Ignoring whitespaces, carriage return and tab characters between elements when parsing an XML file
I want to parse an XML file with SAX Xerces C++ while ignoring any white spaces, carriage return and tab characters that are NOT within element attributes or within a start and end element. I开发者_如何学Go want to ignore white spaces, carriage returns and tabs that would be between tags.
For instance in following XML file:
<tag1 attr1="val 1"><tag2>my text here</tag2>
[many white spaces here] </tag1>
I want to preserve white spaces within the strings 'val 1', 'my text here', but ignore the carriage return, and the many whitespace characters between the ending </tag2>
and the ending </tag1>
.
I tried to use a boolean flag 'withinElement' set to true in startElement()
and set to false in endElement()
methods, but that does not prevent me to ignore whitespace characters between </tag2>
and </tag1>
for instance.
Should that be done in the characters()
method?
and how to do it as there does not seem to be a way to know where we are precisely when the characters()
method is invoked?
You could ask the parser to validate the XML file and then you will get all the ignorable whitespaces through the method ignorableWhitespace
and the "good" whitespaces through characters
.
精彩评论