Parsing XML Textlist
I'm trying to parse a XML file. I'm able to parse normal text node but how do I parse a textlist? I'm getting the firstChild of the textlist thats sadly all. If I try to do
elem.nextSibling();
it is always null which can't be, I know there are two other values left.
Does someone can provide me an example maybe?
Thanks!
XML example
<viewentry position="1" unid="7125D090682C3C3EC1257671002F66F4" noteid="962" siblings="65">
<entrydata columnnumber开发者_运维技巧="0" name="Categories">
<textlist>
<text>Lore1</text>
<text>Lore2</text>
</textlist>
</entrydata>
<entrydata columnnumber="1" name="CuttedSubjects">
<text>
LoreImpsum....
</text>
</entrydata>
<entrydata columnnumber="2" name="$35">
<datetime>20091117T094224,57+01</datetime>
</entrydata>
</viewentry>
I assume you're using a DOM parser.
The first child of the <textlist>
node is not the first <text>
node but rather the raw text that contains the whitespace and carriage return between the end of <textlist>
and the beginning of <text>
. The output of the following snippet (using org.w3c.dom.* and javax.xml.parsers.*)
Node grandpa = document.getElementsByTagName("textlist").item(0);
Node daddy = grandpa.getFirstChild();
while (daddy != null) {
System.out.println(">>> " + daddy.getNodeName());
Node child = daddy.getFirstChild();
if (child != null)
System.out.println(">>>>>>>> " + child.getTextContent());
daddy = daddy.getNextSibling();
}
shows that <textlist>
has five children: the two <text>
elements and the three raw text pieces before, between and after them.
>>> #text
>>> text
>>>>>>>> Lore1
>>> #text
>>> text
>>>>>>>> Lore2
>>> #text
When parsing XML this way, it's easy to overlook that the structure of the DOM-tree can be complicated. You can quickly end up iterating over a NodeList in the wrong generation, and then you get nulls where you would expect siblings. This is one of the reasons why people came up with all kinds of xml-to-java stuff, from homegrown XMLHelper classes to XPath expressions to Digester to JAXB, so you need to go down to the DOM level only when you absolutely have to.
精彩评论