开发者

Parsing XML Textlist

I'm trying to parse a XML file. I'm able to parse normal text node but how do I parse a textlist? I'm getting the firstChild of the textlist thats sadly all. If I try to do

 elem.nextSibling();

it is always null which can't be, I know there are two other values left.

Does someone can provide me an example maybe?

Thanks!

XML example

<viewentry position="1" unid="7125D090682C3C3EC1257671002F66F4" noteid="962" siblings="65">
    <entrydata columnnumber开发者_运维技巧="0" name="Categories">
        <textlist>
            <text>Lore1</text>
            <text>Lore2</text>
        </textlist>
    </entrydata>
    <entrydata columnnumber="1" name="CuttedSubjects">
        <text>
            LoreImpsum....
        </text>
    </entrydata>
    <entrydata columnnumber="2" name="$35">
        <datetime>20091117T094224,57+01</datetime>
    </entrydata>
</viewentry>


I assume you're using a DOM parser.

The first child of the <textlist> node is not the first <text> node but rather the raw text that contains the whitespace and carriage return between the end of <textlist> and the beginning of <text>. The output of the following snippet (using org.w3c.dom.* and javax.xml.parsers.*)

Node grandpa = document.getElementsByTagName("textlist").item(0);
Node daddy = grandpa.getFirstChild();
while (daddy != null) {
    System.out.println(">>> " + daddy.getNodeName());
    Node child = daddy.getFirstChild();
    if (child != null)
        System.out.println(">>>>>>>> " + child.getTextContent());
    daddy = daddy.getNextSibling();
}

shows that <textlist> has five children: the two <text> elements and the three raw text pieces before, between and after them.

>>> #text
>>> text
>>>>>>>> Lore1
>>> #text
>>> text
>>>>>>>> Lore2
>>> #text

When parsing XML this way, it's easy to overlook that the structure of the DOM-tree can be complicated. You can quickly end up iterating over a NodeList in the wrong generation, and then you get nulls where you would expect siblings. This is one of the reasons why people came up with all kinds of xml-to-java stuff, from homegrown XMLHelper classes to XPath expressions to Digester to JAXB, so you need to go down to the DOM level only when you absolutely have to.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜