How to traverse this XML to get DATA?

2023-03-31 06:51 问答作者：

I am trying to getting information of the item in the xml that is presented like this:

<item>
  <title>The Colbert Report - Confused by Rick Parry With an "A" for America</title>

  <guid isPermaLink="false">http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0</guid>
  <link>http://rss.hulu.com/~r/HuluPopularVideosThisWeek/~3/6aeJ5cWMBzw/the-colbert-report-confused-by-rick-parry-with-an-a-for-america</link>
  <description>&lt;a href="http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0"&gt;&lt;img src="http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg" align="right" hspace="10" vspace="10" width="145" height="80" border="0" /&gt;&lt;/a&gt;&lt;p&gt;The fat cat media elites in Des Moines think t开发者_如何学Chey can sit in their ivory corn silos and play puppet master with national politics.&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.hulu.com/users/add_to_playlist?from=feed&amp;video_id=267788"&gt;Add this to your queue&lt;/a&gt;&lt;br/&gt;Added: Fri Aug 12 09:59:14 UTC 2011&lt;br/&gt;Air date: Thu Aug 11 00:00:00 UTC 2011&lt;br/&gt;Duration: 05:39&lt;br/&gt;Rating: 4.7 / 5.0&lt;br/&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/HuluPopularVideosThisWeek/~4/6aeJ5cWMBzw" height="1" width="1"/&gt;</description>

  <pubDate>Fri, 12 Aug 2011 09:59:14 -0000</pubDate>
  <media:thumbnail height="80" width="145" url="http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg" />
  <media:credit>Comedy Central</media:credit>
  <dcterms:valid>start=2011-08-12T00:15:00Z; end=2011-09-09T23:45:00Z; scheme=W3C-DTF</dcterms:valid>
  <feedburner:origLink>http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0</feedburner:origLink></item>
<item>

I need the title, link, media:thumbnail url and description.

I have used the method found in: http://www.rgagnon.com/javadetails/java-0573.html

Things work fine for title and link, but not on the image url and description.

Can someone help me with this?

You can use XPath to retrieve particular data from an XML document.

For example in order to retrieve the content of the url attribute:

XPathFactory factory = XPathFactory.newInstance();

XPath xpath = factory.newXPath();
String url = xpath.evaluate("/item/media:thumbnail/@url", new InputSource("data.xml"));

    try {
        DocumentBuilderFactory dbf =
        DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        InputSource is = new InputSource(new FileReader(new File("item.xml")));

        Document doc = db.parse(is);
        NodeList nodes = doc.getElementsByTagName("item");

        // iterate the employees
        for (int i = 0; i < nodes.getLength(); i++) {
           Element element = (Element) nodes.item(i);

           NodeList title = element.getElementsByTagName("title");
           Element line = (Element) title.item(0);
           System.out.println("title: " + line.getTextContent());

           NodeList link = element.getElementsByTagName("link");
           line = (Element) link.item(0);
           System.out.println("link: " + line.getTextContent());

           NodeList mt = element.getElementsByTagName("media:thumbnail");
           line = (Element) mt.item(0);
           System.out.println("media:thumbnail: " + line.getTextContent());

           Attr url = line.getAttributeNode("url");
           System.out.println("media:thumbnail -> url: " + url.getTextContent());
        }
    }
    catch (Exception e) {
        e.printStackTrace();
    }

For url, you first get element media:thumbnail, and then since url is an attribute of media:thumbnail, you simply call the function getAttributeNode("url") from the media:thumbnail element.

For pure DOM solution you could use following code to fetch wanted values:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("document.xml");

Element item = doc.getDocumentElement(); // assuming that item is a root element
NodeList itemChilds = item.getChildNodes();

for (int i = 0; i != itemChilds.getLength(); ++i)
{
    Node itemChildNode = itemChilds.item(i);
    if (!(itemChildNode instanceof Element))
        continue;
    Element itemChild = (Element) itemChildNode;
    String itemChildName = itemChild.getNodeName();

    if (itemChildName.equals("title")) // possible switch in Java 7
        System.out.println("title: " + itemChild.getTextContent());
    else if (itemChildName.equals("link"))
        System.out.println("link: " + itemChild.getTextContent());
    else if (itemChildName.equals("description"))
        System.out.println("description: " + itemChild.getTextContent());
    else if (itemChildName.equals("media:thumbnail"))
        System.out.println("image url: " + itemChild.getAttribute("url"));
}

Result:

title: The Colbert Report - Confused by Rick Parry With an "A" for America
link: http://rss.hulu.com/~r/HuluPopularVideosThisWeek/~3/6aeJ5cWMBzw/the-colbert..
description: <a href="http://www.hulu.com/watch/267788/the-colbert-report-confuse..
image url: http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg

The problem here is that the description tag contains an escaped xml (or perhaps html) string rather than just xml.

Probably the easiest thing to do is to get the text contained by this tag and open another XML parser to parse it as a separate XML document. This may not work if it's actually an html fragment and not valid xml however.

继续阅读：dom-traversal traversal xml

How to traverse this XML to get DATA?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？