开发者

How to traverse this XML to get DATA?

I am trying to getting information of the item in the xml that is presented like this:

<item>
  <title>The Colbert Report - Confused by Rick Parry With an "A" for America</title>

  <guid isPermaLink="false">http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0</guid>
  <link>http://rss.hulu.com/~r/HuluPopularVideosThisWeek/~3/6aeJ5cWMBzw/the-colbert-report-confused-by-rick-parry-with-an-a-for-america</link>
  <description>&lt;a href="http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0"&gt;&lt;img src="http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg" align="right" hspace="10" vspace="10" width="145" height="80" border="0" /&gt;&lt;/a&gt;&lt;p&gt;The fat cat media elites in Des Moines think t开发者_如何学Chey can sit in their ivory corn silos and play puppet master with national politics.&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.hulu.com/users/add_to_playlist?from=feed&amp;video_id=267788"&gt;Add this to your queue&lt;/a&gt;&lt;br/&gt;Added: Fri Aug 12 09:59:14 UTC 2011&lt;br/&gt;Air date: Thu Aug 11 00:00:00 UTC 2011&lt;br/&gt;Duration: 05:39&lt;br/&gt;Rating: 4.7 / 5.0&lt;br/&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/HuluPopularVideosThisWeek/~4/6aeJ5cWMBzw" height="1" width="1"/&gt;</description>

  <pubDate>Fri, 12 Aug 2011 09:59:14 -0000</pubDate>
  <media:thumbnail height="80" width="145" url="http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg" />
  <media:credit>Comedy Central</media:credit>
  <dcterms:valid>start=2011-08-12T00:15:00Z; end=2011-09-09T23:45:00Z; scheme=W3C-DTF</dcterms:valid>
  <feedburner:origLink>http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0</feedburner:origLink></item>
<item>

I need the title, link, media:thumbnail url and description.

I have used the method found in: http://www.rgagnon.com/javadetails/java-0573.html

Things work fine for title and link, but not on the image url and description.

Can someone help me with this?


You can use XPath to retrieve particular data from an XML document.

For example in order to retrieve the content of the url attribute:

XPathFactory factory = XPathFactory.newInstance();

XPath xpath = factory.newXPath();
String url = xpath.evaluate("/item/media:thumbnail/@url", new InputSource("data.xml"));


    try {
        DocumentBuilderFactory dbf =
        DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        InputSource is = new InputSource(new FileReader(new File("item.xml")));

        Document doc = db.parse(is);
        NodeList nodes = doc.getElementsByTagName("item");

        // iterate the employees
        for (int i = 0; i < nodes.getLength(); i++) {
           Element element = (Element) nodes.item(i);

           NodeList title = element.getElementsByTagName("title");
           Element line = (Element) title.item(0);
           System.out.println("title: " + line.getTextContent());

           NodeList link = element.getElementsByTagName("link");
           line = (Element) link.item(0);
           System.out.println("link: " + line.getTextContent());

           NodeList mt = element.getElementsByTagName("media:thumbnail");
           line = (Element) mt.item(0);
           System.out.println("media:thumbnail: " + line.getTextContent());

           Attr url = line.getAttributeNode("url");
           System.out.println("media:thumbnail -> url: " + url.getTextContent());
        }
    }
    catch (Exception e) {
        e.printStackTrace();
    }

For url, you first get element media:thumbnail, and then since url is an attribute of media:thumbnail, you simply call the function getAttributeNode("url") from the media:thumbnail element.


For pure DOM solution you could use following code to fetch wanted values:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("document.xml");

Element item = doc.getDocumentElement(); // assuming that item is a root element
NodeList itemChilds = item.getChildNodes();

for (int i = 0; i != itemChilds.getLength(); ++i)
{
    Node itemChildNode = itemChilds.item(i);
    if (!(itemChildNode instanceof Element))
        continue;
    Element itemChild = (Element) itemChildNode;
    String itemChildName = itemChild.getNodeName();

    if (itemChildName.equals("title")) // possible switch in Java 7
        System.out.println("title: " + itemChild.getTextContent());
    else if (itemChildName.equals("link"))
        System.out.println("link: " + itemChild.getTextContent());
    else if (itemChildName.equals("description"))
        System.out.println("description: " + itemChild.getTextContent());
    else if (itemChildName.equals("media:thumbnail"))
        System.out.println("image url: " + itemChild.getAttribute("url"));
}

Result:

title: The Colbert Report - Confused by Rick Parry With an "A" for America
link: http://rss.hulu.com/~r/HuluPopularVideosThisWeek/~3/6aeJ5cWMBzw/the-colbert..
description: <a href="http://www.hulu.com/watch/267788/the-colbert-report-confuse..
image url: http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg


The problem here is that the description tag contains an escaped xml (or perhaps html) string rather than just xml.

Probably the easiest thing to do is to get the text contained by this tag and open another XML parser to parse it as a separate XML document. This may not work if it's actually an html fragment and not valid xml however.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜