splitting up of xml file using java
< ?xml version="1.0" encoding="utf-8"? > < rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:ynews="http://news.yahoo.com/rss/" version="2.0" > < channel >
< title>Cricket News Headlines | Cricket News - Yahoo! News India< /title>
< link>http://in.news.yahoo.com/cricket/< /link>
< description>Check out the latest Cricket news headlines from Yahoo! News India. F开发者_Python百科ind top Cricket stories and in-depth coverage of Cricket news from India and around the world.< /description>
< language>en-IN< /language>
< copyright>Copyright (c) 2011 Yahoo! Inc. All rights reserved< /copyright>
< pubDate>2011-04-06T15:30:02+05:30< /pubDate>
< ttl>5< /ttl>
< image>
< title>Cricket News Headlines | Cricket News - Yahoo! News India< /title>
< link>http://in.news.yahoo.com/cricket/< /link>
< url>http://l.yimg.com/os/mit/media/m/index/img/Yahoo_logo_en- IN.gif< /url>
< /image> < item>< title>Hectic schedule will drain players, says Dhoni< /title>
< description>Chennai, Apr 6 (PTI) ...< /description>
< link>http://in.news.yahoo.com/hectic-schedule-drain-players-says-dhoni-20110406-023100-889.html< /link>
< pubDate>2011-04-06T09:31:00Z< /pubDate>
< source>PTI< /source>
< guid isPermaLink="false">/hectic-schedule-drain-players-says-dhoni-20110406-023100-889.html< /guid>
< /item>
< item>
< title>India, Pakistan trade secretaries to meet on April 27-28< /title>
< description>New Delhi, Apr 6 (PTI) ...< /description>
< link>http://in.news.yahoo.com/india-pakistan-trade-secretaries-meet-april-27-28-20110406-023100-140.html< /link>
I want only the HEADLINES from this XML, that is only between < item>< title>MESSAGES< /title> tags. Also have to print the message one after other continuously. how can i do this.
I would use the javax.xml.xpath
APIs that are included in Java SE 5 for this.
import java.io.FileReader;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
public class Demo {
public static void main(String[] args) throws Exception {
XPath xPath = XPathFactory.newInstance().newXPath();
FileReader reader = new FileReader("input.xml");
InputSource xml = new InputSource(reader);
NodeList titleNodes = (NodeList) xPath.evaluate("//item/title", xml, XPathConstants.NODESET);
for(int x=0; x<titleNodes.getLength(); x++) {
System.out.println(titleNodes.item(x).getTextContent());
}
}
}
Parse the file to create a DOM document. On this DOM select all title
elements and their text contents are the headlines you're looking for.
Quick example with dom4j
:
File xml = new File("input.xml"); // replace with your document
SAXReader reader = new SAXReader();
Document doc = reader.read(xml);
List titles = doc.selectNode("//item/title"); // a list of all title elements
for (Object obj:titles)
System.out.println(((Element) obj).getText());
Should print all titles to the console
This is something that comes up often. I have a groovy script to do this. It is available here.
https://github.com/ramanathanrv/utils/blob/master/groovy/split_xml.groovy
Usage: groovy split_xml.groovy <input_file_name> <no_of_pieces>
PS: This is not my code. I got this code from somewhere but really forgot the source.
精彩评论