开发者

Faster api than javax.xml.xpath to parse the xml for a value?

I am using javax.xml.xpath to search for specific strings in xml files, however due to the huge 开发者_运维百科number of xml files which needs to be searched this is turning out to be much slower than expected.

Is there any api that java supports that is faster than javax.xml.xpath or which is the fastest that is available?


As pointed out by skaffman you will want to be sure you are using the javax.xml.xpath libraries as efficiently as possible. If you are executing an XPath statement more that once you will want to make sure to compile it into an XPathExpression.

XPathExpression xPathExpression = xPath.compile("/root/device/modelname");
nl = (NodeList) xPathExpression.evaluate(dDoc, XPathConstants.NODESET);

Demo

In the example option #2 will be faster than option #1.

import java.io.File;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NodeList;

public class Demo {

    public static void main(String[] args) {
        DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
        try {
            DocumentBuilder builder = domFactory.newDocumentBuilder();
            File xml = new File("input.xml");
            Document dDoc = builder.parse(xml);

            NodeList nl;

            // OPTION #1
            XPath xPath = XPathFactory.newInstance().newXPath();
            nl = (NodeList) xPath.evaluate("root/device/modelname", dDoc, XPathConstants.NODESET);
            printResults(nl);
            nl = (NodeList) xPath.evaluate("/root/device/modelname", dDoc, XPathConstants.NODESET);
            printResults(nl);

            // OPTION #2
            XPathExpression xPathExpression = xPath.compile("/root/device/modelname");
            nl = (NodeList) xPathExpression.evaluate(dDoc, XPathConstants.NODESET);
            printResults(nl);
            nl = (NodeList) xPathExpression.evaluate(dDoc, XPathConstants.NODESET);
            printResults(nl);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private static void printResults(NodeList nl) {
        for(int x=0; x<nl.getLength(); x++) {
            System.out.println("the value is: " + nl.item(x).getTextContent());
        }
    }

}

input.xml

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <blah>foo</blah>
  <device>
    <modelname>xbox</modelname>
  </device>
  <blah>bar</blah>
  <device>
    <modelname>wii</modelname>
  </device>
  <blah/>
</root>


I wonder if the XPath searching is really your bottleneck, or whether it's actually the XML parsing? I would suspect the latter. I don't know how persistent your XML documents are, but I would think the solution is to store them in an XML database so you only incur the parsing cost once, and so that they can be indexed to make XPath/XQuery searching more efficient.


you can look at my previous answer for something related.

Basically I used JXpath and Xerces as well as Dom4J and javax. I can say with confidence from my experience that VTD-XML is hands down the fastest of these options.

There are plenty of other questions on using VTD-XML on SO if you care to search.

EDIT:
ok, so based on your comment the code snippet would be something like this:

VTDGen vg = new VTDGen();
AutoPilot ap = new AutoPilot();
int i;
ap.selectXPath("/root/device/modelname");
if (vg.parseFile(PATH_TO_FILE,true)){
    VTDNav vn = vg.getNav();
    ap.bind(vn); // apply XPath to the VTDNav instance
    // AutoPilot moves the cursor for you
    while((i=ap.evalXPath())!=-1){
        System.out.println("the value is: " + vn.toNormalizedString(vn.getText()));
    }
}

For the following XML:

<root>
  <blah>foo</blah>
  <device>
    <modelname>xbox</modelname>
  </device>
  <blah>bar</blah>
  <device>
    <modelname>wii</modelname>
  </device>
  <blah/>
</root>

The output will be:

the value is: xbox
the value is: wii

You can take it from here...


You should elaborate on what kinds of things you are searching for -- if it's plain content Strings, I would consider using Stax API (javax.xml.stream.XMLStreamReader), for example. XPath is good if you need to restrict your search for specific subset.

One problem with XPath however is that depending on expression it may end up building a DOM tree in memory, and this is rather costly (relative to parsing XML), both in terms of speed and memory use. So if this can be avoided that alone can speed up processing by factory of 3x.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜