开发者

Java: How to speed up the xpath string generation on a given w3c dom document?

I have the following method which takes a org.w3c.dom.Document and generate an absolute xpath string.

I notice it takes long time to go through hundreds of elements on a page.

Is there anyway to speed it up or a different approach perhaps?

Important note: I am开发者_运维问答 only given org.w3c.dom document

   public String getElementXpath(DOMElement elt){
            String path = "";          

            for (Node fib = (Node) elt; fib != null; fib = fib.getParentNode()){                
                if (fib.getNodeType() == Node.ELEMENT_NODE){

                    DOMElement thisparent = (DOMElement) fib;
                    int idx = getElementIdx(thisparent);
                    String xname = thisparent.getTagName();

                        if (idx >= 1) xname += "[" + idx + "]";
                        path = "/" + xname + path;
                }
            }
            return path;           
        }

        private int getElementIdx(DOMElement elt) {
             int count = 1;
             for (Node sib = elt.getPreviousSibling(); sib != null; sib = sib.getPreviousSibling())
                {
                    if (sib.getNodeType() == Node.ELEMENT_NODE){
                        DOMElement thiselement = (DOMElement) sib;
                        if(thiselement.getTagName().equals(elt.getTagName())){
                            count++;
                        }
                    }
                }

            return count;
        }


Your code is O(n^2) in the number of siblings (that is, the maximum fan-out of the tree).

Given any DOM problem, a better approach is always to avoid using DOM! But I don't know if that's an option in your case.

A less radical change would be to change your code so that, as it walks the children of a node, it maintains a hashmap containing for each element name encountered, the number of elements with that name, and then use this information to generate the subscript (index) rather than counting back through all the previous siblings.


I am not sure whether you generate XPaths for multiple or just a single node in each DOM document, but if you generate multiple, then you can cache the expressions as suggested by others. Hard to estimate, but if you want to generate very many XPaths from the same document, you might as well reverse the algorithm to start with the root element. And note that you can normalize text nodes if you have a lot, but I am unsure of the overall performance ;)

But regardless, iteration over the DOM nodes is really fast. But your String handling is not, in fact it is somewhat bad. Switch to a single StringBuilder (thanks, Alvin) instead of your current approach (using + to append Strings is compiled into something more compcliated, see javadoc). Make sure you initialize it to a good size in the constructor.

You do not really need to check the tag name either, any-name element type is allowed in XPath. Like /*[1]/*[2] for example.


=== New - So you need to use DOM ===

To speed things up you can do caching (like the other person suggested). Notice your current code computes the xpath for the same node multiple times (or each node N you will have to compute xpath for N for each of N's children). Here is what I have in mind for caching:

HashMap<Node, String> xpathCache;
HashMap<Node, Integer> nodeIndexCache;

public String getElementXpath(DOMElement elt){
            String path = "";

            for (Node fib = (Node) elt; fib != null; fib = fib.getParentNode()){                
                if (fib.getNodeType() == Node.ELEMENT_NODE){

                    String cachedParentPath = xpathCache.get(fib);

                    if (cachedParentPath != null){
                        path = cachedParentPath + path;
                        break;
                    }

                    DOMElement thisparent = (DOMElement) fib;
                    int idx = getElementIdx(thisparent);
                    String xname = thisparent.getTagName();

                        if (idx >= 1) xname += "[" + idx + "]";
                        path = "/" + xname + path;
                }
            }

            /* 
             * here, not only you know the xpath to the elt, you also 
             * know the xpath to the ancestors of elt. You can leverage
             * this to cache the ancestor's xpath as well. But I just 
             * cache the elt for illustration purpose.
             * 
             * To compute ancestor's xpath efficiently, maybe you want to 
             * store xpath using different data structure other than String.
             * Maybe a Stack of Strings?
             */
            if (! xpathCache.containsKey(elt)){
               xpathCache.put (elt, path);
            }

            return path;           
        }

private int getElementIdx(DOMElement elt) {
             Integer count = nodeIndexCache.get(elt);
             if (count != null){
               return count;
             }
             count = 1;

             LinkedList<Node> siblings = new LinkedList<Node>();
             for (Node sib = elt.getPreviousSibling(); sib != null; sib =           sib.getPreviousSibling())
                {
                   siblings.add(sib);
                }

             int offset = 0;
             for (Node n : siblings)
             {
                nodeIndexCache.put(n, siblings.size() - index);
                offset ++;
             }                

            /* 
             * you can improve index caching even further by doing it in the
             * above for loop.
             */      
            nodeIndexCache.put(elt, siblings.size()+1);

            return count;
}

It looks like you are given a random node and you have to compute the xpath by backtracing the node's path? If what you ultimately want to achieve is to compute xpath of all the nodes, fastest way is to start with the root node and traverse through the tree, provided you have reference to the root node.

=== OLD ===

You can try using event-base XML parsing API instead of DOM. JVM comes with an event parser called SAXParser, you can start by using that. There is also StAX that you can try.

The event-based XML parser emits "events" as it does depth-first traversal instead of parsing the XML into in-memory-DOM. So the event-based parser visits each element of your XML, emits event like "onOpenTag", "onClosedTag", and "onAttribute". By writing an event handler, you can build and/or store the paths of the elements like this:

...
currentPath=new Stack();

onOpenTag(String tagName){
   this.currentPath.push("tagName");

   if ("Item".equals(tagName)){
      cache.store(convertToPathString(currentPath));
   }
}

onCloseTag(String tagName){
   this.currentPath.pop();
}

Nice thing about event-based API is it's fast and saves a lot of memory for big XML.

Bad thing about it is you have to write mode code to get the data you want.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜