resilient searching of elements via xpath?
from my previous question,
how does this xpath behave?
I found that
html//p//table//tr//td/a
could deal with any unexpected elements that show up between the above xpath.
For instance the xpath above could handle:
html/p/div/table/tr/td/a
html/p/table/tr/td/b/div/a
However, how can I formulate an xpath which be fully resilient to missing/unexpected elements ?
For instance, the xpath mentioned in the beginning cannot handle the following:
/html/table/tr/td/a (p is missing)
/html/div/span/table/tr/td/a (p is missing and position replaced with `div/span/`)
Does an xpath syntax exist to deal with above case ? If not, what would be an alternate approach ?
My gut tells me, it's not possible with xpath alone so I 开发者_开发问答am utilizing the following algorithm using pseudocode.
Essentially, what it will do is split up the given xpath, and look for immediate child for each ancestor. If the expected child doesn't exist or is some other element, it will dig through all children of current ancestor, and attempt to discover the expected child.
function searchElement(){
elements[] = "/html/p/table/tr/td/a".split("/");
thisElement = "";
for (element in elements) {
if (firstItem){
thisElement = findElementByXpath(element);
}else{
try{
thisElement.findElementByXpath(element); //look for this element inside previous element (from previous iteration);
}catch(NotFoundException e){ //if element is not found, search all elements inside previous element, and look for it.
foundElement = false;
discoveredElement = thisElement.findElementByXpath("*");
while(foundElement != true){
if (discoveredElement.findEleemntByXpath(element) != null){
//successful, element found, overwrite.
thisElement = thisElement.findElementByXpath("*").findEleemntByXpath(element);
foundElement = true;
}else{
//not successful, keep digging.
discoveredElement = discoveredElement.findElementByXpath("*");
}
}
}
}
}
return thisElement;
}
Is this an optimal approach ? I am worried that searching for "*" and digging through each Element is rather inefficient.
I don't know what to tag this question besides "xpath"...feel free to edit.
Thank you.
If I understand you correctly, you want to select a
elements with specific ordered optional ancestors.
Then your expression: /html//p//table//tr//td/a
It should be:
//a[(self::*|parent::td)[1]
[(self::*|ancestor::tr)[1]
[(self::*|ancestor::table)[1]
[(self::*|ancestor::p)[1]
[ancestor::html[not(parent::*)]]
]
]
]
]
But this is the same as:
/html//a |
/html//td/a |
/html//tr//a |
/html//tr//td/a |
/html//table//a |
/html//table//td/a |
/html//table//tr//a |
/html//table//tr//td/a |
/html//p//a |
/html//p//td/a |
/html//p//tr//a |
/html//p//tr//td/a |
/html//p//table//a |
/html//p//table//td/a |
/html//p//table//tr//a |
/html//p//table//tr//td/a |
and /html//a
is so general that it would select any a
It's possible, but a really bad idea.
The //
construct means "skip any number of elements." So you could use a path of //td
to find a "td" element anywhere in the DOM.
Which means that you'll pick up the element at /html/body/im/not/what/you/want/td
精彩评论