resilient searching of elements via xpath?

2023-02-02 22:45 问答作者：

from my previous question,

how does this xpath behave?

I found that

html//p//table//tr//td/a

could deal with any unexpected elements that show up between the above xpath.

For instance the xpath above could handle:

html/p/div/table/tr/td/a
html/p/table/tr/td/b/div/a

However, how can I formulate an xpath which be fully resilient to missing/unexpected elements ?

For instance, the xpath mentioned in the beginning cannot handle the following:

/html/table/tr/td/a (p is missing)
/html/div/span/table/tr/td/a (p is missing and position replaced with `div/span/`)

Does an xpath syntax exist to deal with above case ? If not, what would be an alternate approach ?

My gut tells me, it's not possible with xpath alone so I 开发者_开发问答am utilizing the following algorithm using pseudocode.

Essentially, what it will do is split up the given xpath, and look for immediate child for each ancestor. If the expected child doesn't exist or is some other element, it will dig through all children of current ancestor, and attempt to discover the expected child.

function searchElement(){
elements[] =  "/html/p/table/tr/td/a".split("/");
thisElement = "";

for (element in elements) {
if (firstItem){ 
  thisElement = findElementByXpath(element);
}else{
try{
thisElement.findElementByXpath(element); //look for this element inside previous element (from previous iteration);
}catch(NotFoundException e){ //if element is not found, search all elements inside previous element, and look for it.

foundElement = false;
discoveredElement = thisElement.findElementByXpath("*");
while(foundElement != true){

  if (discoveredElement.findEleemntByXpath(element) != null){
    //successful, element found, overwrite.
     thisElement = thisElement.findElementByXpath("*").findEleemntByXpath(element);
     foundElement = true;
  }else{
    //not successful, keep digging.
    discoveredElement = discoveredElement.findElementByXpath("*");
  }

}
}
}
}

return thisElement;
}

Is this an optimal approach ? I am worried that searching for "*" and digging through each Element is rather inefficient.

I don't know what to tag this question besides "xpath"...feel free to edit.

Thank you.

If I understand you correctly, you want to select a elements with specific ordered optional ancestors.

Then your expression: /html//p//table//tr//td/a

It should be:

//a[(self::*|parent::td)[1]
       [(self::*|ancestor::tr)[1]
           [(self::*|ancestor::table)[1]
               [(self::*|ancestor::p)[1]
                        [ancestor::html[not(parent::*)]]
               ]
           ]
       ]
   ]

But this is the same as:

/html//a |
/html//td/a |
/html//tr//a |
/html//tr//td/a |
/html//table//a |
/html//table//td/a |
/html//table//tr//a |
/html//table//tr//td/a |
/html//p//a |
/html//p//td/a |
/html//p//tr//a |
/html//p//tr//td/a |
/html//p//table//a |
/html//p//table//td/a |
/html//p//table//tr//a |
/html//p//table//tr//td/a |

and /html//a is so general that it would select any a

It's possible, but a really bad idea.

The // construct means "skip any number of elements." So you could use a path of //td to find a "td" element anywhere in the DOM.

Which means that you'll pick up the element at /html/body/im/not/what/you/want/td

resilient searching of elements via xpath?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？