How do I select sets of nodes with a single XPath query?

2023-02-13 05:32 问答作者：

I'm trying to extract journey and price information from my favorite airline.

I have a search results page that looks like this:

MASwings search results http://img28.imagevenue.com/aAfkjfp01fo1i-2846/loc29/42467_dayview_oneway_122_29lo.jpg

EDIT: Image host might have blocked the hotlink. See the image on this page: http://img28.imagevenue.com/img.php?image=42467_dayview_oneway_122_29lo.jpg

Repro URL for booking query

I can select each row that represents a flight using this XPath selector:

//*[@class="servicecode "]/ancestor::tr[1]

But each flight row is not an independent journey; the flights are really grouped into legs, and these are what I want to select.

The row class alternates for each new leg: the rows of the first leg have class "datarow", and the rows of the next leg have "datarow alt". In Python I can group the nodes selected by the above expression using itertools.groupby, but if there is a way to acheive this purely in XPath, I would prefer it.

An extension to this question: my selector selects all rows, whether the flight is sold out or not. I can select the first flight of every bookable journey using this selector:

//*[contains(@class, "datarow")][.//input]

But if the leg has more than one flight, then I will have to look for following sibling with the same c开发者_运维百科lass using another XPath query.

Is there a single XPath query that will return me each bookable leg as a nodeset?

Note: I'm using the Python lxml library, in case that matters.

I can select each row that represents a flight using this XPath selector:

     //*[@class="servicecode "]/ancestor::tr[1] 

But each flight row is not an independent journey; the flights are really grouped into legs, and these are what I want to select.

The row class alternates for each new leg: the rows of the first leg have class "datarow",

Use:

//tr[@class='datarow'][.//*[@class='servicecode']]

An extension to this question: my selector selects all rows, whether the flight is sold out or not. I can select the first flight of every bookable journey using this selector:
//*[contains(@class, "datarow")][.//input]
But if the leg has more than one flight, then I will have to look for following sibling with the same class using another XPath query.

Is there a single XPath query that will return me each bookable leg as a nodeset?

Yes:

  (//tr[@class='datarow'])[1]//input 
| 
  (//tr[@class='datarow'])[1]
         //following-sibling::tr[@class='datarow altrow']
                   [count(preceding-sibling::tr[@class='datarow'])=1]
                         //input

This XPath expression selects all tr elements that represent each bookable leg (in this case 3 legs) of the first journey.

To get all legs of the second journey, substitute 1 in the above expression with 2.

To get all legs of the k-th journey, substitute 1 in the above expression with the actual value of k.

This does what I want. But is there a more elegant solution?

//*[contains(@class, "columns")]//tr[contains(@class, "datarow")][1]
|
//*[contains(@class, "columns")]//tr[not(contains(@class, "altrow"))]
       [preceding-sibling::tr[1]
           [contains(@class, "altrow")]]
|
//*[contains(@class, "columns")]//tr[contains(@class,"altrow")]
       [preceding-sibling::tr[1]
           [not(contains(@class, "altrow"))]]

The second part selects each set of consecutive rows with class not containing "altrow" as a single nodeset.

The third part selects each set of consecutive rows with class containing "altrow" as a single node set.

The first part selects the first set of consecutive rows with class not containing "altrow", because it is not selected by the second part.

How do I select sets of nodes with a single XPath query?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？