开发者

Non-trivial screen scraping selections using pQuery

I'm using pQuery (a Perl port of jQuery) to select elements and retrieve text from a HTML-document.

Consider the following markup:

<x>
   <y>code1</y>
   <z>stuff</z>
   <y>code2</y>
   <z>foobar</z>
</x>

And the following pQuery code:

my $target_value = pQuery($markup)->find($pquery_selector)->text;

I'm trying to formulate $pquery_selector so that it matches <z>foobar</z> in the markup above using the following rule: find the z-element that follows after a y-element which has a body containing "code2". While this is possible using jQuery I'm not sure开发者_如何转开发 that the pQuery syntax is powerful enough to handle such an expression.

Is this type of selection possible using the pQuery syntax?


In jQuery it might be possible to write a selector like 'y:contains(code2)+z'. However, pQuery is still unfinished (as of version 0.07), and a selector like x+z just gives an error demonstrating that the module developer hasn't gotten around to translating that part of the jQuery code.

Since pQuery hasn't been touched since 2008, I'd recommend either fixing it yourself (the code is on cpan and github), or using a more mature module like HTML::TreeBuilder::XPath (which does require learning XPath syntax, but actually works for non-trivial things).

The XPath equivalent of the above jQuery selector would be '//y[contains(text(), 'code2')]/following-sibling::z'

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜