开发者

XPath getting text from an element after a certain element

So right now if I have something like this:

//div[@class='artist']/p[x]/text()

x can either be 3 or 4, or maybe even a different number. Luckily, if what I am looking for is not in 3, I can just check for null and go on until I find text. The issue is I would rather know I'm going to the right element every time. So I tried this:

div[@class='people']/h3[text()='h3 text']/p/text()

since there will always be a <p> right after <h3>h3 text</h3>. However this never returns anything, and usually results in an error. If I remove /p I will get 'h3 te开发者_如何学Cxt' returned.

Anyway, how do I get that <p> directly after <h3>?

BTW, I'm using HTMLCleaner in Java for this.


By default when you don't specify an axis you get the child:: axis, which is why the / operator seems to descend the DOM tree child by child. There is an implied child:: after each slash.

In your case you don't want to find a child of the <div>, you want to find a sibling of it. A sibling is an element at the same nesting level. Specifically, you should use the following-sibling:: axis.

div[@class='people']/h3[text()='h3 text']/following-sibling::p/text()

XPath Axes

Axes are an advanced feature of XPath. They are one of the features that make XPath especially powerful.

You're already familiar with one other axis, though you may not have realized it: the @ symbol is shorthand for attribute::. When you write @href you're really saying attribute::href, as in look for an attribute called "href" instead of a child.

Axes, eh? Shorthand, eh? Tell me more, you say? OK!

  • . and .. are shorthand for the more verbose self::node() and parent::node(), respectively. You could use the longer forms if you wished.

  • The // operator you commonly see as //p or body//a has a hidden descendant-or-self::node() between the slashes. //p is shorthand for /descendant-or-self::node()/p.


Anyway, how do I get that <p> directly after <h3>?

Use:

div[@class='people']/h3[text()='h3 text']/following-sibling::p[1]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜