Extracting an href's text from an html document
I'm trying to parse this piece of HTML:
<div>
<p>
<a href="#" class="transcriptLink" onclick="seekVideo(0); return false;">A few years ago,</a>
<a href="#" class="transcriptLink" onclick="seekVideo(2000); return false;">I felt like I was stuck in a rut,</a>
<a href="#" class="transcriptLink" onclick="seekVideo(5000); return false;">so I decided to follow in the footsteps</a>
<a href="#" class="transcriptLink" onclick="seekVideo(7000); return false;">of the great American philosopher, Morgan Spurlock,</a>
<a href="#" class="transcriptLink" onclick="seekVideo(1开发者_高级运维0000); return false;">and try something new for 30 days.</a>
</p>
</div>
I want to know how to get the text in label, such as: "A few years ago,"
I can get text in "<a> text </a>",
But I do not know how to get "A few years ago," in the label of "<a href="#" class="transcriptLink" onclick="seekVideo(0); return false;">A few years ago,</a> "
<a href="#" class="transcriptLink" onclick="seekVideo(0); return false;">
<a href="#" class="transcriptLink" onclick="seekVideo(2000); return false;">
....................
There are different about only onclick="seekVideo(....);
You can use XPath: /div/p/a[1]/text()
- selects a
by index or matching @onclick
value: /div/p/a[starts-with(@onclick, 'seekVideo(0)')]/text()
. So both queries return A few years ago,
.
To get number in @onclick
seekVideo
you can use this expression:
substring-before(substring-after(@onclick, '('), ')')
e.g.: To find a
whose @onclick
seekVideo
= 0
you can use this XPath:
/div/p/a[substring-before(substring-after(@onclick, '('), ')') = '0']/text()
or
/div/p/a[number(substring-before(substring-after(@onclick, '('), ')')) = 0]/text()
So both queries return A few years ago,
.
Use:
string(//div/a[starts-with(@onclick, 'seekVideo(0)')])
This expression evaluates the string-value of the first a
in the XML document that is a child of a div
, and the string-value of whose onclick
attribute starts with the string "seekVideo(0)"
精彩评论