XQuery extract between two tags
I am currently working on extracting data from HTML
. I would like to extract the text between two <p class="xfHeading">
tags.
<p class="xfHeading"><b>XYZ:</b></p>
<p>asdfghjk</p>
<p>sdsdsd</p>
<p>asdvcvcfghjk</p>
<p class="xfHeading"><b>ABC:</b></p>
<P>fvgbhnjm</P>
<p class="xfHeading"><b>PQR:</b></p>
<ul>
</ul>
<p class="xfHeading"><b>MNO:</b></p>
<ul>
<li>jdjshdj</li>
</ul>
The output should be :
asdfghjk
sdsdsd
asdvcvcfghjk
One way to do this is :
/p[class="xfHeading"]/following-sibling::p[0]|/p[class="xfHeading"]/following-sibling::p[1]|/p[class="xfHeading"]/following-sibling::p[2]
or
/p[class="xfHeading"]/following-sibling::p[position()<4]
H开发者_StackOverflow中文版owever since the content between keeps on changing all the time I need a solution wherein the content between the two tags <p class="xfHeading">
is extracted.
Use:
(//p[@class="xfHeading"])[1]
/following-sibling::p
[. << (//p[@class="xfHeading"])[2]]
/text()
This means: Select the text-node children of all p
elements that are following siblings of the first p
element in the document with class
attribute having value of xfHeading
, and that at the same time are preceding the second p
element in the document with class
attribute having value of xfHeading
.
EDIT: After your clarification, my suggestion is to use a FLWOR expression such as the following. This looks for a <p>
with the proper <b>
tag contents based on the unique contents of that <b>
tag, and returns the text of each <p>
tag that is a sibling of it.
for $b in //p[class="xfHeading"]/b and $p in //p[class="xfHeading"]/p
where $b/text() = "XYZ:"
return p/text()
Note that the //
is an XPATH construct, not a comment
OLD ANSWER: Without an example of what you'd like the resulting data to look like, answering the question is a bit tough. However, to select, for instance, the text inside a <b>
tag, you'd do:
/p[class = "xfHeading"]/b/text()
In general, appending text()
to the end of an expression returns the text inside the node in question.
精彩评论