开发者

XQuery extract between two tags

I am currently working on extracting data from HTML. I would like to extract the text between two <p class="xfHeading"> tags.

         <p class="xfHeading"><b>XYZ:</b></p> 
            <p>asdfghjk</p>  
            <p>sdsdsd</p>  
            <p>asdvcvcfghjk</p>  

         <p class="xfHeading"><b>ABC:</b></p> 
            <P>fvgbhnjm</P>  

         <p class="xfHeading"><b>PQR:</b></p> 
            <ul> 

            </ul> 

         <p class="xfHeading"><b>MNO:</b></p> 
             <ul> 
                <li>jdjshdj</li>  
             </ul> 

The output should be :

asdfghjk

sdsdsd

asdvcvcfghjk

One way to do this is :

/p[class="xfHeading"]/following-sibling::p[0]|/p[class="xfHeading"]/following-sibling::p[1]|/p[class="xfHeading"]/following-sibling::p[2]

or

/p[class="xfHeading"]/following-sibling::p[position()<4]

H开发者_StackOverflow中文版owever since the content between keeps on changing all the time I need a solution wherein the content between the two tags <p class="xfHeading"> is extracted.


Use:

(//p[@class="xfHeading"])[1]
          /following-sibling::p
             [. << (//p[@class="xfHeading"])[2]]
                 /text()

This means: Select the text-node children of all p elements that are following siblings of the first p element in the document with class attribute having value of xfHeading, and that at the same time are preceding the second p element in the document with class attribute having value of xfHeading.


EDIT: After your clarification, my suggestion is to use a FLWOR expression such as the following. This looks for a <p> with the proper <b> tag contents based on the unique contents of that <b> tag, and returns the text of each <p> tag that is a sibling of it.

for $b in //p[class="xfHeading"]/b and $p in //p[class="xfHeading"]/p
    where $b/text() = "XYZ:"
        return p/text()

Note that the // is an XPATH construct, not a comment

OLD ANSWER: Without an example of what you'd like the resulting data to look like, answering the question is a bit tough. However, to select, for instance, the text inside a <b> tag, you'd do:

/p[class = "xfHeading"]/b/text()

In general, appending text() to the end of an expression returns the text inside the node in question.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜