开发者

How to filter certain words from selected text using XPath?

To select the text here:

     Alpha Bravo Charlie Delta Echo Foxtrot

from this HTML structure:

<div id="entry-2" class="item-asset asset hentry">
  <div class="asset-h开发者_开发百科eader">
    <h2 class="asset-name entry-title">
      <a rel="bookmark" href="http://blahblah.com/politics-democrat">Pelosi Q&amp;A</a>
    </h2>
  </div>
  <div class="asset-content entry-content">
    <div class="asset-body">
     <p>Alpha Bravo Charlie Delta Echo Foxtrot</p>
    </div>
  </div>
</div>

I apply following XPath expression to select the text inside asset-body:

//div[contains(
            div/h2[
              contains(concat(' ',@class,' '),' asset-name ')
              and
              contains(concat(' ',@class,' '),' entry-title ')
            ]/a[@rel='bookmark']/@href
         ,'democrat')
        ]/div/div[
           contains(concat(' ',@class,' '),' asset-body ')
           ]//text()

How would I sanitize the following words from the text:

Alpha
Charlie
Echo

So that I end up with only the following text in this example:

Bravo Delta


With XPath 1.0 supposing uniques NMTokens:

concat(substring-before(concat(' ',$Node,' '),' Alpha '),
       substring-after(concat(' ',$Node,' '),' Alpha '))

As you can see, this becomes very verbose (and bad performance).

With XPath 2.0:

string-join(tokenize($Node,' ')[not(.=('Alpha','Charlie','Echo'))],' ')


How would I sanitize the following words from the text:

Alpha 
Charlie 
Echo 

So that I end up with only the following text in this example:

Bravo Delta 

This can't be done in XPath 1.0 alone -- you'll need to get the text in the host language and do the replacement there.

In XPath 2.0 one can use the replace() function:

replace(replace(replace($vText, ' Alpha ', ''), ' Charlie ', ''), ' Echo ')
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜