How to filter certain words from selected text using XPath?
To select the text here:
Alpha Bravo Charlie Delta Echo Foxtrot
from this HTML structure:
<div id="entry-2" class="item-asset asset hentry">
<div class="asset-h开发者_开发百科eader">
<h2 class="asset-name entry-title">
<a rel="bookmark" href="http://blahblah.com/politics-democrat">Pelosi Q&A</a>
</h2>
</div>
<div class="asset-content entry-content">
<div class="asset-body">
<p>Alpha Bravo Charlie Delta Echo Foxtrot</p>
</div>
</div>
</div>
I apply following XPath expression to select the text inside asset-body
:
//div[contains(
div/h2[
contains(concat(' ',@class,' '),' asset-name ')
and
contains(concat(' ',@class,' '),' entry-title ')
]/a[@rel='bookmark']/@href
,'democrat')
]/div/div[
contains(concat(' ',@class,' '),' asset-body ')
]//text()
How would I sanitize the following words from the text:
Alpha
Charlie
Echo
So that I end up with only the following text in this example:
Bravo Delta
With XPath 1.0 supposing uniques NMTokens:
concat(substring-before(concat(' ',$Node,' '),' Alpha '),
substring-after(concat(' ',$Node,' '),' Alpha '))
As you can see, this becomes very verbose (and bad performance).
With XPath 2.0:
string-join(tokenize($Node,' ')[not(.=('Alpha','Charlie','Echo'))],' ')
How would I sanitize the following words from the text:
Alpha
Charlie
Echo
So that I end up with only the following text in this example:
Bravo Delta
This can't be done in XPath 1.0 alone -- you'll need to get the text in the host language and do the replacement there.
In XPath 2.0 one can use the replace()
function:
replace(replace(replace($vText, ' Alpha ', ''), ' Charlie ', ''), ' Echo ')
精彩评论