开发者

xpath - find multiple sequencial occurences of an element

I have an xhtml node that I need to clean, with the following innerText:

<img style="width: 402px; height: 312px;" src="http://www.mydomain.com/test.jpg" align="left" border="0" height="312" hspace="5" vspace="5" width="402"> <br><font size="1" face="Arial"><br><br><br><br><br><br><br><br&开发者_StackOverflowgt;<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><font face="Verdana">Image text goes here</font> </font>

I can't figure out by myself the xpath expressions that returns / finds multiple occurrences of the <br> element. do I need to do recursion in the nodes and check against the last match ?

UPDATE: I'm using HtmlAgilityPack to navigate through the doc.

Thanks in advance!

Regards, byte_slave


Not really sure what you want to do with this. I have asked what you want it transformed to as a comment of the question…

Guessing what you might want to do though…

To find out the total number of <br/> elements, you just use XPath count(//descendant-or-self::br)

Or if you want to do something with all the <br/> elements that are next to another <br/> you could use XPath //descendant-or-self::br[following-sibling::br or preceding-sibling::br] to return just that long list of <br/>s


XPath is not going to work because this is NOT XHTML. All the br tags are unclosed. Heck, even the img tag itself is imcomplete...

You need to clean this with plain text handling (regular expressions, likely) or HTML sanitizers. Look at

xmllint

and

HTML tidy

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜