Convert links in blockquotes to plain text
So, I've been asking a lot of Xpath questions recently. Sorry, but I've only just started using it, and I'm working on a kind of hard project. You see, at the moment I'm parsing HTML like this (not a copy and paste, just an example):
<span id="no153434"></span>
<blockquote>Text here.<br/>More text.<br/>Some more text.</blockquote>
And I'm using
//span[starts-with(@id, 'no')]/following::*[1][name()='blockquote']//node()
To get the text inside. It's working fine, although it's very frustrating. I need to manually check for
then manually combine the strings before and after the br, add a newline, and so on. But it stills works. Until there is a link in the text, that is. Then the code is like this:<span id="no153434"></span>
<blockquote>Text here.<br/>Text.<br/><font class = "unkfunc"><a href="linkhere" class="link">linkhere</a></font></blockquote>
I have absolutely NO idea where to go from here, as the link is incl开发者_StackOverflowuded as a completely seperate item (twice) in the array. Atleast with the br I knew where it had to be moved to. Really contemplating giving up in this project after all this effort.
You can use this XPath to obtain text inside element: //span[starts-with(@id, 'no')]/following::*[1][name()='blockquote']//text()
So you receive following result:
- Text here.
- Text.
- linkhere
If you want only text nodes and br:
//span
[starts-with(@id, 'no')]/
following::*[1][name()='blockquote']
//node()
[ count(.|..//text()) = count(..//text())
or
name()='br'
]
returns
Text here.
<br />
Text.
<br />
linkhere
The answer is to not use XPath for this kind of work. Got it working 1,000,000x easier with Objective-C-HTML-Parser.
精彩评论