开发者

Extracting context from a set point in the middle of an HTML file

I have some HTML, and I'm extracting a snippet at a certain point (an inline image), but I'd like to show some context around this image.

I'm using PHP, and I know that both Symfony and Wordpress provide functions for dealing with what happens when you chop up text in the middle of some HTML (it closes all open tags), but nothing for dealing with snippets in the other dir开发者_高级运维ection.

So, in the case of :

 'Snippet of text and a <a href="#moo">link right her'

I can use the above-mentioned function to fix, but what about:

'nk right here</a> and then more text after the link.'

I've considered the possibility that even the tag-closing snippet is probably the wrong way to go about this, and I should instead be using Xpath to parse the HTML. However, I can't find any examples or mentions of using xpath to create snippets like this.

Update:

So my current idea is:

  1. move up the parse tree until I get to the tag that encloses all the content (div class=post in my case). The last node that I have before this div is the starting point (most likely a p tag).

  2. From here, get the previous sibling (which should be a p tag again).

  3. Descend into this node and get the last children, saving the text content to a temporary string. Keep stepping back through these children, until we get enough of a snippet.

This still ins't ideal, as I'm not sure how far I'll have to step down to get the text content.

Does anyone know of an implementation of this idea anywhere?


This isn't a complete answer, but you can use an xpath query to get just the node(s) you're interested in, then us the nextSibling and previousSibling properties (in whatever form supported by the extension) to get context for the node(s).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜