开发者

Extract all text from a HTML page without losing context

For a translation program I am trying to g开发者_JS百科et a 95% accurate text from a HTML file in order to translate the sentences and links.

For example:

<div><a href="stack">Overflow</a> <span>Texts <b>go</b> here</span></div>

Should give me 2 results to translate:

Overflow

Texts <b>go</b> here

Any suggestions or commercial packages available for this problem?


I'm not exactly sure what you're asking, but look at simplehtmldom. Specifically the "Extract Contents from HTML" tab under quick start on that front page (can't link directly, sigh). With that you can extract the text of a website without all those pesky tags.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜