开发者

How can I create a basic human readable plain text representation of XHTML using Java?

Given some simple XHTML, I'd like to create a human readable plain text version of it. This would involve removing all HTML tags, but adding or preserving some whitespace.

For example, this input:

<div>
<p>This is some text, some is <b>bold</b>.</p>
<ul>
  <li>Point one</li>
  <li>Point two</li>
</ul>
</div>

wou开发者_开发知识库ld become:

"This is some text, some is bold. Point one Point two"

(commas between the LIs would be ideal... :)


Jericho HTML Parser. You can either strip all the tags or call on a "renderer" class that tries to mimick the look (eg your bulleted lists would be tabbed)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜