How can I create a basic human readable plain text representation of XHTML using Java?
Given some simple XHTML, I'd like to create a human readable plain text version of it. This would involve removing all HTML tags, but adding or preserving some whitespace.
For example, this input:
<div>
<p>This is some text, some is <b>bold</b>.</p>
<ul>
<li>Point one</li>
<li>Point two</li>
</ul>
</div>
wou开发者_开发知识库ld become:
"This is some text, some is bold. Point one Point two"
(commas between the LIs would be ideal... :)
Jericho HTML Parser. You can either strip all the tags or call on a "renderer" class that tries to mimick the look (eg your bulleted lists would be tabbed)
精彩评论