Render HTML Webpage to text in Java
I would like to get the text representation of a website in a human-readable form, for example hyperlink locations or input fields.
Is there any library that does this? (I've checked Jericho Renderer but it does not show input fields) For example<div>
<form action="example.php">
Name:
<input type="text" name="name_field">
<input type="button" value="OK">
</form>
</div>
to something like this
Name: [_______开发者_高级运维_] [OK]
Try tag soup and build it yourself. You get a DOM model of the HTML and can spit out the text.
精彩评论