Html 2 Text - Remove "hidden" Text
I am currently looking for ways to read the visible text开发者_如何学Go of a website and store it into plaintext string using Java.
In other words, I'd like to convert something like this:
Hello <span style="display: none">stupid</span> World into "Hello World"
or something like
<span>Un</span>friendly into "Unfriendly" (and not something like "Un friendly")
or
Hello
World
into "Hello World" (as new lines are ignored in HTML)
Do you know of any lib capable of assisting in this task?
Cheers,
Matthias
Boilerpipe is an HTML cleaning library written in Java.
Have a look at Cobra to see if the API provides any method to render the HTML and convert it into plain text.
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论