HTML to TXT library that mimics the output of "lynx -dump"?
The problem is really that specific.
I need a library in java that can take HTML content and generate text in the same format that is generated by the Linux lynx program.
I need to expose data provided by 3rd party servers to end users on Android. Data format is ancient, in badly formatted HTML, so much that I've tried reading it using java and it fails occasionally (unacceptable). It is also growing every month (preinstall ruled out) and I can't convince them to change to "modern" stuff (life would be great in XML etc.).
Shortest route: I wrote a class to use the W3 html2txt service online (google search it). It worked fine on the app until I got complains and noticed that the W3 service fails occasionally. It's not that big of a deal, but the black box logic ex开发者_运维技巧pects the output to be in this "lynx like" text format.
So I would like a library to do the conversion (HTML->TXT) in "lynx style" inside the app and avoid the outages in the W3 service. And besides, the lynx output the probably the best I've seen, the most organized and neat.
Are you guys aware of any?
not sure what you mean by lynx style so I might be completely off by submitting this (if so please excuse me).
I used some piece of code a while back to check HTML/XML files (at the time I was just priting it out in the logs
InputStream in = context.getResources().openRawResource(id); StringBuffer inLine = new StringBuffer(); InputStreamReader isr = new InputStreamReader(in); BufferedReader inRd = new BufferedReader(isr);
String text; while ((text = inRd.readLine()) != null) { inLine.append(text); inLine.append("\n"); } in.close(); return inLine.toString();
I hope it helps but I got the feeling you need something more complex :P
After a year, I give up. Answer is: no way to handle that, no library in Java. At least for now.
I'm closing this. Thank you for your attention.
精彩评论