what is the best html parser for java? [closed]
Want to improve this question? Update the questi开发者_运维百科on so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this questionAssuming we have to use java, what is the best html parser that is flexible to parse lots of different html content, and also requires not a whole lot of code to do complex types of parses?
I would recommend Jsoup for this. It has a very nice API with support for jQuery like CSS selectors and non-verbose element iteration. To take a copy of this answer as an example, this prints your own question and the name of all answerers here:
URL url = new URL("https://stackoverflow.com/questions/3121136");
Document document = Jsoup.parse(url, 3000);
String question = document.select("#question .post-text").text();
System.out.println("Question: " + question);
Elements answerers = document.select("#answers .user-details a");
for (Element answerer : answerers) {
System.out.println("Answerer: " + answerer.text());
}
An alternative would be XPath, but JSoup is more useful for webdevelopers who already have a good grasp on CSS selectors.
The best would be the one that gets the job done right.
There is a opensource one called tagsoup, and also jTidy
精彩评论