开发者

Java web scraper [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
开发者_高级运维

We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.

Closed 2 years ago.

Improve this question

What is the best library for a Java web scraper? I know the following choices:

  1. Selenium
  2. HTMLUnit
  3. Lobo browser

I need to select one option to build a scraper for one scalable project.


If you are scraping, why do you need a browser? Just doing basic cURL calls to a page and getting the response will give you what you need to do scraping.

This will help with scalability. If you want a browser then go for HTMLUnit as that would again help with scalability.


I was recently recommended Web Harvest, and thought it worked well out of the box, except for some issues around HTTP 500 response codes ...


Use jsoup, it works great to get the response from URL and then use the XPath Expression to parse data from the response. I've implemented this and it works great.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜