Java web scraper [closed]
开发者_高级运维
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this questionWhat is the best library for a Java web scraper? I know the following choices:
- Selenium
- HTMLUnit
- Lobo browser
I need to select one option to build a scraper for one scalable project.
If you are scraping, why do you need a browser? Just doing basic cURL calls to a page and getting the response will give you what you need to do scraping.
This will help with scalability. If you want a browser then go for HTMLUnit as that would again help with scalability.
I was recently recommended Web Harvest, and thought it worked well out of the box, except for some issues around HTTP 500 response codes ...
Use jsoup, it works great to get the response from URL and then use the XPath Expression to parse data from the response. I've implemented this and it works great.
精彩评论