Java API for web scraping or web mining [duplicate]
I'm looking for a good Java api to do web scraping. I tried WEB-Harvest api but I think it's a bit clunky. Any other suggestions?
I've used httpunit to do just this task in production.
(Maven Dependency)
I use this:
It supports HttpClient and HtmlUnit (headless browser that supports javascript) and parallelizes it if required over a large pool of proxies. I can also recommend JSoup for static html processing.