开发者

what is the best method or tool to scrape web sites? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or e开发者_运维百科xtended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 9 years ago.

i need to scrape (with approval) web sites before I start to write my own what is the best tool/way to scrape web sites, which is both fast (multithreaded) and easy to learn?


Take a look at this recent blog post by Lee Holmes. He wrote a pretty cool screen scraper using Powershell and the HTML Agility Pack.


Consider using TestPlan. It has a display-less browser mode for fast scraping. The scripting language is very simple and quick to learn the basics.


TagSoup, a SAX-compliant parser written in Java, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short.

Details here: http://mercury.ccil.org/~cowan/XML/tagsoup/


Have you taken a look at this - https://scraperwiki.com/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜