开发者

Best screen scraper, simple html dom or snoopy?

which one is better for screen scraping? simple html dom or snoopy ?? i use simple html dom and find it comfortable.. does snoopy has any advantage over simp开发者_运维技巧le html dom?

my requirements : if i wanna scrape contents from a page(after login).. simple html dom is easy but it takes a lotta time to print the results..


Is Snoopy that well known / mature of a package?

If it's not, then all other things being equal, I'd probably go with generic HTML DOM code - especially if the scraping is somewhat simple.

But only you know when your code is starting to get too big, unmanageable, etc., at which point it might be better to look at another tool out there like Snoopy.

(Which, admittedly, I don't have experience with; it's apparently at http://sourceforge.net/projects/snoopy/ for those not familiar with it - "Snoopy is a PHP class that simulates a web browser. It automates the task of retrieving web page content and posting forms, for example.")

The real reason I'm posting, even though I don't know Snoopy per se and thus can't definitively answer your question, is to ask if you've considered using Selenium (http://www.seleniumhq.org/) instead of Snoopy.

Selenium is a fairly well-known testing tool, and it occurred to me that one of the nice things about using that for what you're doing (if you can) is that it has built in tests.

The reason that's good is that screen scraping is kind of an inherently brittle task - if the target site changes something, blam, your scraping fails. So it's kind of a nice design to have an automated scrape/test-that-scraping-worked system.

Something to think about, anyway.


I've stumbled into BeautifulSoup, which is Python-based. I suppose there are a bunch of others too.

Looks like Snoopy is PHP-based, and hence can be run server-side only. Is this what you are really looking for? What are your requirements? Please elaborate on that.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜