开发者

Web crawler in ruby [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 9 years ago.

What is your recommendatio开发者_如何学Pythonn of writing a web crawler in Ruby? Any lib better than mechanize?


I'd give a try to anemone. It's simple to use, especially if you have to write a simple crawler. In my opinion, It is well designed too. For example, I wrote a ruby script to search for 404 errors on my sites in a very short time.


If you want just to get pages' content, the simpliest way is to use open-uri functions. They don't require additional gems. You just have to require 'open-uri' and... http://ruby-doc.org/stdlib-2.2.2/libdoc/open-uri/rdoc/OpenURI.html

To parse content you can use Nokogiri or other gems, which also can have, for example, useful XPATH-technology. You can find other parsing libraries just here on SO.


You might want to check out wombat that is built on top of Mechanize/Nokogiri and provides a DSL (like Sinatra, for example) to parse pages. Pretty neat :)


I am working on pioneer gem which is not a spider, but a simple asynchronous crawler based on em-synchrony gem


I just released one recently called Klepto. Its got a pretty simple DSL, is built on top of capybara and has lot of cool configuration options.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜