How to Programmatically take Snapshot of Crawled Webpages (in Ruby)?

2022-12-10 02:59 问答作者：

What is the best solution to programmatically take a snapshot of a webpage?

The situation is this: I would like to crawl a bunch of webpages and take thumbnail snapshots of them periodically, say once every few months, without having to manually go to each one. I would also like to be a开发者_Python百科ble to take jpg/png snapshots of websites that might be completely Flash/Flex, so I'd have to wait until it loaded to take the snapshot somehow.

It would be nice if there was no limit to the number of thumbnails I could generate (within reason, say 1000 per day).

Any ideas how to do this in Ruby? Seems pretty tough.

Browsers to do this in: Safari or Firefox, preferably Safari.

Thanks so much.

This really depends on your operating system. What you need is a way to hook into a web browser and save that to an image.

If you are on a Mac - I would imagine your best bet would be to use MacRuby (or RubyCocoa - although I believe this is going to be deprecated in the near future) and then to use the WebKit framework to load the page and render it as an image.

This is definitely possible, for inspiration you may wish to look at the Paparazzi! and webkit2png projects.

Another option, which isn't dependent on the OS, might be to use the BrowserShots API.

There is no built in library in Ruby for rendering a web page.

Using Selenium & Ruby is one possibility. You can run Firefox as a headless browser (ie on a server).
Here is the source code for browser shots. http://sourceforge.net/projects/browsershots/files/
If you are using Linux you could use http://khtml2png.sourceforge.net/ and script it via Ruby.
Some paid services to try and automate
- http://webthumb.bluga.net/home
- http://www.thumbalizr.com

as viewed by.... ie? firefox? opera? one of the myriad webkit engines?

if only it were possible to automate http://browsershots.org :)

Use selenium-rc, it comes with snapshot capabilities.

With jruby you can use SWT's browser library.

继续阅读：ruby snapshot web-crawler

How to Programmatically take Snapshot of Crawled Webpages (in Ruby)?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？