How to Programmatically take Snapshot of Crawled Webpages (in Ruby)?
What is the best solution to programmatically take a snapshot of a webpage?
The situation is this: I would like to crawl a bunch of webpages and take thumbnail snapshots of them periodically, say once every few months, without having to manually go to each one. I would also like to be a开发者_Python百科ble to take jpg/png snapshots of websites that might be completely Flash/Flex, so I'd have to wait until it loaded to take the snapshot somehow.
It would be nice if there was no limit to the number of thumbnails I could generate (within reason, say 1000 per day).
Any ideas how to do this in Ruby? Seems pretty tough.
Browsers to do this in: Safari or Firefox, preferably Safari.
Thanks so much.
This really depends on your operating system. What you need is a way to hook into a web browser and save that to an image.
If you are on a Mac - I would imagine your best bet would be to use MacRuby (or RubyCocoa - although I believe this is going to be deprecated in the near future) and then to use the WebKit framework to load the page and render it as an image.
This is definitely possible, for inspiration you may wish to look at the Paparazzi! and webkit2png projects.
Another option, which isn't dependent on the OS, might be to use the BrowserShots API.
There is no built in library in Ruby for rendering a web page.
Using Selenium & Ruby is one possibility. You can run Firefox as a headless browser (ie on a server).
Here is the source code for browser shots. http://sourceforge.net/projects/browsershots/files/
If you are using Linux you could use http://khtml2png.sourceforge.net/ and script it via Ruby.
Some paid services to try and automate
- http://webthumb.bluga.net/home
- http://www.thumbalizr.com
as viewed by.... ie? firefox? opera? one of the myriad webkit engines?
if only it were possible to automate http://browsershots.org :)
Use selenium-rc, it comes with snapshot capabilities.
With jruby you can use SWT's browser library.
精彩评论