Automatically refresh and download Asirra images

2022-12-08 15:08 问答作者：

If you're unfamiliar with Asirra, it's a CAPTCHA technique developed by microsoft that uses the identification of cats and dogs rather than a string of text for human verification.

I'd like to use their database of millions of pictures of cats and dogs for some machine learning experiments, and so I'm trying to write a script that will automatically refresh their site and download 12 images at a regular interval. Unfortunately, I'm a novice when it comes to JavaScript.

The problem is, for very obvious security reasons, it's hard to find the actual url of the image because it's all behind obfuscated javascript. I tried using Curl to see what html was returned using a terminal app, and it's the same deal - just javascript. So, using a script, how do I get access the actual images? Obviously the images are being transferred to my computer since they're showing up on my screen, but I don't know how to capture those images using a script.

Also a problem is that I don't want the smaller images that first load, I need the larger ones that only show up only when you mouse over them, so I guess I need to overwrite that javascript function to give the larger images to me via the script as well.

开发者_运维百科

I'd prefer something in Python or C#, but I'll take anything - thanks!

Edit: Their public corpus doesn't have near enough images for my uses, so that won't work. Also, I'm not asking necessarily for you to write me my script, just some guidance on how to access the full-size images using a script.

Try using their public corpus http://research.microsoft.com/en-us/projects/asirra/corpus.aspx

While waiting for an answer here I kept digging and eventually figured out a sort of hacked way of getting done what I wanted.

First off, the reason this is a somewhat complicated problem (at least to a javascript novice like me) is that the images from ASIRRA are loaded onto the webpage via javascript, which is a client-side technology. This is a problem when you download the webpage using something like wget or curl because it doesn't actually run the javascript, it just downloads the source html. Therefore, you don't get the images.

However, I realized that using firefox's "Save Page As..." did exactly what I needed. It ran the javascript which loaded the images, and then it saved it all into the well-known directory structure on my hard drive. That's exactly what I wanted to automate. So... I found a firefox Add-on called "iMacros" and wrote this macro:

VERSION BUILD=6240709 RECORDER=FX
TAB T=1
URL GOTO=http://www.asirra.com/examples/ExampleService.html
SAVEAS TYPE=CPL FOLDER=C:\Cat-Dog\Downloads  FILE=*

Set to loop 10,000 times, it worked perfectly. In fact, since it was always saving to the same folder, duplicate images were overwritten (which is what I wanted).

继续阅读：javascript

Automatically refresh and download Asirra images

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？