Whats the best way to crawl a batch of urls for a specific html element and retrieve the image?

2022-12-19 14:00 问答作者：

I'm looking to crawl ~100 webpages that are of the same structure, but the image I require is of a different name in each instance.

The image tag is located at:

#content div.artwork img.ar开发者_如何转开发twork

and I need the src url of that result to be downloaded.

Any ideas? I have the urls in a .txt file, and am on a mac os x box.

I am not sure how you can utilize a 'selector' like query on the file but a Perl regex might do the job just as well:

for url in `cat urls.txt`; do wget -O- $url; done | \
  perl -nle 'print $1 if /<img.+?class="artwork".+?src="([^"]+)"/'

精彩评论