automate finding all images used by webpage
When I load a page with Firebug I can see a list of all the i开发者_Python百科mages required by the site. How can I automate finding a list of the image URLs used by a webpage, including those referenced in external CSS?
With PHP Simple HTML DOM Parser it is as easy as:
$html = file_get_html('http://www.google.com/');
$ret = $html->find('img');
Simple HTML DOM parser also includes options to get attributes of each object, so you should be able to grab the URL easily. Something like:
$URL = $ret->src;
(This looks through the DOM, so I assume it will find images inserted by CSS, but I have not had a chance to test it.)
There are a few Firefox extensions that deal with downloading images from a web page. How about trying the "Image Download" add-on?
in the end I used webkit to load each webpage and watch the resources downloaded
精彩评论