开发者

Get images from a URL and retrieve them all to select the one which is the biggest

I want to open a URL and RegEx all the image's URLs from the page. Then I want to cURL all of them and check what size they have. In the end I 开发者_运维知识库want to get the biggest one. How do I do this?


You could start with getting the URL using curl, saving it in a variable.
Then you could apply a regex like this one: <img.*?src=['"](.*?)['"]>

Check if the source starts with http or is a relative link, if its a relative link you can prepend the url of the page.

Finally get the size of the images using getimagesize() http://php.net/manual/en/function.getimagesize.php


Use the php DOM to find the images.

I have not tested this code at all, but it should get you headed in the right direction.

$urls = array();
$dom = DOMDocument::loadHTML(YOUR_HTML);
$imgList = $dom->getElementsByTagName('img');
$imgCount = $imgList->length;
for ($i = 0; $i < $imgCount; $i++) {
    $imgElement = $imgList->item($i);
    if ($imgElement->hasAttribute('src')) {
        $urls[] = $imgElement->getAttribute('src');
    }
}

If you want to get linked images, you can change 'img'/'src' to 'a'/'href'. But you will need to find a way to filter the list to get only images.

You did not say what your criteria is for image size, so I can't help you there. Do you want maximum file size or resolution?


It might be already obvious by now, use a DOM parser, not regex. Just get all elements by tag name <img> and then get for each the URL from its src attribute. To determine the image's size without downloading the entire image, you'd probably like to fire a HTTP HEAD request instead and then determine the Content-Length header in the obtained response. The http_head() may be useful in this.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜