Fast Remote PHP Technique To Detect Image 404
What PHP script technique runs the fastest in detecting if a remote image does not exist before I include the image? I mean, I don't want to download all the bytes of the remote image -- just enough to detect if it exists.
And while on the subject but with just a slight deviation, I'd like to download just enough bytes to determine a JPEG's width and height information.
Speed开发者_开发百科 is very important in my concern here on this system design I'm working on.
I've modified the @Volomike's code to get width too. Here you go...
function get_image_dim($sURL) { // note that for jpeg you may need to change 300 to a larger value, // as some height/width info is farther out in the header try { $hSock = @ fopen($sURL, 'rb'); if ($hSock) { while(!feof($hSock)) { $vData = fread($hSock, 300); break; } fclose($hSock); if (strpos(' ' . $vData, 'JFIF')>0) { $vData = substr($vData, 0, 300); $asResult = unpack('H*',$vData); $sBytes = $asResult[1]; $width = 0; $height = 0; $hex_width = ''; $hex_height = ''; if (strstr($sBytes, 'ffc2')) { $hex_height = substr($sBytes, strpos($sBytes, 'ffc2') + 10, 4); $hex_width = substr($sBytes, strpos($sBytes, 'ffc2') + 14, 4); } else { $hex_height = substr($sBytes, strpos($sBytes, 'ffc0') + 10, 4); $hex_width = substr($sBytes, strpos($sBytes, 'ffc0') + 14, 4); } $width = hexdec($hex_width); $height = hexdec($hex_height); return array('width' => $width, 'height' => $height); } elseif (strpos(' ' . $vData, 'GIF')>0) { $vData = substr($vData, 0, 300); $asResult = unpack('h*',$vData); $sBytes = $asResult[1]; $sBytesH = substr($sBytes, 16, 4); $height = hexdec(strrev($sBytesH)); $sBytesW = substr($sBytes, 12, 4); $width = hexdec(strrev($sBytesW)); return array('width' => $width, 'height' => $height); } elseif (strpos(' ' . $vData, 'PNG')>0) { $vDataH = substr($vData, 22, 4); $asResult = unpack('n',$vDataH); $height = $asResult[1]; $vDataW = substr($vData, 18, 4); $asResult = unpack('n',$vDataW); $width = $asResult[1]; return array('width' => $width, 'height' => $height); } } } catch (Exception $e) {} return FALSE; }
So, using it we have...
// jpeg $url = 'http://upload.wikimedia.org/wikipedia/commons/thumb/c/ce/Quality_comparison_jpg_vs_saveforweb.jpg/250px-Quality_comparison_jpg_vs_saveforweb.jpg'; // png //$url = 'http://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/280px-PNG_transparency_demonstration_1.png'; // gif //$url = 'http://upload.wikimedia.org/wikipedia/commons/e/e2/Sunflower_as_gif_small.gif'; $dim = get_image_dim($url); print_r($dim);
Run a cURL
that does a HEAD
request insted of a full GET
I didn't test this, but hopefully you'll get the idea:
<?php
$url = 'http://www.example.com/image.gif';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_NOBODY, true); // this is what sets it as HEAD request
curl_exec($ch);
if (curl_getinfo($ch, CURLINFO_HTTP_CODE) == '200') { // 200 = OK
// image exists ..
}
curl_close($ch);
?>
See cURL docuentation for more information about cURL.
You should be able to determine a JPEG's dimensions without loading up its entire contents. For baseline JPEGs, that is, non-progressive-scan JPEGs, scan in bytes until you come across 0xFFC0. Skip the next three bytes. The next two bytes indicate the height. They are followed by two more bytes that indicate the width.
For example, in "FF C0 00 11 08 01 DE 02 D0", 01DE represents a height of 478 and 02D0 represents a width of 720.
I'd send a GET request that contains a RANGE header to limit the actual data transfer where possible (the remote server might not honour the RANGE request but it's still worth a try). It probably doesn't make much difference whether you use sockets (directly) or curl to make the requests. But... you never know without benchmarks. For curl take a look at the "CURLOPT_RANGE" option at http://docs.php.net/function.curl-setopt
It probably doesn't fit your profile ("several an hour, on a server with only slim CPU power available.") but you might want to try handling multiple urls at a time, i.e. having multiple active connections and only handle those that won't block on a read operation. If the limiting factor is mostly/only cpu power ...forget this part. sockets: Take a look at stream_select curl: see curl_multi_exec()
If the curl module is unavailable you can also use the http url wrapper in combination with stream_context_create() to send a request containing a RANGE header.
Looks like you've already figured out what to do with the data once you've received it.
I think the following routine will retrieve just the image heights for JPG, GIF, and PNG, or return an === FALSE condition on a 404 or other image type. The routine also does this with the least server resources because the file_get_contents() route appears to actually download the file even with byte restriction added in, as does getimagesize() download the file. You can see the performance hit compared to this.
The way this routine works is that it downloads just 300 bytes from the file. Unfortunately JPEG pushes its height value pretty far out in a file unlike GIF or PNG and so I had to read the file that far out in bytes. Then, with those bytes, it scans for JFIF, PNG, or GIF in that header to let us know which file type it is. Once we have that, we then use unique routines on each to parse the header. Note that JPEG must first use unpack() with H* and then scan for ffc2 or ffc0 and process. GIF, however, must first unpack() with h* (big difference there).
This function was created by me with trial and error, and could be wrong. I ran it on several images and it appears to work good. If you find a fault in it, consider letting me know.
Anyway, this system will let me determine an image height and discard the image and find another if too tall. On whatever random image I find, I set width in the IMG tag of the HTML and it automatically resizes the height -- but looks good only if the image is under a certain height. As well, it does a 404 check to see if the image that was returned by another server to me was not for an image that no longer exists or which prohibits cross-site linking. And since I am manually setting the images to a fixed width, I don't care to read the image width. You can adapt this function and usually look just a few small bytes forward to find image widths should you want to do so.
function getImageHeight($sURL) {
try {
$hSock = @ fopen($sURL, 'rb');
if ($hSock) {
while(!feof($hSock)) {
$vData = fread($hSock, 300);
break;
}
fclose($hSock);
if (strpos(' ' . $vData, 'JFIF')>0) {
$vData = substr($vData, 0, 300);
$asResult = unpack('H*',$vData);
$sBytes = $asResult[1];
if (strstr($sBytes, 'ffc2')) {
$sBytes = substr($sBytes, strpos($sBytes, 'ffc2') + 10, 4);
} else {
$sBytes = substr($sBytes, strpos($sBytes, 'ffc0') + 10, 4);
}
return hexdec($sBytes);
} elseif (strpos(' ' . $vData, 'GIF')>0) {
$vData = substr($vData, 0, 300);
$asResult = unpack('h*',$vData);
$sBytes = $asResult[1];
$sBytes = substr($sBytes, 16, 4);
$sBytes = strrev($sBytes);
return hexdec($sBytes);
} elseif (strpos(' ' . $vData, 'PNG')>0) {
$vData = substr($vData, 22, 4);
$asResult = unpack('n',$vData);
$nHeight = $asResult[1];
return $nHeight;
}
}
} catch (Exception $e) {}
return FALSE;
}
Store images locally. That's very simple and guaranteed solution.
精彩评论