What is the best way to check if a URL exists in PHP?
What is the best way to see a 开发者_JAVA百科URL exists and the response is not a 404 ?
You can use get_headers($url)
Example 2 from Manual:
<?php
// By default get_headers uses a GET request to fetch the headers. If you
// want to send a HEAD request instead, you can do so using a stream context:
stream_context_set_default(
array(
'http' => array(
'method' => 'HEAD'
)
)
);
print_r(get_headers('http://example.com'));
// gives
Array
(
[0] => HTTP/1.1 200 OK
[Date] => Sat, 29 May 2004 12:28:14 GMT
[Server] => Apache/1.3.27 (Unix) (Red-Hat/Linux)
[Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT
[ETag] => "3f80f-1b6-3e1cb03b"
[Accept-Ranges] => bytes
[Content-Length] => 438
[Connection] => close
[Content-Type] => text/html
)
The first array element will contain the HTTP Response Status code. You have to parse that.
Note that the get_headers
function in the example will issue an HTTP HEAD request, which means it will not fetch the body of the URL. This is more efficient than using a GET request which will also return the body.
Also note that by setting a default context, any subsequent calls using an http stream context, will now issue HEAD requests. So make sure to reset the default context to use GET again when done.
PHP also provides the variable $http_response_header
The
$http_response_header
array is similar to theget_headers()
function. When using the HTTP wrapper,$http_response_header
will be populated with the HTTP response headers.$http_response_header
will be created in the local scope.
If you want to download the content of a remote resource, you don't want to do two requests (one to see if the resource exists and one to fetch it), but just one. In that case, use something like file_get_contents
to fetch the content and then inspect the headers from the variable.
@Gordon - Here is a more complete library routine based on your answer. It includes some preliminary checking for URL validity, some more error handling, and parsing of the returned headers. It also follows any redirect chains for a reasonable number of steps.
class cLib {
static $lasterror = 'No error set yet';
/**
* @brief See with a URL is valid - i.e. a page can be successfully retrieved from it without error
* @param string $url The URL to be checked
* @param int $nredirects The number of redirects check so far
* @return boolean True if OK, false if the URL cannot be fetched
*/
static function checkUrl($url, $nredirects = 0) {
// First, see if the URL is sensible
if (filter_var($url, FILTER_VALIDATE_URL) === false) {
self::$lasterror = sprintf('URL "%s" did not validate', $url);
return false;
}
// Now try to fetch it
$headers = @get_headers($url);
if ($headers == false) {
$error = error_get_last();
self::$lasterror = sprintf('URL "%s" could not be read: %s', $url, $error['message']);
return false;
}
$status = $headers[0];
$rbits = explode(' ', $status);
if (count($rbits) < 2) {
self::$lasterror = sprintf('Cannot parse status "%s" from URL "%s"', $status, $url);
return false;
}
if (in_array($rbits[1], array(301, 302, 304, 307, 308))) {
// This URL has been redirected. Follow the redirection chain
foreach ($headers as $header) {
if (cLib::startsWith($header, 'Location:')) {
if (++$nredirects > 10) {
self::$lasterror = sprintf('URL "%s" was redirected over 10 times: abandoned check', $url);
return false;
}
return self::checkUrl(trim(substr($header, strlen('Location:'))), $nredirects);
}
}
self::$lasterror = sprintf('URL "%s" was redirected but location could not be identified', $url);
return false;
}
if ($rbits[1] != 200) {
self::$lasterror = sprintf('URL "%s" returned status "%s"', $url, $status);
return false;
}
return true;
}
}
With apologies to @FranciscoLuz - if you're expecting errors based on user input, the "@ and error_get_last" method seems perfectly sensible to me - I don't see that there's anything more proper about using set_error_handler.
BTW, not sure if I should have done this as an edit to @Gordon's answer rather than as a separate answer. Can someone advise?
public function isLink($url)
{
$result = false;
if (!filter_var($url, FILTER_VALIDATE_URL) === false) {
$getHeaders = get_headers($url);
$result = strpos($getHeaders[0], '200') !== false;
}
return $result;
}
I'm using this function as it also validates and returns the protocol of the URL if not found.
$theUrl = 'google.com';
function isValidURL($url) {
$urlRegex = '@(http(s)?)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])@';
if(preg_match($urlRegex, $url)){
return preg_replace($urlRegex, "http$2://$4", $url);
} else {
return false;
}
}
var_dump(isValidURL($theUrl));
A way I have developed to identify whether a URL is indeed existing or not is the following scrypt. It can be improved by more finely analyzing error returns. There I performed a simple error return by estimating that only URLs with "could not resolve host" are wrong.
function URL_EXIST($pUrl)
{
$etat = true;
$ch = curl_init($pUrl);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
if (curl_exec($ch) === false)
{
$mes = strtolower(curl_error($ch));
$cdt_wrong = preg_match('#could not resolve host#',$mes);
$cdt_wrong |= preg_match('#404 not found#',$mes);
if($cdt_wrong==true)
{
$etat = false;
}
}
curl_close($ch);
return $etat;
}
with some exemples, it is working good
精彩评论