Is this site blocking/ignoring my HTTP requests?
I'm only able to fetch from a site when I use cURL with a proxy. cURL without a proxy and file_get_contents()
return nothing开发者_如何转开发 (cURL HTTP code "0" and curl_error()
Empty reply from server
). I'm able to fetch other sites just fine without a proxy.
Aside from being blocked, is there any other possible explanation of why I can only access this site via proxy?
Did you set a USER AGENT in cURL? Sometimes websites will block you if your USER AGENT isn't set or if your HTTP request looks suspicious.
To set your USER AGENT in PHP:
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
Is this from your workplace or something? Many companies disable file_get_contents()
on shared PHP installs, as it's quite risky.
The site probably has user agent detection. You can fake that in your curl call but I don't believe that's possible with file_get_contents()
. Another method sites use is to only display content once a cookie has been set so site scrapers will never see the data.
Try this:
function curl_scrape($url,$data,$proxy,$proxystatus)
{
$fp = fopen("cookie.txt", "w");
fclose($fp);
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_TIMEOUT, 40);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
if ($proxystatus == 'on')
{
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
}
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
ob_start(); // prevent any output
return curl_exec ($ch); // execute the curl command
ob_end_clean(); // stop preventing output
curl_close ($ch);
unset($ch);
}
I'm guessing I was truly blocked. Using proxy now and it works fine.
精彩评论