开发者

File sending 404 to fpf but viewable in Browser

If you visit this page in the browser: http://www.x-rates.com/d/TRY/table.html you can see th开发者_如何学运维at it works fine, but when I try to do $doc = new DOMDocument(); $doc->loadHTMLFile('http://www.x-rates.com/d/TRY/table.html'); it returns 404. I have also tried doing file_get_contents() and sending the html to DOMDocument that way, but no luck. Any help gratefully received.


404 looks like the standard response code you've given for the URL:

$ curl -I http://www.x-rates.com/d/TRY/table.html
HTTP/1.1 404 Not Found
Date: Mon, 01 Aug 2011 12:23:49 GMT
Server: Apache/2.2.19
Content-Type: text/html

You can acquire the HTTP response body and load it with DomDocument as a string.

This can be done with file_get_contentsDocs and setting the ignore_errors HTTP context option. Example code:

$url = 'http://www.x-rates.com/d/TRY/table.html';

// Create a stream
$opts = array(
  'http'=>array(
    'ignore_errors'=> true,
  )
);

$context = stream_context_create($opts);

// Open the file using the HTTP headers set above
$file = file_get_contents($url, false, $context);

$doc = new DOMDocument();
$doc->loadHTML($file);


The page is returning a 404, and I believe it is doing this deliberately to make it harder to scrape it. I found this on their site:

Fetching data with tools such as PHP, LWP, Java and Microsoft controls for example are not permitted

You might want to double-check that you are actually allowed to be doing what you are doing, I'm concerned you're potentially infringing copyright.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜