开发者

saving unknown files with curl w/ PHP 5.3.x

I'm trying to archive a web base forum that has attachments that users have posted. So far, I made use of the php cURL library to get the individual topics and have been able to save the raw pages. However, I now need to figure out a way to archive the attachments that are located on the site.

Here is the problem: Since the file type is not consistent, I need to find a way to save the files with the correct extension. Note that I plan to rename the file when I save it so that it's organized in a way that it can be easily found later.

The link to the attached files in a page is in the format:

<a href="https://example.com/get_file?fileId=4342343212223">some file.txt</a>

I've already used preg_match() to get the URL's to the attached files. My biggest problem now is now just making sure the fetched file i开发者_Go百科s saved in the correct format.

My question: Is there any way to get the file type efficiently? I'd rather not have to use a regular expression, but I'm not seeing any other way.


Does the server add the correct Content-Type header field when serving the files? You can then intercept that with setting CURLOPT_HEADER or file_get_contents + $http_response_header.

http://www.php.net/manual/en/reserved.variables.httpresponseheader.php


i would look into

http://www.php.net/manual/en/book.fileinfo.php

to see if you can automatically grab the file type when you get ahold of it.


you can use DOMDocument and DOMXpath to extract urls and filename safely.

$doc=new DOMDocument();
$doc->loadHTML($content);
$xpath= new DOMXpath($doc);
//query examples:
foreach($xpath->query('//a') as $node)
    echo $node->nodeValue;
foreach($xpath->query('//a/@href') as $node)
    echo $node->nodeValue;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜