php curl CURLOPT_HEADER and DOM
I have the following code:
curl_setopt($ch, CURLOPT_URL, $host);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
$html = curl_exec($ch);
preg_match_all('|Set-Cookie: (.*);|U', $html, $results);
$cookies = implode(';', $results[1]);
$dom = new DOMDocument();
$dom->loadHTML($html);
on line $dom->loadHTML($html); I am getting the following error:
Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: Misplaced DOCTYPE declaration in Entity, line: 12 in D:\Programs\xampp\xampp\htdocs\ip\megafonmoscow.php on line 39 Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: htmlParseStartTag: misplaced tag in Entity, line: 13 in D:\Programs\xampp\xampp\htdocs\ip\megafonmoscow.php on line 39 Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: htmlParseStartTag: misplaced tag in Entity, line: 14 in D:\Programs\xampp\xampp\htdocs\ip\megafonmoscow.php on line 39 Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: Unexpected end tag : head in Entity, line: 32 in D:\Programs\xampp开发者_JS百科\xampp\htdocs\ip\megafonmoscow.php on line 39 Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: htmlParseStartTag: misplaced tag in Entity, line: 34 in D:\Programs\xampp\xampp\htdocs\ip\megafonmoscow.php on line 39
Is the line curl_setopt($ch, CURLOPT_HEADER, 1);
cause of this error? I need it because of the cookies. Any ideas on how to solve this?
The alternative to mck89's approach is to download headers and body together, but to split them before you try to parse it:
$html = curl_exec($ch);
[snip]
$html = preg_replace('/^.*\n\n/s','',$html,1); // strip out everything before & including the double line break between headers and body
$dom = new DOMDocument();
$dom->loadHTML($html);
This saves an HTTP request and therefore a certain amount of time.
Try to remove that line so that it won't return headers and then use the get_headers functions to get them after the curl request.
curl_setopt($ch, CURLOPT_URL, $host);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
$html = curl_exec($ch);
$headers=get_headers($host, 1);
精彩评论