开发者

php curl CURLOPT_HEADER and DOM

I have the following code:

curl_setopt($ch, CURLOPT_URL, $host);
  curl_setopt($ch, CURLOPT_HEADER, 1); 
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
  curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
  $html = curl_exec($ch);


  preg_match_all('|Set-Cookie: (.*);|U', $html, $results);  
  $cookies = implode(';', $results[1]);


  $dom = new DOMDocument();
  $dom->loadHTML($html);

on line $dom->loadHTML($html); I am getting the following error:

Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]:
Misplaced DOCTYPE declaration in
Entity, line: 12 in
D:\Programs\xampp\xampp\htdocs\ip\megafonmoscow.php
on line 39

Warning: DOMDocument::loadHTML()
[function.DOMDocument-loadHTML]:
htmlParseStartTag: misplaced 
tag in Entity, line: 13 in
D:\Programs\xampp\xampp\htdocs\ip\megafonmoscow.php
on line 39

Warning: DOMDocument::loadHTML()
[function.DOMDocument-loadHTML]:
htmlParseStartTag: misplaced 
tag in Entity, line: 14 in
D:\Programs\xampp\xampp\htdocs\ip\megafonmoscow.php
on line 39

Warning: DOMDocument::loadHTML()
[function.DOMDocument-loadHTML]:
Unexpected end tag : head in Entity,
line: 32 in
D:\Programs\xampp开发者_JS百科\xampp\htdocs\ip\megafonmoscow.php
on line 39

Warning: DOMDocument::loadHTML()
[function.DOMDocument-loadHTML]:
htmlParseStartTag: misplaced 
tag in Entity, line: 34 in
D:\Programs\xampp\xampp\htdocs\ip\megafonmoscow.php
on line 39

Is the line curl_setopt($ch, CURLOPT_HEADER, 1); cause of this error? I need it because of the cookies. Any ideas on how to solve this?


The alternative to mck89's approach is to download headers and body together, but to split them before you try to parse it:

$html = curl_exec($ch);

[snip]

$html = preg_replace('/^.*\n\n/s','',$html,1); // strip out everything before & including the double line break between headers and body

$dom = new DOMDocument();
$dom->loadHTML($html);

This saves an HTTP request and therefore a certain amount of time.


Try to remove that line so that it won't return headers and then use the get_headers functions to get them after the curl request.

  curl_setopt($ch, CURLOPT_URL, $host);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
  curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
  $html = curl_exec($ch);
  $headers=get_headers($host, 1);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜