开发者

Perplexed by simple XPath bug

<?php
$response = 
'<style><div id="subhead"></div></style>';
//echo $response;

$doc = new DOMDocument();

$doc->loadHTML($response);  

$finder = new DomXPath($doc);

$term_select = $finder->query('//div[@id="subhead"]');

var_dump($term_select->item(0));

?>

The var_dump gets NULL, and I also get this Warning on line 8:

Warning: DOMDocument::loadHTML(): Unexp开发者_如何学Goected end tag : div in Entity, line: 1 on line 8

Note that this is not my HTML (I'm scraping), so changing the HTML is not an option.


The problem is that you can't have a DIV element instead a STYLE one so when you use loadHTML, it fails to validate the HTML. If you did a $doc->saveHTML(); you would have quickly realized that it's wrapping the <div id="subhead"> in CDATA.

To solve the problem, use loadXML() instead.

$doc->loadXml($response);


loadHTML() expects to find HTML in the string, but that is not valid HTML, so the string does not get loaded properly. XPath will not have that <div> element to get to. Try loadXML() instead.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜