开发者

parsing html with DOMDocument

I'm parsing html with DOMDocument in php.

I found I'm unable to select all using an xpath query. However the getElementsByTagName() method works fine.

Here is the code:

$xml = new DOMDocument();
$xml->load("file.html");
$xpath = new DOMXPath($xml);

$links = $xpath->query("//a");
$links2 = $xml->getElementsByTagName("a");

foreach($links as $link){
    echo "<br>$k: ".$link->nodeValue; // this doesn't print the node value. $links is empty
}
foreach($link开发者_如何学Cs2 as $link){
    echo "<br>$k: ".$link->nodeValue; // this prints OK the node value
}

I'd have thought xpath->query("//a") would be the same as getElementsByTagname("a") but apparently isn't.

Could anybody tell me why they aren't the same. Or if they are, what am I doing wrong to select the nodes using the xpath query?

Thank you


Cannot reproduce: http://codepad.org/N8BlsQro

If you want to use load or loadXML your markup has to be valid X(HT)ML. HTML is SGML based. Try with loadHTML or loadHTMLFile.

Note that when you use loadHTML or loadHTMLFile, DOM will try to repair any invalid HTML to an extent that it is workable for DOM. For instance, it will add a basic HTML skeleton around any partial HTML documents and that can have an effect on your XPath queries (not in the case of \\a though).


Try:

$links = $xpath->query('//a/@href');
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜