parsing html with DOMDocument
I'm parsing html with DOMDocument in php.
I found I'm unable to select all using an xpath query. However the getElementsByTagName() method works fine.
Here is the code:
$xml = new DOMDocument();
$xml->load("file.html");
$xpath = new DOMXPath($xml);
$links = $xpath->query("//a");
$links2 = $xml->getElementsByTagName("a");
foreach($links as $link){
echo "<br>$k: ".$link->nodeValue; // this doesn't print the node value. $links is empty
}
foreach($link开发者_如何学Cs2 as $link){
echo "<br>$k: ".$link->nodeValue; // this prints OK the node value
}
I'd have thought xpath->query("//a") would be the same as getElementsByTagname("a") but apparently isn't.
Could anybody tell me why they aren't the same. Or if they are, what am I doing wrong to select the nodes using the xpath query?
Thank you
Cannot reproduce: http://codepad.org/N8BlsQro
If you want to use load
or loadXML
your markup has to be valid X(HT)ML. HTML is SGML based. Try with loadHTML
or loadHTMLFile
.
Note that when you use loadHTML
or loadHTMLFile
, DOM will try to repair any invalid HTML to an extent that it is workable for DOM. For instance, it will add a basic HTML skeleton around any partial HTML documents and that can have an effect on your XPath queries (not in the case of \\a
though).
Try:
$links = $xpath->query('//a/@href');
精彩评论