Querypath and Malformed HTML
I'm using QueryPath to manipulate a pages DOM. The page I'm manipulating has some tags that QueryPath doesn't know how to interpret.
I've tried passing the following as options but I still get errors:
ignore_parser_warnings
use_parser (html)I get the following errors with these enabled:
Warning: 开发者_JAVA技巧DOMDocument::loadHTML() [domdocument.loadhtml]: Tag nobr invalid in Entity
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity
Any help would be greatly appreciated.
Use htmlqp()
instead of qp()
. The htmlqp()
function does a substantial amount of fixing for yucky HTML.
Try the libxml functions
libxml_use_internal_errors(TRUE);
$dom->load('whatever'); // or whatever you use for loading the DOM
libxml_clear_errors();
Instead of just clearing the erros, you can opt to handle them, though the above should be sufficient for most cases.
Just use an @ in front of your QueryPath functions to suppress the warnings. While invalid HTML may generate warnings, it can generally handle it just fine.
精彩评论