XHTML namespace issues with cssselect in lxml
I have problems using cssselect with a XHTML (or XML with namespace). Although the documentation says how to use namespace in csselect I do not understand it: cssselect namespaces
My Input XHTML string:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Teststylesheet</title>
<style type="text/css">
/*<![CDATA[*/
ol{margin:0;padding:0}
/*]]>*/
</style>
</head>
<body>
</body>
</html>
My Python Script:
parser = etree.XMLParser()
tree = etree.fromstrin开发者_运维知识库g(xhtmlstring, parser).getroottree()
for style in CSSSelector("style")(tree):
print "HAVE CSS!"
The python script does not print any Have CSS!
. Using the etree.HTMLParser
instead of etree.XMLParser
works but I really want to use the XMLParser and keep everything (namespace, structure) of the XHTML.
Can anybody help me with this namespace problem?
The doc string for cssselect.CSSSelector (version 2.0) shows how to use namespaces:
class CSSSelector(etree.XPath):
""" ...
To use CSS namespaces, you need to pass a prefix-to-namespace
mapping as ``namespaces`` keyword argument::
>>> rdfns = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
>>> select_ns = cssselect.CSSSelector('root > rdf|Description',
... namespaces={'rdf': rdfns})
>>> rdf = etree.XML((
... '<root xmlns:rdf="%s">'
... '<rdf:Description>blah</rdf:Description>'
... '</root>') % rdfns)
>>> [(el.tag, el.text) for el in select_ns(rdf)]
[('{http://www.w3.org/1999/02/22-rdf-syntax-ns#}Description', 'blah')]
"""
If you've tried this but your version of cssselect.CSSSelector
does not have a namespaces
parameter, then your version of lxml may need to be upgraded.
精彩评论