开发者

XHTML namespace issues with cssselect in lxml

I have problems using cssselect with a XHTML (or XML with namespace). Although the documentation says how to use namespace in csselect I do not understand it: cssselect namespaces

My Input XHTML string:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>Teststylesheet</title>
  <style type="text/css">
  /*<![CDATA[*/
  ol{margin:0;padding:0}
  /*]]>*/
  </style>
</head>
<body>
</body>
</html>

My Python Script:

parser = etree.XMLParser()    
tree = etree.fromstrin开发者_运维知识库g(xhtmlstring, parser).getroottree()
for style in CSSSelector("style")(tree):
  print "HAVE CSS!"

The python script does not print any Have CSS!. Using the etree.HTMLParser instead of etree.XMLParser works but I really want to use the XMLParser and keep everything (namespace, structure) of the XHTML.

Can anybody help me with this namespace problem?


The doc string for cssselect.CSSSelector (version 2.0) shows how to use namespaces:

class CSSSelector(etree.XPath):
    """ ...
    To use CSS namespaces, you need to pass a prefix-to-namespace
    mapping as ``namespaces`` keyword argument::

        >>> rdfns = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
        >>> select_ns = cssselect.CSSSelector('root > rdf|Description',
        ...                                   namespaces={'rdf': rdfns})

        >>> rdf = etree.XML((
        ...     '<root xmlns:rdf="%s">'
        ...       '<rdf:Description>blah</rdf:Description>'
        ...     '</root>') % rdfns)
        >>> [(el.tag, el.text) for el in select_ns(rdf)]
        [('{http://www.w3.org/1999/02/22-rdf-syntax-ns#}Description', 'blah')]
    """

If you've tried this but your version of cssselect.CSSSelector does not have a namespaces parameter, then your version of lxml may need to be upgraded.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜