Equivalent of Beautiful Soup's renderContents() method in lxml?
Is there an equivalent of Beautiful Soup's tag.renderContents()
method in lxml?
I've tried using element.text
, but that doesn't render child tags, as well as '开发者_StackOverflow社区'.join(etree.tostring(child) for child in element)
, but that doesn't render child text. The closest I've been able to find is etree.tostring(element)
, but that renders the opening and closing tags of element
, which I do not want.
Is there another method I'm overlooking (or an alternative approach to accomplish this)?
You're most of the way there with your original idea. element.text
gives you the first text child of the element, and your list comprehension gives you everything else. If you concatenate the two strings together, you get what you're looking for:
>>> xmlstr = "<sec>header <p>para 0</p> text <p>para 1</p> footer</sec>"
>>> element = etree.fromstring(xmlstr)
>>>
>>> element.text + "".join(map (etree.tostring, element))
'header <p>para 0</p> text <p>para 1</p> footer'
>>>
Ari.
One hackish solution:
from lxml import etree
def render_contents(element):
"""
Surely there is a safe lxml built-in for this...
"""
tagname = element.tag
return re.sub('</%s>\s*$' % tagname, '',
re.sub(r'^<%s(\s+\w+=".*?")*?>' % tagname,
'', etree.tostring(element))).strip()
Edit
Really, there's no better method than this?
精彩评论