开发者

Equivalent of Beautiful Soup's renderContents() method in lxml?

Is there an equivalent of Beautiful Soup's tag.renderContents() method in lxml?

I've tried using element.text, but that doesn't render child tags, as well as '开发者_StackOverflow社区'.join(etree.tostring(child) for child in element), but that doesn't render child text. The closest I've been able to find is etree.tostring(element), but that renders the opening and closing tags of element, which I do not want.

Is there another method I'm overlooking (or an alternative approach to accomplish this)?


You're most of the way there with your original idea. element.text gives you the first text child of the element, and your list comprehension gives you everything else. If you concatenate the two strings together, you get what you're looking for:

>>> xmlstr = "<sec>header <p>para 0</p> text <p>para 1</p> footer</sec>"
>>> element = etree.fromstring(xmlstr)
>>>
>>> element.text + "".join(map (etree.tostring, element))
'header <p>para 0</p> text <p>para 1</p> footer'
>>>

Ari.


One hackish solution:

from lxml import etree
def render_contents(element):
    """
    Surely there is a safe lxml built-in for this...
    """
    tagname = element.tag
    return re.sub('</%s>\s*$' % tagname, '',
                  re.sub(r'^<%s(\s+\w+=".*?")*?>' % tagname,
                         '', etree.tostring(element))).strip()

Edit

Really, there's no better method than this?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜