开发者

Parsing ODF in Python with lxml

I'm trying to parse the content.xml inside a ODF-file. I've read the file into a string and i've got a tree object with lxml.etree:

tree = etree.XML(string)

But now I need to find every subelement that is tex开发者_StackOverflow中文版t:a OR text:h. I've been told in previous question that I could use XPath. I've tried but got stuck every single time. Can't even find one of those elements.

If i try:

elem = tree.xpath('//text:p')
I just get a
XPathEvalError: Undefined namespace prefix

So how do I get a list with BOTH of thoose subelements in the right order so i can iterate over them?


That's because text is a namespace abbreviation, defined in the ODF schema. Try

tree.xpath('//text:a | //text:h',
           namespaces={'text': 'urn:oasis:names:tc:opendocument:xmlns:text:1.0'})

| is the set union operator. See also LXML docs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜