开发者

Python toprettyxml() formatting problems

I'm trying to process XML using Python's minidom, and then output the result using toprettyxml(). I ran into two problems:

  1. There are added blank lines.
  2. There are added newlines and tabs for text nodes.

Here's the code and output:

$ cat test.py
from xml.dom import minidom

dom = minidom.parse("test.xml")
print dom.toprettyxml()

$ cat test.xml
<?xml version="1.0" encoding="UTF-8"?>

<store>
    <product>
        <fruit>orange</fruit>
    </product>
</store>


$ python test.py
<?xml version="1.0" ?>
<store>


    <product>


        <fruit>
            orange
        </fruit>


    </product>


</store>

I can workaround problem 1 using strip() to remove blank lines, and I can workaround problem 2 using the hack (fixed_writexml) d开发者_StackOverflow中文版escribed in this link: http://ronrothman.com/public/leftbraned/xml-dom-minidom-toprettyxml-and-silly-whitespace/, but I was wondering if there's a better solution since the hack is almost 3 years old now. I'm open to using something other than minidom, but I'd like to avoid adding external packages like lxml.


One solution is to patch minidom Library with the proposed patch to the bug you mention.

I haven't tested myself, a bit hacky too, so it may not suit you!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜