开发者

Getting rid of the encoding in lxml

I am trying to print a XML file using lxml and Python.

Here is the code:

>>> from lxml import etree
>>> root = etree.Element('root')
>>> child = etree.SubElement(root, 'child')
开发者_JAVA百科>>> print etree.tostring(root, pretty_print = True, xml_declaration = True, encoding = None)

Output:

<?xml version='1.0' encoding='ASCII'?>
<root>
  <child/>
</root>

As you can see, I have declared encoding = None, however it still shows encoding = 'ASCII' in the final output. Which I guess is expected. If I don't put in the encoding tag, it still shows ASCII.

Is there any way I can just get the XML version tag and not the encoding part? I want the output to be like this:

<?xml version='1.0'>


It shouldn't matter what lxml.etree outputs as long as it's valid XML. If you really want to, you can glue strings together:

'<?xml version="1.0"?>\n' + etree.tostring(root, pretty_print = True, encoding = 'ASCII')

It's unclear why you want to remove it, since ultimately XML needs to know what charset it's in to make sense of anything. The XML 1.0 spec includes a method of guessing charsets, and seems to encourage the use of encoding declarations:

In the absence of [external information], it is a fatal error ... for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8.

...

Unless an encoding is determined by a higher-level protocol, it is also a fatal error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜