Getting rid of the encoding in lxml
I am trying to print a XML file using lxml and Python.
Here is the code:
>>> from lxml import etree
>>> root = etree.Element('root')
>>> child = etree.SubElement(root, 'child')
开发者_JAVA百科>>> print etree.tostring(root, pretty_print = True, xml_declaration = True, encoding = None)
Output:
<?xml version='1.0' encoding='ASCII'?>
<root>
<child/>
</root>
As you can see, I have declared encoding = None
, however it still shows encoding = 'ASCII'
in the final output. Which I guess is expected. If I don't put in the encoding
tag, it still shows ASCII.
Is there any way I can just get the XML version tag and not the encoding part? I want the output to be like this:
<?xml version='1.0'>
It shouldn't matter what lxml.etree outputs as long as it's valid XML. If you really want to, you can glue strings together:
'<?xml version="1.0"?>\n' + etree.tostring(root, pretty_print = True, encoding = 'ASCII')
It's unclear why you want to remove it, since ultimately XML needs to know what charset it's in to make sense of anything. The XML 1.0 spec includes a method of guessing charsets, and seems to encourage the use of encoding declarations:
In the absence of [external information], it is a fatal error ... for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8.
...
Unless an encoding is determined by a higher-level protocol, it is also a fatal error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16.
精彩评论