XML Encoding error while writing it in to file
I think I am following the right approach but I am still getting an encoding error:
from xml.dom.minidom import Document
import codecs
doc = Document()
wml = doc.createElement("wml")
doc.appendChild(wml)
property = doc.createElement("property")
wml.appendChild(property)
descriptionNode = doc.createElement("description")
property.appendChild(descriptionNode)
descriptionText = doc.createTextNode(description.decode('ISO-8859-1'))
descriptionNode.appendChild(descriptionText)
file = codecs.open('contentFinal.xml', 'w', encoding='ISO-8859-1')
file.write(doc.toprettyxml())
file.close()
The description node contains some characters in ISO-8859-1 encoding
, this is encoding specified by the site it self in meta tag. But when doc.toprettyxml()
starts writing in file I got following error:
Traceback (most recent call last):
File "main.py", line 467, in <module>
file.write(doc.toprettyxml())
File "C:\Python27\lib\xml\dom\minidom.py", line 60, in toprettyxml
return writer.getvalue()
File "C:\Python27\lib\StringIO.py", line 271, in getvalue
self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 10: ordinal not in range(128)
Why am I ge开发者_运维知识库tting this error as I am decoding and encoding with same standard?
Edited
I have following deceleration in my script file:
#!/usr/bin/python
# -*- coding: utf-8 -*-
may be this is conflicting?
Ok i have found a solution. When ever data is in other foriegn language you just need to defined the proper encoding in xml header. You do not need to describe encoding in file.write(doc.toprettyxml(encoding='ISO-8859-1'))
not even when you are opening a file for writing file = codecs.open('contentFinal.xml', 'w', encoding='ISO-8859-1')
. Below is the technique which i used. May be This is not a professional method but that works for me.
file = codecs.open('abc.xml', 'w')
xm = doc.toprettyxml()
xm = xm.replace('<?xml version="1.0" ?>', '<?xml version="1.0" encoding="ISO-8859-1"?>')
file.write(xm)
file.close()
May be there is a method to set default encoding in header but i could not find it. Above method does not bring any error on browser and all data display perfectly.
精彩评论