XML UTF-8 data being written differently
Unfortunately I'm working in an obscure platform called uniPaaS so I'm probably after some platform-agnostic advice.
I've got a Web Service request where the XML document contains those irritating smart quotes. The byt开发者_运维百科e data for the character is E2 80 99
(which is a 00002019 RIGHT SINGLE QUOTATION MARK
)
When I write the XML file to disk on our staging server, it writes it correctly. When I write it on our production server, it totally changes the values of those bytes and malforms the XML document:
E2 80 99
becomes 92
. Has anyone ever seen this sort of behaviour before? It seems to only be that one byte string (but the SOAP resonse is 50Mb large, so I haven't had a chance to diff the entire file).
It's encoding it as CP1251.
>>> '\x92'.decode('cp1251').encode('utf-8')
'\xe2\x80\x99'
精彩评论