开发者

XML UTF-8 data being written differently

Unfortunately I'm working in an obscure platform called uniPaaS so I'm probably after some platform-agnostic advice.

I've got a Web Service request where the XML document contains those irritating smart quotes. The byt开发者_运维百科e data for the character is E2 80 99 (which is a 00002019 RIGHT SINGLE QUOTATION MARK)

XML UTF-8 data being written differently

When I write the XML file to disk on our staging server, it writes it correctly. When I write it on our production server, it totally changes the values of those bytes and malforms the XML document:

XML UTF-8 data being written differently

E2 80 99 becomes 92. Has anyone ever seen this sort of behaviour before? It seems to only be that one byte string (but the SOAP resonse is 50Mb large, so I haven't had a chance to diff the entire file).


It's encoding it as CP1251.

>>> '\x92'.decode('cp1251').encode('utf-8')
'\xe2\x80\x99'
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜