开发者

Handling unicode data in XMLRPC

I have to migrate data to OpenERP through XMLRPC by using Termi开发者_如何学JAVAnatOOOR.

I send a name with value "Rotule right Aurélia".

In Python the name with be encoded with value : 'Rotule right Aur\xc3\xa9lia '

But in TerminatOOOR (xmlrpc client) the data is encoded with value 'Rotule middle Aur\357\277\275lia'

So in the server side, the data value is not decoded correctly and I get bad data.

The terminateOOOR is a ruby plugin for Kettle ( Java product) and I guess it should encode data by utf-8.

I just don't know why it happens like this.

Any help?


This issue comes from Kettle. My program is using Kettle to get an Excel file, get the active sheet and transfer the data in that sheet to TerminateOOOR for further handling. At the phase of reading data from Excel file, Kettle can not recognize the encoding then it gives bad data to TerminateOOOR.

My work around solution is manually exporting excel to csv before giving data to TerminateOOOR. By doing this, I don't use the feature to mapping excel column name a variable name (used by kettle).


first off, whenever you deal with text (and all text is bound to contain some non-US-ASCII character sooner or later), you'll be much happier doing that in Python 3.x instead of in the 2.x series. if Py3 is not an option, try to always use from __future__ import unicode_literals (available in Python 2.6 and 2.7).

basically, when you send text or any other data over the wire, that will only happen in the form of bytes (octets of bits), so it will have to be encoded at some point. try to find out exactly where that encoding takes place in your tool chain; if necessary, use a debugging tool (or deploy print( repr( x ) ) statements) to look into relevant variables. the other software you mention is presumably written in PHP, a language which is known to have issues with unicode. you say that 'it should encode the data by utf-8', but on the other hand, when the receiving end sees the data of an incoming RPC request, that data should already be in utf-8. it would have to be decoded to obtain unicode again.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜