开发者

How can I get unicode characters from a URL parameter?

I need to use a GET request to send JSON to my server via a JavaScript client, so I started echoing responses back to make sure nothing is lost in translation. There doesn't seem to be a problem with normal text, but as soon as I include a Unicode character of any sort (e.g. "ç&qu开发者_高级运维ot;) the character is encoded somehow (e.g. "\u00e7") and the return value is different from request value. My primary concern is that, A) In my Python code saves what the client intended on sending to the database correctly, and B) I echo the same values back to the client that were sent (when testing).

Perhaps this means I can't use base64, or have to do something different along the way. I'm ok with that. My implementation is just an attempt at a means to an end.

Current steps (any step can be changed, if needed):

Raw JSON string which I want to send to the server:

'{"weird-chars": "°ç"}'

JavaScript Base64 encoded version of the string passed to server via GET param (on a side note, will the equals sign at the end of the encoded string cause any issues?):

http://www.myserver.com/?json=eyJ3ZWlyZC1jaGFycyI6ICLCsMOnIn0=

Python str result from b64decode of param:

'{"weird-chars": "\xc2\xb0\xc3\xa7"}'

Python dict from json.loads of decoded param:

{'weird-chars': u'\xb0\xe7'}

Python str from json.dumps of that dict (and subsequent output to the browser):

'{"weird-chars": "\u00b0\u00e7"}'


Everything looks fine to me.

>>> hex(ord(u'°'))
'0xb0'
>>> hex(ord(u'ç'))
'0xe7'

Perhaps you should decode the JSON before attempting to use it.


Your procedure's fine, you just need 1 more step; that is, encoding from unicode to utf-8 (or any other encoding that supports the 'weird characters'.)

Think of decoding as what you do to go from a regular string to unicode and encoding as what you do to get back from unicode. In other words:

You de - code a str to produce a unicode string

and en - code a unicode string to produce an str.

So:

params = {'weird-chars': u'\xb0\xe7'}

encodedchars = params['weird-chars'].encode('utf-8')

encodedchars will contain your characters, displayed in the selected encoding (in this case, utf-8).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜