How can I get unicode characters from a URL parameter?
I need to use a GET request to send JSON to my server via a JavaScript client, so I started echoing responses back to make sure nothing is lost in translation. There doesn't seem to be a problem with normal text, but as soon as I include a Unicode character of any sort (e.g. "ç&qu开发者_高级运维ot;) the character is encoded somehow (e.g. "\u00e7") and the return value is different from request value. My primary concern is that, A) In my Python code saves what the client intended on sending to the database correctly, and B) I echo the same values back to the client that were sent (when testing).
Perhaps this means I can't use base64, or have to do something different along the way. I'm ok with that. My implementation is just an attempt at a means to an end.
Current steps (any step can be changed, if needed):
Raw JSON string which I want to send to the server:
'{"weird-chars": "°ç"}'
JavaScript Base64 encoded version of the string passed to server via GET param (on a side note, will the equals sign at the end of the encoded string cause any issues?):
http://www.myserver.com/?json=eyJ3ZWlyZC1jaGFycyI6ICLCsMOnIn0=
Python str
result from b64decode
of param:
'{"weird-chars": "\xc2\xb0\xc3\xa7"}'
Python dict
from json.loads
of decoded param:
{'weird-chars': u'\xb0\xe7'}
Python str
from json.dumps
of that dict
(and subsequent output to the browser):
'{"weird-chars": "\u00b0\u00e7"}'
Everything looks fine to me.
>>> hex(ord(u'°'))
'0xb0'
>>> hex(ord(u'ç'))
'0xe7'
Perhaps you should decode the JSON before attempting to use it.
Your procedure's fine, you just need 1 more step; that is, encoding from unicode to utf-8
(or any other encoding that supports the 'weird characters'.)
Think of decoding as what you do to go from a regular string to unicode and encoding as what you do to get back from unicode. In other words:
You de - code a str
to produce a unicode
string
and en - code a unicode
string to produce an str
.
So:
params = {'weird-chars': u'\xb0\xe7'}
encodedchars = params['weird-chars'].encode('utf-8')
encodedchars
will contain your characters, displayed in the selected encoding (in this case, utf-8
).
精彩评论