Confusion with unicode in Python
As part of a Django site, users can enter street names and the entry will be added to a google maps geocoding link. Everything works well, until users enter special characters.
I would like to display the special character in the link, however python replaces the character with the unicode symbol. Is there a way to prevent python from switching to unicode and simply taking the users input? I have tried several decoders and formats, but it did not solve the problem.
edit: The code is programmed in Python 2.
I am currently requesting the JSON response as follows:
url = "http://maps.googleapis.com/maps/api/geocode/json?address=" +
addressString.decode('ascii') + "&sensor=false";
g开发者_如何转开发oogleResponse = urllib.urlopen(url);
Thank you for your help and advice.
First, check if addressString is a Unicode object (it is if you're using Python 3, or if type(addressString)
shows 'unicode'). If it is, then you probably need to try the following:
url = "http://maps.googleapis.com/maps/api/geocode/json?address=" +
urllib.quote(addressString.encode('utf-8')) + "&sensor=false";
If addressString is a (non-unicode) string object (in Python 2) or a bytes object (in Python 3), then it must be already encoded in UTF-8. In that case, try the following:
url = "http://maps.googleapis.com/maps/api/geocode/json?address=" +
urllib.quote(addressString) + "&sensor=false";
Both of these snippets should convert the unicode characters to URL escape sequences using the %
signs. That is the standard way of using non-ASCII characters in URL. Modern browsers should decode these sequences display these as Unicode characters.
[big fat comment because comments can't be formatted well]
Following the instructions of @Boaz Yaniv works for me:
>>> addressString = 'Wilhelmstra\xc3\x9fe 123, T\xc3\xbcbingen, Deutschland'
That's a str
ojbject, encoded in UTF-8. We need to percent-escape it so that it can be used in a URL.
>>> import urllib
>>> fixed = urllib.quote(addressString)
>>> print repr(fixed)
'Wilhelmstra%C3%9Fe%20123%2C%20T%C3%BCbingen%2C%20Deutschland'
Now let's try it out:
>>> url = "http://maps.googleapis.com/maps/api/geocode/json?address=" + fixed +
"&sensor=false"
>>> guff = urllib.urlopen(url).read()
>>> import json
>>> print repr(json.loads(guff)['results'][0]['formatted_address'])
u'Wilhelmstra\xdfe 123, 72074 T\xfcbingen, Germany'
>>>
If you have something like this: 'Wilhelmstra\xdfe 123, T\xfcbingen, Deutschland'
, that's a str
object encoded in latin1 or cp1252 or whatever. You'll need to decode that to a unicode
object then encode that in UTF-8 then percent-escape it.
However if you have (VERY subtle difference) u'Wilhelmstra\xdfe 123, T\xfcbingen, Deutschland'
, that's a unicode
object and you'll need to encode that in UTF-8 then percent-escape it.
You said """ i still get the same error message: Exception Type: UnicodeEncodeError Exception Value: 'ascii' codec can't encode character u'\xdf' in position 10: ordinal not in range(128) when requesting the link """
This looks like you are feeding a unicode
object to something which wants a str
object and tries to get it by encoding using the (usual default) ascii
encoding. If you continue to have this problem, show your code. Break it down to the minimum necessary (as I did above). Show repr(step_by_step_results).
Don't sure, try:
url = "http://maps.googleapis.com/maps/api/geocode/json?address=" +
addressString.decode('utf-8') + "&sensor=false";
googleResponse = urllib.urlopen(url);
精彩评论