how to encode url in python
I have create开发者_高级运维d a function for decoding url.
from urllib import unquote
def unquote_u(source):
result = source
if '%u' in result:
result = result.replace('%u','\\u').decode('unicode_escape')
result = unquote(result)
print result
return result
if __name__=='__main__':
unquote_u('{%22%22%3A%22test_%E5%93%A6%E4%BA%88%E4%BB%A5%E8%85%BF%E5%93%A6.doc.txt%22%2C%22mimeType%22%3A%22text%2Fplain%22%2C%22compressed%22%3Afalse%7D')
But, I am not ale to get proper file name. proper file name is : test_哦予以腿哦.doc
can anyone tell me how to do that?
urllib.unquote can do it:
>>> urllib.unquote('{%22%22%3A%22test_%E5%93%A6%E4%BA%88%E4%BB%A5%E8%85%BF%E5%93%A6.doc.txt%22%2C%22mimeType%22%3A%22text%2Fplain%22%2C%22compressed%22%3AFalse%7D')
'{"":"test_\xe5\x93\xa6\xe4\xba\x88\xe4\xbb\xa5\xe8\x85\xbf\xe5\x93\xa6.doc.txt","mimeType":"text/plain","compressed":False}'
>>> eval(_)
{'': 'test_\xe5\x93\xa6\xe4\xba\x88\xe4\xbb\xa5\xe8\x85\xbf\xe5\x93\xa6.doc.txt', 'mimeType': 'text/plain', 'compressed': False}
>>> _['']
'test_\xe5\x93\xa6\xe4\xba\x88\xe4\xbb\xa5\xe8\x85\xbf\xe5\x93\xa6.doc.txt'
>>> print _
test_哦予以腿哦.doc.txt
Note that I had to change "false" to "False" in the quoted string. Also that the string after unquote is still UTF-8 encoded; you can use str.decode('utf8') to get a Unicode string if that is what you require.
As JBernardo mentions, eval() of unsafe data is a very bad idea. Anybody knowing, or even suspecting, that a server-side script is eval()-ing form data can easily craft a POST with commands that can compromise the server. Better would be this:
>>> import json, urllib
>>> json.loads(urllib.unquote('{%22%22%3A%22test_%E5%93%A6%E4%BA%88%E4%BB%A5%E8%85%BF%E5%93%A6.doc.txt%22%2C%22mimeType%22%3A%22text%2Fplain%22%2C%22compressed%22%3Afalse%7D'))['']
u'test_\u54e6\u4e88\u4ee5\u817f\u54e6.doc.txt'
>>> print _
test_哦予以腿哦.doc.txt
Also note that this later approach didn't require changing false to False; in fact it doesn't work if I do. The json package takes care of that.
One thing to add, after get unquoted url from urllib.unquote(url), you probably need use decode('utf8') to convert the raw string into a unicode string.
精彩评论