开发者

how to encode url in python

I have create开发者_高级运维d a function for decoding url.

from urllib import unquote

def unquote_u(source):
  result = source
  if '%u' in result:
    result = result.replace('%u','\\u').decode('unicode_escape')
  result = unquote(result)
  print result
  return result

if __name__=='__main__':
    unquote_u('{%22%22%3A%22test_%E5%93%A6%E4%BA%88%E4%BB%A5%E8%85%BF%E5%93%A6.doc.txt%22%2C%22mimeType%22%3A%22text%2Fplain%22%2C%22compressed%22%3Afalse%7D')

But, I am not ale to get proper file name. proper file name is : test_哦予以腿哦.doc

can anyone tell me how to do that?


urllib.unquote can do it:

>>> urllib.unquote('{%22%22%3A%22test_%E5%93%A6%E4%BA%88%E4%BB%A5%E8%85%BF%E5%93%A6.doc.txt%22%2C%22mimeType%22%3A%22text%2Fplain%22%2C%22compressed%22%3AFalse%7D')
'{"":"test_\xe5\x93\xa6\xe4\xba\x88\xe4\xbb\xa5\xe8\x85\xbf\xe5\x93\xa6.doc.txt","mimeType":"text/plain","compressed":False}'
>>> eval(_)
{'': 'test_\xe5\x93\xa6\xe4\xba\x88\xe4\xbb\xa5\xe8\x85\xbf\xe5\x93\xa6.doc.txt', 'mimeType': 'text/plain', 'compressed': False}
>>> _['']
'test_\xe5\x93\xa6\xe4\xba\x88\xe4\xbb\xa5\xe8\x85\xbf\xe5\x93\xa6.doc.txt'
>>> print _
test_哦予以腿哦.doc.txt

Note that I had to change "false" to "False" in the quoted string. Also that the string after unquote is still UTF-8 encoded; you can use str.decode('utf8') to get a Unicode string if that is what you require.


As JBernardo mentions, eval() of unsafe data is a very bad idea. Anybody knowing, or even suspecting, that a server-side script is eval()-ing form data can easily craft a POST with commands that can compromise the server. Better would be this:

>>> import json, urllib
>>> json.loads(urllib.unquote('{%22%22%3A%22test_%E5%93%A6%E4%BA%88%E4%BB%A5%E8%85%BF%E5%93%A6.doc.txt%22%2C%22mimeType%22%3A%22text%2Fplain%22%2C%22compressed%22%3Afalse%7D'))['']
u'test_\u54e6\u4e88\u4ee5\u817f\u54e6.doc.txt'
>>> print _
test_哦予以腿哦.doc.txt

Also note that this later approach didn't require changing false to False; in fact it doesn't work if I do. The json package takes care of that.


One thing to add, after get unquoted url from urllib.unquote(url), you probably need use decode('utf8') to convert the raw string into a unicode string.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜