开发者

Python string and UTF-8 problems

I am programming a script that will grab some data from my website using http GET.

My problem is that i have to pass unicode characters to the website.

I am reading a file that contains these characters and then i try produce a url in order to make the request.

The file is utf-8 encoded and i use this to read from it

f = codecs.open("values.txt", encoding='utf-8')

then i read the first line of t开发者_运维知识库he file and i am concatenating the value with the url

sUrl = "http://example.com?word="
value = f.readline()
visitUrl = sUrl + value

if i use print visitUrl the output is correct. i.e http://example.com?word=π

How to use visiUrl without destroying my special characters? I tried to encode the string to ascii but it doesn't work for all characters.


Quote the url

import urllib
s = u'Здравей'
urllib.quote(s.encode('utf-8'))
# %D0%97%D0%B4%D1%80%D0%B0%D0%B2%D0%B5%D0%B9

or use urlencode directly to build the query part of the url

urllib.urlencode({'data': s.encode('utf-8')})
# 'data=%D0%97%D0%B4%D1%80%D0%B0%D0%B2%D0%B5%D0%B9'


Build the URL with urllib.urlencode rather than trying to construct it by concatenating strings. Non-ASCII characters in a URL need to be URL encoded.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜