开发者

python encoding

Using mechanize, I retrieved source page of an web which contains some non-ASCII characters, such as Chinese characters.

Code goes below:

#using python2.6
from mechanize import Browser

br = Browser()
br.open("http://www.example.html")

src = br.reponse().read()  #retrieve the source of the web

print src   #print the src

Question:

1.According to the source of the page, I can see th开发者_Go百科at, its charset=gb2312, but when I print src, all the contents are correct, I mean no gibberish. Why? Does print know the src's encoding?

2.Should I explicitly decode or encode the src?


src is a unicode, which has no encoding. print (or more correctly, sys.stdout.write()) figures out what encoding to use when outputting.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜