python encoding
Using mechanize
, I retrieved source page of an web which contains some non-ASCII characters, such as Chinese characters.
Code goes below:
#using python2.6
from mechanize import Browser
br = Browser()
br.open("http://www.example.html")
src = br.reponse().read() #retrieve the source of the web
print src #print the src
Question:
1.According to the source of the page, I can see th开发者_Go百科at, its charset=gb2312
, but when I print src
, all the contents are correct, I mean no gibberish. Why? Does print
know the src's encoding?
2.Should I explicitly decode or encode the src?
src
is a unicode
, which has no encoding. print
(or more correctly, sys.stdout.write()
) figures out what encoding to use when outputting.
精彩评论