UnicodeDecodeError problem with mechanize [duplicate]
I receive the following string from one website via mechanize:
'We\x92ve'
I know that \x92 stands for ’ character. I'm trying to convert that string to Unicode:
>>开发者_如何学JAVA unicode('We\x92ve','utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 2: unexpected code byte
What am I doing wrong?
Edit: The reason I tried 'utf-8' was this:
>> response = browser.response()
>> response.info()['content-type']
'text/html; charset=utf-8'
Now I see I can't always trust content-type header.
\x92
stands for ’
alright, but it does so in the Windows-1252 encoding, not in UTF-8:
>>> print unicode('We\x92ve','1252')
We’ve
If you don't know what encoding your source data is in, you can detect it using chardet (extremely easy to use).
精彩评论