开发者

UTF-8 in Python

This seems to be a common question among international developers but I haven't found a straight answer yet. I'm getting from a feed the following string: "Carlos e Carlos mostram o que há de melhor na internet"

开发者_如何转开发The following error is returned to the console: UnicodeDecodeError: 'utf8' codec can't decode bytes in position 31-33: invalid data

thanks in advance,

fbr


You can't just decode using some random encoding, even if it is UTF-8; you must decode using the encoding returned in the HTTP headers or an equivalent within the document (such as within the META element of HTML).

If the encoding isn't available or is incorrect then you should specify in the decode operation what will happen on an invalid byte sequence; usually 'replace' suffices for this.

>>> print u'Carlos e Carlos mostram o que há de melhor na internet'.encode('latin1').decode('utf-8', 'replace')
Carlos e Carlos mostram o que h�e melhor na internet
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜