开发者

Determine coding of text

I'm getting some weird characters as a response to a webpage. I'm pretty sure its a message in Russian language, but the coding seems to be all weird. The webpage info tells me the the en开发者_Python百科coding is ISO-8859-1. Here is the sample response.

Âû ñòðàíè÷êå ïðåâüþøêàìè

Is there a way to decrypt this response? Is the response salvageable at all.


It looks like the encoding is actually Cyrillic Windows-1251. Switch your web-browsers encoding accordingly.

For example, the text you supplied in that encoding is:

Вы страничке превьюшками

which an auto-translation says means "You page previews".


It is hardly possible to automatically recognize 8-bit encodings because all byte combinations are technically valid. In this case, I'm pretty sure it is Windows-1251, because the characters are quite meaningful there:

Вы страничке превьюшками

It's clearly not ISO-8859-1.

For converting this into a Unicode string, use the decode method:

b = "Âû ñòðàíè÷êå ïðåâüþøêàìè".encode("Latin-1")  # simulate the incoming byte string
u = b.decode("Windows-1251")
print(u)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜