Determine coding of text
I'm getting some weird characters as a response to a webpage. I'm pretty sure its a message in Russian language, but the coding seems to be all weird. The webpage info tells me the the en开发者_Python百科coding is ISO-8859-1. Here is the sample response.
Âû ñòðàíè÷êå ïðåâüþøêàìè
Is there a way to decrypt this response? Is the response salvageable at all.
It looks like the encoding is actually Cyrillic Windows-1251. Switch your web-browsers encoding accordingly.
For example, the text you supplied in that encoding is:
Вы страничке превьюшками
which an auto-translation says means "You page previews".
It is hardly possible to automatically recognize 8-bit encodings because all byte combinations are technically valid. In this case, I'm pretty sure it is Windows-1251, because the characters are quite meaningful there:
Вы страничке превьюшками
It's clearly not ISO-8859-1.
For converting this into a Unicode string, use the decode
method:
b = "Âû ñòðàíè÷êå ïðåâüþøêàìè".encode("Latin-1") # simulate the incoming byte string
u = b.decode("Windows-1251")
print(u)
精彩评论