Character encoding cross-reference

2023-04-07 17:54 问答作者：

I have just migrated a database containing Latin American place names from MS Access to my MySQL. In the process, every instance of á has been changed to ‡. Here is my question:

Does there exist some sort of reference for looking up which character encoding has been translated to which other? For example, a place where I can enter a character and see how it would be misrepresented after a variety of erroneous enc开发者_C百科oding translations (e.g. ASCII to ISO 8859-1, ISO 8859-1 to UTF-8, etc.)?

Not that I'm aware of, but if you have a list of possible encodings, you can write a simple program like:

for x in ENCODINGS:
    for y in ENCODINGS:
        try:
            if 'á'.encode(x) == '‡'.encode(y):
                print(x, '→', y)
        except UnicodeError:
            pass

Doing that, it appears in your case that the original encoding is one of:

mac_arabic
mac_centeuro
mac_croatian
mac_farsi
mac_iceland
mac_latin2
mac_roman
mac_romanian
mac_turkish

and the misinterpreted encoding is one of:

cp1250
cp1251
cp1252
cp1253
cp1254
cp1255
cp1256
cp1257
cp1258
palmos

If you live in a "Western" locale, then mac_roman → cp1252 is the most likely possibility.

继续阅读：ascii character-encoding cross-reference iso-8859-1 utf-8

Character encoding cross-reference

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？