开发者

Character encoding cross-reference

I have just migrated a database containing Latin American place names from MS Access to my MySQL. In the process, every instance of á has been changed to . Here is my question:

Does there exist some sort of reference for looking up which character encoding has been translated to which other? For example, a place where I can enter a character and see how it would be misrepresented after a variety of erroneous enc开发者_C百科oding translations (e.g. ASCII to ISO 8859-1, ISO 8859-1 to UTF-8, etc.)?


Not that I'm aware of, but if you have a list of possible encodings, you can write a simple program like:

for x in ENCODINGS:
    for y in ENCODINGS:
        try:
            if 'á'.encode(x) == '‡'.encode(y):
                print(x, '→', y)
        except UnicodeError:
            pass

Doing that, it appears in your case that the original encoding is one of:

  • mac_arabic
  • mac_centeuro
  • mac_croatian
  • mac_farsi
  • mac_iceland
  • mac_latin2
  • mac_roman
  • mac_romanian
  • mac_turkish

and the misinterpreted encoding is one of:

  • cp1250
  • cp1251
  • cp1252
  • cp1253
  • cp1254
  • cp1255
  • cp1256
  • cp1257
  • cp1258
  • palmos

If you live in a "Western" locale, then mac_roman → cp1252 is the most likely possibility.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜