Character encoding cross-reference
I have just migrated a database containing Latin American place names from MS Access to my MySQL. In the process, every instance of á has been changed to ‡. Here is my question:
Does there exist some sort of reference for looking up which character encoding has been translated to which other? For example, a place where I can enter a character and see how it would be misrepresented after a variety of erroneous enc开发者_C百科oding translations (e.g. ASCII to ISO 8859-1, ISO 8859-1 to UTF-8, etc.)?
Not that I'm aware of, but if you have a list of possible encodings, you can write a simple program like:
for x in ENCODINGS:
for y in ENCODINGS:
try:
if 'á'.encode(x) == '‡'.encode(y):
print(x, '→', y)
except UnicodeError:
pass
Doing that, it appears in your case that the original encoding is one of:
- mac_arabic
- mac_centeuro
- mac_croatian
- mac_farsi
- mac_iceland
- mac_latin2
- mac_roman
- mac_romanian
- mac_turkish
and the misinterpreted encoding is one of:
- cp1250
- cp1251
- cp1252
- cp1253
- cp1254
- cp1255
- cp1256
- cp1257
- cp1258
- palmos
If you live in a "Western" locale, then mac_roman → cp1252 is the most likely possibility.
精彩评论