开发者

What codepage encodes a 'ç' as '?º' (0x3f 0xba)

Today I received a file from a customer that I have to read, but it contains strange characters开发者_StackOverflow中文版. Using known names, I can guess the meaning of some characters.

For example:

Realname  | Encoded as   | sign  | hex
----------|--------------|-------|-------
Françios  | Fran?ºios    | ç     | 3f ba
André     | Andr??       | é     | 3f 3f
Hélène    | H??l?¿ne     | è     | 3f bf
etc.
  • I have tried all codepages (known to .Net) to import the file, and see if they contain the words I know. But no codepage gives me satisfaction.
  • Opening the file in Notepad++ thinks it is ANSI, and also shows the unwanted characters. (But it has a hex-editor plugin that is usefull).
  • Other files (from the same user & zipfile) are encoded in UTF-8.

From the guy I received the files from, I cannot expect help. (Using Google Translate) he made it clear to me that he found it very hard just to create the files, and he is using software (I believe SAP) that I do not have access to.

Is there any other way I can find the encoding of the files he just send to me?


I can get those results if I take UTF-8 encoded text, pretend it is CP850, and then convert it to Latin-1, Windows-1252, or a similar encoding. The "?" comes from the fact that the CP850 character at 0xc3 is "├", which doesn't exist in Latin-1 or derived encodings, so the conversion replaces it with a "?".


Edit: I did a bit wider of a search using iconv, and CP437, CP862, or CP865 are better matches than CP850. Since you asked, the one-liner I used this time was:

for enc in `iconv -l`; do echo -n "$enc: "; echo -n "ç é è" | iconv -s -f $enc -t "LATIN1//TRANSLIT" 2>/dev/null; echo; done


it should UTF-8 or UTF-16. they contains almost all regular characters. it looks like you have a decode/encode problem.

notepad++ it maybe confused, because your files do not use a Byte-Order-Mark.

how do you process your files?

try to read them as binary and then try different encodings to get a string. if you do not read them as binary, a default encoding may take place.

the "?" is a sign for that.

may be that helps out.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜