Four byte encoding of U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS)?
Which character encoding (or combinations of encodings) represents the character ö
(U+00F6
, LATIN SMAL开发者_如何学JAVAL LETTER O WITH DIAERESIS
or simply put chr(246)
in ISO-8859-1) as the four octets combination chr(195) . chr(63) . chr(194) . chr(164)
?
This page lists a fairly comprehensive set of all of the various binary representations of that particular character, and none of them are even close to what you have. Are you certain that there isn't some other transformation being done on top of the text encoding?
If you think that the data might have been encoded multiple times, try this:
public static IEnumerable<Encoding> FindEncodingPath(char desiredChar, byte[] data)
{
return FindEncodingPath(new char[] { desiredChar }, data, 5);
}
private static IEnumerable<Encoding> FindEncodingPath(char[] desiredChar, byte[] data, int iterationsLeft)
{
List<Encoding> encodings = null;
foreach(Encoding enc in Encoding.GetEncodings())
{
byte[] temp = enc.GetBytes(desiredChar);
bool match = false;
if(temp.Length == data.Length)
{
match = true;
for(int i = 0; i < data.Length; i++)
{
if(data[i] != temp[i])
{
match = false;
break;
}
}
}
if(match)
{
encodings = new List<Encoding>();
encodings.Add(enc);
break;
}
else if(iterationsLeft > 0)
{
IEnumerable<Encoding> tempEnc = FindEncodingPath(desiredChar, temp, iterationsLeft - 1);
if(tempEnc != null)
{
encodings = new List<Encoding>();
encodings.Add(enc);
encodings.AddRange(tempEnc);
break;
}
}
}
return encodings;
}
精彩评论