开发者

Four byte encoding of U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS)?

Which character encoding (or combinations of encodings) represents the character ö (U+00F6, LATIN SMAL开发者_如何学JAVAL LETTER O WITH DIAERESIS or simply put chr(246) in ISO-8859-1) as the four octets combination chr(195) . chr(63) . chr(194) . chr(164)?


This page lists a fairly comprehensive set of all of the various binary representations of that particular character, and none of them are even close to what you have. Are you certain that there isn't some other transformation being done on top of the text encoding?

If you think that the data might have been encoded multiple times, try this:

public static IEnumerable<Encoding> FindEncodingPath(char desiredChar, byte[] data)
{
    return FindEncodingPath(new char[] { desiredChar }, data, 5);
}

private static IEnumerable<Encoding> FindEncodingPath(char[] desiredChar, byte[] data, int iterationsLeft)
{
    List<Encoding> encodings = null;

    foreach(Encoding enc in Encoding.GetEncodings())
    {
        byte[] temp = enc.GetBytes(desiredChar);

        bool match = false;

        if(temp.Length == data.Length)
        {
            match = true;

            for(int i = 0; i < data.Length; i++) 
            {
                if(data[i] != temp[i])
                {
                    match = false;
                    break;
                }
            }
        }

        if(match)
        {
            encodings = new List<Encoding>();

            encodings.Add(enc);

            break;
        }
        else if(iterationsLeft > 0)
        {
            IEnumerable<Encoding> tempEnc = FindEncodingPath(desiredChar, temp, iterationsLeft - 1);

            if(tempEnc != null)
            {
                encodings = new List<Encoding>();

                encodings.Add(enc);
                encodings.AddRange(tempEnc);

                break;
            }
        }
    }

    return encodings;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜