开发者

What is the character encoding?

I have several characters that aren't recognized properly. Characters like:

º
á
ó
(etc..)

Thi开发者_JS百科s means that the characters encoding is not utf-8 right? So, can you tell me what character encoding could it be please.


We don't have nearly enough information to really answer this, but the gist of it is: you shouldn't just guess. You need to work out where the data is coming from, and find out what the encoding is. You haven't told us anything about the data source, so we're completely in the dark. You might want to try Encoding.Default if these are files saved with something like Notepad.

If you know what the characters are meant to be and how they're represented in binary, that should suggest an encoding... but again, we'd need to know more information.


read this first http://www.joelonsoftware.com/articles/Unicode.html

There are two encodings: the one that was used to encode string and one that is used to decode string. They must be the same to get expected result. If they are different then some characters will be displayed incorrectly. we can try to guess if you post actual and expected results.


I wrote a couple of methods to narrow down the possibilities a while back for situations just like this.

 static void Main(string[] args)
        {
            Encoding[] matches = FindEncodingTable('Ÿ');
            Encoding[] enc2 = FindEncodingTable(159, 'Ÿ');
        }
        // Locates all Encodings with the specified Character and position
        // "CharacterPosition":  Decimal position of the character on the unknown encoding table.  E.G. 159 on the extended ASCII table
       //"character":  The character to locate in the encoding table.  E.G.  'Ÿ' on the extended ASCII table
         static Encoding[] FindEncodingTable(int CharacterPosition, char character)
        {
            List matches = new List();
            byte myByte = (byte)CharacterPosition;
            byte[] bytes = { myByte };
            foreach (EncodingInfo encInfo in Encoding.GetEncodings())
            {
                Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
                char[] chars = thisEnc.GetChars(bytes);
                if (chars[0] == character)
                {
                    matches.Add(thisEnc);
                    break;
                }
            }
            return matches.ToArray();
        }
        // Locates all Encodings that contain the specified character
        static Encoding[] FindEncodingTable(char character)
        {
            List matches = new List();
            foreach (EncodingInfo encInfo in Encoding.GetEncodings())
            {
                Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
                char[] chars = { character };
                byte[] temp = thisEnc.GetBytes(chars);
                if (temp != null)
                    matches.Add(thisEnc);
            }
            return matches.ToArray();
        }


Encoding is the form of modifying some existing content; thus allowing it to be parsed by the required destination protocols.

An example of encoding can be seen when browsing the internet:

The URL you visit: www.example.com, may have the search facility to run custom searches via the URL address:

www.example.com?search=...

The following variables on the URL require URL encoding. If you was to write:

www.example.com?search=cat food cheap

The browser wouldn't understand your request as you have used an invalid character of ' ' (a white space)

To correct this encoding error you should exchange the ' ' with '%20' to form this URL:

www.example.com?search=cat%20food%20cheap

Different systems use different forms of encoding, in this example I have used standard Hex encoding for a URL. In other applications and instances you may find the need to use other types of encoding.

Good Luck!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜