开发者

why do char takes 2 bytes as it can be stored in one byte

can anybody tell me that in c# why does char takes two bytes although it can be stored in one byte. Don't you think it is wastage of a开发者_如何学Go memory. if not , then how is extra 1-byte used? in simple words ..please make me clear what is the use of extra 8-bits.!!


although it can be stored in one byte

What makes you think that?

It only takes one byte to represent every character in the English language, but other languages use other characters. Consider the number of different alphabets (Latin, Chinese, Arabic, Cyrillic...), and the number of symbols in each of these alphabets (not only letters or digits, but also punctuation marks and other special symbols)... there are tens of thousands of different symbols in use in the world ! So one byte is never going to be enough to represent them all, that's why the Unicode standard was created.

Unicode has several representations (UTF-8, UTF-16, UTF-32...). .NET strings use UTF-16, which takes two bytes per character (code points, actually). Of course, two bytes is still not enough to represent all the different symbols in the world; surrogate pairs are used to represent characters above U+FFFF


The char keyword is used to declare a Unicode character in the range indicated in the following table. Unicode characters are 16-bit characters used to represent most of the known written languages throughout the world.

http://msdn.microsoft.com/en-us/library/x9h8tsay%28v=vs.80%29.aspx


Unicode characters. True, we have enough room in 8bits for the English alphabet, but when it comes to Chinese and such, it takes a lot more characters.


In C#, char's are 16-bit Unicode characters by default. Unicode supports a much larger character set than can be supported by ASCII.

If memory really is a concern, here is a good discussion on SO regarding how you might work with 8-bit chars: Is there a string type with 8 BIT chars?

References:

On C#'s char datatype: http://msdn.microsoft.com/en-us/library/x9h8tsay(v=vs.80).aspx

On Unicode: http://en.wikipedia.org/wiki/Unicode


because utf-8 was probably still too young for microsoft to consider using it

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜