size of char type in c#

2022-12-18 12:39 问答作者：

Just开发者_高级运维 wondering why do we have char type of 2 bytes size in C# (.NET) unlike 1 byte in other programming languages?

A char is unicode in C#, therefore the number of possible characters exceeds 255. So you'll need two bytes.

Extended ASCII for example has a 255-char set, and can therefore be stored in one single byte. That's also the whole purpose of the System.Text.Encoding namespace, as different systems can have different charsets, and char sizes. C# can therefore handle one/four/etc. char bytes, but Unicode UTF-16 is default.

I'm guessing with “other programming languages” you mean C. C has actually two different char types: char and wchar_t. char may be one byte long, wchar_t not necessarily.

In C# (and .NET) for that matter, all character strings are encoded as Unicode in UTF-16. That's why a char in .NET represents a single UTF-16 code unit which may be a code point or half of a surrogate pair (not actually a character, then).

Actually C#, or more accurately the CLR's, size of char is consistent with most other managed languages. Managed languages, like Java, tend to be newer and have items like unicode support built in from the ground up. The natural extension of supporting unicode strings is to have unicode char's.

Older languages like C/C++ started in ASCII only and only later added unicode support.

Because strings in .NET are encoded as 2 byte Unicode charactes.

Because a character in a C# string defaults to the UTF-16 encoding of Unicode, which is 2 bytes (by default).

C# using 16 bit character width probably has more to do with performance rather than anything else.

Firstly if you use UTF-8 you can fit every character in the "right" amount of space. This is because UTF-8 is variable width. ASCII chars will use 8 bits while larger characters will use more.

But variable length character encoding encourages a O(n) algorithm complexity in common scenarios. E.g. Retrieving a character at a particular location in a string. There have been public discussions on this point. But the simplest solution is to continue using a character width that fits most of your charset, truncating the others. Now you have a fixed character width.

Strictly speaking, UTF-16 is also a variable width encoding, so C# ( and Java for that matter ) are using something of a hybrid since their character widths are never 32 bits.

继续阅读：.net character-encoding

size of char type in c#

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？