Memory wise, is storing a string as byte cheaper than its UTF equivalent?
If I store a string as a byte, does it use less memory than if it was stored in UTF-8?
e.g.
开发者_运维百科string text = "Hello, World!";
Versus encoding it into a byte variable?
If you stored that in a byte array it would be more efficient than in a string, yes - because all of that text is ASCII, which would be encoded as a single byte per character. However, it's not universally true for all strings (some characters would take 2 bytes, some would take 3 - and for non-BMP characters it would take even more), and it's also a darned sight less convenient to work with in binary form...
I would stick with strings unless you had a really really good reason to keep them in memory as byte arrays.
UTF8 will only use 1 byte per char if you stick to 7bit ascii.
But internally .NET uses UCS-2 which uses 2 bytes per char IIRC, so yes, assuming you want to store it as UTF8 it will use less memory than just storing it as a string, assuming that you are storing western european languages (aka, latin1).
In the example you gave, UTF-8 encoding would save you some bytes insce you only use ASCII characters, but it does depend on the input string - some UTF8 encoded strings might actually be larger than the corresponding UTF-16 version.
//UTF-16 so 26 bytes
string text = "Hello, World!";
//UTF-8 length will be 13 (only ASCII chars used)
var bytesUTF8 = Encoding.UTF8.GetBytes(text);
//UTF-16 so 26 bytes
var bytesUTF16 = Encoding.Unicode.GetBytes(text);
Strings are arrays of characters, which in .NET are UTF-16 encoded. Each char thus needs an Int16 (twice the space) to store its value (characters in the upper half of the codepage use a second Char structure to hold the second pair of bytes).
If you're only dealing with ASCII, yes, you can put a string in a byte array that takes half the space as a char array and doesn't lose information. However, as Jon said, that's not a very convenient way to work with strings. You have 2 GIGABYTES of addressing space available for a single string. As bytes, yes you'd get 2 billion characters, but as strings you still get 1 BILLION characters in a single string. If you really need more than that in a single string I worry about what you think you need it for.
精彩评论