addition of strings in c#, how the compiler does it?
A = string.Concat("abc","def")
B = "abc" + "def"
A vs. B
Lately I have been confused why many would say that defini开发者_运维问答tely A does a much faster processing compared to B. But, the thing is they would just say because somebody said so or because it is just the way it is. I suppose I can hear a much better explaination from here.
How does the compiler treats these strings?
Thank you!
The very first thing I did when I joined the C# compiler team was I rewrote the optimizer for string concatenations. Good times.
As already noted, string concats of constant strings are done at compile time. Non-constant strings do some fancy stuff:
a + b --> String.Concat(a, b)
a + b + c --> String.Concat(a, b, c)
a + b + c + d --> String.Concat(a, b, c, d)
a + b + c + d + e --> String.Concat(new String[] { a, b, c, d, e })
The benefits of these optimizations are that the String.Concat method can look at all the arguments, determine the sum of their lengths, and then make one big string that can hold all the results.
Here's an interesting one. Suppose you have a method M that returns a string:
s = M() + "";
If M() returns null then the result is the empty string. (null + empty is empty.) If M does not return null then the result is unchanged by the concatenation of the empty string. Therefore, this is actually optimized as not a call to String.Concat at all! It becomes
s = M() ?? ""
Neat, eh?
Read this: The Sad Tragedy of Micro-Optimization Theater (Coding Horror)
In C#, the addition operator for strings is just syntactic sugar for String.Concat. You can verify that by opening the output assembly in reflector.
Another thing to note is, if you have string literals (or constants) in your code, such as in the example, the compiler even changes this to B = "abcdef"
.
But, if you use String.Concat
with two string literals or constants, String.Concat will still be called, skipping the optimization, and so the +
operation would actually be faster.
So, to sum it up:
stringA + stringB
becomes String.Concat(stringA, stringB)
.
"abc" + "def"
becomes "abcdef
"
String.Concat("abc", "def")
stays the same
Something else i just had to try:
In C++/CLI, "abc" + "def" + "ghi
" is actually translated to String.Concat(String.Concat("abc", "def"), "ghi")
Actually, B is resolved during compile time. You will end up with B = "abcdef"
whereas for A, the concatenation is postponed until execution time.
In this particular case, the two are actually identical. The compiler will transform the second variant, the one using the +
operator, into a call to Concat, the first variant.
Well, that is, if the two actually contained string variables that was concatenated.
This code:
B = "abc" + "def";
actually transforms into this, without concatenation at all:
B = "abcdef";
This can be done because the result of the addition can be computed at compile-time, so the compiler does this.
However, if you were to use something like this:
A = String.Concat(stringVariable1, stringVariable2);
B = stringVariable1 + stringVariable2;
Then those two will generate the same code.
However, I would like to know exactly what those "many" said, as i think it is something different.
What I think they said is that string concatenation is bad, and you should use StringBuilder or similar.
For instance, if you do this:
String s = "test";
for (int index = 1; index <= 10000; index++)
s = s + "test";
Then what happens is that for each iteration through the loop, you'll build one new string, and let the old one be eligible for garbage collection.
Additionally, each such new string will have all the contents of the old one copied into it, which means you'll be moving a large amount of memory around.
Whereas the following code:
StringBuilder sb = new StringBuilder("test");
for (int index = 1; index <= 10000; index++)
sb.Append("test");
Will instead use an internal buffer, that is larger than what needs be, just in case you need to append more text into it. When that buffer becomes full, a new one that is larger will be allocated, and the old one left for garbage collection.
So in terms of memory use and CPU usage, the later variant is much better.
Other than that, I would try to avoid focusing too much on "is code variant X better than Y", beyond what you already have experience with. For instance, I use StringBuilder now just because I'm aware of the case, but that isn't to say that all the code I write that use it actually needs it.
Try to avoid spending time micro-optimizing your code, until you know you have a bottleneck. At that time, the usual tip about measure first, cut later, is still in effect.
If the strings are literals, as in your question, then the concatenation of the strings assigned to B
will be done at compile-time. Your example translates to:
string a = string.Concat("abc", "def");
string b = "abcdef";
If the strings aren't literals then the compiler will translate the +
operator into a Concat
call.
So this...
string x = GetStringFromSomewhere();
string y = GetAnotherString();
string a = string.Concat(x, y);
string b = x + y;
...is translated to this at compile-time:
string x = GetStringFromSomewhere();
string y = GetAnotherString();
string a = string.Concat(x, y);
string b = string.Concat(x, y);
精彩评论