How does the C# compiler optimize a code fragment?
If I have a code like this
for(int i=0;i<10;i++)
{
int iTemp;
iTemp = i;
//.........
}
Does the compiler instantinate iTemp 10 times?
Or it optimize it?
I mean if i rewrite the loop as
开发者_JAVA百科int iTemp;
for(int i=0;i<10;i++)
{
iTemp = i;
//.........
}
Will it be faster?
Using reflector you can view the IL generated by the C# compiler.
.method private hidebysig static void Way1() cil managed
{
.maxstack 2
.locals init (
[0] int32 i)
L_0000: ldc.i4.0
L_0001: stloc.0
L_0002: br.s L_0008
L_0004: ldloc.0
L_0005: ldc.i4.1
L_0006: add
L_0007: stloc.0
L_0008: ldloc.0
L_0009: ldc.i4.s 10
L_000b: blt.s L_0004
L_000d: ret
}
.method private hidebysig static void Way2() cil managed
{
.maxstack 2
.locals init (
[0] int32 i)
L_0000: ldc.i4.0
L_0001: stloc.0
L_0002: br.s L_0008
L_0004: ldloc.0
L_0005: ldc.i4.1
L_0006: add
L_0007: stloc.0
L_0008: ldloc.0
L_0009: ldc.i4.s 10
L_000b: blt.s L_0004
L_000d: ret
}
They're exactly the same so it makes no performance difference where you declare iTemp.
As others have said, the code you've shown produces equivalent IL, except when the variable is captured by a lambda expression for later execution. In that case the code is different as it must keep track of the current value of the variable for the expression. There may be other instances where the optimization doesn't take place as well.
Creating a fresh copy of the loop variable is a common technique when you want to capture the value for a lambda expression.
Try:
var a = new List<int> { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
var q = a.AsEnumerable();
int iTemp;
for(int i=0;i<10;i++)
{
iTemp = i;
q = q.Where( x => x <= iTemp );
}
Console.WriteLine(string.Format( "{0}, count is {1}",
string.Join( ":", q.Select( x => x.ToString() ).ToArray() ),
q.Count() ) );
and
var a = new List<int> { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
var q = a.AsEnumerable();
for(int i=0;i<10;i++)
{
var iTemp = i;
q = q.Where( x => x <= iTemp );
}
Console.WriteLine(string.Format( "{0}, count is {1}",
string.Join( ":", q.Select( x => x.ToString() ).ToArray() ),
q.Count() ) );
If you're really curious about how CSC (the C# compiler) treats your code, you might want to play with LINQPad- it allows you to, among other things, enter short C# expressions or programs and take a look at the resulting IL (CLR bytecode).
One thing to keep in mind is that local variables are typically allocated on the stack. One task that a compiler must do is figure out how much stack space a particular method requires and set that aside.
Consider:
int Func(int a, int b, int c)
{
int x = a * 2;
int y = b * 3;
int z = c * 4;
return x + y + z;
}
Ignoring the fact that this can be easily optimized to be return (a * 2) + (b * 3) + (c * 4), the compiler is going to see three local variables and set aside room for three local variables.
If I have this:
int Func(int a, int b, int c)
{
int x = a * 2;
{
int y = b * 3;
{
int z = c * 4;
{
return x + y + z;
}
}
}
}
It's still the same 3 local variables - just in different scopes. A for loop is nothing but a scope block with a little glue code to make it work.
Now consider this:
int Func(int a, int b, int c)
{
int x = a * 2;
{
int y = b * 3;
x += y;
}
{
int z = c * 4;
x += z;
}
return x;
}
This is the only case where it could be different. You have variables y and z which go in and out of scope - once they are out of scope, the stack space is no longer needed. The compiler could choose to reuse those slots such that y and z share the same space. As optimizations go, it's simple but it doesn't gain much - it saves some space, which might be important on embedded systems, but not in most .NET applications.
As a side note, the C# compiler in VS2008 in release isn't even performing the simplest strength reductions. The IL for the first version is this:
L_0000: ldarg.0
L_0001: ldc.i4.2
L_0002: mul
L_0003: stloc.0
L_0004: ldarg.1
L_0005: ldc.i4.3
L_0006: mul
L_0007: stloc.1
L_0008: ldarg.2
L_0009: ldc.i4.4
L_000a: mul
L_000b: stloc.2
L_000c: ldloc.0
L_000d: ldloc.1
L_000e: add
L_000f: ldloc.2
L_0010: add
L_0011: ret
whereas, I fully expected to see this:
L_0000: ldarg.0
L_0001: ldc.i4.2
L_0002: mul
L_0003: ldarg.1
L_0004: ldc.i4.3
L_0005: mul
L_0006: add
L_0007: ldarg.2
L_0008: ldc.i4.4
L_0009: mul
L_000a: add
L_000b: ret
The compiler will do the optimisation you've shown for you.
It's a simple form of loop hoisting.
A lot of people have provided you IL to show you that your two code fragments are effectively the same from a performance perspective. It's not really necessary to go to that level of detail to see why this is the case. Just think about this from the perspective of the call stack.
Effectively what happens at the beginning of a method containing a code fragment like the two that you provided is that the compiler will emit code to allocate space at the beginning of the method for all locals that will be used within that method.
In both cases what the compiler sees is a local named iTemp
so when it allocates space on the stack for the locals it will allocate 32-bits to hold iTemp
. It doesn't matter to the compiler that in the two code fragments iTemp
have different scope; the compiler will enforce that by just not allowing you to refer to iTemp
outside the for
loop in the first fragment. What it will do is allocate this space once (at the beginning of the method) and reuse the space as needed during the loop in the first fragment.
The C# compiler doesn't always need to do a good job. The JIT optimizer is tuned for the IL that the C# compiler emits, better looking IL does not (necessarily) produce better looking machine code.
Let's take an earlier example:
static int Func(int a, int b, int c)
{
int x = a * 2;
int y = b * 3;
int z = c * 4;
return x + y + z;
}
The emitted IL from the 3.5 compiler with optimizations enabled looks like this:
.method private hidebysig static int32 Func(int32 a,
int32 b,
int32 c) cil managed
{
// Code size 18 (0x12)
.maxstack 2
.locals init (int32 V_0,
int32 V_1,
int32 V_2)
IL_0000: ldarg.0
IL_0001: ldc.i4.2
IL_0002: mul
IL_0003: stloc.0
IL_0004: ldarg.1
IL_0005: ldc.i4.3
IL_0006: mul
IL_0007: stloc.1
IL_0008: ldarg.2
IL_0009: ldc.i4.4
IL_000a: mul
IL_000b: stloc.2
IL_000c: ldloc.0
IL_000d: ldloc.1
IL_000e: add
IL_000f: ldloc.2
IL_0010: add
IL_0011: ret
} // end of method test::Func
Not very optimal right? I'm compiling it into an executable, calling it from a simple Main method and the compiler isn't inlining it or doing any optimizations really.
So what is happening at runtime?
The JIT is in fact inlining the call to Func() and producing much better code than you might imagine when looking at the stack-based IL up above:
mov edx,dword ptr [rbx+10h]
mov eax,1
cmp rax,rdi
jae 000007ff`00190265
mov eax,dword ptr [rbx+rax*4+10h]
mov ecx,2
cmp rcx,rdi
jae 000007ff`00190265
mov ecx,dword ptr [rbx+rcx*4+10h]
add edx,edx
lea eax,[rax+rax*2]
shl ecx,2
add eax,edx
lea esi,[rax+rcx]
精彩评论