开发者

When using object initializers, why does the compiler generate an extra local variable?

While answering a question on SO yesterday, I noticed that if an object is initialized using an Object Initializer, the compiler creates an extra local variable.

Consider the following C# 3.0 code, compiled in release mode in VS2008:

public class Class1
{
    public string Foo { get; set; }
}

public class Class2
{
    public string Foo { get; set; }
}

public class TestHarness
{
    static void Main(string[] args)
    {
        Class1 class1 = new Class1();
        class1.Foo = "fooBar";

        Class2 class2 =
            new Class2
            {
                Foo = "fooBar2"
            };

        Console.WriteLine(class1.Foo);
        Console.WriteLine(class2.Foo);
    }
}

Using Reflector, we can examine the code for the Main method:

.method private hidebysig static void Main(string[] args) cil managed
{
    .entrypoint
    .maxstack 2
    .locals init (
        [0] class ClassLibrary1.Class1 class1,
        [1] class ClassLibrary1.Class2 class2,
        [2] class ClassLibrary1.Class2 <>g__initLocal0)
    L_0000: newobj instance void ClassLibrary1.Class1::.ctor()
    L_0005: stloc.0 
    L_0006: ldloc.0 
    L_0007: ldstr "fooBar"
    L_000c: callvirt instance void ClassLibrary1.Class1::set_Foo(string)
    L_0011: newobj instance void ClassLibrary1.Class2::.ctor()
    L_0016: stloc.2 
    L_0017: ldloc.2 
    L_0018: ldstr "fooBar2"
    L_001d: callvirt instance void ClassLibrary1.Class2::set_Foo(string)
    L_0022: ldloc.2 
    L_0023: stloc.1 
    L_0024: ldloc.0 
    L_0025: callvirt instance string ClassLibrary1.Class1::get_Foo()
    L_002a: call void [mscorlib]System.Console::WriteLine(string)
    L_002f: ldloc.1 
    L_0030: callvirt instance string ClassLibrary1.Class2::get_Foo()
    L_0035: call void [mscorlib]System.Console::WriteLine(string)
    L_003a: ret 
}

Here, we can see that the compiler has generated two references to an instance of Class2 (class2 and <>g__initLocal0), but only one reference to an instance of Class1 (class1).

Now, I'm not very familiar with IL, but it looks like it's instantiating <>g__initLocal0, before setting class2 = <开发者_Python百科;>g__initLocal0.

Why does this happen?

Does it follow then, that there is a performance overhead when using Object Initializers (even if it is very slight)?


Thread-safety and atomicity.

First, consider this line of code:

MyObject foo = new MyObject { Name = "foo", Value = 42 };

Anybody reading that statement might reasonably assume that the construction of the foo object will be atomic. Before the assignment the object doesn't exist at all. Once the assignment has completed the object exists and is in the expected state.

Now consider two possible ways of translating that code:

// #1
MyObject foo = new MyObject();
foo.Name = "foo";
foo.Value = 42;

// #2
MyObject temp = new MyObject();  // temp will be a compiler-generated name
temp.Name = "foo";
temp.Value = 42;
MyObject foo = temp;

In the first case the foo object is instantiated on the first line, but it won't be in the expected state until the final line has finished executing. What happens if another thread tries to access the object before the last line has executed? The object will be in a semi-initialised state.

In the second case the foo object doesn't exist until the final line when it is assigned from temp. Since reference assignment is an atomic operation this gives exactly the same semantics as the original, single-line assignment statement. ie, The foo object never exists in a semi-initialised state.


Luke's answer is both correct and excellent, so good on you. It is not, however, complete. There are even more good reasons why we do this.

The specification is extremely clear that this is the correct codegen; the specification says that an object initializer creates a temporary, invisible local which stores the result of the expression. But why did we spec it that way? That is, why is it that

Foo foo = new Foo() { Bar = bar };

means

Foo foo;
Foo temp = new Foo();
temp.Bar = bar;
foo = temp;

and not the more straightforward

Foo foo = new Foo();
foo.Bar = bar;

Well, as a purely practical matter, it's always easier to specify the behaviour of an expression as based on its contents, not its context. For this specific case though, suppose we specified that this was the desired codegen for assignment to a local or field. In that case, foo would be definitely assigned after the (), and therefore could be used in the initializer. Do you REALLY want

Foo foo = new Foo() { Bar = M(foo) };

to be legal? I hope not. foo is not definitely assigned until after the initialization is done.

Or, consider properties.

Frob().MyFoo = new Foo() { Bar = bar };

This has to be

Foo temp = new Foo();
temp.Bar = bar;
Frob().MyFoo = temp;

and not

Frob().MyFoo = new Foo();
Frob().MyFoo.Bar = bar;

because we don't want Frob() called twice and we don't want property MyFoo accessed twice, we want them each accessed once.

Now, in your particular case, we could write an optimizing pass that detects that the extra local is unnecessary and optimize it away. But we have other priorities, and the jitter probably does a good job of optimizing locals.

Good question. I've been meaning to blog this one for a while.


For the Why: could be that it's done to ensure that no "known" reference to a not (fully) initialized object (from the language's point of view) exists? Something like (pseudo-)constructor semantics for the object initializer? But that's just an idea.. and I can't imagine a way to use the reference and access the not initialized object besides in a multi-threaded environment.

EDIT: too slow..

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜