开发者

How does the CLR (.NET) internally allocate and pass around custom value types (structs)?

Question:

Do all CLR value types, including user-defined structs, live on the evaluation stack exclusively, meaning that they will never need to be reclaimed by the garbage collector, or are there cases where they are garbage-collected?

Background:

I have previously asked a question on SO about the impact that a fluent interface has on the runtime performance of a .NET application. I was particuarly worried that creating a large number of very short-lived temporary objects would negatively affect runtime performance through more frequent garbage-collection.

Now it has occured to me that if I declared those temporary objects' types as struct (ie. as user-defined value types) instead of class, the garbage collector might not be involved at all if it turns out that all value types live exclusively on the evaluation stack.

(This occured to me mainly because I was thinking of C++'s way of handling local variables. Usually being automatic (auto) variables, they are allocated on the stack and therefore freed when the program execution gets back to the caller — no dynamic memory management via new/delete involved at all. I thought the CLR just might handle structs similarly.)

What I've found out so far:

I did a brief experiment to see what the differences are in the CIL generated for user-defined value types and reference types. This is my C# code:

struct SomeValueType     {  public int X;  }
class SomeReferenceType  {  public int X;  }
.
.
static void TryValueType(SomeValueType vt) { ... }
static void TryReferenceType(SomeReferenceType rt) { ... }
.
.
var vt = new SomeValueType { X = 1 };
var rt = new SomeReferenceType { X = 2 };
TryValueType(vt);
TryReferenceType(rt);

And this is the CIL generated for the last four lines of code:

.locals init
(
    [0] valuetype SomeValueType vt,
    [1] class SomeReferenceType rt,
    [2] valuetype SomeValueType <>g__initLocal0,  //
    [3] class SomeReferenceType <>g__initLocal1,  // why are these generated?
    [4] valuetype SomeValueType CS$0$0000         //
)

L_0000: ldloca.s CS$0$0000
L_0002: initobj SomeValueType  // no newobj required, instance already allocated
L_0008: ldloc.s CS$0$0000
L_000a: stloc.2
L_000b: ldloca.s <>g__initLocal0
L_000d: ldc.i4.1 
L_000e: stfld int32 SomeValueType::X
L_0013: ldloc.2 
L_0014: stloc.0 
L_0015: newobj instance void SomeReferenceType::.ctor()
L_001a: stloc.3
L_001b: ldloc.3 
L_001c: ldc.i4.2 
L_001d: stfld int32 SomeReferenceType::X
L_0022: ldloc.3 
L_0023: stloc.开发者_Go百科1 
L_0024: ldloc.0 
L_0025: call void Program::TryValueType(valuetype SomeValueType)
L_002a: ldloc.1 
L_002b: call void Program::TryReferenceType(class SomeReferenceType)

What I cannot figure out from this code is this:

  • Where are all those local variables mentioned in the .locals block allocated? How are they allocated? How are they freed?

  • (Off-topic: Why are so many anonymous local variables needed and copied to-and-fro, only to initialize my two local variables rt and vt?)


Your accepted answer is wrong.

The difference between value types and reference types is primarily one of assignment semantics. Value types are copied on assignment - for a struct, that means copying the contents of all fields. Reference types only copy the reference, not the data. The stack is an implementation detail. The CLI spec promises nothing about where an object is allocated, and it's a bad idea to depend on behaviour that isn't in the spec.

Value types are characterised by their pass-by-value semantics but that does not mean they actually get copied by the generated machine code.

For example, a function that squares a complex number can accept the real and imaginary components in two floating point registers and return its result in two floating point registers. The code generator optimizes away all of the copying.

Several people had explained why this answer was wrong in comments below it but some moderator has deleted all of them.

Temporary objects (locals) will live in the GC generation 0. The GC is already smart enough to free them as soon as they go out of scope. You do not need to switch to struct instances for this.

This is complete nonsense. The GC sees only the information available at run-time, by which point all notions of scope have disappeared. The GC will not collect anything "as soon as it goes out of scope". The GC will collect it at some point after it has become unreachable.

Mutable value types already have a tendency to lead to bugs because it's hard to understand when you're mutating a copy vs. the original. But introducing reference properties on those value types, as would be the case with a fluent interface, is going to to be a mess, because it will appear that some parts of the struct are getting copied but others aren't (i.e. nested properties of reference properties). I can't recommend against this practice strongly enough, it's liable to lead to all kinds of maintenance headaches in the long haul.

Again, this is complete nonsense. There is nothing wrong with having references inside a value type.

Now, to answer your question:

Do all CLR value types, including user-defined structs, live on the evaluation stack exclusively, meaning that they will never need to be reclaimed by the garbage-collector, or are there cases where they are garbage-collected?

Value types certainly do not "live on the evaluation stack exclusively". The preference is to store them in registers. If necessary, they will be spilled to the stack. Sometimes they are even boxed on the heap.

For example, if you write a function that loops over the elements of an array then there is a good chance that the int loop variable (a value type) will live entirely in a register and never be spilled to the stack or written into the heap. This is what Eric Lippert (of the Microsoft C# team, who wrote of himself "I don’t know all the details" regarding .NET's GC) meant when he wrote that value types can be spilled to the stack when "the jitter chooses to not enregister the value". This is also true of larger value types (like System.Numerics.Complex) but there is a higher chance of larger value types not fitting in registers.

Another important example where value types do not live on the stack is when you're using an array with elements of a value type. In particular, the .NET Dictionary collection uses an array of structs in order to store the key, value and hash for each entry contiguously in memory. This dramatically improves memory locality, cache efficiency and, consequently, performance. Value types (and reified generics) are the reason why .NET is 17× faster than Java on this hash table benchmark.

I did a brief experiment to see what the differences are in the CIL generated...

CIL is a high-level intermediate language and, consequently, will not give you any information about register allocation and spilling to the stack and does not even give you an accurate picture of boxing. Looking at CIL can, however, let you see how the front-end C# or F# compiler boxes some value types as it translates even higher-level constructs like async and comprehensions into CIL.

For more information on garbage collection I highly recommend The Garbage Collection Handbook and The Memory Managment Reference. If you want a deep dive into the internal implementation of value types in VMs then I recommend reading the source code of my own HLVM project. In HLVM, tuples are value types and you can see the assembler generated and how it uses LLVM to keep the fields of value types in registers whenever possible and optimizes away unnecessary copying, spilling to the stack only when necessary.


Please consider the following:

  1. The difference between value types and reference types is primarily one of assignment semantics. Value types are copied on assignment - for a struct, that means copying the contents of all fields. Reference types only copy the reference, not the data. The stack is an implementation detail. The CLI spec promises nothing about where an object is allocated, and it's ordinarily a dangerous idea to depend on behaviour that isn't in the spec.

  2. Temporary objects (locals) will live in the GC generation 0. The GC is already smart enough to free them (almost) as soon as they go out of scope - or whenever it is actually most efficient to do so. Gen0 runs frequently enough that you do not need to switch to struct instances for efficiently managing temporary objects.

  3. Mutable value types already have a tendency to lead to bugs because it's hard to understand when you're mutating a copy vs. the original. Many of the language designers themselves recommend making value types immutable whenever possible for exactly this reason, and the guidance is echoed by many of the top contributors on this site.

Introducing *reference properties* on those value types, as would be the case with a fluent interface, further violates the [Principle of Least Surprise][3] by creating inconsistent semantics. The expectation for value types is that they are copied, *in their entirety*, on assignment, but when reference types are included among their properties, you will actually only be getting a shallow copy. In the worst case you have a mutable struct containing *mutable* reference types, and the consumer of such an object will likely erroneously assume that one instance can be mutated without affecting the other.

There are always exceptions - [some of them in the framework itself][4] - but as a general rule of thumb, I would not recommend writing "optimized" code that (a) depends on private implementation details and (b) that you know will be difficult to maintain, *unless* you (a) have full control over the execution environment and (b) have actually profiled your code and verified that the optimization would make a significant difference in latency or throughput.
  1. The g_initLocal0 and related fields are there because you are using object initializers. Switch to parameterized constructors and you'll see those disappear.

Value types are typically allocated on the stack, and reference types are typically allocated on the heap, but that is not actually part of the .NET specification and is not guaranteed (in the first linked post, Eric even points out some obvious exceptions).

More importantly, it's simply incorrect to assume that the stack being generally cheaper than the heap automatically means that any program or algorithm using stack semantics will run faster or more efficiently than a GC-managed heap. There a number of papers written on this topic and it is entirely possible and often likely for a GC heap to outperform stack allocation with a large number of objects, because modern GC implementations are actually more sensitive to the number of objects that don't need freeing (as opposed to stack implementations which are entirely pinned to the number of objects on the stack).

In other words, if you've allocated thousands or millions of temporary objects - even if your assumption about value types having stack semantics holds true on your particular platform in your particular environment - utilizing it could still make your program slower!

Therefore I'll return to my original advice: Let the GC do its job, and don't assume that your implementation can outperform it without a full performance analysis under all possible execution conditions. If you start with clean, maintainable code, you can always optimize later; but if you write what you believe to be performance-optimized code at the cost of maintainability and later turn out to be wrong in your performance assumptions, the cost to your project will be far greater in terms of maintenance overhead, defect counts, etc.


It is a JIT compiler implementation detail where it will allocate the .locals. Right now, I don't know any that doesn't allocate them on a stack frame. They are "allocated" by adjusting the stack pointer and "freed" by resetting it back. Very fast, hard to improve. But who knows, 20 years from now we might all be running machines with CPU cores that are optimized to run only managed code with a completely different internal implementation. Probably cores with a ton of registers, the JIT optimizer already uses registers to store locals now.

The temporaries are emitted by the C# compiler to provide some minimum consistency guarantees in case object initializers throw exceptions. It prevents your code from ever seeing a partially initialized object in a catch or finally block. Also used in the using and lock statements, it prevents the wrong object from being disposed or unlocked if you replace the object reference in your code.


Structures are value types and allocated on the stack when used for local variables. But if you cast a local variable to Object or an interface, the value is boxed and allocated on the heap.

In consequence structures are freed after they fall out of scope besides they are boxed and moved to the heap after that the garbage collector becomes responsible for freeing them when there is no longer any reference to the object.

I am not sure about the reason for all the compiler generated local variables but I assume they are used because you use object initializers. The objects are first initialized using a compiler generated local variable and only after complete execution of the object initializers copied to your local variable. This insures that you will never see an instance with only some of the object initializers executed.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜