How does the .NET CLR distinguish between Managed from Unmanaged Pointers?
Everything is ultimately JITed into native machine code, so ultimately, we have a native stack in .NET which the GC needs to scan for object pointers whenever it does a garbage collection.
Now, the question is: How does the .NET garbage collector figure out if a pointer to an object inside the GC heap is actually a managed pointer or a random integer that h开发者_JAVA百科appens to have a value that corresponds to a valid address?
Obviously, if it can't distinguish the two, then there can be memory leaks, so I'm wondering how it works. Or -- dare I say it -- does .NET have the potential to leak memory? :O
As others have pointed out, the GC knows precisely which fields of every block on the stack and the heap are managed references, because the GC and the jitter know the type of everything.
However, your point is well-taken. Imagine an entirely hypothetical world in which there are two kinds of memory management going on in the same process. For example, suppose you have an entirely hypothetical program called "InterMothra Chro-Nagava-Sploranator" written in C++ that uses traditional COM-style reference-counted memory management where everything is just a pointer to process memory, and objects are released by invoking a Release method the correct number of times. Suppose Sploranator hypothetically has a scripting language, JabbaScript, that maintains a garbage-collected pool of objects.
Trouble arises when a JabbaScript object has a reference to a non-managed Sploranator object, and that same Sploranator object has a reference right back. That's a circular reference that cannot be broken by the JabbaScript garbage collector, because it doesn't know about the memory layout of the Sploranator object. So there is the potential here for memory leaks.
One way to solve this problem is to rewrite the Sploranator memory manager so that it allocates its objects out of the managed GC pool.
Another way is to use a heuristic; the GC can dedicate a thread of a processor to scan all of memory looking for integers that happen to be pointers to its objects. That sounds like a lot, but it can omit pages that are uncommitted, pages in its own managed heap, pages that are known to contain only code, and so on. The GC can make a guess that if it thinks an object might be dead, and it cannot find any pointer to that object in any memory outside of its control, then the object is almost certainly dead.
The down side of this heuristic is of course that it can be wrong. You might have an integer that accidentally matches a pointer (though that is less likely in 64 bit land). That would extend the lifetime of the object. But who cares? We are already in the situation where circular references can extend the lifetimes of objects. We're trying to make that situation better, and this heuristic does so. That it is not perfect is irrelevant; it's better than nothing.
The other way it can be wrong is that Sploranator could have encoded the pointer, by, say, flipping all of its bits when storing the value and only flipping it back right before the call. If Sploranator is actively hostile to this GC heuristic strategy then it doesn't work.
Resemblance between the garbage collection strategy outlined here and the actual GC strategy of any product is almost entirely coincidental. Eric's musings about implementation details of garbage collectors of hypothetical non-existing products are for entertainment purposes only.
The garbage collector doesn't need to infer whether a particular byte pattern (whether 4 or 8 bytes) is a pointer or not - it already knows.
In the CLR everything is strongly typed, so the garbage collector knows whether the bytes are an int
, a long
, an object reference, an untyped pointer, etc etc.
The layout of an object in memory is defined at compile type - metadata stored in the assembly gives the type and location of every member of the instance.
The layout of stack frames is similar - the JITter lays out the stack frame when the method is compiled, and keeps track of what kinds of data are stored where. (It's done by the JITter to allow for different optimizations depending on the capabilities of your processor).
When the garbage collector runs, it has access to all this metadata, so it never needs to guess whether a specific bit pattern might be a reference or not.
Eric Lippert's blog is a good place to find out more - References are not addresses would be a place to start.
Well when JITing the code the compiler knows in which places it puts the reference to objects. Whenever you use a field in a method, which holds a reference, it knows that in that place theres a reference. This information can also be preserved when you JIT the code.
Now a reference points to the object. Each object has a pointer to its class (the .GetType()-method). Basically the GC can now take a pointer, follow it, read the type of the object. The type tells you if there are other fields which contain references to other objects. This way the GC can walk the entire stack and heap.
Of course this is a bit over simplified, but the basic principle. And in the end its a implementation-detail. There are certainly other ways and all kinds of tricks to do this efficiently.
Update after comment: The pointer on the stack points to a object on the heap. Every object has a header, which also contains a pointer to its type-info. So you can dereference the pointer on the stack, there dereference the pointer to the object-info to find out what kind of object it is.
Remember that all managed memory is managed by the CLR. Any actual managed reference was created by the CLR. It knows what it created and what it didn't.
If you really feel you must know the details of the implementation, then you should read CLR via C# by Jeffrey Richter. The answer is not simple - it's quote a bit more than can be answered on SO.
When you create a new reference type object in .NET you are automatically "registering" it with the CLR and its GC. There is no way to inject random value types into this process. In other words:
The CLR does not maintain some large, disorganized heap of pointers mixed with value types. It just tracks CLR-created objects (for garbage collection purposes anyway.) Any value type will be short-lived on the stack or be a member of a class instance. There is no potential for confusion the GC.
Have a look at Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework
(Some of the technical details might be a bit dated but the structure described is valid.)
Some brief points from the article....
When a process is initialized, the runtime reserves a contiguous region of address space that initially has no storage allocated for it. This address space region is the managed heap. The heap also maintains a pointer, which I'll call the NextObjPtr. This pointer indicates where the next object is to be allocated within the heap. Initially, the NextObjPtr is set to the base address of the reserved address space region.
...
Every application has a set of roots. Roots identify storage locations, which refer to objects on the managed heap or to objects that are set to null. For example, all the global and static object pointers in an application are considered part of the application's roots. In addition, any local variable/parameter object pointers on a thread's stack are considered part of the application's roots. Finally, any CPU registers containing pointers to objects in the managed heap are also considered part of the application's roots. The list of active roots is maintained by the just-in-time (JIT) compiler and common language runtime, and is made accessible to the garbage collector's algorithm.
...
When the garbage collector starts running, it makes the assumption that all objects in the heap are garbage. In other words, it assumes that none of the application's roots refer to any objects in the heap. Now, the garbage collector starts walking the roots and building a graph of all objects reachable from the roots. For example, the garbage collector may locate a global variable that points to an object in the heap.
On the question...
...or a random integer that happens to have a value that corresponds to a valid address?....memory leaks ?
If the object is not reachable the GC will destroy it regardless.
According to the "CLR via C#" book, the runtime knows exactly where will find the references/pointers by inspecting the "method's internal table". What this internal table holds in the microsoft's implementation is unknown, but it can accuretly identify call frames on the stack, local variables and even what kind of value the registers hold for each EIP address.
The mono implementation used a conservative scanning, which means that treated every value on the stack as potential pointer. That not only translates to memory leaks, but also (since it cannot update those values) the objects identified by this, are treated as pinned (unmovable by the GC compactor) and that leads to memory fragmentation.
Now mono has the option of "Precise Stack Marking" which uses GCMaps. You can read more for it here http://www.mono-project.com/Generational_GC#Precise_Stack_Marking
Note that this implementation is not accurate as it is the MS one, since it continues to treat the current frame conservatively.
References have headers, so it's not just a random integer.
精彩评论