How does callvirt work under the hood?
I am trying to understand how the CLR implements reference types and polymorphism. I have referred to Don Box's Essential .Net Vol 1 which is a great help to calrify most of the stuff. But I am stuck/confused by the following issue when I tried to play around with some IL code to understand better.
I will try to explain the problem as best as I can. Consider the following code
class Base
{
public void m()
{
Console.WriteLine("Base.m")开发者_JAVA技巧;
}
}
class Derived : Base
{
public void m()
{
Console.WriteLine("Derived.m");
}
}
Now consider a simple console application with IL of the main method shown below. I tweaked the IL created by compiler manually to understand and assembled again with ILAsm.exe
.class private auto ansi beforefieldinit Console1.Program
extends [mscorlib]System.Object
{
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 44 (0x2c)
.maxstack 1
.locals init ([0] class Console1.Base d)
nop
newobj instance void Console1.Base::.ctor()
stloc.0
ldloc.0
callvirt instance void Console1.Derived::m()
nop
call string [mscorlib]System.Console::ReadLine()
pop
ret
} // end of method Program::Main
} // end of class Console1.Program
I was expecting this code NOT to run as the object reference is pointing to an object of Base and there is no way the method table of a base object will have an entry for the method m() defined in Derived class.
But magically this code executes the Derived.m()!!
So, there are two questions I don't understand in the above code:
What is the significance of the Type specified in the below IL code? I have tried to experiment by changing this to different types (e.g System.Exception!!) and no errors are reported. Why??
.locals init ([0] class Console1.Base d)
- How exactly does callvirt works? How did the call get routed to Derived.m()?
Thanks in advance!!
Regards, Ajay
My guess is that the jitter realizes that Derived.m
isn't virtual and thus can never point anywhere else. So the callvirt
reduces to a null-check and a call instead of a call through the v-table.
Try making Derived.m
virtual. I bet it'll throw then.
The C# compiler emits callvirt
instructions even when calling a non virtual methods if it can't prove that this!=null
so it gets a null-check. And the jitter is intelligent enough in that case to replace the virtual call by a normal call with a fixed address(or even inline it).
And you should check if you're code is verifiable. I think it isn't.
Your code isn't verifiable (run it through peverify
). I've written a blog post about how callvirt works under-the-hood that might help you understand what it does, and how your code executes.
Bear in mind that the CLR does try to execute non-verifiable code if run as a normal program; only if it actually causes a problem does it bork.
In your example, calling Derived.m()
on an instance of Base
works because the actual run-time binary representation of the object instances is the same; the this
object is basically the same, and no instance fields of the objects are accessed.
Try putting an instance field access into both methods and see what happens...
please note that by default, code executed from the local machine is not verified. This means that invalid code can be written and executed. I suspect your main function will not pass as-is. The PEVerify tool can check an assembly to ensure the code is type-safe, or you can enable these checks for code from the local machine or from a specific location via Security Policy Administration.
The purpose of the type in the locals statement is to declare the type of the local variable. This provides the information needed by the type verifier to verify that member accesses on the local variable are operating on an object of the correct type.
Callvirt could be implemented several ways. The most likely way is in the same way C++ vtables are implemented: An object contains a table of function pointers. Each function is located at a predefined offset in the table. To call the function, the address at the predefined offset is loaded and called. Note that in some cases, the CLR could do additional optimizations if the type of the object is known. Whether this is done, I don't know.
I think this is a side-effect of a JIT compiler optimization. If the m() method was virtual, it would have to generate the machine code to dig the method table pointer out of the object, then make the virtual call. But this method isn't virtual and the JIT compiler already knows the method table pointer for the Derived class. So it bypasses the pointer retrieval and supplies it directly. Making the call work as you observed. You can verify my guess by checking the generated machine code.
Yeah, the IL verifier isn't scoring any points here. You could make it more interesting by having the Derived.m() method tinker with a field that's only declared in Derived. I've seen too much Reflection.Emit code crash with an AccessViolation to be greatly surprised by this. It however may well be intentional, no need to verify IL that crashes anyway. Not sure, exploiting these kind of verification loopholes isn't (yet) common. Thankfully.
For more information about how this works even deeper under the hood, check out this StackExchange question/answer: How does the callvirt .NET instruction work for interfaces?
精彩评论