Types in Bytecode
I've been working for some time on (Java) Bytecode,开发者_开发技巧 however, it had never occurred to me to ask why are some instructions typed? I understand that in an ADD operation, we need to distinguish between an integer addition and a FP addition (that's why we have IADD and FADD). However, why do we need to distinguish between ISTORE and FSTORE? They both involve the exact same operation, which is moving 32 bits from the stack to a local variable position?
The only answer I can think of is for type-safety, to prevent this: (ILOAD, ILOAD, FADD). However, I believe that type-safety is already enforced at the Java language level. OK, the Class file format is not directly coupled with Java, so is this a way to enforce type-safety for languages that do not support it? Any thought? Thank you.
EDIT: to follow up on Reedy's answer. I wrote this minimal program:
public static void main(String args[])
{
int x = 1;
}
which compiled to:
iconst_1
istore_1
return
using a bytecode editor, I changed the second instruction:
iconst_1
fstore_1
return
and it returned a java.lang.VerifyError: Expecting to find float on stack.
I wonder, if on the stack there's no information on the type, just bits, how did the FSTORE instruction knew that it was dealing with a int and not a float?
Note: I couldn't find a better title for this question. Feel free to improve it.
These instructions are typed to ensure the program is typesafe. When loading a class the virtual machine performs verification on the bytecodes to ensure that, for example, a float isn't passed as an argument to a method expecting an integer. This static verification requires that the verifier can determine the types and number of values on the stack for any given execution path. The load and store instructions need the type tag because the local variables in the stack frames are not typed (i.e. you can istore to a local variable and later fstore to the same position). The type tags on the instructions allow the verifier to know what type of value is stored in each local variable.
The verifier looks at each opcode in the method and keeps track of what types will be on the stack and in the local variables after executing each one. You are right that this is another form of type checking and does duplicate some of the checks done by the java compiler. The verification step prevents loading of any code that would cause the VM to execute an illegal instruction and ensures the safety properties of the Java platform without incurring the large runtime penalty of checking types before each operation. Runtime type checking for each opcode would be a performance hit each time the method is executed, but the static verification is done only once when the class is loaded.
Case 1:
Instruction Verification Stack Types Local Variable Types
----------------------- --------------- ---------------------- -----------------------
<method entry> OK [] 1: none
iconst_1 OK [int] 1: none
istore_1 OK [] 1: int
return OK [] 1: int
Case 2:
Instruction Verification Stack Types Local Variable Types
----------------------- --------------- ---------------------- -----------------------
<method entry> OK [] 1: none
iconst_1 OK [int] 1: none
fstore_1 Error: Expecting to find float on stack
The error is given because the verifier knows that fstore_1 expects a float on the stack but the result of executing the previous instructions leaves an int on the stack.
This verification is done without executing the opcodes, rather it is done by looking at the types of the instruction, just like the java compiler gives an error when you write (Integer)"abcd"
. The compiler doesn't have to run the program to know that "abcd"
is a string and can't be cast to Integer
.
To answer your first question with my best guess: these bytecodes are different because they may require different implementations. For example, a particular architecture may keep integer operands on the main stack, but floating-point operands in hardware registers.
To answer your second question, VerifyError is thrown when the class is loaded, not when it's executed. The verification process is described here; note pass #3.
Geoff Reedy explained in his answer what the verifier does when a class is loaded. I just want to add that you can disable the verifier using a JVM parameter. This is not recommended!
For your example program (with iconst and fstore), the result of running with verification disabled is a VM error that halts the JVM with the following message:
=============== DEBUG MESSAGE: illegal bytecode sequence - method not verified ================
#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
# EXCEPTION_PRIV_INSTRUCTION (0xc0000096) at pc=0x00a82571, pid=2496, tid=3408
#
# Java VM: Java HotSpot(TM) Client VM (1.5.0_15-b04 mixed mode, sharing)
# Problematic frame:
# j BytecodeMismatch.main([Ljava/lang/String;)V+0
#
...
All bytecode must be provably typesafe with a static data flow analysis as mentioned above. However, this doesn't really explain why instructions like _store have different types, since the type can be inferred from the type of the value on the stack. In fact, there are some instructions like pop, dup, and swap that do exactly that and operate on multiple types. Why some instructions are typed and others aren't is something that can only be explained by the original developers of Java.
精彩评论