How much instruction-level optimisation can a JIT apply?
To what extent can a JIT replace platform independent code with processor-specific machine instructions?
For example, the x86 instruction set includes the 开发者_StackOverflow中文版BSWAP
instruction to reverse a 32-bit integer's byte order. In Java the Integer.reverseBytes()
method is implemented using multiple bitwise masks and shifts, even though in x86 native code it could be implemented in a single instruction using BSWAP
. Are JITs (or static compilers for that matter) able to make the change automatically or is it too complex or not worth it due to a poor speed/time tradeoff?
(I know that this is in most cases a micro-optimisation, but I'm interested none the less.)
For this case, yes, the hotspot server compiler could do this optimization. The reverseBytes() methods are registered as vmIntrinsics in hotspot. When jit compiler compile these methods, it will generate a special IR node, not compile the whole method. And this node will be translated into 'bswap' in x86. see src/share/vm/opto/library_call.cpp
//---------------------------- inline_reverseBytes_int/long/char/short-------------------
// inline Integer.reverseBytes(int)
// inline Long.reverseBytes(long)
// inline Character.reverseBytes(char)
// inline Short.reverseBytes(short)
bool LibraryCallKit::inline_reverseBytes(vmIntrinsics::ID id) {
assert(id == vmIntrinsics::_reverseBytes_i || id == vmIntrinsics::_reverseBytes_l ||
id == vmIntrinsics::_reverseBytes_c || id == vmIntrinsics::_reverseBytes_s,
"not reverse Bytes");
if (id == vmIntrinsics::_reverseBytes_i && !Matcher::has_match_rule(Op_ReverseBytesI)) return false;
if (id == vmIntrinsics::_reverseBytes_l && !Matcher::has_match_rule(Op_ReverseBytesL)) return false;
if (id == vmIntrinsics::_reverseBytes_c && !Matcher::has_match_rule(Op_ReverseBytesUS)) return false;
if (id == vmIntrinsics::_reverseBytes_s && !Matcher::has_match_rule(Op_ReverseBytesS)) return false;
_sp += arg_size(); // restore stack pointer
switch (id) {
case vmIntrinsics::_reverseBytes_i:
push(_gvn.transform(new (C, 2) ReverseBytesINode(0, pop())));
break;
case vmIntrinsics::_reverseBytes_l:
push_pair(_gvn.transform(new (C, 2) ReverseBytesLNode(0,pop_pair())));
break;
case vmIntrinsics::_reverseBytes_c:
push(_gvn.transform(new (C, 2) ReverseBytesUSNode(0, pop())));
break;
case vmIntrinsics::_reverseBytes_s:
push(_gvn.transform(new (C, 2) ReverseBytesSNode(0, pop())));
break;
default:
;
}
return true;
}
and src/cpu/x86/vm/x86_64.ad
instruct bytes_reverse_int(rRegI dst) %{
match(Set dst (ReverseBytesI dst));
format %{ "bswapl $dst" %}
opcode(0x0F, 0xC8); /*Opcode 0F /C8 */
ins_incode( REX_reg(dst), OpcP, opc2_reg(dst) );
ins_pipe( ialu_reg );
%}
精彩评论