Virtually free - JVM callsite optimization by example
14 Dec 2015Recently, this post by Swift creator Chris Lattner has been making the rounds, which does a nice job of detailing method dispatch in various languages. I thought I would add some color on how the Hotspot JIT can safely turn a virtual call into a static one.
Let say we have a simple class Foo
public class Foo {
protected int x;
public Foo(int x) {
this.x = x;
}
public int function() {
return x + 1;
}
}
And another class Bar
which extends Foo
and implements its own function
public class Bar extends Foo {
public Bar(int x) {
super(x);
}
@Override
public int function() {
return x + 2;
}
}
Now we’ll create a method that calls Foo.function
, and then give that method a workout:
import java.util.Random;
public class CallTest {
static Random random = new Random();
public static int call(Foo f) {
return f.function();
}
public static void main(String[] args) {
long res = 0;
for(int i = 0; i < 100000; i++) {
res += call(new Foo(random.nextInt(150)));
}
new Bar(0);
for(int i = 0; i < 100000; i++) {
res += call(new Foo(random.nextInt(150)));
}
System.out.println(res);
}
}
We can run this via:
jackson@serv cha $ java -XX:+UnlockDiagnosticVMOptions \
-XX:CompileCommand=print,CallTest.call -XX:-UseCompressedOops CallTest
1. As always, if you want to follow along you’ll need to follow these steps to be able to see assembly on your JVM
The first c2 compile of call
looks like this:
# {method} {0x00007f19b8ecc4e8} 'call' '(LFoo;)I' in 'CallTest'
# parm0: rsi:rsi = 'Foo'
# [sp+0x20] (sp of caller)
mov %eax,-0x14000(%rsp)
push %rbp
sub $0x10,%rsp #*synchronization entry
# - CallTest::call@-1 (line 8)
mov 0x10(%rsi),%eax # implicit exception: dispatches to 0x00007f1d7d113a3d
inc %eax #*iadd
# - Foo::function@5 (line 9)
# - CallTest::call@1 (line 8)
add $0x10,%rsp
pop %rbp
test %eax,0x149395c4(%rip) # 0x00007f1d91a4d000
retq
Notice that it has simply inlined Foo.function
. At first glance, this seems odd, as this code is not correct if passed a Bar
, which is valid input. Whats going on?
The trick here is that Hotspot doesn’t know about the contents of Bar
yet, because classes are loaded lazily. When it came time to compile call
it saw that there was no subclass implementation overriding Foo.function
, so it just made the virtual call static (and subsequently inlined it). This is known as Class Hierarchy Analysis (CHA), and is an important optimization to clean up Java’s virtual by default model.
This only works, however, if there is some way of invalidating the method if this assumption not longer becomes true (eg this example). This is handled in doCall.cpp:
// Identify possible target method and inlining style
ciMethod* Compile::optimize_inlining(ciMethod* caller, int bci,
ciInstanceKlass* klass,
ciMethod* callee,
const TypeOopPtr* receiver_type,
bool check_access) {
// only use for virtual or interface calls
(...)
ciInstanceKlass* calling_klass = caller->holder();
ciMethod* cha_monomorphic_target = callee->find_monomorphic_target(
calling_klass, klass, actual_receiver, check_access);
(...)
if (cha_monomorphic_target != NULL) {
// Hardwiring a virtual.
// If we inlined because CHA revealed only a single target method,
// then we are dependent on that target method not getting overridden
// by dynamic class loading. Be sure to test the "static" receiver
// dest_method here, as opposed to the actual receiver, which may
// falsely lead us to believe that the receiver is final or private.
dependencies()->assert_unique_concrete_method(actual_receiver,
cha_monomorphic_target);
return cha_monomorphic_target;
}
When we get to loading Bar
, our compile of call
is discarded due to this failed dependency. If you have a debug jvm, you can see this happening by running with -XX:+TraceDependencies
Failed dependency of type unique_concrete_method
context = Foo
method = {method} {0x7f04a0f1fc58} 'function' '()I' in public synchronized 'Foo'
witness = Bar
code: nmethod 760 36 4 CallTest::call (5 bytes)
Marked for deoptimization
context = Foo
dependee = Bar
context supers = 2, interfaces = 0
Compiled method (c2) 760 36 4 CallTest::call (5 bytes)
(...)
Dependencies:
Dependency of type unique_concrete_method
context = Foo
method = {method} {0x7f04a0f1fc58} 'function' '()I' in public synchronized 'Foo'
[nmethod<=klass]Foo
Failed dependency of type leaf_type
context = Foo
witness = Bar
code: nmethod 760 28 2 CallTest::call (5 bytes)
Now when Hotspot gets around to compiling call
again in the second loop, it will correct this:
# {method} {0x00007f19b8ecc4e8} 'call' '(LFoo;)I' in 'CallTest'
# parm0: rsi:rsi = 'Foo'
# [sp+0x20] (sp of caller)
mov %eax,-0x14000(%rsp)
push %rbp
sub $0x10,%rsp #*synchronization entry
# - CallTest::call@-1 (line 8)
mov 0x8(%rsi),%r10 # implicit exception: dispatches to 0x00007f1d7d1144e5
movabs $0x7f19b8ecd028,%r11 # {metadata('Foo')}
cmp %r11,%r10
jne 0x00007f1d7d1144d0 #*invokevirtual function
# - CallTest::call@1 (line 8)
mov 0x10(%rsi),%eax
inc %eax #*iadd
# - Foo::function@5 (line 9)
# - CallTest::call@1 (line 8)
add $0x10,%rsp
pop %rbp
test %eax,0x14938b31(%rip) # 0x00007f1d91a4d000
retq
This verison looks similar, but there now is a class guard around the inlining of Foo.function
. The other branch, which I’ve ommitted here, jumps back to the interpreter. Note that although this works for both classes, the profiled type information has led Hotspot to only optimize this for Foo
.
If you’re interested in learning more, I highly recommend Alexy Shipilev’s epic rundown of Java method dispatch, which covers this and more in detail.
(Standard self-plug: If you enjoy these sorts of random details, follow me on twitter)