Virtually free - JVM callsite optimization by example

Recently, this post by Swift creator Chris Lattner has been making the rounds, which does a nice job of detailing method dispatch in various languages. I thought I would add some color on how the Hotspot JIT can safely turn a virtual call into a static one.

Let say we have a simple class Foo

public class Foo {
  protected int x;

  public Foo(int x) {
    this.x = x;
  }

  public int function() {
    return x + 1;
  }
}

And another class Bar which extends Foo and implements its own function

public class Bar extends Foo {
  public Bar(int x) {
    super(x);
  }
  @Override
  public int function() {
    return x + 2;
  }
}

Now we’ll create a method that calls Foo.function, and then give that method a workout:

import java.util.Random;

public class CallTest {
  static Random random = new Random();

  public static int call(Foo f) {
    return f.function();
  }

  public static void main(String[] args) {
    long res = 0;
    for(int i = 0; i < 100000; i++) {
      res += call(new Foo(random.nextInt(150)));
    }
    new Bar(0);
    for(int i = 0; i < 100000; i++) {
      res += call(new Foo(random.nextInt(150)));
    }
    System.out.println(res);
  }
}

We can run this via:

jackson@serv cha $ java -XX:+UnlockDiagnosticVMOptions \ 
 -XX:CompileCommand=print,CallTest.call -XX:-UseCompressedOops CallTest

1. As always, if you want to follow along you’ll need to follow these steps to be able to see assembly on your JVM

The first c2 compile of call looks like this:

# {method} {0x00007f19b8ecc4e8} 'call' '(LFoo;)I' in 'CallTest'
# parm0:    rsi:rsi   = 'Foo'
#           [sp+0x20]  (sp of caller)
mov    %eax,-0x14000(%rsp)
push   %rbp
sub    $0x10,%rsp         #*synchronization entry
                          # - CallTest::call@-1 (line 8)

mov    0x10(%rsi),%eax    # implicit exception: dispatches to 0x00007f1d7d113a3d
inc    %eax               #*iadd
                          # - Foo::function@5 (line 9)
                          # - CallTest::call@1 (line 8)

add    $0x10,%rsp
pop    %rbp
test   %eax,0x149395c4(%rip)        # 0x00007f1d91a4d000
retq

Notice that it has simply inlined Foo.function. At first glance, this seems odd, as this code is not correct if passed a Bar, which is valid input. Whats going on?

The trick here is that Hotspot doesn’t know about the contents of Bar yet, because classes are loaded lazily. When it came time to compile call it saw that there was no subclass implementation overriding Foo.function, so it just made the virtual call static (and subsequently inlined it). This is known as Class Hierarchy Analysis (CHA), and is an important optimization to clean up Java’s virtual by default model.

This only works, however, if there is some way of invalidating the method if this assumption not longer becomes true (eg this example). This is handled in doCall.cpp:

// Identify possible target method and inlining style
ciMethod* Compile::optimize_inlining(ciMethod* caller, int bci,
                                     ciInstanceKlass* klass,
                                     ciMethod* callee,
                                     const TypeOopPtr* receiver_type,
                                     bool check_access) {
  // only use for virtual or interface calls
  (...)
  ciInstanceKlass*   calling_klass = caller->holder();
  ciMethod* cha_monomorphic_target = callee->find_monomorphic_target(
    calling_klass, klass, actual_receiver, check_access);
  (...)
  if (cha_monomorphic_target != NULL) {
    // Hardwiring a virtual.
    // If we inlined because CHA revealed only a single target method,
    // then we are dependent on that target method not getting overridden
    // by dynamic class loading.  Be sure to test the "static" receiver
    // dest_method here, as opposed to the actual receiver, which may
    // falsely lead us to believe that the receiver is final or private.
    dependencies()->assert_unique_concrete_method(actual_receiver,
      cha_monomorphic_target);
    return cha_monomorphic_target;
  }

When we get to loading Bar, our compile of call is discarded due to this failed dependency. If you have a debug jvm, you can see this happening by running with -XX:+TraceDependencies

Failed dependency of type unique_concrete_method
  context = Foo
  method  = {method} {0x7f04a0f1fc58} 'function' '()I' in public synchronized 'Foo'
  witness = Bar
  code: nmethod    760   36       4       CallTest::call (5 bytes)

Marked for deoptimization
  context = Foo
  dependee = Bar
  context supers = 2, interfaces = 0
Compiled method (c2)     760   36       4       CallTest::call (5 bytes)
 (...)
Dependencies:
Dependency of type unique_concrete_method
  context = Foo
  method  = {method} {0x7f04a0f1fc58} 'function' '()I' in public synchronized 'Foo'
   [nmethod<=klass]Foo
Failed dependency of type leaf_type
  context = Foo
  witness = Bar
  code: nmethod    760   28       2       CallTest::call (5 bytes)

Now when Hotspot gets around to compiling call again in the second loop, it will correct this:

# {method} {0x00007f19b8ecc4e8} 'call' '(LFoo;)I' in 'CallTest'
# parm0:    rsi:rsi   = 'Foo'
#           [sp+0x20]  (sp of caller)
mov    %eax,-0x14000(%rsp)
push   %rbp
sub    $0x10,%rsp         #*synchronization entry
                          # - CallTest::call@-1 (line 8)
mov    0x8(%rsi),%r10     # implicit exception: dispatches to 0x00007f1d7d1144e5
  
movabs $0x7f19b8ecd028,%r11  #   {metadata('Foo')}
cmp    %r11,%r10
jne    0x00007f1d7d1144d0 #*invokevirtual function
                          # - CallTest::call@1 (line 8)

mov    0x10(%rsi),%eax
inc    %eax               #*iadd
                          # - Foo::function@5 (line 9)
                          # - CallTest::call@1 (line 8)

add    $0x10,%rsp
pop    %rbp
test   %eax,0x14938b31(%rip)        # 0x00007f1d91a4d000
retq

This verison looks similar, but there now is a class guard around the inlining of Foo.function. The other branch, which I’ve ommitted here, jumps back to the interpreter. Note that although this works for both classes, the profiled type information has led Hotspot to only optimize this for Foo.

If you’re interested in learning more, I highly recommend Alexy Shipilev’s epic rundown of Java method dispatch, which covers this and more in detail.

(Standard self-plug: If you enjoy these sorts of random details, follow me on twitter)