How the JVM compares your strings using the craziest x86 instruction you've never heard of
01 Sep 2016We’ve all probably seen Java’s String comparison function before. It compares strings by the first differing character, falling back to the length difference when they are identical up to the end of the shorter string:
But did you know there is also a secret second implementation? String.compareTo
is one of a few methods that is important enough to also get a special hand-rolled assembly version. On my machine, it something like this:
The code that generates this, MacroAssembler::string_compare
in macroAssembler_x86.cpp is well-documented for the curious. Its worth noting that there is an even fancier version for modern systems using AVX2 (with its 256bit vectorized registers) that I’m not going to cover here.
PCMPESTRIwhat?
Introduced in SSE4.2, pcmpestri
is a member of the pcmpxstrx
family of vectorized string comparison instructions. With a control byte to specify options for their complex functionality, they are complicated enough to get their own subsection in the x86 ISR. Intel even provides a flow diagram for our viewing pleasure:
Now that’s really putting the C in CISC!
The option bits for the control byte are specified as follows:
-------0b 128-bit sources treated as 16 packed bytes.
-------1b 128-bit sources treated as 8 packed words.
------0-b Packed bytes/words are unsigned.
------1-b Packed bytes/words are signed.
----00--b Mode is equal any.
----01--b Mode is ranges.
----10--b Mode is equal each.
----11--b Mode is equal ordered.
---0----b IntRes1 is unmodified.
---1----b IntRes1 is negated (1’s complement).
--0-----b Negation of IntRes1 is for all 16 (8) bits.
--1-----b Negation of IntRes1 is masked by reg/mem validity.
-0------b Index of the least significant, set, bit is used
(regardless of corresponding input element validity).
IntRes2 is returned in least significant bits of XMM0.
-1------b Index of the most significant, set, bit is used
(regardless of corresponding input element validity).
Each bit of IntRes2 is expanded to byte/word.
0-------b This bit currently has no defined effect, should be 0.
1-------b This bit currently has no defined effect, should be 0.
1. If you want to learn more, Section 4.1 of the Instruction Set Reference covers these options in detail.
compareTo
uses 0x19
, which means doing the “equal each” (aka string comparison) operation across 8 unsigned words (thanks UTF-16!) with a negated result. This monster of an instruction takes in 4 registers of input: the 2 strings themselves as parameters, plus their lengths in %rax
and %rdx
(‘e’ meaning explicit length - pcmpistri & pcmpistrm instead look for terminating nulls). The result (the index generated from IntRes2) is placed in %ecx
. And just in case that wasn’t enough, pcmpxstrx
also reappropriate flags as well:
CFlag – Reset if IntRes2 is equal to zero, set otherwise
ZFlag – Set if absolute-value of EDX is < 16 (8), reset otherwise
SFlag – Set if absolute-value of EAX is < 16 (8), reset otherwise
OFlag – IntRes2[0]
AFlag – Reset
PFlag – Reset
With all of out of our way, lets look at the main loop in detail with some setup before it for context:
Going in, %rax%
is the minimum of the strings’ lengths, and %rdx
is that minimum masked by ~0x7
(so 8x the maximum number of iterations). It then bumps the pointers in the character arrays (%rsi
and %rdi
) by that many characters and then negates %rax
, so the indexing into the array in the main loop is actually backwards. After loading 8 characters of the first string into %xmm0
, it then does the comprison against 8 characters of the second, jumping out if CFlag
is set (which means the index of the differing character is in %ecx
), and then adjusts the 2 length registers and checks to see if this was the last iteration (which would make %rdx
0). How does a negative number make a valid length? Oops, almost forgot to mention that pcmpestri
actually considers the lengths to be the absolute value:
The length of each input is interpreted as being the absolute-value of the value in
the length register.
Following the main loop, there is a fallthrough case to check the remaining characters when the minimum length isn’t a multiple of 8, and then the final case of diffing the lengths when the strings are identical up the shortest’s length. Phew!
More matching fun
If this wasn’t complicated enough for you, have a quick gander at the indexOf implementations (there are 2, depending on the size of the matching string), which use control byte 0x0d
, which does “equal ordered” (aka substring) matching.
As always, if you are crazy enough to find wierd JVM internals interesting you should totally follow me on twitter