We have run some performance tests to try to evaluate the impact of introducing reflective capabilities into a Java interpreter. Like the other few papers in the literature on reflection that provide performance data, we have preferred to evaluate the overhead of reflection on each particular operation, instead of running standard benchmarks. In fact, there are no standard benchmarks to evaluate the impact of reflection. Existing general-purpose benchmarks usually focus on optimization of complex patterns of control flow, which would not be affected by the introduction of interception for objects operations, and calculations on large arrays, which would incur a huge overhead.
This table describes the platforms on which the performance tests were run.
Our tests have been performed on four different platforms, listed in Table 1. On the Solaris platforms, the tests were run in real-time scheduling mode, so as to ensure that no other processes would affect the measured times. On the GNU/Linux platforms, this scheduling mechanism was not available, so we just ensured that the tested hosts were as lightly loaded as possible.
On each host, we have run the same Java program, compiled with Sun JDK's Java compiler, without optimization, to prevent method inlining. The produced bytecodes were executed by different interpreters under different configurations.
We have used Guaranį 1.4.1 and the snapshot of Kaffe 1.0.b1 distributed with it, using the JIT compiler and the interpreter engines. Kaffe and Guaranį were compiled with EGCS 1.1b, with default optimization levels. The program used to perform the tests was the one distributed with Guaranį 1.4.1.
Operation | Description |
emptyloop |
No reflective operation. |
synchronized |
Empty block synchronized on an arbitrary object. |
invokestatic |
Invoke an empty static method that takes no arguments and returns void. |
invokespecial |
Invoke a non-static private do-nothing method that returns void and takes only the implicit this as argument. The same bytecode is used to invoke constructors and, in some cases, final methods. |
invokevirtual |
Invoke an empty method that takes only the implicit this as argument, and returns void. Dynamic binding, performed with a dispatch table, occurs before interception test. |
invokeinterface |
Invoke the same method, but through an object reference of interface type. Dynamic binding is much slower in this case. |
getstatic |
Load a static int field into a variable. |
putstatic |
Store a zero-valued variable in a static int field. |
getfield |
Load a non-static int field into a variable. |
putfield |
Store a zero-valued variable in a non-static int field. |
arraylength |
Load the length of an array of int into a variable. |
iaload |
Load the first element of an array of int into a variable. |
iastore |
Store a zero-initialized variable in the first element of an array of int. |
println |
Print the line ``Hello world!'' to System.err, which was redirected to /dev/null before starting the Virtual Machine. It is a first attempt to estimate the overall impact of introducing interception abilities. |
compile |
Compile the test program itself. Section 5.1 contains a detailed description and analysis. |
This table describes the operation(s) performed within a loop in our performance tests.
For each configuration, we have timed several different operations, described in Table 2. Each operation was timed by running it repeatedly inside a loop, after running it once outside the loop, before starting the timer. This ensures that, before the loop starts, any JIT compilation has already taken place, all the data and code was brought into the cache and, unless the test involves object allocation, the garbage collector will not run.
This inner loop is run repeatedly, with the iteration count being adjusted at every outer iteration, aiming at a running time longer than 1 second. Since the operations that read the clock at the beginning and at the end of each inner loop take less than 1 microsecond to run, and the clock resolution is 1 millisecond, a total running time of 1 second is enough to elliminate any effects they might have in the outcome of the tests.
The inner-loop iteration count starts at 1, and is repeatedly multiplied by 10 until it is large enough to be measurable with the clock resolution. As soon as this happens, the elapsed time and the iteration count start to be used to estimate the running-time of an iteration. If the total elapsed time of an execution of the inner loop is longer than one second, the estimate is the final result of the test. Otherwise, it is used to compute the iteration count for the next execution of the inner loop, aiming at a total execution time of 1100 milliseconds.
With the exception of the tests println and compile, this mechanism selected an iteration count between 50,000 and 100,000,000, for the final execution of the inner loop of each test. In the case of println, the iteration count was never smaller than 500. The compile test was run stand-alone, not within this framework.
Each test case was run 50 times on each configuration and platform, and the average times of the runs were used to compute the relative overheads presented in Table 3 and Table 4. Although we have introduced the ability to intercept operations, no actual interception took place during those tests.
No interception occurs in these tests, they just measure the overhead imposed on the interpreter to introduce the ability to intercept operations.
No interception occurs in these tests, they just measure the overhead imposed on the JIT compiler and the code it produces to introduce the ability to intercept operations.