Boosting Java Performance: Native Code & JIT Compilers

by
Hank Shiffman
Strategic Technologist
Silicon Graphics, Inc.
September, 1996

Many of the virtues attributed to Java come not from the language but from the environment in which it runs. To be sure, Java is a nice language. It has features that make it easier and more convenient to write than C++ and more considerate of machine resources than languages like Lisp or Smalltalk. But the world's excitement hasn't come from the invention of yet another programming language that trades off efficiency for programmer productivity.

The proper source of excitement is the Java Virtual Machine. At its core the JVM is an interpreter that examines Java byte codes and performs the requested actions. The concept is much the same as a BASIC or APL interpreter. The difference is in the interpreted language. Traditional interpreters do a simple translation from source code into something that's faster to work with but still close to the original. Java byte code is closer to a real machine language, with many byte code instructions for each line of source code. Unlike BASIC and APL, Java code must be compiled before it can be run.

The Java Interpreter: The Good, The Bad & The Ugly

Not having to compile is one of the great virtues of interpreted languages. It gives us fast turnaround during development at the expense of runtime performance. Since Java doesn't have this advantage (we still have to compile just as with C), why the interpreter? There are three advantages of Java using byte code instead of going to the native code of the system:

Portability: Each kind of computer has its unique instruction set. While some processors include the instructions for their predecessors, it's generally true that a program that runs on one kind of computer won't run on any other. Add in the services provided by the operating system, which each system describes in its own unique way, and you have a compatibility problem. In general, you can't write and compile a program for one kind of system and run it on any other without a lot of work. Java gets around this limitation by inserting its virtual machine between the application and the real environment (computer + operating system). If an application is compiled to Java byte code and that byte code is interpreted the same way in every environment then you can write a single program which will work on all the different platforms where Java is supported. (That's the theory, anyway. In practice there are always small incompatibilities lying in wait for the programmer.)
Security: One of Java's virtues is its integration into the Web. Load a web page that uses Java into your browser and the Java code is automatically downloaded and executed. But what if the code destroys files, whether through malice or sloppiness on the programmer's part? Java prevents downloaded applets from doing anything destructive by disallowing potentially dangerous operations. Before it allows the code to run it examines it for attempts to bypass security. It verifies that data is used consistently: code that manipulates a data item as an integer at one stage and then tries to use it as a pointer later will be caught and prevented from executing. (The Java language doesn't allow pointer arithmetic, so you can't write Java code to do what we just described. However, there is nothing to prevent someone from writing destructive byte code themselves using a hexadecimal editor or even building a Java byte code assembler.) It generally isn't possible to analyze a program's machine code before execution and determine whether it does anything bad. Tricks like writing self-modifying code mean that the evil operations may not even exist until later. But Java byte code was designed for this kind of validation: it doesn't have the instructions a malicious programmer would use to hide their assault.
Size: In the microprocessor world RISC is generally preferable over CISC. It's better to have a small instruction set and use many fast instructions to do a job than to have many complex operations implemented as single instructions. RISC designs require fewer gates on the chip to implement their instructions, allowing for more room for pipelines and other techniques to make each instruction faster. In an interpreter, however, none of this matters. If you want to implement a single instruction for the switch statement with a variable length depending on the number of case clauses, there's no reason not to do so. In fact, a complex instruction set is an advantage for a web-based language: it means that the same program will be smaller (fewer instructions of greater complexity), which means less time to transfer across our speed-limited network.

There is of course one big disadvantage of interpreters: their performance. That extra layer between the application and the hardware and operating system uses up a lot of the system's performance. How much we lose will vary considerably with different application, and it's even possible (although very rare) for interpreted Java to run faster than compiled C++. A few examples will give us a reasonable idea of what we might expect. Here are timings for iterative and recursive versions of a calculator of numbers in the Fibonacci series, as well as an implementation of the Sieve of Eratosthenes, a calculator of prime numbers. Timings are in seconds, with the C++ version compiled at the -O1 level of optimization:

Program	Java VM	C++ Native	Ratio
fib_loop (64,.1M)	25.1	1.1	22.82x
sieve (100K)	25.0	4.0	6.25x
fib_rec (32)	18.2	1.0	18.20x

Native Code Translation: Getting The Speed Boost

Clearly, Java's value is going to be severely limited if we can expect to lose anywhere from 85 to 95 percent of our performance. The obvious solution is to replace interpreted byte code with native machine code. Native Java code should get us a lot closer to C++ performance. We'll still have the advantages of the Java language, although we lose the portability, security and code size benefits of byte code. Cosmo Code 2.0 includes a native code translator for Java. javat takes a Java .class file (the file that contains the byte code that implements the methods of a class), reads and converts the byte codes into native MIPS instructions and stores the translated code back into the .class file along with the byte code. If we run a Java application on an IRIX system using the standalone interpreter (e.g. java my_application) it will load and execute the native code instead of the byte code. Any other system attempting to run the program will use the byte code, so we still have portability at the expense of performance.

How much difference does this make? Here are execution times for the same applications, this time with the addition of timings for javat-translated code. Notice how much closer we can get to our C++ times:

Program	Java VM	Java Native	C++ Native	Ratio
fib_loop (64,.1M)	25.1	1.2	1.1	1.09x
sieve (100K)	25.0	5.0	4.0	1.25x
fib_rec (32)	18.2	1.5	1.0	1.50x

Java now looks a lot more competitive for a lot more applications. Of course we can only use this approach for Java applications, not the applets that make Java so exciting for web page authors. Native code is bigger than byte code, which would mean even longer download times. (And for those of us with modems the download times are already too long.) And native code makes validation impossible or at least impractical, which means we give up any hope of being protected against intentional or accidental attack on our systems by misbehaving applets. As positive as the development of native translators may be for Java applications, they don't apply to the web.

Or do they? What if we could stay with byte code for all of its advantages, converting the byte code to native code in our browser just before we start execution? We would maintain the byte code on our web servers, transfer it to our browsers that way (getting the benefit of its smaller size and its platform independence), validate it in the browser (getting all the security benefits) and then quickly convert the byte codes to their native equivalents. The native code will execute much faster, giving us the best of both worlds.

This is just-in-time translation. JIT translators/compilers need to be embedded into our web browser or standalone Java interpreter (java for applications or appletviewer for applets). Once the code for a class has been security validated it can be handed to the JIT translator. The translator needs to be very fast, since we don't want to make the process of loading and starting the Java code any slower than it already is. This concern with startup performance means that we can't do the kind of time consuming optimization that are generally available in C++ or FORTRAN compilers. We have to generate the fastest code we can in the smallest time.

The Translation Process: What Happens When?

With most languages, source files are compiled to create object code files. These object files are linked together to create an executable program file. During linking each mention in one file of a procedure or a data item in another file is converted to a reference to the target's actual location. Java uses an approach called late binding, where the individual class files (the Java equivalent of object files) don't have their external references resolved until we actually try to use them.

Imagine an applet on a web page. The page contains an <APPLET> tag which identifies the name of the class to load and run, the location of the class file (if it isn't in the same directory as the page) and the size of the region on the page which will be the applet's canvas. This single class file is transferred to the browser, which validates it according to the security rules. Assuming it passes muster, the class is initialized. A table is created which maps each method signature (the class name, method name and argument list) to the location of the method's code. Each call to a method of another class becomes a call to a global method resolver. The resolver's job is to find the desired method in the table and then replace the call to the resolver with a call to the method itself.

Execution begins with the invocation of the class's first method, generally the init or start method. This method executes some code, perhaps calling other methods from this class. Eventually we try to create an object of another class or invoke a static method like Math.pow(). The class loader looks for the class file in the directories on CLASSPATH. Failing that, it tries the directory from which the initial class was loaded. Assuming that the class exists in one of these places, it is loaded, validated and initialized. The method resolver points the calling method to the one it wants and lets execution proceed. In this way each class in a program is loaded when it is first used. The goal is to reduce startup time for applets that have to load a large number of classes over the Web. We get faster startup at the expense of delays during execution. We may also fail to discover missing class files. Since classes aren't loaded until they're first used, we won't detect a missing file until we try to access it. It is vital that our testing exercise every part of the code, something we should be doing anyway.

What changes when we bring in a JIT translator? Once a class is loaded and validated, the translator goes to work on it. The translator in Cosmo Code 2.0 uses a three stage process:

Scan the code in each method and locate the boundaries between instructions. Most Java instructions are between one and three bytes in length, with the opcode stored in the first byte. (Some instructions are longer. The instruction which implements the switch statement includes data for all of the case clauses. A single switch with many case clauses could generate a single instruction that is hundreds of bytes in length.) Basic blocks are identified as well; these are blocks of instructions which can be entered only at the top and exited only at the bottom.
Determine the depth of the stack at each instruction and identify the ultimate destination for each intermediate result. Java's virtual machine is based on a stack architecture using main memory, while the computers that run the JVM all use fast registers for calculation. The more work we can do in registers the less time we will spend waiting for slower main memory and the faster the program will run.
Generate the native code equivalent for each instruction. The opcode becomes an offset into a table of templates for the generated code. Some limited optimization is done at this stage to eliminate stack accesses wherever possible.

Once the translation is done the code begins executing. When the code makes reference to a method in another class the process of loading the new class and resolving the methods is similar to the one described earlier. One difference is that every reference in the methods of the current class to the new class is resolved at one time, instead of waiting for each one to be hit. Since resolving a reference will cause a change to the native code we're executing (the call to the resolver gets replaced by a call to the resolved method), we have to invalidate the processor's instruction cache and force it to reload the code from main memory. This is a relatively time-consuming operation for a fast processor, one we want to do as infrequently as possible. So it's better for our applet's performance to resolve as many method references as we can before we invalidate the instruction cache.

Where do we stand?

Native code translation is new for Cosmo Code 2.0. Java applications can achieve native code performance using either just-in-time translation or batch translation. New options to the java command enable just-in-time translation (the -jit switch) and use of native code placed in the class file by javat (the -tran switch). Both switches may be used together to use translated code where it exists and do JIT translation where required.

Native code for Java applets is not yet supported. We are working with Netscape to integrate our JIT technology into an upcoming version of Netscape Navigator for IRIX. Expect a major announcement as soon as a JIT-enabled browser is available.

More information on Silicon Graphics' Cosmo products, including Cosmo Code, can be found on Silicon Surf at http://www.sgi.com/Products/cosmo. Evaluation copies of the software and information on beta releases are available on the site.

Take me home:

Show me another:

Comments to: Hank Shiffman, Mountain View, California

by Hank Shiffman Strategic Technologist Silicon Graphics, Inc. September, 1996

The Java Interpreter: The Good, The Bad & The Ugly

Native Code Translation: Getting The Speed Boost

The Translation Process: What Happens When?

Where do we stand?

by
Hank Shiffman
Strategic Technologist
Silicon Graphics, Inc.
September, 1996