Introduction |
This paper will discuss Java and its application to the growing class of devices known as information appliances. It will consider the reasons for Java's success with developers of these devices, the challenges these developers face and some of the approaches being taken to address these challenges. Finally, it will introduce JSTAR, a new product from JEDI Technologies, Inc. that delivers high performance Java to embedded applications. |
The Java Phenomenon |
The rise of Java is unprecedented in the computer industry. Never before
has a single technology captured so much attention from companies both
large and small, from investors and developers, from journalists and
columnists and writers of every kind. That all this attention is focused
on something as esoteric as a programming language is even more remarkable.
But Java is far more than a language. It has characteristics that go beyond the definition of typical programming languages into the realms of operating systems, networking and other application services. This wider definition of the Java environment has sparked fundamental changes in our thinking about software and about the computers that run it. Java is forcing us to reconsider the ways we develop applications, the ways we distribute them and even how they are used. That there is excitement about Java is obvious. That this excitement is justified should be clear as well; after several years of development and tremendous investment in both money and talent, the interest in using Java to solve real problems is only growing. What may not be quite so obvious is that not everyone who sees value in Java wants the same thing from it. Java offers many benefits; which of these matter will vary with the needs of its users.
Moving a working application to an incompatible kind of system is not an easy task. For one thing, not every service the program may use is available on every system. Where these services are available, they may sport subtle differences that must be worked around. A small example is the symbol used to separate directory names in a file path. UNIX systems use the slash (/) character. Microsoft Windows relies on the backward slash (\). And MacOS uses a colon, although most Macintosh owners are probably unaware of that fact. Even in cases where porting an application to a new platform might be easy and inexpensive (for example, moving from one system running UNIX to another), one must consider the cost to test, market and support the new version. This cost consideration is the reason that many applications that could be made to run on multiple systems are rarely made available on those systems. For too many developers, any cost associated with a new version is too much. Java applications don't have such problems; the same piece of code will run on any computer that supports the language. Portability is achieved by not compiling programs for a specific computer. Instead, the target of the compilation is something called the Java Virtual Machine, a conceptual computer whose machine language, known as Java byte code, bears a striking resemblance to the Java language itself. Since real computers can't run this Java machine language directly, a piece of software has to be inserted between the computer and the application. This Java Virtual Machine emulator reads the compiled byte code and instructs the computer to perform the desired tasks. What this means is that any computer, armed with a Java Virtual Machine emulator (which we'll refer to as a JVM from now on), can run any Java application built for the JVM. The processor doesn't matter; the operating system doesn't matter. In the words of the Sun Microsystems marketing department, "Write Once, Run Anywhere." Initially this meant "run on any well equipped desktop computer for which someone has written a JVM." But as we will see, the meaning of "Run Anywhere" has grown almost beyond measure. Java is more productive than popular languages like C and C++. But why should this be so? To understand the difference between C/C++ and Java it helps to know a little history. The C language was developed at Bell Labs in the early days of UNIX. Like every other operating system of its day, the first version of UNIX was programmed in assembly language. This made it very efficient, since assembly language lets a programmer get as close to the hardware of a computer as anyone could possibly want. But assembly language is specific to a computer, the DEC PDP-7 in the case of that first UNIX. UNIX could never be ported to another kind of computer without rewriting everything from scratch. Enter C. C was designed as a systems programming language, so it still gave the programmer very precise control of the hardware. But as a high level compiled language, it gave programmers much higher productivity than assembly language. And by abstracting all the concepts shared by individual computers, it let a programmer create portable source code. UNIX was rewritten in C. And that led to UNIX being ported to many different kinds of computers. C became the language of choice not just for UNIX but for much of the software that would run on top of it. In time C would become not just the systems program language for UNIX but also for the Macintosh, where it displaced Pascal, and Microsoft Windows. But traditional C (often referred to as K&R C after its authors, Brian Kernighan and Dennis Ritchie) had its flaws. For one, it was extremely forgiving and would compile even the most obviously broken code without complaint. Getting a C program to compile was easy; getting it to work was another matter entirely. And C had scalability problems. As applications got very large it became harder and harder for programmers to keep track of all the relationships among their code and data. An answer to these problems arrived in the form of Bjarne Stroustrup's C++. A mostly compatible superset of C, C++ let programmers catch their more common mistakes at compile time. And it added a set of object-oriented programming features that made it easier for programmers to manage the code and data complexity of large programs. What it did not do, or at least did its best not to do, was give up the precise control and efficiency it inherited from C. Properly written C++ code should be every bit as fast as its equivalent in C, which made C++ a popular replacement for C for most applications. But in some respects C++ is a giant step backward. One of C's virtues is its simplicity; the first C programming text is a surprisingly thin book. And it's possible for the typical programmer to learn every nuance of C within a reasonable period of time. C++ adds so many new features that interact in so many subtle ways that most programmers give up trying to understand more than a subset of the language. Java code looks a lot like C++. But don't be fooled; these two languages were designed to solve vastly different problems. C++, like C, is intended for developers who need tight control over computing resources. Java is designed for developers who don't need such control and are willing to let the system handle more low level details for them. A few examples: Both C/C++ and Java let programs allocate chunks of memory at run time for new objects. In C/C++ it is the programmer's job to release the memory when it's no longer needed so it can be used later for other objects. Forget to do so and you have a memory leak; leak enough memory and eventually the program will fail. By contrast, Java uses an automatic Garbage Collector (GC) to identify and reclaim memory that's no longer in use. Making a GC a standard part of the language eliminates a whole class of logic errors and performance problems from Java code. Java and C/C++ also differ in the way they handle common run time errors. C and C++ treat error detection as a programmer problem. It is the programmer's responsibility to check the value returned by each system or library call to determine if an error occurred. (The specific error is identified by the value of a variable called errno.) Similarly, the programmer must check subscript values against the size of an array before using them. Such out-of-range subscript errors are both common and extremely hard to detect. (The usual symptom is a bad value in an unrelated piece of data, with no clue to what part of the code wrote that value.) Java avoids both of these problems by detecting errors automatically and then reporting them to the application using its exception mechanism. If the application handles the exception, it will take whatever action the programmer deemed appropriate. If it doesn't, Java will terminate the application. Better to fail than to continue with bad information. Java eliminates one other common source of problems in C/C++ code: the incorrect use of pointer arithmetic. It does so in the most direct fashion possible: by not letting the programmer manipulate pointers at all. This certainly eliminates a significant source of bugs. However, it does so at the expense of utility. You can't write operating systems or device drivers in Java. Without the ability to address arbitrary locations in memory, Java can't be used for the kind of low level tasks for which C was designed. Still, for applications that don't need low level control Java does provide tremendous productivity benefits. And even those that do need such control may benefit from a hybrid approach, using Java for most of the work and C or C++ native methods where Java can't do the job. Java was designed for a much more dynamic style of linking. Java programs do have a link phase, where individual object classes and their methods (procedures) are pulled together into an executable program. What is different is that this link operation takes place every time the program is run. This late binding mechanism makes it much easier to incorporate changes, enhancements and new capabilities into existing applications. Java supports several mechanisms that take advantage of this dynamism to produce especially flexible software. JavaBeans, Enterprise JavaBeans and Jini serve different kinds of developers building different kinds of software. But they all work by letting applications know about new kinds of components: what they are, what operations they support, how they can be integrated, how they can be customized and so on. (A brief explanation of these three technologies: JavaBeans is a specification that lets programs learn about new object classes. It was designed to permit graphical application-building tools to adapt to new components and let their users connect them together to make applications. Enterprise JavaBeans is similar in concept; the big difference is that EJB components exist on a server system and are shared among multiple applications. EJB is oriented around large objects that make up a corporate multitier architecture. Finally we have Jini; a mechanism for devices on a shared network to learn about each other and to interact in a sort of open community. Put another way, Jini is intended for smart appliances, Enterprise JavaBeans is for in-house client/server business applications and JavaBeans is for objects within a single application running on a single Java Virtual Machine.) |
Java & Information Appliances |
Information appliances are a new class of specialized devices that are designed to interact with the Internet or a local network. There is incredible momentum behind such smart devices, with many industry analysts predicting that such appliances will outnumber PCs on the Internet within the next two years. What is equally interesting is that in every application category, manufacturers who are building information appliances are making Java an integral part of their devices. A few examples:
Each of the first six categories listed above represents tens of millions of devices that will rely on Java. The market for Java-based smart cards is far larger, with estimates as high as three billion units worldwide. Taken together, they represent a huge demand for Java solutions. |
Embedded Java: A Paradigm Shift |
All of this interest in Java among embedded developers represents a dramatic move away from their traditional model of development. The embedded world has always focused on the practical aspects of the devices they build. Typically these involve some combination of cost, size/weight and power consumption. Reducing these means finding low performance components that are just good enough to do the job. If a device will be sold in volume, cheap components mean lower costs and higher profits. Software development costs are not nearly so important. Reducing the development cost for a device is easy: just sell more units.
This fact has kept embedded projects from enjoying the kinds of productivity gains seen by virtually every other class of software developer. But all this is beginning to change, with Java at the forefront of that change. Here are some of the ways embedded development is being turned on its head:
Much of Java's success has occurred on web pages (Java applets), as a programming language for database clients applications, in a few simple embedded devices and as a server-side programming language (Java servlets). These are at the low end of a range of applications, based on their size and complexity. Further up the scale we find potential applications like smart clients, more sophisticated embedded devices, shrink wrapped software, scientific & engineering computing and large enterprise applications. And although there are many developers working with Java in these areas, their success to date has been limited. The problem is that Java's flexibility and portability come at too high a price for these applications. Java's memory needs are often much larger than C, once the requirements for the virtual machine and all the class libraries are added to the application. And remember that the JVM is an interpreter. Between the overhead of byte code interpretation and all of Java's extra error checking, it takes a much more powerful processor to get to the required level of performance. More memory and faster processors mean higher cost. Worse, for wireless applications they also mean higher power consumption and shorter battery life. |
The Issue of Java Performance |
Sun released its first implementation of Java in 1995. That version ran applications much more slowly than their native code equivalents, with typical applications reporting slowdowns of 20X and some applications running more than fifty times slower. Since then, huge investments have been made by Sun and others in an effort to close the gap between Java performance and that of native code.
Today, there are many different approaches to executing Java code. What follows is a survey of the different techniques commonly in use, each of which has benefits and tradeoffs. Keep in mind that these approaches are not mutually exclusive; a single Java solution may benefit from a combination of approaches.
But not all of a program's time is spent in interpretation. Java's powerful run-time environment reduces the demand on programmers by taking on many of their responsibilities. Features like garbage collection and thread synchronization, significant sources of overhead in early implementations, have seen their contribution to total execution time reduced as more sophisticated algorithms have been applied to them. And these improvements are being applied to every Java implementation, not just to interpreters.
JIT compilers provide dramatically better performance than interpreters, often reducing run time by a factor of ten. But they can be a mixed blessing. Using a JIT compiler means adding some translation overhead up front to get better performance later in the execution. And a JIT means a big increase in memory footprint. Depending on the processor, translated code will be between five and ten times the size of the original byte code. In server applications where large memory configurations are the norm a tradeoff of memory for time may well be acceptable. But in an embedded application more memory means higher cost and more drain on limited power. There are variations on the JIT theme. Sun has released a translation engine it calls HotSpot. HotSpot is a dynamic JIT. Instead of translating every routine as it's used, HotSpot monitors the program's execution and identifies the most often used and most expensive routines. It translates only those routines it deems worthwhile, letting the interpreter deal with the rest. In theory this will offer better performance than a JIT and lower memory usage, since less time and memory are taken up with translated code. But HotSpot's monitoring adds its own overhead. Experiments show that HotSpot produces better performance for some applications but slows down others. And its memory requirements are no better than those of JIT solutions.
A Java native compiler violates some of the rules of Java in exchange for better performance. It treats Java byte code the way other language compilers treat source, analyzing it, optimizing it and then converting it into native code. This object code is linked with Java run time support and other system libraries to create an executable program. Such a program gives up many of the advantages of Java. It isn't portable, it can no longer be validated for safety and, depending on the implementation, it may not support run time integration with new classes into dynamic networks of components. Why would someone give up so much of what makes Java special? Native code compilers will be of interest primarily to developers who choose Java for its productivity rather than its flexibility. For them Java is a better language, what C++ might have been if its designers had worried more about the needs of the programmer than about those of the computer. Developers of compute-intensive software may fall into this category, as would developers of in-house enterprise applications. Java native applications will have better performance than JIT compiled code, although the difference may not be significant. They may also have somewhat lower memory requirements, since they don't need to carry around the compiler at run time. The benefit provided by a native code compiler will depend heavily on the specifics of the application.
Although Java chips are designed around their ability to run Java, they must be able to do more than that. Java byte code does not permit the arbitrary manipulation of memory you need to write operating systems and device drivers. Java chips need these capabilities to run system software and reserve some unassigned byte codes for such privileged operations. Any attempt to use these byte codes from a Java application will be reported as a security violation. The challenge for a Java chip is the same as for any new processor with a new instruction set: software. One can assume that a supplier of a Java chip will provide a real-time operating system (RTOS) and a Java Virtual Machine for their chip, whether as custom software or ported versions of existing packages. Note that any new processor, even a Java processor, needs someone to implement a JVM for it before it can run Java applications. This can be a significant effort: while Java applications are portable, the underlying Java platform software is not. This raises important questions each embedded device designer must answer: Is the software my device will need available on this processor? Is my first choice of RTOS (either because of its feature set or my organization's familiarity with it) available? What about the availability and quality of JVM(s) for that chip? And what about other software: applications, libraries, etc.? In short, how much of the software already runs on the chip and how much time and money will it take to get the missing pieces in place? Designers of embedded devices have large investments in hardware, software and, even more significant, in the experience and expertise of their people. Java processors run counter to that investment, requiring designers to start again with new hardware, new and potentially less stable software and a possibly short but still not insignificant learning curve. It is uncertain how many developers will consider the new investment worthwhile. It is clear that each approach to running Java has its strengths and its weaknesses. None emerges as the clear winner for embedded devices, as designers are forced to trade off processor speed against memory requirements against the availability of software. |
JEDI Technologies & The JSTAR Accelerator |
What would an ideal Java solution look like? Obviously, it would run Java applications at least as well as the alternatives available today, offering performance five to ten times higher than interpreters. It would achieve this level of performance without the need for a faster CPU clock or extra memory. And it would be compatible with existing processor architectures, to take advantage both of existing expertise and a wide variety of available software.
It is from this problem definition that JEDI Technologies began. Founded in late 1998, JEDI set itself the task of eliminating the technical barriers to widespread use of Java in embedded devices. JEDI's approach was to develop a Java accelerator that could be added to existing microprocessor designs. Acting like an on-the-fly JIT compiler, this JSTAR accelerator provides similar levels of performance without any need for memory to hold the translated code. And because JSTAR enhances an existing processor it gets all the benefits of using that processor, including the catalogue of software that supports it. Architecturally, JSTAR is a coprocessor that interfaces to the native microprocessor core and its cache or memory subsystem. JSTAR acts as a Java interpreter in silicon, retrieving byte code instructions from memory and executing them in conjunction with the native processor. JSTAR operates directly on Java byte code, eliminating the extra memory JIT compilers need to hold the native code they generate. The JSTAR-enabled processor Adding JSTAR to a microprocessor requires no modification to the native core. In particular, the native instruction set and pipeline architecture of the processor are unchanged. Operating systems and native applications, software components and tools run on a JSTAR-enabled processor just as they do on the original chip. Even Java native methods compiled for the native processor run without modification. JSTAR was designed to integrate with existing Java Virtual Machine implementations from Sun, HP and others. The JVM is modified to initialize JSTAR and then to give it control of the main fetch/decode/execute instruction processing loop. Making a JVM work with JSTAR is greatly simplified by JSTAR's ability to adapt to the internals of the JVM. In particular, JSTAR does not impose any specific layout for local variables, call stack frames or the Java operand stack. (The operand stack stores intermediate results from Java computations. Where a register-oriented processor would implement a simple expression like C =A + B as load A into r1, load B into r2, add r2 to r1, store r1 into C, the stack-based Java byte code would use something more like push A, push B, add, pop C.) JSTAR also works with the JVM's implementation of garbage collection, native method invocation and thread switching & synchronization. It is designed to work with the variety of thread schedulers used by JVMs, including both native threads and cooperative thread schemes like Sun's green threads. It also supports multiple JVMs running concurrently on the same processor. JSTAR's flexibility minimizes the work required to convert a new JVM to take advantage of it. It also permits JSTAR to benefit from all the work being done to optimize specific Java run-time implementations. |
JSTAR Application Performance |
The first JSTAR implementation makes use of an R3000 class
processor core, the VxWorks real-time operating system from Wind River
Systems and Sun's PersonalJava 3.0 virtual machine
environment. Performance was measured using a set of industry standard
benchmarks on a field-programmable gate array (FPGA) clocked at ten
and twelve megahertz (MHz). These benchmarks were run using the same
software environment with JSTAR enabled and then disabled, giving an
accurate picture of its effect on the performance of each
application.
Pendragon Software Corporation's Embedded CaffeineMark suite consists of six tests of basic Java execution, whose results are combined using a geometric mean calculation. In addition to the R3000 and R3000/JSTAR tests, results were obtained for a MIPS R4600 processor running at 200 MHz and a StrongARM processor running at 166 MHz, both using the same combination of PersonalJava and VxWorks, as well as an Intel Pentium-based PC running Microsoft Windows 98 and Sun's JVM 1.2.2. The JIT compiler in the latter system was deactivated to provide an accurate comparison of interpreters and JSTAR. Benchmark performance was divided by processor clock rate to provide a common measurement of CaffeineMarks per MHz. Experiments with the same processor and software environment at varying clock rates indicate that the benchmark results scale linearly with increases in processor speed.
As we can see from the graph, the MIPS and StrongARM processors produce similar performance of between .52 and .61 CM/MHz. The more complex Pentium processor gets between 50% and 74% better performance than these other processors, the reason for which we will discuss shortly. Adding JSTAR to the R3000 improves its performance by 5.5 times, giving the R3000/JSTAR combination Java throughput more than three times that of the larger and more power-hungry Pentium running at the same clock rate. A large portion of the Pentium processor's advantage in these tests comes from a single benchmark. The CaffeineMark Float benchmark performs a large number of double precision floating point calculations, giving a significant edge to the only processor in this group with floating point hardware. Removing the Float benchmark from the set places the Pentium processor more in line with the others and gives JSTAR an acceleration of nearly seven times that of the R3000 alone and four times that of the Pentium. Java Interpreter Performance Other benchmark applications show similar results. JSTAR improves an R3000 processor's Dhrystone performance by 5.2 times, increasing a 12 MHz processor from 256.52 Dhrystones per second to 1333.33. The Tak benchmark, a measure of recursive procedure calling, shows accelerations of 7.3 times and 2.1 times for integer and single precision floating point, respectively. Once again, note that this last result was achieved using a software implementation of floating point; having floating point hardware would produce even greater improvement. Also be aware that all of these results were obtained from a particular virtual machine implementation. Greater efficiencies in the VM would produce higher overall acceleration. |
JSTAR Hardware Specs |
A best performance JSTAR core requires approximately 30,000 gates to support a 32 bit, single issue processor. It can be implemented using a unified cache or memory system or the separate instruction and data cache of Harvard architecture processors like ARM9, either with or without a memory management unit.
Power requirements for JSTAR execution are estimated to be 18 milliwatts for a 1.5 volt 100 megahertz processor This represents less than 15% of the power required by a typical native processor at the same clock rate. With appropriate hardware and software support, JSTAR will draw no more than leakage power when idle. |
The JSTAR Advantage |
Building an embedded device involves making choices: technical, economic and practical. Each choice is defined by a series of tradeoffs and implicit decisions regarding other choices. Choose a processor and you either limit yourself to software that already runs on that processor or to software that can be moved to it within the time available and at an acceptable cost. Add an operating system and you restrict your options further. And so it goes with each new layer of software and each new component.
Developers working with embedded Java must balance their need for Java performance against all of their other requirements: cost, power, size, the ability to run native code, the ability to interact with the outside world and so on. Java technology that intrudes on the device's ability to satisfy the non-Java requirements of a device is not a viable solution. JSTAR represents the low risk, low cost Java solution most developers need. It offers a high performance engine for running Java applications without placing unacceptable demands on scarce resources. And it offers this benefit without giving up the high C/C++ performance of a native processor, its large catalogue of available software or the years of expertise built up by embedded developers. This symbiosis between JSTAR and a native core means the best of both Java and traditional computing. |
Take me home: | Show me another: |
Comments to: Hank Shiffman, Mountain View, California