Sidebar


A Tercentennial


By Hanpeter van Vliet

Hanpeter van Vliet was the author of Mocha, the controversial Java decompiler. The first beta version of Mocha was released in June 1996, to no great fanfare. However, when its existence was reported in C|net in August, a furor arose in the Java-development community.

Van Vliet subsequently removed the decompiler from his site and wrote "A Tercentennial," a manifesto of sorts which he published in The Local, a "virtual pub" located at the Java UK Experience site (java.motiv.co.uk).

Van Vliet then held a vote to determine whether Mocha should be re-posted. The response was overwhelmingly in favor of its return, and it reappeared on his site along with Crema, an obfuscator for Mocha.

Not long afterward, many Mocha links inexplicably "went 404." Attempts to track down subsequent versions of Mocha and Crema came up short. Finally, it became known that Hanpeter had succumbed to the cancer he had been battling. He passed away on December 31, 1996 at the age of 34.

Mocha and Crema are still available at web.inter.NL.net/users/H.P.van.Vliet/. "A Tercentennial" is reprinted here in its entirety with permission from Motiv Systems, Ltd.

The Dutch have a reputation for stealing coffee. Exactly three centuries ago, in1696, my ancestors stole a coffee plant from the heavily guarded plantations of Mocha (Yemen). They shipped it to their east-Indian colony and cultivated it into a unique and successful species that would become known as "Java."

So what could be more appropriate to celebrate this than to release Mocha, the Java decompiler, this year? And isn't it apt that both the Java compiler and decompiler were written by Dutchmen? Not everybody seems to agree.

I should have known. By American standards, three hundred years ago is prehistoric. Coffee should be weak and instant, and in an oversized mug (with plenty of free refills). A cup of Mocha was bound to upset some stomachs. Whoever brewed that cup was liable for damages.

What's the fuss? Mocha is a Java decompiler, a program that reconstructs source code from binary classes. Although there are decompilers for many languages (Visual Basic, C, Clipper, Smalltalk, to name a few) the situation with Java is rather unique.

First of all, by design, Java's compiled classes contain an exceptional amount of symbolic information. Class names, field names, method names, and method signatures are necessary for the runtime linking of classes. In addition, data types and exceptions are required for the bytecode-verification process to ensure that downloaded programs play by the rules of the language. More symbolic information also means more meaningful decompiler output. And because Java programs—applets—are typically small, the absence of comments in the source code is hardly an obstacle to understanding.

Secondly, compiled Java programs are free. In fact, you get them without asking for them. This obviously does not help to convince the receiver that they do represent a value. Like the plastic toys you find in the cereal box (if the kids give you a chance) they appear cute sometimes, but always worthless. It can be tempting to use the interesting parts of that free stuff in creative ways.

Last but not least, Java is in a rather explosive phase. Many companies are attempting to stake out a part of that expanding market. A small advantage in know-how could prove essential in establishing yourself over your competitors. Making your source code available to the world is not the smartest move, but in a way that's what you're doing if you distribute binary classes.

All in all, there is a very low threshold to "borrowing" code, and at the same time, the "tactical" value of a few lines of code is, apparently, considered enormous.

Enough reason for two companies to threaten to sue me for damages. Not immediately recognizing this as a knee-jerk reaction, I have responded to that by temporarily removing Mocha from my site. I needed some time to figure out whether the claims could be substantiated, and I wanted to give developers a time-out to get over the shock. In the meantime, I have discovered that it is a small (but vocal) minority that objects to Java decompilers, and that their moaning has no legal basis.

In other words, Mocha will be back. In fact, it will be meaner than ever.

I'm not going to defend Mocha. It defends itself. Even if its existence cannot be justified in other ways, it at least drives home the point that compilation is not a good way to hide your secrets. That is a valuable insight both for implementers of security (remember the hole in Netscape's first SSL implementation?) and for commercial applet developers. Attempting to ban decompilers--to the extent that they are only available to criminals--is ostrich policy.

A smarter response is to find ways to deal with decompilers. In the area of cryptography, the answer was found long ago in public-key algorithms. Knowledge of the algorithm (and the key) simply does not help you to break the cipher. Hence, publication of the algorithm (whether explicitly or via reverse engineering) is safe.

Commercial software developers have two options. Whether they use Java or not, the only way to keep their algorithms absolutely confidential is not to distribute them. Partitioning an application in client and server modules is a way to do that, and with Java this is becoming easier than ever. But it has some obvious drawbacks--it increases load on the server, and it is overkill for little applets.

The alternative is to accept the risk of reverse engineering (like we've done for years), but to try and make it as hard as possible. To that end, it would help if the difference in abstraction between source code and object code would be large. For the more complex the transformation done by the compiler, the more difficult it is to do the reverse transformation. Unfortunately, the Java language is not very abstract, and the bytecode was designed to be close to the language.

With Java, it seems that the best thing you can do is to remove as much symbolic information as possible from your program. Or better yet, replace the symbolic information with invalid identifiers like numbers and keywords. This does not stop a decompiler, but it does make its output unintelligible to both humans and compilers. This is a good opportunity to plug my Java obfuscator, called "Crema." I used an early version of this program to protect Mocha itself against decompilation. If you happen to have a copy of Mocha, you'll notice that most of its classes have numbers rather than names. Decompilation of such a class results in an interesting potpourri of numbers, not in a valid Java source. Crema has since been refined in many ways, and will soon be released.