Hidden in Plain Sight
By Al Williams
I've been accused of having strange dreams, and I suppose that's true. One night I dreamt of sending secret data by taking advantage of a subtle drift in the signals between airplane transponders. To outward appearances, the signals were ordinary, but if you knew where to look, there was a message. When I woke up, I was intrigued by the idea, so I did some research.
As it turns out, I was 7,000 years too late to file a patent application. The Greek Histiaeus wrote secret messages on the shaved head of a slave, and dispatched the slave after his hair grew back. This sort of code writing is known as steganography and it is in use to this day. (Well, maybe not the part about the slave's shaved head.)
Today, computers handle most of the work involved in steganography. Think of all the files that fly across the Internet every day. If you see an email message that starts with
PGP, you can guess it was encrypted with the Pretty Good Privacy (PGP) program. This is a good sign that the sender and the recipient are exchanging information that they wish to keep private. However, if that email message had contained a picture of a car, you probably wouldn't have suspected that it contained any private information. It's possible, though, that the sender and recipient were transmitting the same private information in the picture of the car as they were in the PGP-encrypted email.
Steganography programs can hide data inside a picture so that the picture appears the same (or nearly the same) to the naked eye, but contains data if you know where to look. This is handy in many situations, such as for storing credit card numbers on your server. Imagine that a cracker breaks into your server. You can tempt the cracker with a file full of dummy credit card numbers, drawing his or her attention away from the pictures of your hometown. This gives you an extra layer of security because a cracker must find the files before he or she can try to crack them. Of course, the hidden file can itself be encrypted for extra protection. (For more information, see
Java's I/O System
When I was reading up on data obfuscation, I realized that it would be easy to create an I/O filter in Java that hides or reveals messages in an ordinary, innocent-looking stream of data. Java is a great language to use for steganography because the I/O system is built on streams and filters. Streams can be any source of I/O data, like network sockets, strings, arrays, files, or pipes. Filters plug into existing stream classes to provide extra features. For example, you might write buffering classes that have special knowledge about the data your program uses to improve I/O performance. Many of the predefined classes in the java.io package are filters.
Working with Java I/O reminds me of plumbing. You take two pipes, put them together, and then find an adapter that makes them fit. You can join lots of pipes together if you have enough different adapters. It's perfectly acceptable to mix more than one adapter to fit the plumbing pipes together. For example, suppose you're writing a network program that uses the
Socket class. You can read or write to the socket using streams (from the
getOutputStream methods). You might use an
InputStreamReader to convert the stream to a reader object. Then, for example, you would pass the
InputStreamReader to the constructor for a
If you're a Unix user, this might remind you of how you perform many tasks in Unix: using pipes. So while
more shows you a file,
more shows you the same file with its lines sorted. This is such a powerful idea that you'll probably want to write your own plug-in modules that can alter a stream of input or output. That's what my steganography object doesit attaches to an output stream, letting you easily hide or reveal a stream of data.
This isn't a new idea. For example, similar objects in the java.util.zip package perform zip file compression and decompression. It's a flexible technique, too. After all, you can make an ordinary string appear as a stream using
StringWriter, so you can perform these filtering operations on strings, sockets, files, pipesanything that can produce a stream. Other objects can convert arrays to streams, so you could apply filters to arrays as well.
Filtering modules are so commonplace that Java makes it easy to produce them via special prototype classes that you can subclass. In particular, the
FilterWriter classes will preprocess data for an
Writer object respectively.
You won't use the prototype classes (like
FilterReader) directly. Instead, you'll create a subclass to do whatever specialized processing you need. Although each class has quite a few functions, you'll mostly need to override the
write methods. You can usually write one function to handle single characters and then define the remaining functions in terms of the character-oriented function.
For example, you could create a class called
UCWriter that forces all of the output to a writer into upper case. When the filter class wants to write to the underlying writer, it uses the protected
out field from
FilterWriter. Nothing actually appears on the output stream until the class specifically writes to the
If you were to build a little test program, you could convert the
System.out stream to an
OutputStreamWriter object, then attach it to a
UCWriter. Finally, you would attach the entire set of writers and streams to a
PrintWriter (which performs the formatting). The end result is a stream that prints everything in uppercase to the system console.
The Steg Object
My steganography algorithm, as implemented in Listing 1, is very simple. The encoder reads a template file of plain text. This text should look like something innocenta press release, a joke, or a resume. Be sure to use text that doesn't appear to have secret meaning. As the encoder writes the text out to the next stream, it uses the spaces between words in the template file to encode the bits of the file you want to hide.
Any data you want to encode is essentially a sequence of binary numbers. You only need two states to represent a binary number. To make it a little harder to guess the scheme, I alternated the representation of a 0 and a 1 on each bit. Two spaces indicate one distinct state while a single space indicates the opposite state. Because the states alternate between 1 and 0, if two consecutive words have two spaces after them, and the first one indicates a 1, the second set of two blanks encodes a 0.
This isn't a perfect scheme. The compression ratio is terrible because we're using many bytes to encode a single bit of data. Of course, you can always compress the resulting text file if you like. Another problem is that if any process adds text to the start of the encoded file, decoding will yield incorrect results. You could solve this problem by making the decoder search for a keyword to start the encoded message (and perhaps another key word to stop decoding). The encoder could then frame the file with these words.
As you examine
Listing 1, keep the following points in mind:
Steg object maintains a boolean variable (
encoding) that tracks whether the object is hiding text or revealing it. There are two different constructors, so the state of
encoding depends on which constructor created the object. When encoding, the object needs a template file containing the dummy text that will hide the subject file.
- Camouflaging your hidden file may require more text than the template file contains. If the
Steg class runs out of template text, I wanted it to simply start reading the template file from the beginning again. Unfortunately,
FileReader doesn't support the
reset method, so I had to improvise.
You can force a
FileReader to support
mark/reset by adding a
BufferedReader, but then you must agree to buffer the entire file. That seems wasteful, so instead the encoding constructor accepts the filename of the template text. It then creates a
FileReader that the rest of the code uses. The
readChar method closes this
FileReader when it reaches the end of the file and recreates the
FileReader using the filename. Although this seems inefficient, it's better than buffering the entire file in memory.
readWord routines are used internally by the encoder to strip words from the template file. These methods also provide a way for the caller to determine whether the end of the word was a newline character. The encoder preserves newlines to make the text look more realistic. On decoding, the routine ignores any newline characters in case any line wrapping has occurred (for example, when an email client attempts to reformat a message).
Using the Class
You could easily use the
Steg class for any sort of Java program, including a servlet or a JSP page. I included a sample
main method so that you can test the class from the command line. To do so, prepare a template file named template.txt (which should be long compared to your input text). Also create a text file that you wish to hide (I'll name mine input.txt). Then issue this command:
java Steg template.txt input.txt >output.txt
To recover the hidden text, issue the following command:
java Steg output.txt
I've included a sample template file and a hidden output file, zipped in the Source Code section, for you to decode.
Before you can crack an encoding method, you must recognize that one is in use. Steganography helps you encode private information so that most people can't even tell you're trying to hide something. The method of encoding used in my
Steg class probably isn't all that secure from attack, but it doesn't have to be that way. For maximum security, you could also encrypt your message before hiding it. That way, if for some reason a cracker found the actual files and revealed the message, the private information would remain encrypted.
In fact, there are many ways you could improve the program's resistance to attack. For example, you could add the start and stop keywords that I mentioned earlier and then put dummy text before and after those keywords. This would make it more difficult to decode without knowing the keywords. You could also compress the data (using classes from the java.util.zip package) before hiding it. Compression, by definition, tends to remove patterns from the data (that is, it increases the average entropy) making statistical analysis more difficult. If you know how to use the Java cryptography libraries, you could easily use them to perform actual encryption of the data as you're hiding it.
(Get the source code for this article here.)
Al is a consultant, instructor, and author who lives near Houston, TX. Look for his new Java Network Programming Black Book available soon from The Coriolis Group. You can find Al on the Web at www.al-williams.com.