Is CGI Dead?
By Lincoln D. Stein
In this "back to basics" issue of Web Techniques it seems appropriate to look back at the Common Gateway Interface (CGI) protocol and take stock. Six years ago (almost an eternity in Web terms) CGI's package of rules and programming conventions revolutionized the embryonic Web, turning boring, static HTML documents into exciting, dynamic documents, and, as a side effect, catapulting Perl from an obscure UNIX sysadmin's scripting language into a well-respected software development tool.
Without having exact statistics (which are impossible to derive), it's fair to say that static HTML files account for a minority of the documents on the Web, dwarfed by the infinite variations on dynamic pages emitted by search engines, database interfaces, news gateways, media outlets, and more specialized programs. All these dynamic pages are driven by software wired either directly or indirectly into a Web server. A significant fraction, possibly the majority, of these programs use the CGI protocol.
But CGI is an archaic technology. It's heavily tied to assumptions about the UNIX operating system, particularly about using environment variables. The protocol is inefficient because it requires a new process to be respawned each time a page is served. Furthermore, the protocol is severely limited in scope because it restricts scripts to interacting with the Web server during the content-generation phase of the transaction and does not allow scripts to intervene in other interesting operations, such as user authentication and URL-to-filename translation. To overcome these deficiencies, a host of CGI alternatives and spin-offs has appeared in recent years. There are persistent CGI variants such as FastCGI, embedded CGI emulators such as mod_perl and Velocigen, server APIs such as ISAPI, and template-driven solutions such as ASP and PHP. Most recently we've been seeing an entirely new generation of Web software built on top of application servers (see Web Techniques, February 1999). Is CGI dead?
'Tis a Gift To Be Simple
There was a good reason for CGI's rapid adoption six years ago, and there's good reason for its enduring popularity today. It's simple. So simple that a fully compliant CGI script can be written with three lines of Perl. So simple that someone who has never written a line of code, let alone a network server, can install and customize a CGI script with ease.
Under the CGI protocol the script can recover all the relevant information about the server, the connection, the browser, and the requested document by reading from a handful of environment variables, or, in the case of a
POSTed Web form, by reading text from standard input. To send data back to the browser, the script merely needs to print what it wants to display to standard output. True, there are a few gotchas (such as correctly decoding URL-encoded form strings), but those things are handled well by the many CGI support libraries and collections of example code.
Because a CGI script runs as an independent process from the Web server, it has no language limitations. You can write scripts in C or C++, in Perl or Python, in Java or JCL. CGI scripts are also easy to debug, since you can easily create an environment that simulates the Web server in various states and steps through the script in your favorite debugger.
CGI scripting gives developers' imaginations free rein. CGI scripts can do anything that stand-alone programs can do. They can control external devices, open up network connections, and talk to databases. Do you need a Web page to interface with a 1970s-era tape-punch device? No problem! You can whip up a CGI script to do it.
Properly written CGI scripts are also highly portable. Plenty of scripts written five years ago for now-obsolete Web servers on now-obsolete platforms are running today in completely different environments. The combination of language independence and a simple, widely supported protocol makes CGI portability a cinch.
Contrast the simplicity and flexibility of CGI scripting to some of the alternatives. Web-server APIs rarely give developers a choice of development language, usually limiting them to one or, occasionally, a couple of alternatives. The process of building and installing a new server module can be formidable, and the rich APIs can be hard to learn. Template systems, such as PHP, offer simplicity, but offer no way of stepping outside the envelope. (What do you mean there's no driver for our inventory database?) Even the simplest Java servlet requires more work to get up and running than an equivalent CGI script.
Where CGI fails is in scalability. A CGI script that runs quickly when the server is getting hit 50,000 times a week becomes as slow as a slug on a winter morning when the number of hits goes up a hundredfold. The reason for this dramatic deterioration in performance is the need for the server to spawn a new CGI script process every time it needs to run. This is why there has been so much effort put into creating CGI-like environments that don't require frequent process creation. Technologies that I've talked about in previous columns include FastCGI, a protocol that keeps the script running in a separate process until it is needed; mod_perl, an embedded Perl interpreter for Apache; and Velocigen, an embedded Perl interpreter for Netscape and Microsoft servers. If you find that your CGI scripts just aren't keeping up but you're loath to switch to a radically different technology, one of these development environments is for you.
The other thing some other development environments have that CGI notably lacks is session management. Because the HTTP protocol is stateless, simple CGI scripts have no memory of previous requests nor any notion of a continuous user session that spans multiple requests. Built-in session management is the big (possibly the biggest) selling point of application servers. However, I would take issue with anyone who considers the lack of session management to be a fatal flaw in CGI. Most scripts don't need the overhead or complexity of a full-blown session-management system, and can make do with a few simple tricks for maintaining state. For those scripts that need sophisticated session-management systems, such as applications that are physically spread over redundant servers, there's a number of good session-management packages written for various languages and platforms. My favorite is Apache::Session, written by Jeffrey Baker for use with Apache/mod_perl.
Standardizing the Standard
One measure of a protocol's health is its rate of evolution. Living protocols change to meet the changing environment, just as the HTTP protocol is changing to meet the requirements of a Web that has become commercially oriented, and the TCP/IP protocol is changing to meet the depletion of IP address space.
Readers may be surprised to hear that the CGI "standard" isn't really a standard at all. The current protocol is a "common practice" adopted from a terse description of CGI version 1.1 published by the NCSA HTTPD team in 1993, and incorporated into its popular Web server.
Two years ago there was an effort to formalize the de facto CGI/1.1 protocol as an Internet Engineering Task Force (IETF) Request for Comments (RFC). However this effort lost momentum. Then, in the spring of 1998, Ken Coar and David Robinson picked up the torch, and have successfully published CGI/1.1 as an Internet Draft. This draft is now available for comment by the community and may go through several revisions before it comes up for review by the IETF (see "
Online"). This represents an important milestone in the CGI protocol, because for the first time there is an unambiguous standards document to which server vendors and script writers can adhere. In fact, I found a few minor surprises myself while reading through the draft, because there were some places where my interpretation of "common practice" diverges from the draft's. Without a formal specification, such divergences of interpretation would go unnoticed.
What's New in CGI/1.2?
Having submitted the CGI/1.1 draft, Coar and Robinson, along with a loose group of other volunteers, have begun work on the CGI/1.2 draft. CGI/1.2 seeks to fix some of the problems in the current CGI/1.1 protocol, and add new features and functionality. Nothing is finalized at this point, but the following is a sampling of what might be added to CGI/1.2.
Disentangling Scheme and Protocol. It sounds crazy, but there's no reliable way for a CGI script to reconstruct its own URL for use in self-referencing scripts. This is because of a confusion in the CGI/1.1 specification between the scheme, which is the prefix used for URLs (such as "http:"), and the protocol, which specifies the communications protocol (like "HTTP"). Usually these are the same, but there are some cases in which they're not. The most important example occurs when secure sockets layer (SSL) is in use. The scheme is "https" but the protocol is still HTTP. Under CGI/1.1, scripts have access only to the protocol. This will be fixed in CGI/1.2.
Defining Meta-Variables for SSL. When SSL is used for encrypting Web communications, it carries with it a lot of interesting information, including such things as the encryption algorithm in use and authentication information from the user's digital signature. However, the CGI/1.1 protocol has nothing to say on the issue of which SSL information the server is obligated to transmit to the script. As a result, every server does it a little bit differently. CGI/1.2 will mend this gap.
Giving Scripts More Control over the server. CGI/1.2 will probably define a new header field that scripts can emit named
script-control. This field will allow the script to send a number of directives directly to the server. One directive that has been proposed so far is
no-abort, which prevents the server from aborting the execution of the script while it's doing something delicate, like updating a database.
More HTTP/1.1 Support. HTTP/1.1, which is just becoming widespread, specifies a host of techniques for making Web communications faster and more reliable. However CGI/1.1 can't deal effectively with some of these techniques. For instance, HTTP/1.1 Web browsers are allowed to submit large
POSTings as a stream without declaring the total length of the content data in advance. To transmit this data to CGI/1.1 scripts, servers currently must buffer the entire
POST in memory or on disk, using resources and forcing the entire data stream to be received before the CGI script can start working on it. CGI/1.2 may allow this to be handled more efficiently.
Header Continuation Characters. A minor point, but apparently an issue in some quarters: The CGI/1.1 protocol gives scripts no way to continue long header fields on subsequent lines. CGI/1.2 will define a header continuation character.
Abolishing the NPH Protocol. The No-Parse Header protocol is a variant on CGI/1.1 that lets the script send data to the browser without the server examining and possibly modifying the header. However, NPH becomes problematic when the HTTP/1.1 protocol is involved, and all of the situations where it might be used can be dealt with using a script-control header instead. NPH may disappear in CGI/1.2.
The interest in the development of a CGI/1.2 standard shows that CGI is no fossil. It's also comforting to see that most of the proposed changes to CGI are enhancements and cosmetic improvements. Nobody has suggested a complete overhaul. This tells us that the design of CGI is fundamentally on target, and that the majority of developers are happy with it the way it is.
CGI is dead. Long live CGI!
Lincoln is an M.D. and Ph.D. who designs information systems for the human genome project at Cold Spring Harbor Laboratory in New York. He can be reached at firstname.lastname@example.org.