Web Techniques Magazine
January 1997
Volume 2, Issue 1

Load Testing Intranet Applications

Finding Hidden Bottlenecks

By Jeff Straathof
jstraathof@pureatria.com

Just as telephone lines used to get tied up on Mother's Day, Web sites are prone to traffic overload. According to the February 14, 1996, issue of USA Today, thousands of users per second were turned away from IBM's Web site during the chess competition between Gary Kasparov and Big Blue's chess-playing computer. Another example of a high-profile cyberevent that left Web surfers high and dry was the 1996 Super Bowl Web site, which turned away millions of would-be users.

Web-server bandwidth is a problem not only for Internet Web sites, but also for corporate intranets, which often function as the backbone of a company's operations. A recent report from Zona Research (Redwood City, CA) makes some startling predictions about the growth of the intranet marketplace: By the end of 1998, the $8 billion revenue generated by the intranet market will be almost four times that of the Internet.

Businesses everywhere are launching internal applications on the World Wide Web, using it as a channel to reach scores of employees in moments, and these applications must have quality, performance, and scalability. Test engineers must measure how fast Web-system components work together, so they can determine what kind of workload their Web servers can withstand: how many simultaneous hits the Web site can handle, and how this affects quality and performance.

Load Testing

A load test emulates user activity and analyzes the effect of the real-world user environment on an application. By load testing a Web application throughout development, IS departments can identify problematic parts of a Web application before it is accessed by hundreds or thousands of users. Load testing can offer proof that an important intranet or Internet application will work properly, and not produce work stoppages or unacceptably poor performance. In almost all cases, load testing uncovers fatal errors that likely would have lead to a system crash had deployment proceeded without testing.

Benefits of Load Testing

By simultaneously emulating multiple users, load testing first determines whether an intranet application can support its intended workload of employees. Using one driver machine to simulate hundreds or thousands of users, load testing creates scripts that represent actual users and their daily, often disparate, operations. With a capture agent, load testing records user activities-including keystrokes, mouse movements, and HTTP requests-to create emulation scripts. Then, it plays back a mix of scripts representing a set of real users.

Load testing also charts the time a visitor to the site has to wait for browser responses. It finds hidden bugs and bottlenecks and gives developers the chance to correct them before the site goes into production. All hardware, software, and database vendors boast of the speed of their products, but load testing discloses how fast those products work within a unique environment, for primary transactions, during peak business hours.

Furthermore, load testing checks and maintains applications as their workloads increase, so systems can be adjusted accordingly. As businesses grow and change, it is important to confirm a Web site's ability to sustain growth. Developers can reuse scripts to alter usage levels, transaction mixes and rates, and application complexity. Load testing is the only way to verify the scalability of components working together.

Hardware versus Software

Load testing can be implemented in either hardware or software. Hardware-based ("multi-PC") load testing requires one PC for each user emulated, while software-based ("virtual-PC") load testing emulates many users from one driver machine. Hardware-based load testing is practical for companies that want to run small tests (under 20 users) and own the necessary machines and test lab. You can create load tests by building on top of existing scripts for GUI test packages (utilities that run in single-user mode to detect interface errors). This configuration is relatively simple: Each PC can run a copy of the GUI test script over the network and simultaneously stress the server.

Software-based load-testing tools require a capture or recording agent that records real-world user activity-including HTTP transactions-into script format. These scripts are then executed from a single driver machine, which behaves as if it were an infinite number of PCs and load tests the server. This method is cost effective, especially for large numbers of users. It's easy to change the script content and/or the number of scripts, which simplifies checking the scalability of your Web site as it grows.

A combined hardware/software-based solution allows you to drive both virtual users and client PCs from one control monitor using the same scripts. This ideal test configuration allows simultaneous measurement of both client and server, as well as easy detection of network bottlenecks.

Canned Benchmarks Versus Load-Testing Tools

It's possible to take load testing one step further and create hypothetical scenarios of application usage. This is preferable to canned benchmarks, such as WebStone, that use predisposed workloads to stress test an application and compare the relative performance of different Web servers. Because they do not reflect actual user behavior on a site, canned benchmarks typically compare one system or configuration to another, but don't test how a system behaves under the load of real users.

Load-Testing Steps

Load testing consists of four basic steps:

Planning. This step can itself be divided into four parts:

1. Define and prioritize your goals. An example of a goal might be to demonstrate scalability, that is, to find the maximum number of concurrent hits that meet the requirements of acceptable response time and minimum throughput. In such a test, response times are reported as a function of the number of concurrent hits. Another example is to determine "breaking points," that is, the outright failures of the hardware, software, operating system, database server, or other system components, including failures due to insufficient memory or similar resources. A breaking point should be based on response time and/or throughput. For example, the breaking point for a short query might be ten times worse than the acceptable response time, whereas the breaking point for a long query might be two times worse than the acceptable response time.

2. Determine the number and size of your database(s). In creating test databases, you must choose between real-life and artificially created databases. Choose the one that gives you the most realistic test database with a reasonable amount of effort.

3. Determine whether your databases need refreshing between tests. If so, to what degree? The optimal case requires the least refreshing for the most testing. Database refreshes between tests can be time consuming, especially with large databases; often, refreshing can take more time than the actual test.

4. Define the application workload. For example, a bank wants to test its intranet application, so the company creates ten different scripts of typical user activity. Script 1 might be an employee submitting an expense report; script 2 might be an employee searching for information about medical insurance; and so on. In the end, the bank might have ten scripts, each representing a common user activity. Script 1 might account for 20 percent of the calls, whereas the other nine scripts comprise the other 80 percent. In this way, the bank would design a realistic load to test its internal Web application.

Building the scripts. Scripts are built using a recording agent that captures user activity and HTTP transactions from any browser. The recording agent captures not only typing and mouse movements, but also the time it takes for the employee to think and pause. More important, the recording agent captures the HTTP transactions between the PC and server and measures the time required for the numerous connects and disconnects. It emulates at SLIP and PPP connection speeds-even when driving users from a machine on the same network as the server under test.

Without the recording agent, you would have to create scripts by hand, and since there is no foolproof method to recreate HTTP transactions generated by today's development tools, there would be no guarantee that the test were accurate, making the results meaningless.

Generally, load-testing tools require some programming knowledge (usually C) to modify scripts that emulate complex environments. However, advanced products offer libraries of commands to simplify script modification. For example, reading from files of shared input should be simple for the script writer.

Once the scripts are captured, they must be transferred to a driver machine for compilation and execution. At this point, you can alter them for realism, adding functions, loops, and think delays. You might also adjust typing and mouse-pointer delays, and utilize global variables and fixed-throughput pacing. You can add looping to make a single captured activity act like many, branching to randomize function order and change data so your emulated users don't make identical queries and updates. User entries can be substituted by generating random values, sharing pools of input on the driver, accessing data returned from the application under test, or passing data between scripts. With simple programming changes, the replayed stream represents exactly the requests generated by multiple users continuously operating a Web application.

Replaying the scripts. This determines the effect of the emulated workload on the system. Replay is accomplished by the touch of a button, and can be altered to increase slowly the number of users on the system. The driver machine executes scripts, which stresses the server.

Analysis. With graphs and charts depicting the performance of an application under the weight of many users, test engineers spot bottlenecks and system slowdowns. Why did the application work perfectly with 50 users, but break with 51? Why did the application perform to expectations with 100 users, but take a plunge with 200?

Because load-testing scripts can be saved, a test engineer can stress an application any time a piece of the application changes. This is particularly useful after there is a new release of the operating system, changes to hardware, or upgrade revisions of the application with new functionality or bug "fixes."

Note that while response-time charts can demonstrate application performance, there is no substitute for practical human feedback, so consider inviting a few "live" users to browse the Web site being tested. These users frequently provide valuable insights that no report can offer.

What to Look For

Load-testing tools for Web should be compatible with other testing products. Find technology efficient enough to emulate the number of users you expect; you don't want the driver to become a bottleneck because a testing tool has consumed all its memory and CPU. As the Web market grows, testing grows as a component of Web-site development, so keep the entire process as simple as possible.


Jeff is a vice president of Pure Atria Corp. Prior to joining the company in 1995, he was president of Performix Inc., a leading provider of load-testing software. He can be reached at jstraathof@pureatria.com.
Copyright Web Techniques. All rights reserved.
Web Techniques Magazine
Last modified: 2/5/97