SOAP: Clean and Secure
By Paul Sholtz
Web services offer an innovative approach to distributed computing, in which software components are designed and built as discrete services, deployed globally on the Internet, and then dynamically assembled into complex distributed applications. If the hype becomes reality, such services will make life easier not only for developers but for end consumers as well.
For example, say you're on a flight from Boston to L.A. when a massive storm breaks out in the Midwest. The pilot has to fly two hours out of the way to get around it, and now you're at risk of losing your car rental and hotel reservation because you won't arrive on time. Luckily, your airline offers Web services updates, because now everyone involved can be notified instantly of your new arrival time. Your car will still be waiting for you, you'll still have a clean hotel room to sleep in, and your business associates might even be pinged on their Blackberry pagers.
It sounds nice. Advances in information technology are supposed to make our lives simpler and easier. But dig a little deeper, and you'll quickly uncover several important privacy issues. To accurately respond to your late arrival, the Web services infrastructure needs to know many details about who you are, where you are (in real time), and what you're about to do. Service providers must operate this infrastructure with an extraordinarily high level of integrity and in a way that promotes customer trust. The model even faces potential legal difficulties: For reasons of public safety, most countries prohibit airlines from publicly disclosing passenger flight lists. A breach in the integrity of Web services here could lead not just to downtime and lost revenue, but to loss of life through terrorist acts or other criminal wrongdoings.
As more organizations deploy infrastructure based on Web services, certain privacy risks inherent in this model will begin to surface. Microsoft learned this lesson the hard way during its recent announcement of HailStorm, a group of Web services through which the company plans to collect, indirectly, as much personally identifying information on consumers as possible. To its credit, Microsoft was quick to see how critical personal information would be to delivering an end-to-end Web services platform. However, Hailstorm faces tough scrutiny from privacy, security, and legal experts. Many critics question Microsoft's ability to operate such an infrastructure reliably.
Simple Object Access Protocol
Services like the hotel and car rental notification agent can be made possible with the Simple Object Access Protocol. SOAP is a method invocation protocol that lets participating computers exchange instructions and information in a manner that's independent of platform or vendor. SOAP encodes information requests and responses using XML. The messages can be run over standard Internet transport protocols, such as HTTP or SMTP. Unfortunately, SOAP made no provision for security, leaving important decisions regarding data privacy in the hands of application developers.
From a security standpoint, one of the most frustrating aspects of SOAP is how effectively it evades perimeter security devices such as firewalls. A firewall filters traffic from potentially untrusted sources by shutting off access to and from certain application-specific ports outside the network. For performance reasons, ports that are used by relatively benign applications like Web servers are generally left open by firewall administrators and are subject to few screening rules. Ports that are used by more powerful and potentially dangerous applications are carefully screened or shut down completely. SOAP undermines this trust model because it lets any distributed application make potentially dangerous method invocation calls directly over highly trusted ports, including 80 (the standard port used by most Web servers).
Because SOAP is a more sophisticated use of HTTP than HTML page display, it requires a much more complex security model than HTTP was originally designed to provide. For now, it's best to configure the firewall to screen all HTTP POSTs for content of type
text/xml and then process them according to special rules.
Example 1 shows a sample SOAP HTTP header and the fields it contains. Administrators may also be able to configure filtering rules based on HTTP header fields containing the word
SOAP, such as
Firewalls, of course, are only your first line of defense. A comprehensive security model also authenticates, authorizes, and logs the activity of users and business processes. Because Web servers often act as method invocation endpoints for SOAP, administrators can use techniques with which they're already familiar when designing security for Web services. For example, Apache includes a module called mod_access for limiting access to Web resources by IP address or Internet domain. You're probably already using this so that certain parts of your Web site are accessible only by clients inside your company's network. The same technique can be used to restrict access to Web services based on network address. Whenever possible, you should configure mod_access to use IP addresses instead of domain names, because this will keep Apache from getting bogged down with slow DNS lookups.
More powerful access control is possible if you require clients to present evidence of their identity, like a username and password, upon authentication. One way to do this is to use the authentication services built directly into HTTP. Most Web servers support both basic and digest authentication, although digest is considerably more secure because it obfuscates credentials using a sophisticated hashing algorithm called MD5.
Example 2 demonstrates how HTTP headers can contain authentication credentials. You can configure Apache to use either authentication method by setting the
AuthType directive in the http.conf file. If you choose this security policy, clients using Web services will have to know to include authentication credentials directly in the HTTP request, just as a Web browser would.
A craftier way to pass credentials would be to encode them directly into XML fields within the SOAP message itself. This technique requires the client and server to agree on a common schema for understanding how authentication credentials will be represented in XML. It also requires you to take precautions to ensure data confidentialitypreferably by encrypting XML elements that contain sensitive information. One promising approach is the XML Encryption standard that's being developed by the W3C XML Encryption Working Group. XML Encryption is still a work in progress, but it defines ways to encrypt XML documents at a wide range of granularities.
Example 3 shows sample XML documents before and after XML Encryption has been applied. At lists.w3.org/Archives/Public/xml-encryption, you can find more information on XML Encryption.
If this description is too complicated, or you don't want to wait for the standards to be fully defined, you can still secure the entire SOAP message by encrypting it with SSL. Nearly every Web server on the market today supports SSL, but the choices for Apache can be a little confusing because there are two similar options: installing and configuring the mod_ssl package, and upgrading to the Apache-SSL server.
mod_ssl provides strong cryptography services for Apache 1.3 Web servers by using the SSL (v2 or v3) and the Transport Layer Security (TLS) protocols. The package integrates with Apache much as any other module (for example, mod_perl or mod_access) does, with the important exception that it requires the Apache Extended API code set. If you're applying mod_ssl directly to the Apache source tree, this should be taken care of automatically. It may, however, be an issue for vendors who want to build separate packages for Apache and mod_ssl. When configuring mod_ssl, remember that although HTTPS can run on any port, the standard defines the default as port 443.
Apache-SSL is a stand-alone secure Web server based on Apache and the SSLeay/OpenSSL libraries. Apache-SSL and mod_ssl are two different (and mutually exclusive) ways of accomplishing the same goal: securing Web transactions with SSL. The mod_ssl code was originally based on the Apache-SSL project. But, there are important functional differences, and you should carefully read the documentation for each to decide which best suits your requirements. For example, if you make extensive use of Apache and have already integrated several other Apache packages into your infrastructure, it may prove easier to install mod_ssl than to upgrade your entire infrastructure to Apache-SSL.
Web services require developers to consider privacy and security very carefully. However, service-based computing also opens the door for companies to introduce more transparency into their information handling practices, thereby holding themselves accountable and improving the trust they have with their customers. A Web services platform should inspire you not just to tighten your security, but also to give your customers more control over the way their personal information is used. (For related information, read
Code Signing in Java").
The problem of consumer data privacy essentially boils down to a question of context. Companies expose themselves to risks whenever they collect information for one purpose and use it for something completely unrelated. For example, one customer may feel that it's all right for a business to use his or her personal data for services such as payment processing, order fulfillment, and even Web site personalization. Yet, the customer might not be so comfortable if he or she knew the same data would be used for services like outbound marketing campaigns or shared with an insurance company. Another customer may have a totally different set of preferences about how personal data should be used. Thus, a customer's sensitivity to privacy violations changes with the context in which the information is used, and these preferences are different for every customer. Segmenting your computing infrastructure according to context (by organizing it around high-level business services) is an excellent way to ensure that personal information is used only for purposes that are consistent with customer preferences, thereby reducing your privacy risk.
Imagine a Web page for a hypothetical financial institution. It lists services offered by the institution, for which the consumer can opt-in or opt-out. Naturally, some services will be mandatory (for instance, the customer can't open a bank account unless he or she consents to receiving bank statements in the mail). But as the idea demonstrates, Web services can make it easier for companies to offer opt-in and opt-out capabilities to customers based on the services provided by the enterprise.
Though still an emerging technology, Web services hold a great deal of promise. Many important standards have yet to be hashed out, and there are still substantial security risks involved in deploying a computing infrastructure based entirely on Web services. Alternatively, service-based computing is a new business paradigm that could be just what's needed to spur the industry into proactively managing privacy risks and fueling electronic commerce growth. The information economy of the twenty-first century depends critically on the efficient use of personal information, and companies that can enhance and enable customer privacy will have a sustainable, long-term competitive advantage.
(Get the source code for this article here.)
Paul is the cofounder and CTO of PrivacyRight, a San Mateo, CA developer of enterprise privacy products. Contact him at firstname.lastname@example.org.