Debugging XML Applications
By Michael Floyd
When developing XML applications, it's often necessary to use several technologies to manage data. As a developer, you must first design schema to represent the data, use markup to describe it, DTDs to validate it, XSL to transform and present it, and the DOM to access and modify it programmatically. In a client-server arrangement, you also have to deal with the technologies of the existing infrastructure.
Because there are so many factors in a single XML application, it's not always easy to tell where things went wrong when a document doesn't display correctly. This is particularly true when generating XML dynamically from a server, because you never see the XML that's being generated. It's possible to spend hours debugging the program. So what can you do to locate the problem quickly and solve it?
Debugging a static XML document is relatively straightforward. All you need do is run the document through a parser, which will show you where there are formatting errors. However, when debugging applications that generate XML on the fly, your chances of finding the errors in the document are slim. Because the document lives (and dies) in memory, you never actually get to see the XML that's being generated. There's no physical XML file to run through a parser.
In this caseactually, in any casethe first thing you want to do is ensure that your XML document is well formed. If the data being generated isn't well formed, the parser fails to generate any output, and if your application is supposed to produce a resulting HTML document you'll typically end up with a blank screen instead. If this happens and you're not sure whether your document is well formed, you can turn the parser's validation option on and redirect any error messages back to your Web browser. And if your XML documents have DTDs associated with them, you can embed a minivalidator into your application.
Listing 1 presents the code that does exactly that.
As with the remainder of the included examples,
validateOnParse property to
true. Once this property has been toggled, you can load the document and let the parser do its job. If the document is well formed, you won't see any messages. However, if there's a problem, Listing 1 uses the
parseError property to get the error message and the line number that the error occurred on. You can send all of this information back to the browser using the
If your document references a DTD or schema, the parser also validates the document. One thing to note is that you should have the parser first check if the document is well-formedand that any errors found at this stage are correctedbefore you try to validate it. To do this, you should temporarily remove the
<!DOCTYPE> declaration and any reference to the DTD during the first stage. Only when the document passes the well-formed test should you move on. At this point, you can reintroduce the
<!DOCTYPE> declaration and run the document through the parser again for validation. Again, any errors are reported back to the browser. Fix 'em, and move on!
If you're still having problems after you've ensured that your application is generating well-formed, validated XML code, the next step is to check your style sheet. Run it through a parser to ensure that it's well formed. One quick way to do this is to load the style sheet file directly in Internet Explorer 5. If there's a problem with form, IE will quickly uncover the error.
If it turns out that your style sheet is well formed, but you're still having problems, compare your XML document's structure to the style sheet. A common problem for style sheets is that the templates don't correctly traverse the document tree. As a result, you may not be selecting the element or attribute you think you're selecting. To resolve this you must be able to examine the structure of your document. Even with dynamically generated documents, this is easy to do if you have a formal DTD or schema. Unfortunately, if you're not using a DTD, the only example of structure is the document itself. How do you view a dynamically generated document that exists only in memory? Write it out to file using the following DOM method:
You can, of course, name the file anything you want.
Converting Database Data
Let's see how you might put these general strategies into practice. In the May issue, I showed you how to access raw data from a back-end database using ASP, how to mark the resulting data with XML tags, and how to apply a style sheet to the dynamically generated document.
You may recall that the example application in that article let users search a catalog of XML tools. The results of the search were presented in a table that listed the name of the product, the company that created the product, and a brief description. The presentation of the product name also contained a hyperlink to the home page for that product. For example, if you search the catalog for "authoring tools," the first entry that comes up is Amaya, developed by the W3C. The name "Amaya" is underlined and rendered in blue, indicating a hyperlink. The hyperlink takes you to the Amaya home page at the W3C Web site.
One problem I subsequently ran into with the application was that while some hyperlinks worked correctly, others generated a 404: File Not Found error when users selected the hyperlink. Apparently, an erroneous URL was occasionally being plugged in for the hyperlink's
HREF value. I knew the dynamically generated code was well formed, because my error handling routines weren't reporting errors. And I knew that the style sheet wasn't responsible, because some of the hyperlinks turned out OK. So I dumped the XML document to file (as described above) and soon discovered the problem.
Consider the markup in
Listing 2, which is an entry that was generated by searching the database. The information between the opening and closing
<homepage> tag is used to create the hyperlink in question. The problem is that not all products in the database have home pages. In such cases, the field in the original database is left blank. Unfortunately, when the SQL query returns records with empty fields, the
null string is returned for that field. As the ASP script is currently written, this value is dynamically placed in the XML document, and subsequently used by the XSL style sheet to create the hyperlink. The XSL processor places the
null value in the
HREF attribute, and because
null is not a valid URL, the result is a broken link. Unsuspecting users then receive the 404: File Not Found error when they try to select the hyperlink.
To solve the problem, you could handle this directly in the ASP script. Feeling, however, that this really boils down to a presentation issue, I decided to handle it in the XSL style sheet. In
Listing 3, the modified style sheet uses an
<xsl:choose> construct to decide whether a hyperlink should be created. The construct provides two elements
<xsl:otherwise> that act much like a
switch statement found in programming languages like C++ and Java.
<xsl:when> element takes a test attribute that lets you check for certain conditions. The value of the test attribute can be any valid XSL expression. And you can include as many
<xsl:when> elements in your style sheet as you need, thus allowing you to treat any case. The
<xsl:otherwise> element acts as a catchall, so you can provide default handling for cases that don't meet the other conditions. The particular setup of elements in
Listing 3 can be read as: "When the
<homepage> element is null, just write out the name of the product, otherwise create a hyperlink for the product name."
Anchor Tags in a Style Sheet
Another problem related to the last example is generating anchor tags in an XSL style sheet. Normally, you can create a hyperlink simply by writing the HTML into the style sheet. For example, you could write
<A HREF="http://www.somedomain.com">Click Here</A>
When the style sheet is processed, this HTML is simply output as part of the XSL transformation. In the database example, however, we get the label (the product name) and the URL from a dynamically generated XML document. Typically, you would use
<xsl:value-of> to traverse the document tree and retrieve content from these elements.
Listing 3 does exactly this when it uses
<xsl:value-of select="prodName"/> to simply display the name of the product. But the problem in generating anchor elements comes when you try to embed the
<xsl:value-of> into the
HREF attribute of the anchor element. That is, you can't write the following directly in your style sheet:
<A HREF="../../../../docs/new1013637225/<xsl:value-of select='homepage'/>" >...</A>
This generates an error because XML does not allow markup to be contained in an attribute. Also, attributes may not contain quotes.
The answer then, is to use
<xsl:element> to create a new anchor element that can be placed in the result tree. Listing 3 does this with the
<xsl:otherwise> element contained in the first
<xsl:choose> construct. The
<xsl:element> construct takes a name attribute, like A in this case, which allows you to name the element that you're about to create. You can then use
<xsl:attribute> to specify attributes for the element. This is where you should place the
<xsl:value-of> statement. Finally, you place the
prodName variable in the anchor element. The closing
</xsl:element> tag automatically generates the closing
</A> in the result tree. All this properly generates working links to the product home pages.
Data with Markup
When working with databases and XML, another problem I've run into is database data containing characters that might be interpreted as markup, such as less than (
<), greater than (
>), and ampersand (
&) symbols. In markup languages like XML and HTML, these characters have special meanings, so they must be converted to other, representative symbols. I first noticed the problem when entries from the authoring category of the database wouldn't display correctly. A quick dump of the XML document to a file helped when I was looking for the error.
The XML tools database contains product names that include XML characters. XML <Pro> from Vervet Logic is a classic example. When the record is processed, the
prodName field is tagged as:
Clearly, this generates an error when parsed by the XML processor. Vervet Logic's Web site seems to avoid the markup problem by not using these characters at all. Likewise, Open Text's Near & Far Designer contains an ampersand in its product name, which looks like the start of an entity to the parser. The parser expects a corresponding semicolon (;) to follow soon after any ampersand, and when it doesn't find it, the parser generates an error.
You could force the database author to input markup characters using predefined entities. For example, "Near & Far" could be input as "Near & Far" in the database. However, this isn't a practical solution in most cases. One alternative is to include code in your ASP script that looks for these characters before processing the data and replaces individual markup characters using predefined entities. Unfortunately, this approach involves expensive string handling methods that can slow the server down. Assuming you're simply planning to pass the data on for display, a better option is to declare the content for each field as a CDATA section.
Example 1 shows the changes you need to make to the script in the xmltools.asp file presented in the May issue (see "
Online" for pertinent links).
The strategies presented here should serve well in tracking down most problems you'll encounter in XML. Remember to validate and revalidate. Include robust error handling routines in your code and embed validation in those routines. This will help you to narrow down possible problems quickly and ultimately will save you countless hours of head scratching.
(Get the source code for this article here.)
Michael is the author of Building Web Sites with XML (Prentice Hall), and teaches XML BootCamp. He also carries the honorary title of editor at large for Web Techniques. He can be reached at mfloyd@BeyondHTML.com.