Unofficial JAXP FAQ

This FAQ is intended to help developers use the Java API for XML Processing (JAXP) effectively. It represents the current opinions of the author (Edwin Goei) as an engineer who works on the reference implementation and not of my employer, Sun Microsystems. Any comments about this FAQ can be emailed to me using the email address below. For questions about JAXP itself, please refer to the question on support below.

The official JAXP FAQ can be found at http://java.sun.com/xml/jaxp/faq.html. Both official and unofficial versions should contain essentially the same information since they are derived from the same XML input document, however, the unofficial version is less formal and may be updated more frequently.

Q. What is JAXP?

The Java API for XML Processing, or JAXP for short, enables applications to parse and transform XML documents using an API that is independent of a particular XML processor implementation. JAXP also provides a pluggability feature which enables applications to easily switch between particular XML processor implementations.

To achieve the goal of XML processor independence, an application should limit itself to the JAXP API and avoid implementation-dependent APIs and behavior. This may or may not be easy depending on the application. See this question for more information. JAXP includes industry standard APIs such as DOM and SAX. See these slides (PDF) for more information.

The reason for the existance of JAXP is to facilitate the use of XML on the Java platform. For example, current APIs such as DOM Level 2 do not provide a method to bootstrap a DOM Document object from an XML input document, JAXP does. (When DOM Level 3 provides this functionality, a new version of the JAXP specification will probably support the new Level 3 scheme also.) Other parts of JAXP such as the javax.xml.transform portion do not have any other equivalent XSLT processor-independent APIs.

Q. What is the difference between the specification version and the implementation version?

A JAXP implementation has both a JAXP specification version number and an implementation name and version number. Specification versions are limited to the form N.N, where N is a number. Specifications are developed according to the Java Community Process (JCP).

Implementations attempt to implement a particular specification version. However, a particular implementation may have bugs in it so that it deviates from the specification. Implementations may use an independent and arbitrary naming and versioning scheme from the JAXP specification version.

There is one particular implementation called the JAXP reference implementation (RI) which can cause confusion. The JAXP RI has a similar name to the specification and has similar version numbers. Unlike the specification version number, the JAXP RI may use a version numbers containing more than two numbers. For example, JAXP RI version 1.1.1 implements the JAXP 1.1 specification and it contains fixes for bugs found in the previous JAXP RI 1.1. Note the difference between specification and implementation versions here. Also, the first RI version has the same number as the specification version, namely 1.1.

As of June 2002, the current specification version of JAXP is 1.2. The current RI version is JAXP RI 1.2.0.

Q. Where can I download an implementation?

The following tables lists implementations that claim to support at least some portions of JAXP. Please note that not all claims have been verified and that the information may not be current.

JAXP can be divided into two main parts: a parsing API and an transform API. Implementations that support the transform API are typically XSLT processors which require an XML parser to read input documents. Because of this, these implementations typically bundle an XML parser as part of their distribution.

The following implementations support the transform component of JAXP and also bundle a parser (in alphabetical order):

Name Parser Implementation XSLT Processor Implementation Comment
Apache Xalan-J Xerces-J 2.x Xalan-J XSLT None
JAXP Reference Implementation Xerces2 or Crimson Xalan-J XSLT See JAXP RI questions below.
Java 2 Standard Edition 1.4 Crimson Xalan-J XSLT, cvs tag: xalan_2_2_d10 Uses JAXP RI version later than 1.1.2
Saxon Old fork of Ælfred2 Saxon XSLT No DOM support

The parsing component often is distributed separately. The following are implementations that support just the parsing component of JAXP (in alphabetical order):

Name Comment
Ælfred2 portion of GNUJAXP Non-apache style license, see link for details
Apache Crimson None
Apache Xerces-J 1.x Supports XML Schema. Obsoleted by Xerces2-J.
Apache Xerces2-J Supports XML Schema. Supersedes Xerces-J 1.x.

The above information was last updated 2002-06-11. Please email me with updated information.

Q. Why does Apache have multiple XML parsers?

As of June 2002, Apache has three Java parsers: Crimson, Xerces 1, and Xerces 2. The reason is historical -- because Apache accepted two donations from two different companies. IBM donated XML4J which became Apache Xerces 1. Sun donated Project X which became Apache Crimson. Xerces 2 is a new third parser which is a rewrite. It has goals such as maintainability, modularity, and the implementation of certain features, which neither of the previous original parsers has achieved. Xerces 2 was designed to fill the long-term needs of Apache projects going forward.

Q. Where can I get the JAXP specification?

A PDF version of the JAXP 1.1 specification can be found by following this link. In addition, there is a JAXP 1.2 maintanence specification that adds support of W3C XML Schema and is based on the previous JAXP 1.1 specification.

Q. How do I start developing an application which uses JAXP?

Perhaps a good place to start is to look at the JAXP RI docs and some sample programs that use JAXP.

If your application needs to programmatically perform XSLT transformations, then you need an implementation that supports the transform parts of the JAXP API. One resource that provides sample code for this type of application is the Xalan-J documentation.

If not, then your application is a pure parsing application and you need to decide between using the DOM or SAX APIs. In the interest of having less API to learn, I would recommend limiting usage to the standard DOM and SAX APIs as much as possible and use the auxiliary JAXP methods for the functionality that is not available or perhaps difficult to use via the DOM or SAX APIs. For example, currently DOM does not specify a method to bootstrap or load an XML document and return a DOM Document object. This is available via JAXP.

Q. Where can I ask questions about JAXP?

One place to ask questions about JAXP is with the provider of your implementation. For example, if you are using the Apache Xerces parser, use the xerces-j-user mailing list. For Apache implementations, see the XML mailing lists page for subscription information. One location where you can find Apache mailing list archives is at MARC.

Sun also hosts a web-based Java and XML Forum where you can communicate with other JAXP users.

Q. Warning about Namespace processing default values

JAXP has a namespaceAware property that is directly tied to the SAX 2.0 "http://xml.org/sax/features/namespaces" feature which controls whether the parser performs namespace processing. However, the JAXP default value of this property is different from the native SAX 2.0 default. When an application creates a parser using JAXP, the default value is false , but when using SAX 2.0 directly using the static method org.xml.sax.helpers.XMLReaderFactory.createXMLReader(), the default is true .

The following code samples illustrate the typical use case of creating a parser with namespace processing turned on using JAXP (exception handling has been omitted). First, an example to create a DOM Document object:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new Inputsource("http://some-uri.com/foo.xml"));

This next example instantiates a SAX 2.0 XMLReader using JAXP. Note, in this example, the code does not use the SAX 2.0 createXMLReader() static method to instantiate the XMLReader.

SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
XMLReader xmlReader = spf.newSAXParser().getXMLReader();
xmlReader.setContentHandler(new MyContentHandler());
xmlReader.parse(new Inputsource("http://some-uri.com/foo.xml"));

You may ask, "why are the default values different"? The reason for this is historical. JAXP 1.0 first defined the default value to be false, then SAX 2.0 came along and defined a "namespaces" feature with a default of true, finally JAXP 1.1 came along and had to maintain backward compatibility with JAXP 1.0, yet support SAX 2.0.

Q. How can I write my JAXP application to be implementation-independent?

JAXP enables your application to be implementation-independent, but it does not enforce this constraint. For example, if your instance XML documents use character encodings that are not required to be supported by the XML REC, then this may cause compatibility problems. You may have tested your application with a JAXP parser that implements an optional character encoding, but not all JAXP parsers may support that character encoding. Therefore, to ensure portability, your application should limit itself to the following required encoding names in instance documents that contain an encoding declaration: "UTF-8" or "UTF-16". In particular, Java encoding names such as "UTF8" in an encoding declaration may not work with all JAXP compatibile processors.

Q. How do I output/marshal/serialize a DOM tree into a stream?

Currently, there is only one way to do this using JAXP but it requires using the transform component. See the list of implementations. This note may also be useful.

Note there are several implementation-dependent ways of doing this such as using the org.apache.xml.serialize package in Xerces or the XmlDocument.write(OutputStream) method in Crimson, but this ties your application to particular parsers and is thus non-portable.

In the future, DOM Level 3 should also provide this feature and it will likely be incorporated into a future version of JAXP.

Q. When I turn on validation, why do I fail to get any errors?

This is probably because you have not set an ErrorHandler. To get validation errors, three things must be true:

  1. The source document must be associated with a schema. For example, the source document contains a DOCTYPE declaration.
  2. Validation must be turned on.
  3. The application must set a SAX ErrorHandler.

Often times, applications fail to perform the last item.

Q. How do I validate my instance document to a particular schema?

If you are using DTDs, the only standard way of controlling the DTD that is used to validate a document is to insert or replace the document type declaration within the XML document itself. An example of some software that will do this is DOCTYPEChanger. A simpler method that requires a minimum of code is to use an EntityResolver. In general, you can use an EntityResolver to override any external entity in your XML document. However, an EntityResolver cannot override the local subset of a DTD, only the external subset.

If you are using another schema language like W3C XML Schema, you can use the JAXP 1.2 API to programmatically set the schema used to validate your instance document. See this question for more information on this topic.

Q. How do I use a different JAXP compatible implementation?

The JAXP 1.1 API allows applications to plug in different JAXP compatible implementations of parsers or XSLT processors. For example, when an application wants to create a new JAXP DocumentBuilderFactory instance, it calls the staic method DocumentBuilderFactory.newInstance(). This causes a search for the name of a concrete subclass of DocumentBuilderFactory using the following order:

  1. The value of a system property like javax.xml.parsers.DocumentBuilderFactory if it exists and is accessible.
  2. The contents of the file $JAVA_HOME/jre/lib/jaxp.properties if it exists.
  3. The Jar Service Provider discovery mechanism specified in the Jar File Specification. A jar file can have a resource (i.e. an embedded file) such as META-INF/services/javax.xml.parsers.DocumentBuilderFactory containing the name of the concrete class to instantiate.
  4. The fallback platform default implementation.

Of the above ways to specify an implementation, perhaps the most useful is the jar service provider mechanism. To use this mechanism, place the implementation jar file on your classpath. For example, to use Xerces 1.4.4 instead of the version of Crimson which is bundled with JDK 1.4 (Java Development Kit version 1.4), place xerces.jar in your classpath. This mechanism also works with older versions of the JDK which do not bundle JAXP. If you are using JDK 1.4 and above, see this question for potential problems.

Q. Why are there Apache classes in the J2SE 1.4 RI?

The J2SE 1.4 RI is the first version of the JDK that bundles in an implementation of JAXP 1.1. This allows developers to write applications without having to provide a parser and XSLT processor with their application. However, in some cases, it may create additional problems.

The Sun J2SE 1.4 RI uses Apache software for its implemenation of JAXP 1.1 with package names unchanged from Apache software distributions. This can cause problems, for example, if your application wants to use a newer version of Apache software. Under the Java 2 class loader delegation model, the java launcher's ClassLoader will load the bundled version of a class (in rt.jar) before any other version. Thus, if you place a newer version of xalan.jar in the extensions directory or on your CLASSPATH, then that version will be ignored since the runtime will use the older bundled version instead. As a workaround, see the question on overriding the implementation in JDK 1.4.

The future plan is to rename the org.apache.** packages to be something like com.sun.org.apache.** to fix this problem. In addition, other package-dependent parts of the software may also need to be modified. However, this may not be done until after JDK 1.4.1.

Q. How do I override the JAXP implementation in JDK 1.4 and above?

In JDK 1.4, there is an Endorsed Standards Override Mechanism which can be used to override the classes in the JDK itself. One way to replace the classes in the JDK with the classes contained in a set of jar files is to place the jar files in some directory, "my-endorsed", and define a system property. For example, to use a newer version of Xalan, place the newer version of xalan.jar in the "my-endorsed" directory and invoke the Java launcher with the -Djava.endorsed.dirs= my-endorsed option. Another way is to place a jar file in the $JAVA_HOME/lib/endorsed directory of the JDK installation itself. You may need to create the endorsed directory if it does not yet exist.

Q. How do I use W3C XML Schema with JAXP?

Use the JAXP 1.2 API to validate instance documents with W3C XML Schema. The JAXP 1.2.0 RI contains two sample programs (DOMEcho and SAXLocalNameCount) that illustrate how to do this. See the JAXP 1.2 specification for more details.


JAXP RI (Reference Implementation) Questions

Q. Where do I get the latest version of the JAXP RI?

Newer versions of the JAXP RI are being released through the following Sun software releases:

To approximate an unbundled version, you can also download the major components individually from Apache. See the question on source code for more information.

Q. Where can I find JAXP RI docs online?

Since at the time of this writing Sun does not provide one, I will try to maintain a browsable online version of the JAXP RI docs. The JAXP RI docs also contain a link to the JAXP API javadoc.

Q. Where do I get the source code to the JAXP RI?

The JAXP RI is based on open sourced code. Although Sun no longer provides a free source distribution (there might be a way to pay for one), you can obtain the same source code from the Apache CVS repositories. For example, the JAXP RI 1.2.0 release consists of:

Note: there may be some differences with the actual JAXP RI. For example, the JAXP RI comes with documentation which is not yet available at Apache.

Q. Why all the jar files?

Starting with JAXP RI 1.2.0, the jar packaging scheme has changed. There are now a total of six jar files. The reasons are to support the J2SE 1.4 Endorsed Standards Rules and to be compatible with the current Apache packaging scheme. The Endorsed Standards Rules essentially state that only endorsed APIs and their implementations can be replaced with newer versions of those APIs. SAX 2 and DOM Level 2 Core are endorsed APIs, however, javax.xml.{parser, transform} classes are not. See the link above for the precise rules. The decision to split the jar files was a compromise because six jar files also makes life diffucult for users.

Q. (Obsolete) What happened to jaxp.jar?

To summarize, starting with JAXP RI 1.1.3, there is no jaxp.jar. This fact should have been emphasized in the JAXP RI 1.1.3 documentation. The motivation for this decision was to match the Apache packaging scheme at the time of release, as well as to simplify life for developers.

In JAXP RI 1.1.3, which includes the Apache Crimson 1.1.3 parser, both API and implementation classes are contained in the same jar files: crimson.jar for the parser and xalan.jar for the XSLT processor. The JAXP RI packaging scheme matches the Apache packaging scheme at the time of release, however, the Apache scheme has since changed. See the Apache site for details.


Edwin Goei via email or web
Dates are usually in ISO 8601 YYYY-MM-DD format.
Last modified: Thu Nov 7 10:36:18 PST 2002