This FAQ is intended to help developers use the Java API for XML Processing (JAXP) effectively. It represents the current opinions of the author (Edwin Goei) as an engineer who works on the reference implementation and not of my employer, Sun Microsystems. Any comments about this FAQ can be emailed to me using the email address below. For questions about JAXP itself, please refer to the question on support below.
The official JAXP FAQ can be found at http://java.sun.com/xml/jaxp/faq.html. Both official and unofficial versions should contain essentially the same information since they are derived from the same XML input document, however, the unofficial version is less formal and may be updated more frequently.
Q. What is JAXP?
The Java API for XML Processing, or JAXP for short, enables applications to parse and transform XML documents using an API that is independent of a particular XML processor implementation. JAXP also provides a pluggability feature which enables applications to easily switch between particular XML processor implementations.
To achieve the goal of XML processor independence, an application should limit itself to the JAXP API and avoid implementation-dependent APIs and behavior. This may or may not be easy depending on the application. See this question for more information. JAXP includes industry standard APIs such as DOM and SAX. See these slides (PDF) for more information.
The reason for the existance of JAXP is to facilitate the use of XML on
the Java platform. For example, current APIs such as DOM Level 2 do not
provide a method to bootstrap a DOM Document object from an
XML input document, JAXP does. (When DOM Level 3 provides this
functionality, a new version of the JAXP specification will probably
support the new Level 3 scheme also.) Other parts of JAXP such as the
javax.xml.transform portion do not have any other equivalent
XSLT processor-independent APIs.
Q. What is the difference between the specification version and the implementation version?
A JAXP implementation has both a JAXP specification version
number and an implementation name and version number. Specification
versions are limited to the form N.N, where N is
a number. Specifications are developed according to the Java Community Process
(JCP).
Implementations attempt to implement a particular specification version. However, a particular implementation may have bugs in it so that it deviates from the specification. Implementations may use an independent and arbitrary naming and versioning scheme from the JAXP specification version.
There is one particular implementation called the JAXP reference implementation (RI) which can cause confusion. The JAXP RI has a similar name to the specification and has similar version numbers. Unlike the specification version number, the JAXP RI may use a version numbers containing more than two numbers. For example, JAXP RI version 1.1.1 implements the JAXP 1.1 specification and it contains fixes for bugs found in the previous JAXP RI 1.1. Note the difference between specification and implementation versions here. Also, the first RI version has the same number as the specification version, namely 1.1.
As of June 2002, the current specification version of JAXP is 1.2. The current RI version is JAXP RI 1.2.0.
Q. Where can I download an implementation?
The following tables lists implementations that claim to support at least some portions of JAXP. Please note that not all claims have been verified and that the information may not be current.
JAXP can be divided into two main parts: a parsing API and an transform API. Implementations that support the transform API are typically XSLT processors which require an XML parser to read input documents. Because of this, these implementations typically bundle an XML parser as part of their distribution.
The following implementations support the transform component of JAXP and also bundle a parser (in alphabetical order):
| Name | Parser Implementation | XSLT Processor Implementation | Comment |
|---|---|---|---|
| Apache Xalan-J | Xerces-J 2.x | Xalan-J XSLT | None |
| JAXP Reference Implementation | Xerces2 or Crimson | Xalan-J XSLT | See JAXP RI questions below. |
| Java 2 Standard Edition 1.4 | Crimson | Xalan-J XSLT, cvs tag: xalan_2_2_d10 | Uses JAXP RI version later than 1.1.2 |
| Saxon | Old fork of Ælfred2 | Saxon XSLT | No DOM support |
The parsing component often is distributed separately. The following are implementations that support just the parsing component of JAXP (in alphabetical order):
| Name | Comment |
|---|---|
| Ælfred2 portion of GNUJAXP | Non-apache style license, see link for details |
| Apache Crimson | None |
| Apache Xerces-J 1.x | Supports XML Schema. Obsoleted by Xerces2-J. |
| Apache Xerces2-J | Supports XML Schema. Supersedes Xerces-J 1.x. |
The above information was last updated 2002-06-11. Please email me with updated information.
Q. Why does Apache have multiple XML parsers?
As of June 2002, Apache has three Java parsers: Crimson, Xerces 1, and Xerces 2. The reason is historical -- because Apache accepted two donations from two different companies. IBM donated XML4J which became Apache Xerces 1. Sun donated Project X which became Apache Crimson. Xerces 2 is a new third parser which is a rewrite. It has goals such as maintainability, modularity, and the implementation of certain features, which neither of the previous original parsers has achieved. Xerces 2 was designed to fill the long-term needs of Apache projects going forward.
Q. Where can I get the JAXP specification?
A PDF version of the JAXP 1.1 specification can be found by following this link. In addition, there is a JAXP 1.2 maintanence specification that adds support of W3C XML Schema and is based on the previous JAXP 1.1 specification.
Q. How do I start developing an application which uses JAXP?
Perhaps a good place to start is to look at the JAXP RI docs and some sample programs that use JAXP.
If your application needs to programmatically perform XSLT transformations, then you need an implementation that supports the transform parts of the JAXP API. One resource that provides sample code for this type of application is the Xalan-J documentation.
If not, then your application is a pure parsing application and you need
to decide between using the DOM or SAX APIs. In the interest of having
less API to learn, I would recommend limiting usage to the standard DOM and
SAX APIs as much as possible and use the auxiliary JAXP methods for the
functionality that is not available or perhaps difficult to use via the DOM
or SAX APIs. For example, currently DOM does not specify a method to
bootstrap or load an XML document and return a DOM Document
object. This is available via JAXP.
Q. Where can I ask questions about JAXP?
One place to ask questions about JAXP is with the provider of your implementation. For example, if you are using the Apache Xerces parser, use the xerces-j-user mailing list. For Apache implementations, see the XML mailing lists page for subscription information. One location where you can find Apache mailing list archives is at MARC.
Sun also hosts a web-based Java and XML Forum where you can communicate with other JAXP users.
Q. Warning about Namespace processing default values
JAXP has a namespaceAware property that is
directly tied to the SAX 2.0
"http://xml.org/sax/features/namespaces" feature which
controls whether the parser performs namespace processing. However,
the JAXP default value of this property is different from the native SAX
2.0 default. When an application creates a parser using JAXP, the
default value is
false
, but when using SAX 2.0
directly using the static method
org.xml.sax.helpers.XMLReaderFactory.createXMLReader(),
the default is
true
.
The following code samples illustrate the typical use case of
creating a parser with namespace processing turned on using JAXP
(exception handling has been omitted). First, an example to create a DOM
Document object:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new Inputsource("http://some-uri.com/foo.xml"));
This next example instantiates a SAX 2.0 XMLReader using JAXP. Note, in this example, the code does not use the SAX 2.0 createXMLReader() static method to instantiate the XMLReader.
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
XMLReader xmlReader = spf.newSAXParser().getXMLReader();
xmlReader.setContentHandler(new MyContentHandler());
xmlReader.parse(new Inputsource("http://some-uri.com/foo.xml"));
You may ask, "why are the default values different"? The reason for
this is historical. JAXP 1.0 first defined the default value to be
false, then SAX 2.0 came along and defined a
"namespaces" feature with a default of true, finally
JAXP 1.1 came along and had to maintain backward compatibility with
JAXP 1.0, yet support SAX 2.0.
Q. How can I write my JAXP application to be implementation-independent?
JAXP enables your application to be implementation-independent, but it does not enforce this constraint. For example, if your instance XML documents use character encodings that are not required to be supported by the XML REC, then this may cause compatibility problems. You may have tested your application with a JAXP parser that implements an optional character encoding, but not all JAXP parsers may support that character encoding. Therefore, to ensure portability, your application should limit itself to the following required encoding names in instance documents that contain an encoding declaration: "UTF-8" or "UTF-16". In particular, Java encoding names such as "UTF8" in an encoding declaration may not work with all JAXP compatibile processors.
Q. How do I output/marshal/serialize a DOM tree into a stream?
Currently, there is only one way to do this using JAXP but it requires
using the transform component. See the list of implementations. This note
may also be useful.
Note there are several implementation-dependent ways of doing this such
as using the org.apache.xml.serialize package in Xerces or the
XmlDocument.write(OutputStream) method in Crimson, but this
ties your application to particular parsers and is thus non-portable.
In the future, DOM Level 3 should also provide this feature and it will likely be incorporated into a future version of JAXP.
Q. When I turn on validation, why do I fail to get any errors?
This is probably because you have not set an ErrorHandler.
To get validation errors, three things must be true:
ErrorHandler.Often times, applications fail to perform the last item.
Q. How do I validate my instance document to a particular schema?
If you are using DTDs, the only standard way of controlling the DTD that
is used to validate a document is to insert or replace the document type
declaration within the XML document itself. An example of some software
that will do this is
DOCTYPEChanger. A simpler method that requires a minimum of code is to
use an EntityResolver. In general, you can use an
EntityResolver to override any external entity in your XML
document. However, an EntityResolver cannot override the local subset of a
DTD, only the external subset.
If you are using another schema language like W3C XML Schema, you can use the JAXP 1.2 API to programmatically set the schema used to validate your instance document. See this question for more information on this topic.
Q. How do I use a different JAXP compatible implementation?
The JAXP 1.1 API allows applications to plug in different JAXP
compatible implementations of parsers or XSLT processors. For example,
when an application wants to create a new JAXP
DocumentBuilderFactory instance, it calls the staic method
DocumentBuilderFactory.newInstance(). This causes a search
for the name of a concrete subclass of DocumentBuilderFactory
using the following order:
javax.xml.parsers.DocumentBuilderFactory if it exists
and is accessible.$JAVA_HOME/jre/lib/jaxp.properties if it exists.META-INF/services/javax.xml.parsers.DocumentBuilderFactory
containing the name of the concrete class to instantiate.Of the above ways to specify an implementation, perhaps the most useful is the jar service provider mechanism. To use this mechanism, place the implementation jar file on your classpath. For example, to use Xerces 1.4.4 instead of the version of Crimson which is bundled with JDK 1.4 (Java Development Kit version 1.4), place xerces.jar in your classpath. This mechanism also works with older versions of the JDK which do not bundle JAXP. If you are using JDK 1.4 and above, see this question for potential problems.
Q. Why are there Apache classes in the J2SE 1.4 RI?
The J2SE 1.4 RI is the first version of the JDK that bundles in an implementation of JAXP 1.1. This allows developers to write applications without having to provide a parser and XSLT processor with their application. However, in some cases, it may create additional problems.
The Sun J2SE 1.4 RI uses Apache software for its implemenation of JAXP 1.1 with package names unchanged from Apache software distributions. This can cause problems, for example, if your application wants to use a newer version of Apache software. Under the Java 2 class loader delegation model, the java launcher's ClassLoader will load the bundled version of a class (in rt.jar) before any other version. Thus, if you place a newer version of xalan.jar in the extensions directory or on your CLASSPATH, then that version will be ignored since the runtime will use the older bundled version instead. As a workaround, see the question on overriding the implementation in JDK 1.4.
The future plan is to rename the org.apache.** packages to be something like com.sun.org.apache.** to fix this problem. In addition, other package-dependent parts of the software may also need to be modified. However, this may not be done until after JDK 1.4.1.
Q. How do I override the JAXP implementation in JDK 1.4 and above?
In JDK 1.4, there is an Endorsed Standards Override Mechanism which can be used to override the classes in the JDK itself. One way to replace the classes in the JDK with the classes contained in a set of jar files is to place the jar files in some directory, "my-endorsed", and define a system property. For example, to use a newer version of Xalan, place the newer version of xalan.jar in the "my-endorsed" directory and invoke the Java launcher with the -Djava.endorsed.dirs= my-endorsed option. Another way is to place a jar file in the $JAVA_HOME/lib/endorsed directory of the JDK installation itself. You may need to create the endorsed directory if it does not yet exist.
Q. How do I use W3C XML Schema with JAXP?
Use the JAXP 1.2 API to validate instance documents with W3C XML Schema. The JAXP 1.2.0 RI contains two sample programs (DOMEcho and SAXLocalNameCount) that illustrate how to do this. See the JAXP 1.2 specification for more details.
Q. Where do I get the latest version of the JAXP RI?
Newer versions of the JAXP RI are being released through the following Sun software releases:
To approximate an unbundled version, you can also download the major components individually from Apache. See the question on source code for more information.
Q. Where can I find JAXP RI docs online?
Since at the time of this writing Sun does not provide one, I will try to maintain a browsable online version of the JAXP RI docs. The JAXP RI docs also contain a link to the JAXP API javadoc.
Q. Where do I get the source code to the JAXP RI?
The JAXP RI is based on open sourced code. Although Sun no longer provides a free source distribution (there might be a way to pay for one), you can obtain the same source code from the Apache CVS repositories. For example, the JAXP RI 1.2.0 release consists of:
Note: there may be some differences with the actual JAXP RI. For example, the JAXP RI comes with documentation which is not yet available at Apache.
Q. Why all the jar files?
Starting with JAXP RI 1.2.0, the jar packaging scheme has changed.
There are now a total of six jar files. The reasons are to support the
J2SE 1.4 Endorsed
Standards Rules and to be compatible with the current Apache packaging
scheme. The Endorsed Standards Rules essentially state that only
endorsed APIs and their implementations can be replaced with newer versions
of those APIs. SAX 2 and DOM Level 2 Core are endorsed APIs, however,
javax.xml.{parser, transform} classes are not. See the link
above for the precise rules. The decision to split the jar files was a
compromise because six jar files also makes life diffucult for users.
Q. (Obsolete) What happened to jaxp.jar?
To summarize, starting with JAXP RI 1.1.3, there is no jaxp.jar. This fact should have been emphasized in the JAXP RI 1.1.3 documentation. The motivation for this decision was to match the Apache packaging scheme at the time of release, as well as to simplify life for developers.
In JAXP RI 1.1.3, which includes the Apache Crimson 1.1.3 parser, both API and implementation classes are contained in the same jar files: crimson.jar for the parser and xalan.jar for the XSLT processor. The JAXP RI packaging scheme matches the Apache packaging scheme at the time of release, however, the Apache scheme has since changed. See the Apache site for details.