Converting DTD Grammars

Table of Contents


Provided Stylesheets

Using XSLT, a DTDx document can easily be transformed to another grammar format. A DTD grammar can be flattened or converted to another format like XML Schema or Relax NG. To make this easier, NekoDTD includes the following stylesheets in the package:

Performing a Transformation

In order to convert a DTD file to another format, use the following steps:

  1. Parse the DTD file with NekoDTD and serialize the XML representation to a file.
  2. Process output of first step with stylesheet of choice using an XSLT processor, such as Xalan.

Convenience Batch Files

To eliviate the burden of performing these steps, the NekoDTD package includes a number of useful batch files to run these steps on Windows. [Only one shell script is available at this time. But it would be very easy to port the .bat files to .sh.] The following batch files are included:

These batch files assume that you have downloaded Xerces2 and Xalan and placed the appropriate Jar files in the lib/ directory. Note: NekoDTD does not provide these Jar files.

Converting Sample DTD to Other Grammar Types

The data/dtd/ directory contains a sample DTD grammar called test.dtd; an XML document called test.xml that references the DTD for validation in the DOCTYPE line; and an XML document called test-schema.xml that references the generated XML Schema grammar called test.xsd via the xsi:noNamespaceSchemaLocation attribute. (There is no need to include a separate document file for Relax NG because Relax NG does not currently define a way of associating a grammar within the document instance.) For convenience, a copy of the DTD grammar converted to XML Schema and Relax NG is also provided and are called test.xsd and test.rng, respectively.

Sample DTD

The sample DTD grammar looks like this:

<!ELEMENT root (foo|(bar,baz)+)*>
<!ATTLIST root version CDATA #FIXED '1.0'>
<!ELEMENT foo EMPTY>
<!ELEMENT bar (#PCDATA)>
<!ELEMENT baz (#PCDATA|mumble)*>
<!ELEMENT mumble ANY>

Converting DTD to XML Schema

Running the dtd2xsd batch file as shown with the sample DTD grammar:

> dtd2xsd data/dtd/test.dtd

produces the following equivalent XML Schema grammar:

<?xml version="1.0" encoding="UTF-8" ?> 
<!-- Generated from data/dtd/test.dtd -->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <!-- <!ELEMENT root (foo|(bar,baz)+)*> -->
 <xsd:element name="root">
  <xsd:complexType>
   <xsd:choice minOccurs="0" maxOccurs="unbounded">
    <xsd:element ref="foo" />
    <xsd:sequence minOccurs="1" maxOccurs="unbounded">
     <xsd:element ref="bar" />
     <xsd:element ref="baz" />
    </xsd:sequence>
   </xsd:choice>
   <!-- <!ATTLIST root version CDATA #FIXED "1.0"> -->
   <xsd:attribute name="version" fixed="1.0">
    <xsd:simpleType>
     <xsd:restriction base="xsd:string" />
    </xsd:simpleType>
   </xsd:attribute>
  </xsd:complexType>
 </xsd:element>
 ...
</xsd:schema>

This can be a very convenient way to convert existing DTDs in order to transition to using XML Schema.

Converting DTD to Relax NG

Running the dtd2rng batch file as shown with the sample DTD grammar:

> dtd2rng data/dtd/test.dtd

produces the following equivalent Relax NG grammar:

<?xml version="1.0" encoding="UTF-8" ?> 
<!-- Generated from data/dtd/test.dtd -->
<grammar xmlns="http://relaxng.org/ns/structure/1.0" 
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
 <start>
  <choice>
   <element name="root">
    <ref name="T_root" /> 
   </element>
   ...
  </choice>
 </start>
 <!-- <!ELEMENT root (foo|(bar,baz)+)*> -->
 <define name="T_root">
  <zeroOrMore>
   <choice>
    <element name="foo">
     <ref name="T_foo" />
    </element>
    <oneOrMore>
     <group>
      <element name="bar">
       <ref name="T_bar" />
      </element>
      <element name="baz">
       <ref name="T_baz" />
      </element>
     </group>
    </oneOrMore>
   </choice>
  </zeroOrMore>
  <!-- <!ATTLIST root version CDATA #FIXED "1.0"> -->
  <optional>
   <attribute name="version">
    <value>1.0</value>
   </attribute>
  </optional>
 </define>
 ...
</grammar>

This can be a very convenient way to convert existing DTDs in order to transition to using Relax NG.

More Powerful Conversions

The un-flattened DTDx document provides enough information to analyze the DTD declarations and produce more meaningful XML Schema and Relax NG content types. However, no stylesheet or code is currently provided with NekoDTD to perform this type of processing. [If you would like to write such a stylesheet or a tool to perform this conversion from the DTDx file, please contact me.]