Using XSLT, a DTDx document can easily be transformed to another grammar format. A DTD grammar can be flattened or converted to another format like XML Schema or Relax NG. To make this easier, NekoDTD includes the following stylesheets in the package:
Converts an XML document in DTDx format to a flattened DTD file. This stylesheet is useful for converting a DTD that is broken up into multiple entities (e.g. DocBook) into a single DTD file.
Flattens the full DTDx format to a
simpler, but equivalent, format. This stylesheet removes
all occurrences of externalSubset
,
parameterEntity
, and similar tags, making
all declarations appear as if they appeared in the
internal subset of the document. Processing the DTDx file
with this stylesheet first simplifies any further
processing performed on the DTDx document.
Converts a flattened DTDx document to an equivalent XML Schema grammar.
Note: The DTDx document must be processed with the dtdx2flat.xsl stylesheet before conversion to XML Schema using this stylesheet. If not, the XSLT stylesheet will (most likely) signal an error, reminding you to flatten the DTDx file first.
Converts a flattened DTDx document to an equivalent Relax NG grammar.
Note: The DTDx document must be processed with the dtdx2flat.xsl stylesheet before conversion to Relax NG using this stylesheet. If not, the XSLT stylesheet will (most likely) signal an error, reminding you to flatten the DTDx file first.
In order to convert a DTD file to another format, use the following steps:
To eliviate the burden of performing these steps, the NekoDTD package includes a number of useful batch files to run these steps on Windows. [Only one shell script is available at this time. But it would be very easy to port the .bat files to .sh.] The following batch files are included:
Parses the given DTD document and writes the generated DTDx document to standard out. The output can then be re-directed to a file for use with subsequent processing.
Parses the given DTDx document, performs an XML transformation using the dtdx2dtd.xsl XSLT stylesheet, and writes the generated document to standard out. The output can then be re-directed to a file for use with subsequent processing.
Parses the given DTDx document, performs an XML transformation using the dtdx2flat.xsl XSLT stylesheet, and writes the generated document to standard out. The output can then be re-directed to a file for use with subsequent processing.
Parses the given DTDx document, performs an XML transformation using the flat2xsd.xsl XSLT stylesheet, and writes the generated document to standard out. The output can then be re-directed to a file for use with subsequent processing.
Parses the given DTDx document, performs an XML transformation using the flat2rng.xsl XSLT stylesheet, and writes the generated document to standard out. The output can then be re-directed to a file for use with subsequent processing.
Parsers the given DTD document, flattens the resultant DTDx document, transforms the DTDx to XML Schema, and writes the final generated document to standard out. The output can then be re-directed to a file for use with subsequent processing.
Parsers the given DTD document, flattens the resultant DTDx document, transforms the DTDx to Relax NG, and writes the final generated document to standard out. The output can then be re-directed to a file for use with subsequent processing.
These batch files assume that you have downloaded Xerces2 and Xalan and placed the appropriate Jar files in the lib/ directory. Note: NekoDTD does not provide these Jar files.
The data/dtd/ directory contains a sample DTD grammar called test.dtd; an XML document called test.xml that references the DTD for validation in the DOCTYPE line; and an XML document called test-schema.xml that references the generated XML Schema grammar called test.xsd via the xsi:noNamespaceSchemaLocation attribute. (There is no need to include a separate document file for Relax NG because Relax NG does not currently define a way of associating a grammar within the document instance.) For convenience, a copy of the DTD grammar converted to XML Schema and Relax NG is also provided and are called test.xsd and test.rng, respectively.
The sample DTD grammar looks like this:
<!ELEMENT root (foo|(bar,baz)+)*> <!ATTLIST root version CDATA #FIXED '1.0'> <!ELEMENT foo EMPTY> <!ELEMENT bar (#PCDATA)> <!ELEMENT baz (#PCDATA|mumble)*> <!ELEMENT mumble ANY>
Running the dtd2xsd batch file as shown with the sample DTD grammar:
> dtd2xsd data/dtd/test.dtd
produces the following equivalent XML Schema grammar:
<?xml version="1.0" encoding="UTF-8" ?> <!-- Generated from data/dtd/test.dtd --> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <!-- <!ELEMENT root (foo|(bar,baz)+)*> --> <xsd:element name="root"> <xsd:complexType> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:element ref="foo" /> <xsd:sequence minOccurs="1" maxOccurs="unbounded"> <xsd:element ref="bar" /> <xsd:element ref="baz" /> </xsd:sequence> </xsd:choice> <!-- <!ATTLIST root version CDATA #FIXED "1.0"> --> <xsd:attribute name="version" fixed="1.0"> <xsd:simpleType> <xsd:restriction base="xsd:string" /> </xsd:simpleType> </xsd:attribute> </xsd:complexType> </xsd:element> ... </xsd:schema>
This can be a very convenient way to convert existing DTDs in order to transition to using XML Schema.
Running the dtd2rng batch file as shown with the sample DTD grammar:
> dtd2rng data/dtd/test.dtd
produces the following equivalent Relax NG grammar:
<?xml version="1.0" encoding="UTF-8" ?> <!-- Generated from data/dtd/test.dtd --> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <choice> <element name="root"> <ref name="T_root" /> </element> ... </choice> </start> <!-- <!ELEMENT root (foo|(bar,baz)+)*> --> <define name="T_root"> <zeroOrMore> <choice> <element name="foo"> <ref name="T_foo" /> </element> <oneOrMore> <group> <element name="bar"> <ref name="T_bar" /> </element> <element name="baz"> <ref name="T_baz" /> </element> </group> </oneOrMore> </choice> </zeroOrMore> <!-- <!ATTLIST root version CDATA #FIXED "1.0"> --> <optional> <attribute name="version"> <value>1.0</value> </attribute> </optional> </define> ... </grammar>
This can be a very convenient way to convert existing DTDs in order to transition to using Relax NG.
The un-flattened DTDx document provides enough information to analyze the DTD declarations and produce more meaningful XML Schema and Relax NG content types. However, no stylesheet or code is currently provided with NekoDTD to perform this type of processing. [If you would like to write such a stylesheet or a tool to perform this conversion from the DTDx file, please contact me.]