Processor List

Table of Contents


Core

These processors are core to the Pipeline utility processor that is bundled with NekoStyle. Because the Pipeline program itself is written as a processor, it can be used within other programs that are written to the NekoStyle architecture. Most of the time, though, the Pipeline processor will be run either directly or as an Ant task.

org.cyberneko.style.core.Pipeline

Provides basic pipeline infrastructure. This class can be run as a standalone program.

org.cyberneko.style.core.DocumentStore

Stores a document in the pipeline cache.

Parameters:
DocumentStore.id
The identifier to use to store the document.This parameter is required

org.cyberneko.style.core.DocumentLoad

Retrieves a document from the pipeline cache.

Parameters:
DocumentLoad.idref
The identifier of the document to retrieve.This parameter is required

Parsers

An assortment of parsers that can generate an XML data structure as output. The traditional parser would be any comformant XML parser implementation. However, any component that can generate a data structure that "looks like" XML can be considered a parser within the NekoStyle architecture.

org.cyberneko.style.parsers.XMLParser

Parses XML documents.

Requires:XML Parser (JAXP)
Parameters:
XMLParser.href
The URI of the XML file to parse. If the URI is specified as "stdin:", the parser will read from standard input.This parameter is required
XMLParser.validation
Specifies whether the parser should perform validation. The default value of this parameter is "false".
XMLParser.namespaces
Specifies whether the parser processes namespaces. The default value of this parameter is "true".
XMLParser.exitOnError
Specifies whether the parser should exit on errors instead of only on fatal errors. The default value of this parameter is "false".

org.cyberneko.style.parsers.HTMLParser

Parses HTML documents.

Requires:NekoHTML 0.3.3 (or higher)
Parameters:
HTMLParser.href
The URL of the HTML file to parse. If the URI is specified as "stdin:", the parser will read from standard input.This parameter is required
HTMLParser.encoding
Specifies the default encoding of parsed files. For example, you may want Japanese web pages to be parsed with Shift_JIS by default. This does not override the encoding explicitly set within the META[@http-equiv="Content-Type"]/@content node.

Transformers

Transformation processors are the most powerful and useful components used in any pipeline. Using a transformer, input documents can be converted to other formats. In addition, the output of a transformation can be used within a <inline> pipeline to dynamically add further processing stages to the pipeline. For example, the output of the DirectoryList processor can be transformed into additional processing stages that clean the HTML files found within a certain subdirectory.

org.cyberneko.style.processors.XSLTProcessor

Transforms documents using XSLT. The document output from the previous stage in the pipeline is transformed using the stylesheet specified in the XSLTProcessor.style parameter.

Requires:XSLT Processor (JAXP/TrAX)
Parameters:
XSLTProcessor.style
The identifier of the stylesheet document. If not specified, the identity transform is performed.

Printers

After a document has been loaded and transformed, the next step is to serialize, or "print", the result back to a file representation. The NekoStyle package comes with an assortment of common printers for this purpose.

org.cyberneko.style.printers.TextPrinter

Prints the text within a document ignoring all elements and other non-text nodes.

Parameters:
TextPrinter.href
The name of the output file. If not specified, writes to standard out.
TextPrinter.encoding
The output encoding. Default is "UTF-8".
TextPrinter.mkdirs
Creates necessary directories for output file. Default is "true".

org.cyberneko.style.printers.XMLPrinter

Prints an XML document. This class is derived from the XSLTPrinter and shares the same requirements.

Parameters:
XMLPrinter.href
The name of the output file. If not specified, writes to standard out.
XMLPrinter.encoding
The output encoding. Default is "UTF-8".
XMLPrinter.mkdirs
Creates necessary directories for output file. Default is "true".

org.cyberneko.style.printers.HTMLPrinter

Prints an HTML document. This class is derived from the XSLTPrinter and shares the same requirements.

Parameters:
HTMLPrinter.href
The name of the output file. If not specified, writes to standard out.
HTMLPrinter.encoding
The output encoding. Default is "UTF-8".
HTMLPrinter.mkdirs
Creates necessary directories for output file. Default is "true".

org.cyberneko.style.printers.XSLTPrinter

Prints documents by performing the identity transform with XSLT and setting the output keys. The XSLT printer can output XML, HTML, and text documents by setting the XSLTPrinter.method parameter appropriately.

Requires:XSLT Processor (JAXP/TrAX)
Parameters:
XSLTPrinter.href
The name of the output file. If not specified, writes to standard out.
XSLTPrinter.method
The output method. Allowed values are "xml", "html", and "text".
XSLTPrinter.encoding
The output encoding. Default is "UTF-8".
XSLTPrinter.mkdirs
Creates necessary directories for output file. Default is "true".

Other

Typical processors used in the NekoStyle architecture are parsers, transformers, and printers. However, any component that can be written to accept an XML document as input and output an XML document can be written as a NekoStyle processor.

org.cyberneko.style.processors.Echo

Prints text messages to standard output or standard error. This component can be used anywhere in the pipeline because it does not modify the input document.

Parameters:
Echo.message
The message to print.
Echo.stream
The output stream, specified as either "stdout" or "stderr". Default is "stdout".
Echo.newline
Append newline to message when printing. This value can be specified as either "true" or "false". Default is "true".

org.cyberneko.style.processors.DirectoryList

Generates an XML document of a directory listing. Transforming the output of this processor with an XSLT stylesheet provides an easy way to dynamically generate additional processing pipelines based on the contents of a directory. For example, you could generate a list of all HTML files in order to remove extraneous formatting tags (e.g. <FONT>, <SPAN>, etc.).

Parameters:
DirectoryList.dir
The directory to list.This parameter is required
DirectoryList.recursive
Set to true to recurse the specified directory. Default is "false".