QuarkXPress Tags

To convert Dr. Dobb's Journal articles from QuarkXPress format to HTML, we've created a custom process using XPress Tags, a feature of QuarkXPress. XPress Tags allow the preservation of character and paragraph attributes (like font size and color, line spacing, and rules above or below text) along with the text of XPress documents in ASCII text files.

A QuarkXPress document comprises text and picture boxes placed on document pages. XPress Tags are designed to describe the content of text boxes. An AppleScript selects certain boxes in our documents, and exports the box contents to tag files. A Perl script then collects these individual tag files and parses them for the content we want.

XPress Tags are markup, similar in concept to HTML and XML— this makes it relatively easy to translate them to HTML. For example, a <B> tag turns on bold character formatting, <I> activates italic formatting, and so on. Groups of character and paragraph attributes are collected together into named paragraph styles. The following is an example of a paragraph-style definition for a style named "FooBar":

(0,0,0,0,0,36,g,"U.S. English")*t(0,0,"2 "):

Many of the attributes defined here are the defaults for this new paragraph style in XPress. For example, *ra0 indicates no rule (line) above the paragraph. *ra1 would create a rule above. *L indicates that the paragraph should be left-aligned. The *t(0,0,"2 ") section defines various attributes for tab stops. The FooBar style weÕve defined specifies the typeface (f"Tekton") at a size of 14 points (z14). There are far too many attributes to describe here; Appendix C of the XPress Reference Manual documents the full set of tags.

The actual text in a paragraph follows this format:

@FooBar:<$>Some <U>text<$>.

Everything between the @ sign and the colon represents the name of the style. This style applies to the whole paragraph, and any other tags in the text represent departures from that paragraphÕs stated style. The <$> tag resets the text attributes to the stated paragraph style. This paragraph would consist of the words "Some text" in 14-point Tekton, with the word "text" underlined.

The paragraph styles in our XPress documents contain a great deal of information about the paragraph's place in the structure of the article (we use style names like "Headline," "Byline," "Bulleted List," and so on). Processing this information with Perl lets us make automatic decisions about the structure of the output HTML files.

--Kevin Carlson, managing editor of digital media for Dr. Dobb's Journal