% % \documentstyle[12pt,stree]{article} % if LaTeX 2.09 % \documentclass[12pt]{article} \usepackage{stree} % % WARNING: `stree' employs emTeX specials (em:lineto, etc.) % \begin{document} \def\<#1>{$\langle$#1$\rangle$} \def\UPSILON{\char'7} \def\XyM{X\kern-.30em\smash{\raise.50ex\hbox{\UPSILON}}\kern-.30em{M}} \def\XyMTeX{\XyM\kern-.1em\TeX} \def\ChemTeX{Chem\kern-0.1em\TeX} %\def\verb{\sverb} \def\CNMR{$^{13}$C NMR} \def\topfraction{.9} \def\bottomfraction{.9} \def\textfraction{.1} \font\twelvett=cmtt12 %\def\tt{\twelvett} \def\0{\twelvett\symbol{92}} \def\1{\twelvett\symbol{123}} \def\2{\twelvett\symbol{125}} \def\4{\twelvett\symbol{36}} \def\5{\twelvett\symbol{95}} \pagestyle{myheadings} \thispagestyle{empty} \markright{macropackage for typesetting structural formulas with LaTeX} \begin{center} \LARGE One more macropackage for typesetting structural formul\ae\ with \LaTeX \bigskip \bigskip \normalsize Igor Strokov$^*$ \bigskip \small Novosibirsk Institute of Organic Chemistry, Siberian Division of Russian Academy of Sciences,\\ Lavrentiev avenue~9, Novosibirsk~90, Russia \end{center} \bigskip \noindent A new macropackage for \LaTeX\ provides a high quality, easy and uniform typesetting of structural formulas of almost any complexity. The use of the new package called \treeTeX\ implies a depth-in traversal of a structure and description of bonds and vertex labels being passed. Additional features of \treeTeX\ include input of mass spectra and simple flow charts. \bigskip \noindent keywords: TeX, LaTeX, chemical structures \insert\footins{\small\rm $^*$ Tel: (3832) 354745, E-mail: strokov@nioch.nsc.ru} \clearpage \section{Introduction} The choice of tools for scientific publishing is not very wide: most often it is MS Word$^{\rm TM}$ or \TeX\ --- a typesetting language and system, developed in the beginning of 80--th by Knuth (Knuth, 1984). Briefly saying, \TeX\ is eminent because of the detailed account of aspects affecting document appearance and, therefore, the perfect quality, practically unreachable by modern word processors. From the other side, \TeX\ is not very comfortable for working with graphics, though it lets to import either various bitmaps (with sacrifice of device independence) or PostScript images. The last decision can lead to almost any visual effect but it requires special soft-- and/or hardware understanding PostScript. At the same time, \TeX\ itself is quite a powerful program language allowing to compose simple charts with the aid of pseudographical fonts. The most popular format of \TeX, \LaTeX\ (Lamport, 1984), can be said to establish a standard on the pseudographics use. Although most structural formulas of organic compounds can be fully typeset in this way, doing it by the direct use of \LaTeX\ drawing commands is too laborious. There are at least three macros designed especially for chemical structures typesetting. Ramek's package (Ramek, 1990) is rather compact and easy in use, but it is distinctive by a formulas style which is rare in contemporary literature. More complex packages of Haas--O'Kane (Haas \& O'Kane, 1987) and Fujita (Fujita, 1994) are free from this drawback. Both of them have similar design based on a set of different macrocommands, each for a specific chemical fragment. Usually a command defines some ring system, whose inner bond types and substituents can be altered by means of command parameters. For example, in Fujita's \XyMTeX\ 5,5-dimethylcyclohexen-2-on-1 is entered as \verb"\cyclohexanev[b]{1D==O;5Sb==;5Sa==}", where {\tt[b]} specifies the C=C double bond, {\tt 1D==O} --- the carbonyl group, and {\tt 5Sb==;5Sa==} --- two methyl substituents. Here, like in other cases, a command name correlates to a compound's systematic notation, which facilitates understanding of a command meaning. From the other side, description of structures on the level of characteristic fragments has some disadvantages. First of all, such fragments are that numerous, so any significant covering of their set is practically unreacheable. This task is not fulfilled in both \ChemTeX\ and \XyMTeX, nevertheless the complete description of each one forms quite a volume book. Additionally, in cases when a formula consists of many fragments, their disposal on one figure becomes a tiresome task of coordinates calculation for nodes to be linked. Therefore, making a description on a lower level of distinct bonds (after Ramek's steps) seems to be more appropriate. Bonds diversity compared to that of characteristic fragments is drastically lower, therefore formulas typesetting in terms of bonds promises to be more simple and universal. On the back side there is a possible loss of input speed and convenience. However, author's experience gave no such evidences. To support this statement let us examine how exactly structural formulas are entered in a new macropackage called {\em\treeTeX.} Following the \TeX\ programmers' tradition, the new package name receives the common logo. The other part of the notation reflects the new approach essense: a chemical \underline{s}tructure is regarded as a bond \underline{tree} which is traversed in a depth-first order. The complete \treeTeX\ constitutes single file {\tt stree.sty}, whose usage is common to other style files in \LaTeX. %Thus, the first line in this document looks like: %\begin{verbatim} %\documentstyle[12pt,stree]{article} %\end{verbatim} \section{Basics} All the input in \treeTeX\ is held through single command \verb"\stree{}". A formula description containing inside brackets is composed in the following way: starting from any vertex one have to traverse all the structure moving along bonds and describing both the passing bonds and verteces labels met on the route. For example, the formula of 1-hydroxy-4-methylpyridine $$ \stree{{HO}20>242\Me>68N>10} $$ is entered as \verb"\stree{{HO} 2 0 >2 4 [2{CH$_3$}] >6 8N >10}". Starting the traversal with vertex OH one have to first input its label: \verb"{HO}". A label is embraced in figure brackets if it contains more than one token. The next character {\tt 2} means: draw a bond in a direction at 2 o'clock on a 12-hours clockface. A traversal always has a current point (vertex). Setting a direction of a bond causes the current point to be shifted to the next vertex of this bond. E.\ g.,\ after character {\tt2} a ring vertex becomes current. Character {\tt0} refers to the next bond at 0 o'clock (or upward). Then goes a double bond at 2 o'clock. If a bond is not single, then one or more tokens (so called {\em prefixes\/}) are put before its direction to specify bond features. A double bond may be displayed in several ways. In our case it is desirable to draw the second line to the right from the main one. This very effect is achieved by prefix {\tt>} (the right angle bracket). The next single bond at 4 hours leads to a substituted vertex of the ring. Let us describe the substituent first. Construction \verb"[2{CH$_3$}]" fulfills this goal as follows: opening bracket {\tt[} marks a current vertex, 2-hours bond goes to vertex CH$_3$ (\verb"{CH$_3$}" is the label description), finally closing bracket {\tt]} retrieves the marked vertex. A methyl group is quite common although sequence \verb"{CH$_3$}" is not very easy to input. A shorter command \verb"\Me" may be used instead, so that \verb"2\Me" is equivalent to \verb"[2{CH$_3$}]" (any other number designating a bond direction may take place of digit {\tt2} here). After setting a substituent the traversal may be continued. Three bonds remain: at 6, 8, and 10 hours. The 6 and 10-hours bonds are double, drawn inside a cycle, which is indicated by prefix {\tt>}. The 8-hours bond leads to a vertex labelled ``N'', therefore {\tt N} follows immediately after {\tt8}. Figure brackets are needless here because the label is composed of a single token. In this example token groups related to different bonds are separated by spaces. Though spaces are not obligatory (\verb"\stree{{HO}20>242\Me>68N>10}" is also correct), one may use them to make an input more readable. However, a space is illicit before a label or a command. In our example writing \verb"8 N" (also \verb"2 {CH$_3$}" or \verb"2 \Me") would cause an error. In one position, however, a {\sl binding\/} space is required: between digit {\tt1} designating a 1-hour direction and following another digit (it prevents missing, say, two ones with eleven). \section{Formal description} Now let us try to state more rigorously the basic rules just introduced. An argument of \verb"\stree" is composed of descriptions of distinct bonds. To describe a bond one have to specify at least its direction --- most often it is a number of hours on a clockface. One or more prefix(es) modifying a bond length, degree, appearance, etc.,\ may precede a direction. If a bond leads to a labelled vertex, then the label must follow the direction. A command (in \TeX's terms --- something starting with {\tt\0}) may take place of a label, allowing to shorter set the same structural fragment. Here a label or a command is regarded as a bond attribute called {\em a suffix\/} due to its terminal position. In contrast to prefixes, a bond may have only one suffix. Besides, suffix is never preceded by a space. The form \\\ may resemble a common verbal explanation of a traversal: {\sl such-and-such\/} bond at {\sl that\/} o'clock leads to {\sl so-and-so\/} label (only words like {\sl such-and-such\/} are substituted by conditional tokens). \def\is{$\longrightarrow$ } \def\ili{\vrule{} } A question may arise: how to distinguish the current bond suffix from a prefix (or a direction) of the next bond? Spaces generally affect only a visual perception, while a computer most often ignores them obeying the following syntax rules given below. (The Backus--Naur form, where an arrow \is\ means ``is defined as'' and token \ili\ means ``or'', is used.) Let us start from definition of \, i.\ e.,\ of an argument of command \verb"\stree". \begin{description} \item[] \ \is \ \ili \\ \ili\\ \{\tt[}\{\tt]}\ \end{description} % It is just a formal way to say an argument to be composed from distinct bonds descriptions and to contain coupled and possibly nested square brackets. Then a definition of \ itself follows: \begin{description}\frenchspacing \item[] \ \is \\\% \ \item[] \ \is \\ \ili \ \item[] \ \is {\tt / \ili , \ili = \ili > \ili < \ili \_ \ili ' \ili ` \ili \verb"~" \ili " \ili . \ili : \ili * \ili \verb"^"} \item[] \ \is \ \ili \