%
% \documentstyle[12pt,stree]{article} % if LaTeX 2.09
%
\documentclass[12pt]{article}
\usepackage{stree}
%
% WARNING: `stree' employs emTeX specials (em:lineto, etc.)
%
\begin{document}

\def\<#1>{$\langle$#1$\rangle$}
\def\UPSILON{\char'7}
\def\XyM{X\kern-.30em\smash{\raise.50ex\hbox{\UPSILON}}\kern-.30em{M}}
\def\XyMTeX{\XyM\kern-.1em\TeX}
\def\ChemTeX{Chem\kern-0.1em\TeX}
%\def\verb{\sverb}
\def\CNMR{$^{13}$C NMR}

\def\topfraction{.9}
\def\bottomfraction{.9}
\def\textfraction{.1}

\font\twelvett=cmtt12
%\def\tt{\twelvett}
\def\0{\twelvett\symbol{92}}
\def\1{\twelvett\symbol{123}}
\def\2{\twelvett\symbol{125}}
\def\4{\twelvett\symbol{36}}
\def\5{\twelvett\symbol{95}}

\pagestyle{myheadings}
\thispagestyle{empty}

\markright{macropackage for typesetting structural formulas with LaTeX}

\begin{center}
\LARGE
One more macropackage for typesetting structural formul\ae\
with \LaTeX

\bigskip
\bigskip
\normalsize Igor Strokov$^*$

\bigskip
\small Novosibirsk Institute of Organic Chemistry, Siberian Division of
Russian Academy of Sciences,\\
Lavrentiev avenue~9, Novosibirsk~90, Russia
\end{center}

\bigskip
\noindent
A new macropackage for \LaTeX\ provides a high quality, easy and uniform
typesetting of structural formulas of almost any complexity. The use of the
new package called \treeTeX\ implies a depth-in traversal of a
structure and description of bonds and vertex labels being passed. Additional
features of \treeTeX\ include input of mass spectra and simple flow charts.

\bigskip
\noindent
keywords: TeX, LaTeX, chemical structures

\insert\footins{\small\rm $^*$ Tel: (3832) 354745,
E-mail: strokov@nioch.nsc.ru}

\clearpage

\section{Introduction}

The choice of tools for scientific publishing is not very wide: most often it
is MS Word$^{\rm TM}$ or \TeX\ --- a typesetting language and system,
developed in the beginning of 80--th by Knuth (Knuth, 1984). Briefly saying,
\TeX\ is eminent because of the detailed account of aspects affecting
document appearance and, therefore, the perfect quality, practically
unreachable by modern word processors. From the other side, \TeX\ is not very
comfortable for working with graphics, though it lets to import either
various bitmaps (with sacrifice of device independence) or PostScript
images.  The last decision can lead to almost any visual effect but it
requires special soft-- and/or hardware understanding PostScript. At the same
time, \TeX\ itself is quite a powerful program language allowing to compose
simple charts with the aid of pseudographical fonts. The most popular format
of \TeX, \LaTeX\ (Lamport, 1984), can be said to establish a standard on the
pseudographics use. Although most structural formulas of organic compounds
can be fully typeset in this way, doing it by the direct use of \LaTeX\
drawing commands is too laborious. There are at least three macros designed
especially for chemical structures typesetting. Ramek's package (Ramek, 1990)
is rather compact and easy in use, but it is distinctive by a
formulas style which is rare in contemporary literature.
More complex packages of Haas--O'Kane (Haas \& O'Kane, 1987) and Fujita
(Fujita, 1994) are free from this drawback. Both of them have similar design
based on a set of different macrocommands, each for a specific chemical
fragment.  Usually a command defines some ring system, whose inner bond types
and substituents can be altered by means of command parameters. For example,
in Fujita's \XyMTeX\ 5,5-dimethylcyclohexen-2-on-1 is entered as
\verb"\cyclohexanev[b]{1D==O;5Sb==;5Sa==}", where {\tt[b]} specifies the C=C
double bond, {\tt 1D==O} --- the carbonyl group, and {\tt 5Sb==;5Sa==} ---
two methyl substituents. Here, like in other cases, a command name
correlates to a compound's systematic notation, which facilitates
understanding of a command meaning. From the other side, description of
structures on the level of characteristic fragments has some disadvantages.
First of all, such fragments are that numerous, so any significant covering
of their set is practically unreacheable. This task is not fulfilled in both
\ChemTeX\ and \XyMTeX, nevertheless the complete description of each one forms
quite a volume book. Additionally, in cases when a formula consists of many
fragments, their disposal on one figure becomes a tiresome task of
coordinates calculation for nodes to be linked.

Therefore, making a description on a lower level of distinct bonds (after
Ramek's steps) seems to be more appropriate. Bonds diversity compared to
that of characteristic fragments is drastically lower, therefore formulas
typesetting in terms of bonds promises to be more simple and universal. On
the back side there is a possible loss of input speed and convenience.
However, author's experience gave no such evidences. To support
this statement let us examine how exactly structural formulas are entered
in a new macropackage called {\em\treeTeX.}

Following the \TeX\ programmers' tradition, the new package name receives the
common logo. The other part of the notation reflects the new approach
essense: a chemical \underline{s}tructure is regarded as a bond
\underline{tree} which is
traversed in a depth-first order. The complete \treeTeX\ constitutes single
file {\tt stree.sty}, whose usage is common to other style files in \LaTeX.

%Thus, the first line in this document looks like:
%\begin{verbatim}
%\documentstyle[12pt,stree]{article}
%\end{verbatim}

\section{Basics}

All the input in \treeTeX\ is held through single command \verb"\stree{}". A
formula description containing inside brackets is composed in the following
way:  starting from any vertex
one have to traverse all the structure moving along bonds and describing both
the passing bonds and verteces labels met on the route. For example, the
formula of 1-hydroxy-4-methylpyridine
$$
\stree{{HO}20>242\Me>68N>10}
$$
is entered as \verb"\stree{{HO} 2 0 >2 4 [2{CH$_3$}] >6 8N >10}".
Starting the traversal with vertex OH one have to first input
its label: \verb"{HO}". A label is embraced in figure
brackets if it contains more than one token. The next character {\tt 2}
means: draw a bond in a direction at 2 o'clock on a 12-hours
clockface.  A traversal always has a current point (vertex). Setting a
direction of a bond causes the current point to be shifted to the next vertex
of this bond.  E.\ g.,\ after character {\tt2} a ring vertex becomes current.
Character {\tt0} refers to the next bond at 0 o'clock (or upward). Then goes
a double bond at 2 o'clock.  If a bond is not single, then one or more tokens
(so called {\em prefixes\/}) are put before its direction to specify
bond features. A double bond may be displayed in several ways. In our case it
is desirable to draw the second line to the right from the main one. This
very effect is achieved by prefix {\tt>} (the right angle bracket).

The next single bond at 4 hours leads to a substituted vertex of the ring.
Let us describe the substituent first. Construction \verb"[2{CH$_3$}]"
fulfills this goal as follows: opening bracket {\tt[} marks a current
vertex, 2-hours bond goes to vertex CH$_3$ (\verb"{CH$_3$}" is the label
description), finally closing bracket {\tt]} retrieves the marked
vertex. A methyl group is quite common although sequence
\verb"{CH$_3$}" is not very easy to input. A shorter command
\verb"\Me" may be used instead, so that \verb"2\Me" is equivalent to
\verb"[2{CH$_3$}]" (any other number designating a bond direction may take
place of digit {\tt2} here).

After setting a substituent the traversal may be continued. Three bonds
remain: at 6, 8, and 10 hours. The 6 and 10-hours bonds are double, drawn
inside a cycle, which is indicated by prefix {\tt>}. The 8-hours bond leads
to a vertex labelled ``N'', therefore {\tt N} follows immediately
after {\tt8}. Figure brackets are needless here because the label is composed
of a single token.

In this example token groups related to different
bonds are separated by spaces. Though spaces are not obligatory
(\verb"\stree{{HO}20>242\Me>68N>10}" is also correct), one may use them to
make an input more readable. However, a space is illicit before a
label or a command. In our example writing \verb"8 N" (also \verb"2 {CH$_3$}"
or \verb"2 \Me") would cause an error. In one position, however, a {\sl
binding\/} space is required: between digit {\tt1} designating a 1-hour
direction and following another digit (it prevents missing, say, two ones
with eleven).

\section{Formal description}

Now let us try to state more rigorously the basic rules just introduced. An
argument of \verb"\stree" is composed of descriptions of distinct bonds. To
describe a bond one have to specify at least its direction --- most often it
is a number of hours on a clockface. One or more prefix(es) modifying a bond
length, degree, appearance, etc.,\ may precede a direction. If a bond leads
to a labelled vertex, then the label must follow the direction. A command (in
\TeX's terms --- something starting with {\tt\0}) may take place of a label,
allowing to shorter set the same structural fragment.
Here a label or a command is regarded as a bond attribute called {\em a
suffix\/} due to its terminal position. In contrast to prefixes, a bond may
have only one suffix. Besides, suffix is never preceded by a space.  The form
\<possible prefix(es)>\<direction>\<possible suffix> may resemble a common
verbal explanation of a traversal:  {\sl such-and-such\/} bond at {\sl
that\/} o'clock leads to {\sl so-and-so\/} label (only words like {\sl
such-and-such\/} are substituted by conditional tokens).

\def\is{$\longrightarrow$ }
\def\ili{\vrule{} }

A question may arise: how to distinguish the current bond suffix from a prefix
(or a direction) of the next bond? Spaces generally affect
only a visual perception, while a computer most often ignores them obeying the
following syntax rules given below. (The Backus--Naur form, where an arrow
\is\ means ``is defined as'' and token \ili\ means ``or'', is used.) Let us
start from definition of \<structure>, i.\ e.,\ of an argument of command
\verb"\stree".

\begin{description}
\item[]
\<structure> \is \<none> \ili \<bond>\<structure> \ili\\
 \<structure>{\tt[}\<structure>{\tt]}\<structure>
\end{description}
%
It is just a formal way to say an argument to be composed from distinct
bonds descriptions and to contain coupled and possibly nested square
brackets. Then a definition of \<bond> itself follows:

\begin{description}\frenchspacing

\item[]
\<bond> \is \<possible prefix(es)>\<direction>\<possible suffix>%
\<possible space(s)>

\item[]
\<possible prefix(es)> \is
\<prefix>\<possible prefix(es)> \ili \<none>

\item[]
\<prefix> \is {\tt / \ili , \ili = \ili >
\ili < \ili \_ \ili ' \ili ` \ili \verb"~" \ili " \ili . \ili : \ili
* \ili \verb"^"}

\item[]
\<possible suffix> \is \<command> \ili \<label> \ili \<none>

\item[]
\<label> \is {\1}\<any sequence of tokens>{\2} \ili
\<any character other than digit or prefix>

\item[]
\<direction> \is
\<hours direction> \ili \<offsets direction>

\item[]
\<hours direction> \is {\tt 0 \ili
1\<{\rm delimiter}> \tt \ili 2 \ili
3 \ili 4 \ili 5 \ili 6 \ili 7
\ili 8 \ili 9 \ili 10 \ili 11 \ili 12}

\item[]
\<{\rm delimiter}> \is \<space>, if the next token is a digit,
otherwise \<none>.

\item[]
\<offsets direction> \is \<sign>\<cardinal number>%
\<sign>\<cardinal number>

\item[]
\<plus or minus> \is {\tt +} \ili {\tt -}

\end{description}
%
These rules eventually expand
syntax elements in angular brackets either to definite tokens or something
seemingly requiring no further explanation. Nothing is said, however, on
meaning of prefixes or what \<offsets direction> means. Semantics of
distinct syntax elements is systematically explored in the following three
sections.

\subsection{Prefixes}

\begin{table}[tbh]\centering
{\bf Table 1.} Prefixes.\par
\medskip
\begin{tabular}{c|c|c|c}
prefix&feature&view&category\\ \hline\hline
\tt /&skew&\stree{/2}&\\ \cline{1-3}
\tt //&very skew&\stree{//2}&direction\\ \cline{1-3}
\tt ,&at 45$^\circ$&\stree{,2}&prefixes\\ \cline{1-3}
{\tt+} {\tt-}&\<sign> in offsets directions&&\\
\hline\hline
\tt =&double centered&\stree{=3}\\ \cline{1-3}
\tt >&double right&\stree{>3}&\\ \cline{1-3}
\tt <&double left&\stree{<3}&bond\\ \cline{1-3}
\tt ==&triple &\stree{==3}&degrees\\ \cline{1-3}
\tt ">&right delocalized&\stree{">3}&\\ \cline{1-3}
\tt "<&left delocalized&\stree{"<3}&\\
\hline\hline
\tt \_&long ($5/3$ normal)&\stree{_3}&bond\\ \cline{1-3}
\tt '&short ($1/3$ normal)&\stree{'3}&length\\
\hline\hline
\verb"~"&invisible &\\ \cline{1-3}
\verb"~~"&invisible label&\\ \cline{1-3}
\tt :&dotted&\stree{:3}&visual\\ \cline{1-3}
\tt *&bold&\stree{*3}&effects\\ \cline{1-3}
\verb"^"&arrow&\stree{^3}\\ \cline{1-3}
\verb"^*"&sphenoid&\stree{^*3}\\
\hline\hline
\tt .&short invisible&\stree{C.2*}&used for marks
\end{tabular}
\end{table}

Previously we already used prefix {\tt>} to specify double
bonds. The full list of prefixes is given in Table 1. All the prefixes are
divided into four categories due to their effect
on a bond direction, type, length, or appearance.
Prefixes of different categories can be combined with each other. For
example, a result of \verb"\stree{_/:=4}" is a long skew dotted double bond
at 4 o'clock: \stree{_=:/4}\ Let us consider, however, more helpful cases of
prefixes use.

Prefix {\tt.}\ (point) is especially designed to set numbers or similar
symbols of vertices. E. g.,\ to obtain the following formula
with numbered vertices \stree{{HO}.01 2.02 4.03 2{OH}.04}\ one have to
enter \0stree\1\1HO\2 .01 2 .02 4 .03 2\1OH\2 .04\2\rm\ (all spaces are
dispensable here). In the full agreement with the form
\<prefix>\<direction>\<suffix> a construction like {\tt.01} means the
following:  put a {\sl short invisible\/} bond at {\sl 0 hours\/} and set
label ``1''.  Meantime, use of this prefix has two features: 1)
a current vertex is not changed, and 2) a direction preceded
by a point {\sl must\/} be followed by a label.  Moreover, {\sl any\/}
character at this place is necessarily regarded as a label. It allows not to
take, say, digit ``1'' in figure brackets without risk of missing it with a
1-hour direction.{\sloppy\par}

Symbols of vertices are usually smaller than other characters.
A corresponding font definition is hold in macro \verb"\numfnt"
which normally equals to \verb"\small". By changing it to, say,
\verb"\mit" one can further mark vertices with math symbols
just writing something like {\tt.0a} (evidently, a font for every label can
be specified also explicitly, e.\ g.,\ \verb".0{\bf 9}").

%{\samepage
\rpic{\stree{:1:3:5^*7**9*^11}}
Three other prefixes were used to typeset the dummy
stereoformula in this paragraph: \verb"\stree{:1 :3 :5 *^7 **9 *^11}" (a
result of a clockwise traversal starting from the left vertex).  Here the
use of colon {\tt:} to draw dotted bonds is quite transparent. It is worth to
note the same combination \verb"*^" to produce first time a widening sphenoid
bond and second time --- a narrowing one according to a previous bond
thickness.  Such a program acumen is welcome until one need to explicitly
specify a bond shape. The simplest way to get control over it is to
insert a void bond of a complementary thickness.  Thus, the isolated
narrowing bond \stree{**17 *^3}\ is input as \verb"\stree{**17 *^3}", where
17 means an ``unexistent'' direction corresponding to a void bond of a null
length.

Please note also the lower bond to be {\sl very\/} bold (input with
two {\tt*}).  Merely bold bonds (introduced with one {\tt*}) are twice as
thin, and doubly bolder than common ones. The normal thickness ({\tt 0.4 pt})
is enough for paper documents. However, such output might look too pale
on a transparent film. To obtain a more bold formula one may either use
{\tt*} before every bond or say \verb"\defwmode=1" in the beginning of a
document (\verb"\defwmode=0" returns the normal thickness).

Prefix {\tt'} (apostrophe) allows to input short ``dashes''
usually denoting free valences in fragments. For example,
\verb"\stree{'8'0 4'6 2N '0'4}" gives formula
\stree{'8'0 4'6 2N '0'4}. A short bond, like
short invisible one, do not alter a current vertex.

A pure invisibility can be achieved by prefix \verb"~"
(a tilde). Invisible bonds well conform to unconnected formulas.
E. g., reaction scheme
$$\stree{{H$_2$C} =3{CH$_2$} ~6{CH$_2$} =9{H$_2$C}}
\quad\longrightarrow\quad \stree{3690}$$ can be input as
\begin{verbatim}
\stree{{H$_2$C}=3{CH$_2$} ~6{CH$_2$}=9{H$_2$C}}
\quad\longrightarrow\quad \stree{3690}
\end{verbatim}
The invisibility is also helpful for alignment purposes. The main idea is to
poise beetling substituents with their invisible counterparts. Let the
following three isomers be placed in a row: $$ \stree{O =2 [~0] 4 [~6] 2
4{Cl}}\quad\stree{4 6\O 2 ~~0\O 4{Cl}}\quad\stree{4 ~~6\O 2 0\O 4{Cl}}$$
Since formulas base lines pass through their
geometric centers, the right decision is
\begin{verbatim}
\stree{O =2 [~0] 4 [~6] 2 4{Cl}} \quad
\stree{4 6\O 2 ~~0\O 4{Cl}} \quad
\stree{4 ~~6\O 2 0\O 4{Cl}}"
\end{verbatim}
A small
explanation is required here. First, the new command \verb"\O" is intended
for a carbonyl group.  Second, two tildes at once in clauses \verb"~~0\O" and
\verb"~~6\O" extend invisibility to both a bond and a label which ends a bond
(one tilde would leave hanging ``O'').

Thus the combined use of prefixes gives a variety of useful effects within a
limited set of tokens. Absence of digits and letters in this set
allows to easy distinguish prefixes from other syntax elements.
Only tokens {\tt+} and {\tt-} do not suit this picture: they are used to set
\<offset directions> whose meaning will be discussed in the following
section.

\subsection{Directions}

\rpic{\stree{0,23,56,89,11}}
Let us systematically consider all different ways to specify bond directions.
In addition to twelve {\em straight\/} directions (agreeing with numbers of
hours) one can derive about twice as more of them with the aid of prefixes
{\tt/} or {\tt,} (see fig.\ 1). A comma applied to one of the four numbers
({\tt2}, {\tt5}, {\tt8}, or {\tt11}) produces four {\em median\/} directions
at 45$^\circ$ to the horizon (they usually appear in 8-member cycles
like one in this paragraph).

\begin{figure}\centering
\def\znakf#1{\tt #1}
\begin{tabular}{cccc}
\stree{[0.00][1.1 1][2.22][3.33][4.44][5.55][6.66][7.77][8.88][9.99]%
[10.10{10}][11.11{1\rlap1}]}&
\stree{[,2.,2{,2}][,5.,5{,5}][,8.,8{,8}][,11.,11{,11}]}&
\stree{[:1][:2][:4][:5][:7][:8][:10][:11]%
[/1./1{/1}][/2./2{/2}][/4./4{/4}][/5./5{/5}][/7./7{/7}][/8./8{/8}]%
[/10./10{/10}][/11./11{/11}]}&
\stree{[:1][:2][:4][:5][:7][:8][:10][:11]%
[//1.//1{//1}][//2.//2{//2}][//4.//4{//4}][//5.//5{//5}][//7.//7{//7}]%
[//8.//8{//8}][//10.//10{//10}][//11.//11{//11}]}\\
straight&median&skew&very skew
\end{tabular}\par\medskip
{\bf Figure 1.} Bond directions (dotted lines show straight directions for a
comparison with skew ones).
\end{figure}

The other prefix {\tt/} (a slash) can precede all non-orthogonal
directions making them to decline toward the nearest
coordinate axis by $\simeq10^\circ$. These directions (let them be called
{\em skew\/}) well conform to 5-membered cycles. A pentagon is usually drawn
with one strictly vertical or horizontal side. If two bonds adjacent to that
side are made skew, then the figure becomes almost right:
$$
\stree{/1124/79}\quad\tabcolsep=0pt
\begin{tabular}{l}\0stree\1\\\tt/11 2 4 /7 9\2\end{tabular}
\qquad
\stree{0/257/10}\quad\tabcolsep=0pt
\begin{tabular}{l}\0stree\1\\\tt0 /2 5 7 /10\2\end{tabular}
$$

An irregular 5-membered ring
{\bondlen3.5mm\stree{0_36810}}
may also appear in formulas. Here are no skew directions, but one bond is
$2/3$ longer than others. The extra length is set by prefix \verb"_"
(an underscore),
and the whole formula is input as
\verb"{\bondlen3.5mm\stree{0 _3 6 8 10}}" (here assignment
\verb"\bondlen3.5mm" answers for the reduced size of the formula, while the
figure brackets bound the diminution).

To look decent, 7-membered rings require another eight directions which lie
even closer than skew ones to coordinate axes. These directions hold
notation of {\em very skew\/} and require one more slash. Their use is
illustrated below:
$$
\stree{1//2468//1011}\quad\tabcolsep=0pt
\begin{tabular}{l}\0stree\11 //2\\\tt4 6 8 //10 11\2\end{tabular}
\qquad
\stree{531//11108//7}\quad
\begin{tabular}{l}\0stree\15 3 1\\\tt//11 10 8 //7\2\end{tabular}
$$

Thus twelve numbers 0--11 optionally prefixed with {\tt,} or {\tt/} can
compose $12 + 4 + 8 + 8 = 32$ different directions. Although it seems to
cover a majority of formulas, in some cases a more general device is
required.  Due to it a direction is set by means of {\sl two\/} numbers
denoting horizontal and vertical bond offsets. The offset unit (called {\em
a quad\/}) equals $1/6$ of the default bond length. Each offset is a cardinal
number starting with a {\sl binding\/} sign {\tt+} or {\tt-} (the
sign allows not to miss offsets from numbers of hours) and finishing by some
non-digit.  Let us comment the following example of offset directions use:

$$
\begin{tabular}{cc}
\stree{202/4-5-6//8-2+2 02[+1+5[=01042]+1-7]/428+2-2}&
\begin{tabular}{l}
\verb"\stree{202/4-5-6//8-2+2 "\\
\verb"02[+1+5[=01042]+1-7]/428+2-2}"
\end{tabular}
\end{tabular}
$$

Having a figure of a formula is always helpful. Moreover, if it contains
bonds of nonstandard length and/or direction, one should prefer a
grid paper, one quad per a grid cell. Meanwhile preparing of such a figure
requires a bit of skill, the further description in \verb"\stree" is rather a
mechanical procedure. One should only keep in mind that bonds with the
short to long side proportion $0:6$ and $3:5$ correspond to straight,
$2:6$ --- to skew, and $1:6$ --- very skew directions and thus can be
specified via numbers of hours.

Let us start from the left methyl group
and traverse the adjacent 5-membered ring clockwise. The first three bonds on
this way are straight (at 2, 0, and 2 hours), the next one is skew at 4
hours. The next bond (5 quads left and 6 down) does not conform to any
``clock'' proportion and thus is wrote as {\tt-5-6}. For the last bond (6 left
and 1 down) we use notation {\tt//8}, though {\tt-6-1} is also acceptable.
The next object is the distorted hexagon around the just completed ring. Here
sequence {\tt-2+2 02} leads to the bridge across the two rings (the bridge
description is given in square brackets). The remained path to the first ring
is described by {\tt/428+2-2}, where pair {\tt28} corresponds to the methyl
group (a more general way {\tt[2]} fits too).

In this example all optional spaces were omitted. The single space
is required only between {\tt-2+2} and {\tt02[}. Indeed, with no space
{\tt-2+202} would mean 2 quads down and 202 right --- a very long bond! The
necessity to bind numbers is not the only specialty of
offset directions. They also can not be used
before commands which draw more than one bond (an explanation of this
feature is leaved on section ``commands'').

\subsection{Labels}

However, before coming to commands one should stay on simpler suffixes, i.\
e.,\ labels. Let us repeat the most significant rules. If a bond leads to a
labelled vertex, then this label must be wrote immediately after the bond
direction. No spaces are allowed before a label or a command\footnote{Since
neither label nor command can start with a digit, this rule could not
contradict with the necessity to separate numbers of quads or digit 1
denoting a 1-hour direction.}.  If a label consists of many characters, or
is a digit, or any character reserved for prefixes, then it must be taken
in figure brackets. If a label is just a letter (e.\ g.,\ C, N, O, H) then
brackets are needless (however it would not be an error to put them).

A vertex label requires all adjacent bonds to become shorter to avoid the
mutual superposition.  The main duty of \treeTeX\ just lies in
proper calculation of label dimensions to update coordinates of lines
which depict bonds.  Because \TeX\ has no arrays, only one current label (its
metrics) is remembered each moment. A complete traversal of a structure
may require some vertices to be met twice. Each time a vertex is met, its
label has to be specified again. E. g.,\ if formula
$$\stree{{HO}3N3\6}$$ is traversed from vertex OH to N and along the ring
back to N, then the nitrogen label should be set twice:
\verb"\stree{{HO}3N 1 3 5 7 9 11N}".  Absence of last {\tt N} would result in
crossing the first label N by the last 11-hours bond (since the fact of a
label presence here is already forgotten). Meantime, square brackets allow
almost always to avoid plural coming at a label.  In our case, for example,
the following construction will do:  \verb"\stree{{HO}3N [5] 1 3 5 7 9}".
Everything inside square brackets passes as if unnoticed:  the closing
bracket makes vertex N to be reminded so that the next 1-hour bond is drawn
correctly.

A careful reader may note many formulas (e.\ g.,\ the last one) to begin
just from a label instead of a prefix or a direction as the formal syntax
implies. Here a tiny trick works: number {\tt17} (i.\ e.,\ a void direction)
is always inserted at the very beginning of an argument of \verb"\stree"
before its processing. If an argument begins with a prefix or a direction,
then {\tt17} does nothing, otherwise {\tt17}\<label> will set this \<label>
in a current point.

Another formal syntax violation belongs to justification of long labels.
Evidently a label with more than one character allows diverse
dispositions relative a vertex center. The default rule states: if a bond goes
left (i.\ e.,\ has negative horizontal offset) then the vertex center
coincides with the center of the most right character, otherwise (a
bond is vertical or goes right) --- the left one. Although it works
perfectly with most terminal moieties, ``internal'' labels may require
additional devices.

%{\samepage
\rpic[pyrrole]{\stree{~'8/>13/>58`{NH}10}}
For example, the habitual clockwise traversal of the pyrrole structure
would imply label NH to be justified by H instead of N as it
should be. The default orientation may be altered by prefix
{\tt`} (back apostrophe):  \verb"\stree{/>1 3/>5 8`{NH} 10}". Since
{\tt`} controls a behavior of a label (on contrary with the others
answering for bonds), this prefix is the only exception allowed to stay
before a label (of course, construction \verb"`8{NH}" is also true though not
as logical as \verb"8`{NH}").
%\par}

%{\samepage
\rpic{\stree{/>13/>58``{NH}10}}
But let us return to the pyrrole. One may find the centered label (in this
paragraph) to be more appropriate. Here the vertex center coincides with
the center of the whole label, that is achieved by doubling the same
prefix: \verb"8``{NH}". However, one can get more fine effects by means of
\TeX\ boxes, kerns, and glue (the next paragraph implies a reader to
be familiar with these things).
%\par}

\rpic{\stree{/>1 3/>5 8``{\vtop{\baselineskip=0pt \hbox{N}\hbox{H}}}10}}
Indeed, both pyrrole formulas are not perfect, and one may try the vertical
form of label NH by means of the following long construction:
\0stree\1/>1 3/>5
8``\1\0vtop\1\0baselineskip=0pt \0hbox\1N\2\0hbox\1H\2\2\2
10\2\rm. Let us comment it. Two {\tt``}
before the label prevent an ``automatic'' alignment by some marginal
token --- evidently it is undesirable here. \TeX\ primitive \verb"\vtop"
makes a vertical box aligned by the topmost of contained boxes. In our case
base lines of \verb"\hbox{N}" and whole \verb"\vtop" coincide, or, in
other words, the height of \verb"\vtop" equals to the height of
\verb"\hbox{N}". As \treeTeX\ ignores a box depth while setting a label, the
vertex center will just coincide with the center of N\@. Without assignment
\verb"\baselineskip=0pt" N and H would stand from each other as far as lines
in a common paragraph. Funny, but almost the same result is achievable via a
short invisible bond at 3 quads down:
\verb"\stree{/>1 3/>5 8N [~-0-3H] 10}")
(however, in this case a gap between N and H is dependable on fonts
used for labels). {\sloppy\par}

\subsection{Commands}

\begin{table}\centering
{\bf Table 3.} The available commands.\par\bigskip
\begin{tabular}{rl|rl|rl}
\verb"\ph"&\stree{3\ph'9}&\verb"\tbu"&\stree{3\tbu}&{\tabcolsep=0pt
\begin{tabular}{r}\verb"\O"\\\verb"\CO"\end{tabular}}&\stree{3\O'11'7}\\
&&&&\\ \hline &&&&\\
\verb"\pho"&\stree{3\pho'9}&\verb"\tbx"&\stree{3\tbx}&\verb"\OH"&\stree{3\OH}\\
&&&&\\ \hline &&&&\\ {\tabcolsep=0pt
\begin{tabular}{r}\verb"\six"\\\verb"\6"\end{tabular}}&\stree{3\6'9}&
\verb"\ip"&\stree{3\ip}&\verb"\COOH"&\stree{3\COOH}\\
&&&&\\ \hline &&&&\\
&&\verb"\Me"&\stree{3\Me}&
\end{tabular}
\end{table}

A command may substitute a label in the role of a suffix. We mean a
regular \TeX\ command composed from signal character \verb"\" and either
one non-letter or a sequence of several letters. Specific commands
may be used {\sl inside\/} command \verb"\stree" to facilitate input of some
fragments (see Table 3). All the available commands do not
change a current vertex (though it is not a rule to obey).  The reason of
such behavior is evident for one-valence moieties, constituting columns 2 and
3 in Table 3.  Cyclic fragments allow more flexible use.  Terminal cycles
require an additional bond before a command in the same direction (e.\ g.,\
biphenyl is input as \verb"\stree{9\ph 3 3\ph}"). It might seem not very
comfortable unless it allowed to easy input condensed cycles and simple
heterocycles:  $$
\begin{tabular}{ccccc}
\stree{10\6 2\6}&\begin{tabular}{l}\verb"\stree{"\\
\verb"10\6 2\6}"\end{tabular}& &
\stree{N3\6 9\Me}&\begin{tabular}{l}\verb"\stree{"\\
\verb"N3\6 9\Me}"\end{tabular}
\end{tabular}
$$

A fragment is drawn in a direction which precedes a command (all figures
in table 3 correspond to 3 o'clock). Sometimes a direction affects a label
view or a whole fragment ``chirality''.
An example is command \verb"\COOH" which always places the carbonyl oxygen on
top (try to ascertain in it).
The definition of \verb"\COOH" worth to be considered fully:
%
\begin{verbatim}
\def\COOH#1{{  % argument #1 = direction in hours
  \ang=#1      % remember the initial direction
  \b\ang[C]    % draw a bond ending with label C
  \ifnum\ang>6 % if more than 6 hours then
    \rot2           % turn 2 hours right,
    {\bt2\b\ang[O]} % put the carbonyl =O,
    \rot8           % turn 8 hours right (= 4 left),
    \OH\ang         % draw -OH
  \else        % otherwise
    \rot2           % turn 2 hours right,
    \OH\ang         % draw -OH,
    \rot8           % turn 8 hours right,
    {\bt2\b\ang[O]} % draw =O
  \fi}}
\def\OH#1{{\ifnum#1>6\b{#1}[HO]\else\b{#1}[OH]\fi}}
\end{verbatim}
%
Let us explain three new commands used in this listing. The most important
one is \verb"\b" which does draw a bond in a given direction (the first
binding parameter) and set an optional label taken in square brackets. Use
of variable \verb"\ang" instead of an absolute value makes a description
dependable on an initial direction handed via the first parameter. After
assignment \verb"\ang=#1" this variable undergoes only relative increment
corresponding to a rotation on a cardinal number of hours.  Use of special
command \verb"\rot" for this purpose guarantees the resulting value to be
kept within 11 hours. Evidently, that simple arithmetic is applicable only to
directions expressed in numbers of hours. It explains why offset directions
are not allowed before commands drawing more than one bond.

Command \verb"\bt2" changes the character of a bond to double
centered (equivalent to the action of prefix {\tt=}). Other
bond attributes (visibility, thickness, etc.)\ can be altered too by means
of some control variables which however are not listed here. These
specific data, if required, can be obtained from the source file {\tt
stree.sty}.

Additional figure brackets in definitions of \verb"\COOH" and \verb"\OH"
involve the common grouping mechanism of \TeX\ to restore a current vertex
after the command action. Let us
also note a simpler command \verb"\OH" to be used inside
\verb"\COOH".  The logic of \verb"\OH" itself is limited to the choice of
either label OH or HO depending on a bond direction.

The set of commands to draw fragments may seem poor and inrepresentative. Its
further extension, however, is in question, since the description on the
level of bonds and labels is laconic enough to expect big advantages
from broad use of internal commands. The optimal case assumes a few simple
commands, easy to be used and kept in mind. It does not mean, however, that
defining of new commands is always senseless. So, authors of math papers
often draw graphs with ``shot'' vertexes, e.\ g.
$$
\def\*#1{\b{#1}[$\bullet$]}
\stree{\*5\*[7\*]3\*1\*3\*5\*7\*9\*11\*}
$$
Bearing in mind \TeX\ notation \verb"$\bullet$", one might honestly
input the graph as \verb"{$\bullet$} 5{$\bullet$} [7{$\bullet$}" \ldots\ but
there is no better way: to define command \verb"\*" for setting a bullet in
a given direction (\verb"\def\*#1{\b{#1}[$\bullet$]}") and then describe
everything much shorter as \verb"\stree{\*5\*[7\*]3\*1\*3\*5\*7\*9\*11\*}".

Indeed, suffix commands may have an additional parameter which must
immediately follow a command and be in figure brackets. Accidentally this
feature can be applied for input of flow charts composed from standard
polygons, e.\ g.,

$$
\def\(#1,#2){\sx#1\sy#2\ab30[]}
\def\com#1[#2]{\(15,0)\(0,-6)\(-3,-3)\(-27,0)\(0,6)\(3,3)\(12,0)
 \bmode-1 \dmode2 \sy-5 \ab30[#2]\bmode-1 \(0,-4)}
\stree{0\com{sugar}-0-3+20-0[+20-0-0+3~-0+9 0\com{milk}]
::-0-3 0\com{coffee}::-0-3}
$$

After defining a polygon by means of command \verb"\com"
%
\begin{verbatim}
\def\(#1,#2){\sx#1\sy#2\ab30[]} % draw a bond offset (#1,#2)
\def\com#1[#2]{\(15,0)\(0,-6)\(-3,-3)\(-27,0)\(0,6)\(3,3)
 \(12,0)\bmode-1 \dmode2 \sy-5 \ab30[#2]\bmode-1 \(0,-4)}
\end{verbatim}
%
one can input the whole scheme as kind of a chemical formula:
\begin{verbatim}
\stree{0\com{sugar}-0-3+20-0
[+20-0-0+3~-0+9 0\com{milk}]
::-0-3 0\com{coffee}::-0-3}
\end{verbatim}

Let us make necessary comments. Variables \verb"\sx" and \verb"\sy" contain
offsets used for a bond if its direction equals a conditional value of 30
hours.  Assignment \verb"\bmode-1" makes a bond invisible while
\verb"\dmode2" answers for a centered label (equivalent to {\tt``}).

This example completes
the systematic consideration of
all the components in a bond description.
Now a bit else remains: how to use \verb"\stree" inside other \LaTeX\
constructions, i.\ e.,\ alignment modes.

\section{Alignment}

By default, \verb"\stree" with the aid of \LaTeX\ environment \verb"picture"
produces \verb"\hbox" whose height slightly (by {\tt1ex}) exceeds its depth.
This small difference makes structures to conform with math symbols in
displayed formulas. E.~g.,\ the following equation
$$
\stree{9\6 3\OH}+\stree{9\OH 1\O
5\Me}\longrightarrow\ \stree{9\6 3O 3 1\O 5\Me}+\stree{{H$_2$O}}
$$
is input just as
\begin{verbatim}
\stree{9\63\OH}+\stree{9\OH1\O5\Me}\longrightarrow
\stree{9\63O31\O5\Me}+\stree{{H$_2$O}}
\end{verbatim}

Vertical alignment may also be specified explicitly with the aid of a token
in square brackets put immediately after \verb"\stree". In agreement with
the \LaTeX\ conventions {\tt[t]} means top alignment and {\tt[b]} --- bottom
(a centered box is obtained by default, making use of {\tt[c]}
obsolete).  Parameter {\tt[u]} (no alignment) leads to a box with null
dimensions at the place of a starting vertex. This is useful if \verb"\stree"
itself is a part of another picture whose elements are bind to vertices.
Since all vertices are situated in nodes of a one quad sized grid, knowing of
exact coordinates of one vertex allows all the others to be easy
calculated. Use of {\tt[u]} guarantees this knowledge for a starting vertex,
whereas other parameters --- do not, since a box produced by \verb"\stree" in
these cases accurately bounds everything found in a formula: labels, marks,
etc.

On the other hand, variant {\tt[u]} makes a user himself to monitor
sizes of a formula. There is still a way deprived of this drawback:
instead of \verb"\stree{" \<description> {\tt\2} one can write
\verb"\begstr \tree{" \<description> \verb"} \endstr".
Before \verb"\endstr" or after \verb"\begstr" any \LaTeX\
commands for \verb"\picture" environment are allowed as if
coordinates (0,0) belong to a starting vertex and
unit of measure \verb"\unitlength" equals
the standard bond length \verb"\bondlen".
In the following example this method is used to
mark a fragment with a dashed box:
$$\begstr
\tree{9\OH1//2[~/7{$R_1$}]4[0~2][6]/2539>760/108[-4-5/118~9{$F_1$}]//1011}
\put(-2,-2){\dashbox{0.2}(4.8,3.8){}}
\endstr
$$
This figure is input as
\begin{verbatim}
\begstr
\tree{9\OH1//2[~/7{$R_1$}]4[0~2][6]/2539>760/108
 [-4-5/118~9{$F_1$}]//1011}
\put(-2,-2){\dashbox{0.2}(4.8,3.8){}}
\endstr
\end{verbatim}

\section{Additional features}

Chemical character of this work allows us to note also the
ability to input mass and \CNMR\ spectra with \treeTeX. For example, the
spectrum
$$
\massp{220-18!187-33!135-26!133-25,121-100!14,2,107-38!43-35,41-41!}
$$
is obtained in result of
\begin{verbatim}
\massp{220-18!187-33!135-26!133-25,121-100!14,2,
 107-38!43-35,41-41!}
\end{verbatim}
Here is just a sequence of numbers divided by punctuation marks:
a mass number is ended by a dash, a relative intensity (in percents) --- by
either comma or exclamation. The last makes an $m/z$ value to be
set over a peak. Several intensities following one mass number (e.\ g.,\ {\tt
121-100!14,2,}) correspond to neighbor peaks with increasing
$m/z$. A comma may be omitted if it is the last token in a \verb"\massp"
argument.

The four parameters control a mass spectrum view:
\begin{itemize}
\item
\verb"\mzlen" --- the horizontal distance corresponding to 1 $m/z$\\
(default \verb"\mzlen=0.6pt"),
\item
\verb"\imax"  --- the height of the maximal peak in units of \verb"\mzlen"\\
(default \verb"\imax=50"),
\item
\verb"\numfnt" --- the font used to set numbers on a figure\\
(default \verb"\def\numfnt{\small}"),
\item
\verb"\msdir" --- the direction of peaks.
\end{itemize}
%
The last parameter is rarely used. Default \verb"\msdir=1" means the common
direction --- upward; any other value make peaks to be oriented downward. The
last case may be useful for visual comparison of two spectra:
$$
\begms
\mass{39-11,0,21,51-5,7,15,64-5,35,7,77-10,5,91-4,74!27,104-6,
119-26,132-42,4,159-26,81!8,176-15,187-93!12,204-4,232-100!14}
\put(-1,0){\line(1,0){250}}\msdir=-1
\mass{39-16,51-7,6,15,65-11,8,77-5,92-73!24,104-9,119-15,
132-21!159-26,100!14,176-16,187-83!14,204-3,232-70!}\vpos=1
\endms
$$
This figure is input in the following way:
%
\begin{verbatim}
\begms
\mass{39-11,0,21,51-5,7,15,64-5,35,7,77-10,5,91-4,74!27,104-6,
119-26,132-42,4,159-26,81!8,176-15,187-93!12,204-4,232-100!14}
\put(-1,0){\line(1,0){250}}\msdir=-1
\mass{39-16,51-7,6,15,65-11,8,77-5,92-73!24,104-9,119-15,
132-21!159-26,100!14,176-16,187-83!14,204-3,232-70!}\vpos=1
\endms
\end{verbatim}
%
By analogy with \verb"\stree", instead of
\verb"\massp{" \<argument> \verb"}" one can write
\verb"\begms \mass{" \<argument> \verb"} \scale1{} \endms" and insert
any commands from the \LaTeX\ {\tt picture} environment
after \verb"\begms" or before \verb"\endms". In our example a common scale
drawn by \verb"\scale1{}" is substituted by the plain horizontal (its length
is the rounded maximal mass number in the spectrum), and then another
spectrum is drawn contrariwise since it is preceded by \verb"\msdir=-1".
Finally, assignment \verb"\vpos=1" switches on the bottom alignment (i.\ e.,\
the box depth equals zero). Though command \verb"\massp" itself presumes the
common alignment control with the aid of parameters {\tt[b]}, {\tt[c]}, or
{\tt[t]}, on the lower level there acts variable \verb"\vpos" (it is true
also for the decomposition of \verb"\stree" onto \verb"\begstr",
\verb"\tree", and \verb"\endstr").  Value 0 means center alignment (same as
parameter {\tt[c]}), 1 --- bottom ({\tt[b]}), 2 --- top ({\tt[t]}), and 3 ---
no alignment (valid only for structural formulas and corresponds to {\tt[u]}).

\treeTeX\ also allows to input simple figures of \CNMR\ spectra like the
following (a peak height reflects its multiplicity):
$$
\cnmrs{2000}{1912s1645s354d1041s{\llap{$^b\searrow$}}%
901s138q1576s1703s575q912d1047d{$*$}566q1696s1956s571q953s395t}
$$
The corresponding text is:
%
\begin{verbatim}
\cnmrs{2000}{1912s1645s354d1041s{\llap{$^b\searrow$}}%
901s138q1576s1703s575q912d1047d{$*$}%
566q1696s1956s571q953s395t}
\end{verbatim}
%
Command \verb"\cnmrs" has two binding parameters: first is the scale length
(in $\hbox{p.\,p.\,m.}*10$), and second --- a spectrum description which
contains for each signal its chemical shift value (multiplied by 10) and
multiplicity, where {\tt s} means singlet, {\tt d} --- doublet, {\tt t} ---
triplet, and {\tt q} --- quartet. A peak label may be set in figure brackets
after a multiplicity token (in our case construction
\verb"\llap{$^b\searrow$}" is used to shift the label left and prevent its
interference with the neighbor doublet marked by a star).

Both kinds of spectra are input in quite a similar way: as a
sequence of numbers with delimiters. However, there are small differences. So,
long command \verb"\massp" can be broken in strings at any places, though in
\verb"\cnmrs" end of line should be commented by \verb"%".
Besides, in \verb"\cnmrs" the scale (and so whole figure) length is set
implicitly, while for a mass spectrum the total length depends on a largest
$m/z$ value (to obtain a mass spectrum with a given length one could
just specify a void peak with null intensity and corresponding $m/z$).

\section{Conclusion}

An apparent question might arise during acquaintance with this work: does
somebody need to study another ``bird language'' to input chemical
formulas if a plenty of convenient programs for visual
editing exist? Essentially it is an old
dispute {\sl pro et contra\/} a command interface vs.\ graphic one.
Professional programmers know very well that a command language provides a
faster and more flexible control though it requires more time for studying.
Indeed there is no contradiction between these modes of interaction, and the
optimal case should include both abilities. Anyhow, in preparation of
scientific manuscripts words do play the main role, i.\ e.,\ a writer
basically remains in the verbal mode of thinking. If numerous
chemical formulas in a text are input by a graphic editor then a
writer will frequently switch from one program to another. Even if the
switching is fast and elegant (a case rarely met in practice), the change of
the interface mode will still require some psychological adaptation. In this
case an ability to input a formula just in text without tackling a mouse and
becoming a designer could save you a lot of time and mind energy.

\section*{References}
\frenchspacing\parindent=0pt
\parskip=2ex

Knuth D. (1984)
\sl The \TeX book.\/
\rm Addison-Wesley, Reading.

Lamport L. (1984)
\sl \LaTeX: A Document Preparation System.\/
\rm Addison-Wesley, Reading.

Ramek M. (1990)
\sl \TeX: Applications, Uses, Methods.
\rm ed. Clark M., p. 227.
Ellis Horwood, London.
%227--258

Haas R. T. \& O'Kane K. C. (1987)
%Typesetting Chemical Equations Using \LaTeX.
\sl Comput. Chem.\/,\bf\ 11,\rm\ 251
%251--271

Fujita S. (1994)
%Typesetting Structural Formulas with the Text Formatter \TeX/\LaTeX.
\sl Comput. Chem.\/,\bf\ 18,\rm\ 109
%109--116

\end{document}