% \iffalse meta-comment % % Copyright (C) 1993-2024 % The LaTeX Project and any individual authors listed elsewhere % in this file. % % This file is part of the LaTeX base system. % ------------------------------------------- % % It may be distributed and/or modified under the % conditions of the LaTeX Project Public License, either version 1.3c % of this license or (at your option) any later version. % The latest version of this license is in % http://www.latex-project.org/lppl.txt % and version 1.3c or later is part of all distributions of LaTeX % version 2008 or later. % % This file has the LPPL maintenance status "maintained". % % The list of all files belonging to the LaTeX base distribution is % given in the file `manifest.txt'. See also `legal.txt' for additional % information. % % The list of derived (unpacked) files belonging to the distribution % and covered by LPPL is defined by the unpacking scripts (with % extension .ins) which are part of the distribution. % % \fi \NeedsTeXFormat{LaTeX2e}[1995/12/01] \documentclass{ltxguide}[1999/02/28] \title{Cyrillic languages support in \LaTeX} \author{\copyright~Copyright 1998--1999,\\ Vladimir Volovich, Werner Lemberg and \LaTeX\ Project Team.\\ All rights reserved.% \footnote{This file may be distributed and/or modified under the conditions of the \LaTeX{} Project Public License, either version 1.3c of this license or (at your option) any later version. See the source \texttt{cyrguide.tex} for full details.}% } \date{12 March 1999} \begin{document} \maketitle \tableofcontents \begin{abstract} This document contains basic information on the Cyrillic setup for \LaTeX{}: how to get the fonts, how to set them up, how to use the interface, its interaction with \babel{}, etc. This is only a first draft of the document and it will probably be modified in future; so please send in comments on it via the \texttt{latexbug} system (see below). \end{abstract} \section{Introduction} Most Latin-based European languages were supported in \LaTeX{} by introducing the~|T1|~font encoding and by using the \textsf{fontenc} and \textsf{inputenc} packages; these use only standard \TeX{} means to support any \mbox{8-bit} input encoding and this one standard font encoding. The restriction to a single font encoding guarantees that multiple languages can happily coexist in one document (\eg hyphenation will be correct for all languages). Starting with the December~1998 Release, \LaTeX{} finally supports Cyrillic languages. This support is based on the new standard Cyrillic \TeX{} font encodings---|T2A|, |T2B|, |T2C|, and~|X2|. The first three of these satisfy some basic requirements for \LaTeX{}~|T*|~encodings, and thus can be used in multi-lingual documents with other languages based on standard font encodings. The reason why we need four different Cyrillic font encodings is that these font encodings support \emph{all} the Cyrillic languages that have been used during the twentieth century (see Section~\ref{fontencs})! The number of Cyrillic glyphs is large, so they cannot be represented with 128~character slots; the other (lower) 128~slots are reserved for Latin letters and other invariant symbols that are needed for the encoding to be a conformant \LaTeX{}~\texttt{T}~encoding. There are some glyphs in the |T2*|~encodings which do not yet have associated characters in \emph{Unicode}, the world-wide character standard. Also, one more font encoding, |T2D|,~is planned for a forthcoming release of \LaTeX{}. A lot of Cyrillic input encodings are already supported (see Section~\ref{inputencs}), and additional encodings could be added easily. \subsection{Acknowledgments} \label{sec:acks} The work on |T2*|~encodings was carried out by the T2~Team, led by Alexander Berdnikov (other members are Mikhail Kolodin and Andrew Janishewskii). The LH~fonts were produced by Olga Lapko (with A.~Khodulev). The \textsf{T2} bundle and \textsf{ruhyphen} package were written by Werner Lemberg and Vladimir Volovich (except that the concrete hyphenation patterns which are part of \textsf{ruhyphen} came from individual authors). The support for the Ukrainian language was prepared by Andrij Shvaika. \section{Installation} The \textsf{fontenc} and \textsf{inputenc} packages are installed automatically in every base \LaTeX{} distribution. All the necessary extra files to use with these packages for Cyrillic are in the \textsf{cyrillic} bundle, which at present contains the following: four font encoding definition files (|t2aenc.def|, |t2benc.def|, |t2cenc.def|, |x2enc.def|); several input encoding definition files (all the other |*.def| files), and font definition files (|*.fd|). The installation of these is described here. \subsection{Fonts} The default font families in \LaTeX{} are the Computer Modern families, namely the CM~fonts (|OT1|~encoded) and the EC~fonts (|T1|~encoded). The LH~fonts, which are now available, provide Computer Modern fonts for all Cyrillic font encodings. They are designed to be compatible with the EC~fonts, and they provide the same font shapes and sizes; they are available at |CTAN:fonts/cyrillic/lh| (the latest version is 3.20). The installation instructions for the fonts are in the file |INSTALL| in the font distribution. Other fonts, including Type~1 fonts, can also be used, provided that their encoding (for \TeX{}) is \mbox{|T2|-compatible}. Some ready-to-use packages supporting such fonts are also available, \eg at \URL{ftp://ftp.vsu.ru/pub/tex} (they should soon be on \ctan). Currently, you will find two packages there: \textsf{PsCyr}, which contains some freely distributable Cyrillic Type~1 fonts with support for \LaTeX{}; and \textsf{c1fonts}, which contains virtual fonts similar to the \textsf{AE}~fonts package using the BlueSky and BaKoMa fonts available from \ctan{} (see the |README| file in that package for detailed information). Further font packages are expected soon. \subsection{Hyphenation patterns} You can find a collection of hyphenation patterns for the Russian language in the \textsf{ruhyphen} package at |CTAN:language/hyphenation/ruhyphen|. These patterns support the |T2*|~encodings, as well as other popular font encodings used for Russian typesetting (including the Omega internal encoding). Patterns for other Cyrillic languages should be adapted to work with the |T2*|~encodings. \subsection{\babel{} support for Russian and Ukrainian} \label{bblrus} Version~3.6k of \babel{} includes support for the |T2*|~encodings and for typesetting both Russian and Ukrainian texts using the Cyrillic letters. The temporary fontencoding |LWN|, which was used in earlier releases of \babel{}, will be withdrawn in the near future and replaced by the |OT2| encoding. \subsection{Getting pre-built packages} Many of the major \TeX{} distributions, such as te\TeX{}, fp\TeX{} and \TeX{}live, contain (or soon will) everything that is needed, including the LH~fonts, \textsf{ruhyphen} and the latest version of \babel{}. We hope that all \TeX{} distributions will soon include all of these, so that the chances are that you will not need to install this by yourself (but it is not difficult). If you are using em\TeX, Mik\TeX, or fp\TeX, you can download the \textsf{ruemtex} package from \URL{ftp://ftp.vsu.ru/pub/tex}. \section{Usage} Support for Cyrillic is based on these standard \LaTeX{} mechanisms: the \textsf{fontenc} and \textsf{inputenc} packages (and on \babel{}). Thus the basic principles for its use are similar to those for other European languages: you simply add, to your document preamble, lines like the following. \begin{verbatim} \usepackage[T2A]{fontenc} \usepackage[koi8-r]{inputenc} \end{verbatim} Here you can put any desired input encoding instead of \mbox{\texttt{koi8-r}}: for example, it would be \texttt{cp866} if you are using a MS-DOS text editor with this Cyrillic code page to prepare your documents, or \texttt{cp1251} if you are a MS~Windows user with Cyrillic support. A full list of the available Cyrillic encodings can be found in Section~\ref{inputencs} and in the file |cyinpenc.dtx|. Documents are, naturally, not restricted to a single font encoding; this is essential for multi-lingual journals or documents. Such changes can be made by using the |\fontencoding| command as part of a font-change. However, it is best to access these font encodings via a higher-level interface. Since such changes are often closely related to other language-dependent settings, it is often sensible to use the \babel{} system, which provides further useful `localisation' and standardised multi-lingual interfaces (for further details, see Section~\ref{bblrus}). Then you can use lines like the following in your document: \begin{verbatim} \usepackage[koi8-r]{inputenc} \usepackage[russian]{babel} \end{verbatim} This will automatically choose the default font encoding for Russian, which is |T2A|, if available. Documentation of the complete set of font-encoding selection rules can be found in |cyrillic.dtx| which is part of |rusbabel|. These \LaTeX{} interfaces are very convenient because they make your documents completely portable, being based solely on standard \TeX{} features. This will mean that your documents can be processed on any \TeX{} system without any need for re-encoding to the `native' encoding used on each platform; this is because the encoding of the document is specified in the document itself. Moreover, if necessary, more than one input encoding can be used within a document; this could be useful if, for example, you need to combine articles prepared by authors on different machines. Each part of the document is then identified by a |\inputencoding| command, which can therefore only be used between paragraphs. Please note that you must always use the two standard \LaTeX{} commands, |\MakeUppercase| and |\MakeLowercase| to produce uppercase or lowercase text in your documents. This is because |\uppercase| and |\lowercase| will not work at all for Cyrillic (note that these latter two commands are not, and never have been, available for use directly in \LaTeX{} documents). \section{Font encodings for Cyrillic languages} \label{fontencs} The Cyrillic font encodings support the following languages. Note that some languages can be properly typeset with more than one encoding. \begin{itemize} \raggedright \item[|T2A|:] Abaza, Avar, Agul, Adyghei, Azerbaijani, Altai, Balkar, Bashkir, Bulgarian, Buryat, Byelorussian, Gagauz, Dargin, Dungan, Ingush, Kabardino-Cherkess, Kazakh, Kalmyk, Karakalpak, Karachaevskii, Karelian, Kirghiz, Komi-Zyrian, Komi-Permyak, Kumyk, Lak, Lezghin, Macedonian, Mari-Mountain, Mari-Valley, Moldavian, Mongolian, Mordvin-Moksha, Mordvin-Erzya, Nogai, Oroch, Osetin, Russian, Rutul, Serbian, Tabasaran, Tadzhik, Tatar, Tati, Teleut, Tofalar, Tuva, Turkmen, Udmurt, Uzbek, Ukrainian, Hanty-Obskii, Hanty-Surgut, Gipsi, Chechen, Chuvash, Crimean-Tatar. \item[|T2B|:] Abaza, Avar, Agul, Adyghei, Aleut, Altai, Balkar, Byelorussian, Bulgarian, Buryat, Gagauz, Dargin, Dolgan, Dungan, Ingush, Itelmen, Kabardino-Cherkess, Kalmyk, Karakalpak, Karachaevskii, Karelian, Ketskii, Kirghiz, Komi-Zyrian, Komi-Permyak, Koryak, Kumyk, Kurdian, Lak, Lezghin, Mansi, Mari-Valley, Moldavian, Mongolian, Mordvin-Moksha, Mordvin-Erzya, Nanai, Nganasan, Negidal, Nenets, Nivh, Nogai, Oroch, Russian, Rutul, Selkup, Tabasaran, Tadzhik, Tatar, Tati, Teleut, Tofalar, Tuva, Turkmen, Udyghei, Uigur, Ulch, Khakass, Hanty-Vahovskii, Hanty-Kazymskii, Hanty-Obskii, Hanty-Surgut, Hanty-Shurysharskii, Gipsi, Chechen, Chukcha, Shor, Evenk, Even, Enets, Eskimo, Yukagir, Crimean Tatar, Yakut. \item[|T2C|:] Abkhazian, Bulgarian, Gagauz, Karelian, Komi-Zyrian, Komi-Permyak, Kumyk, Mansi, Moldavian, Mordvin-Moksha, Mordvin-Erzya, Nanai, Orok (Uilta), Negidal, Nogai, Oroch, Russian, Saam, Old-Bulgarian, Old-Russian, Tati, Teleut, Hanty-Obskii, Hanty-Surgut, Evenk, Crimean Tatar. \end{itemize} The |X2|~encoding was designed to support all the above languages. Its name does not start with |T| because, for example, it contains no Latin letters (it is purely a Cyrillic glyph container); it therefore cannot be used in mixed-script documents along with the other |T*| encodings. Please consult Section~6.4 \textit{Naming conventions} of the file |fntguide.tex| in the base \LaTeX{} distribution for details of the differences between \LaTeX{} font encodings and how they are named. There are two other \LaTeX{} Cyrillic font encodings, |OT2| and |LCY|, that are not included in the base \LaTeX{} distribution. The first is a \mbox{7-bit} encoding (hence the |O|) developed by the AMS; it is useful for typesetting relatively small fragments of text in Cyrillic, using a Latin transliteration scheme. The other, |LCY|, is an \mbox{8-bit} Cyrillic encoding which is not compatible with the requirements for \LaTeX{} |T*|~encodings (hence the |L|); thus it is not suitable for typesetting multi-lingual documents, but it can be used in Plain \TeX{}-based macro packages because it is an extension of |OT1|. These two encodings are supported by \babel{} and by \textsf{ot2cyr}. \section{Input encodings} \label{inputencs} Several Cyrillic code-pages are widely used. Currently, \LaTeX{} contains support for 20~Cyrillic input encodings (some of which are variants of each other). \begin{itemize} \item |cp855| --- the standard \mbox{MS-DOS} Cyrillic code-page. \item |cp866| --- the standard \mbox{MS-DOS} Russian code-page. Several code-pages very similar to this are also supported (the differences are all in the range 242--254). \begin{itemize} \item |cp866av| -- the `Cyrillic Alternative' code-page (an alternative variant of cp866); \item |cp866mav| -- the `Modified Alternative Variant'; \item |cp866nav| -- the `New Alternative Variant'; \item |cp866tat| -- an experimental Tatarian code-page. \end{itemize} \item |cp1251| --- the standard MS Windows Cyrillic code-page. \item \mbox{\texttt{koi8-r}} --- a standard Cyrillic code-page widely used in UNIX-like systems for Russian language support that is specified in RFC~1489. The situation with \mbox{\texttt{koi8-r}} is somewhat similar to that for |cp866|: there are several similar code-pages which coincide for all Russian letters but add some other Cyrillic letters. The following are supported: \begin{itemize} \item \mbox{\texttt{koi8-u}} -- for Ukrainian; \item \mbox{\texttt{koi8-ru}} -- this is described in a draft RFC document specifying a widely used character set for mail and news exchange in the Ukrainian internet community, as well as for presenting WWW information resources in the Ukrainian language; \item |isoir111| -- the \mbox{ISO-IR-111 ECMA} Cyrillic Code Page. \end{itemize} \item |iso88595| --- the \mbox{ISO 8859-5} Cyrillic code-page (also called \mbox{ISO-IR-144}). \item |maccyr| --- the Apple Macintosh Cyrillic code-page (also known as Microsoft cp10007) and |macukr|, the Apple Macintosh Ukrainian code-page, very similar to the Cyrillic code-page. \item The Mongolian code-pages: |ctt| |dbk| |mnk| |mos| |ncc| |mls|. These code-pages were taken from Oliver Corff's `Mon\TeX' package (available at |CTAN:language/mongolian/montex|). Since the |T2*| encodings support the Mongolian Cyrillic script, it is convenient to have support for Mongolian input encodings as well. Pointers to documentation for these code-pages will be much appreciated. \end{itemize} \section{Reporting bugs} In case you find a bug and want to report it, please follow the guidelines given in the file |bugs.txt| in the base \LaTeX{} distributions. Note that there is a category specifically for reporting any bugs that occur only when using Cyrillic fonts or support packages. \section{Miscellanea in the \textsf{T2} bundle} \label{t2m} The \textsf{T2}~bundle at |CTAN:macros/latex/contrib/supported/t2| contains some other useful files, including support for Plain \TeX{}-based macro packages, support for Bib\TeX{} and MakeIndex (see also the \textsf{xindy} program and package---highly recommended for making indices with Cyrillic), support for the \textsf{fontinst} package, mapping tables relating these Cyrillic font encodings (and input encodings) to the Unicode character names and slots (these are in the subdirectory |enc-maps|), and more! To produce documented source listings of the \textsf{T2}~package, run \LaTeX{} on the |*.dtx| and |*.fdd| files therein. When typesetting Cyrillic texts, there is a tradition of using Cyrillic letters (in some situations) within math formul\ae, in exactly the same way as most of the world uses Latin letters. By default this does not work, because symbols declared with |\DeclareTextSymbol| may not be used in math. If you need within math to `transparently' typeset glyphs declared in font encoding definition files, then you could try using the experimental \textsf{mathtext} package, which is also in the \textsf{T2}~bundle. Note that this package uses up at least one additional math alphabet per font encoding. For this and other reasons, The \LaTeX\ Project Team considers that this experimental extension to \LaTeX{}'s glyph-handling mechanisms should be used with caution; but please try it out and send us your opinions and ideas. Note that it is not included in the core of \LaTeX{} because both the coding and the interfaces are likely to change at some point in the future. Finally, here are some pointers to further information: \begin{quote} \URL{http://www.cemi.rssi.ru/cyrtug}\\ \URL{http://xtalk.price.ru/tex} \end{quote} \end{document}