\documentclass{article} %\usepackage[english]{mem} \addtolength{\topmargin}{-3pc} \addtolength{\textwidth}{6pc} \addtolength{\oddsidemargin}{-2pc} \addtolength{\textheight}{7pc} \def\lc{\texttt{\string<}} \def\rc{\texttt{\string>}} \DeclareRobustCommand{\cs}[1]{\texttt{\char`\\#1}} \newenvironment{cmd}% {\par\addvspace{4.5ex plus 1ex}% \vskip-\parskip\small \renewcommand{\arraystretch}{1.2}% \noindent\hspace{-\leftmargini}% \begin{tabular}{|l|}\hline\ignorespaces} {\\\hline\end{tabular}\nobreak\par\nobreak \vspace{2.3ex}\vskip-\parskip} \newenvironment{sample}{\small\quote}{\endquote} \newcommand{\note}[1]{\textit{Note: #1}} \catcode`>=11 \catcode`<=\active \def<#1>{\textit{#1}} \catcode`|=\active \gdef|{\verb|\def<{|\syntx}} \gdef\syntx#1>{\textit{#1}|} \raggedright \parindent1em \nofiles \begin{document} \title{Mem\\ A \emph{M}ultilingual \emph{e}nviron\emph{m}ent for Lamedh/Lambda} \author{Javier Bezos-L\'opez} \maketitle \note{This is a draft. Don't expect all things explained here will work, but many of them work!} \section{Introduction} The package \textsf{Mem} provides a language selection system for Aleph/Omega taking advantage of the features of \LaTeXe. It provides some utilities which make writing a language style quite easy and straightforward and its aims are mainly: \begin{itemize} \item to provide a set of high level macros for users and developpers of language styles, which "hide" the involved primitives and make them easier to use. \item to coodinate different languages so that Aleph/Omega will become a true multilingual environment. \end{itemize} Some of the features implemented by \textsf{Mem} are: \begin{itemize} \item You can switch between languages freely. You have not to take care of neither head lines nor toc and bib entries---the right language is always used.\footnote{Index entries follow a special syntax and the current version of \textsf{mem} cannot handle that. For this reason the index entries remain untouched and you may use language commands only to some extent.} You can even get just a few commands from a language, not all. \item Dialects---small language variants---are supported. For instance American is a variant of English with another date format. Hyphenations patterns are attached to dialects and different rules for US and British English can be used. \item Customization is quite easy---just redefine a command of a language with |\renewcommand| when the language is in force. The new definition will be remembered, even if you switch back and forth between languages. This way, Mem essentially suggest while the final typographical decisions, which are a matter of style, are left to you.\note{And this idea must be further extended.} \item An unique layout can be used through the document, even with lots of languages. If you use a class with a certain layout you don't want to modify, you or the class can tell \textsf{mem} not to touch it at all. \item Mem understands Unicode composite character, at least to some extent. A file using decomposed characters (eg, a\textasciicircum{} instead of \^{a}) will be typeset as expected.\footnote{Actually, \texttt{\string\^} is redefined to create a composite character.} \item Integrated tools to type trasliterated text, so that yo can say \textit{dobryj den'} and it will be transliterated to the Cyrillic alphabet. That will allow to enter the text with the Unicode characters used in transliterations, too (eg, Shar\d{h} Ibn `Aq\={\i}l `al\'{a} Alf\={\i}yat Ibn M\={a}lik). \end{itemize} \section{Quick start} Once installed, you can use mem. To write a, say, German document simply state the |german| option in the |\documentclass| and load the package with |\usepackage[charset=isolat1]{mem}| (if you are using the latin1 encoding; if the document uses the Unicode encoding this setting is not necessary). That's all. If you are happy with that you need not go further; but if you are interested in advanced features (how to insert a Spanish date, for instance) just continue. \section{User Interface} \subsection{Components} A language has a series of commands and variables (counters and lengths) stored in several components. These components are: \begin{description} \item[names] Translation of commands for titles, etc., following the international \LaTeX{} conventions. \item[layout] Commands and variables for the general layout of the document (|enumerate|, |itemize|, etc.) \item[date] Logically, commands concerning dates. \item[tools] Supplementary command definitions. \end{description} These components belong to two categories: names and layout are considered \emph{document} components, since they are intended for formatting the document; date and tools are considered \emph{text} components, since you will want to use them temporarily for a short text (or may be not so short) in some language. There is a further component which is hidden to users: processes, which contains macros used by traslation processes dealing with text typographical conventions and transformations. \subsection{Selecting a Language} You must load languages with |\usepackage|: \begin{sample} |\usepackage[english,spanish]{mem}| \end{sample} Global options (those of |\documentclass|) will be recognized as usual. You can cancel this automatic selection with the package option |loadonly|: \begin{sample} |\usepackage[german,spanish,loadonly]{mem}| \end{sample} \begin{cmd} |\languageset*[,]{}| \end{cmd} Selects all components from , except if there is the optional argument. is a list of components \emph{not} to be selected, in the form |no,no,...|. For instance, if you dislike using tools: \begin{sample} |\languageset*[notools]{french}| \end{sample} \note{perhaps this syntax based on should be changed} This command also sets defaults to be used by the unstarred version (see below). Properties change the behaviour of some features of the language. \begin{cmd} |\begin{} ... \end{}|\\ |\begin{languageset}[,]{} ... \end{languageset}| \end{cmd} Where is a list of s and/or |no|s. The simplest way to use a language locally is the environment above. The second one provides a more flexible approach. There are three posibilities: \begin{itemize} \item When a component is cited in the form (e.g., |date| or |names|), this component is selected for the current language, i.e., that of the current |\languageset|. \item When a component is cited in the form |no| (e.g., |nodate| or |nonames|), this component is cancelled. \item When a component is not cited, then the default, i.e., that set by the |\languageset*| in force, is used; it can be a |no|-form, and then the component will be disabled if necessary. If there is no a previous starred selection, the |no|-form will be presumed in all of components. \end{itemize} If no optional argument is given, then |text,tools,date| are presumed. Thus, at most two languages are selected at the same time; this point should be stressed, because it means that you cannot use at the same time features belonging to three or more languages. While at first glance this behaviour may seem very limiting... \note{to be filled} \note{to repeat, perhaps the way components are selected should be changed, but I have not found a simpler syntax.} Here is an example: \begin{sample} |\languageset*[notools]{spanish}|\\ Now we have Spanish |names|, |date|, and |layout|,\\ but no |tools| at all.\\ |\languageset[names,nodate]{french}|\\ Now we have Spanish |layout|, French |names|,\\ but no |tools|.\\ |\languageset[tools]{german}|\\ Now we have Spanish |names|, |date|, |layout|,\\ and German |tools|.\\ \end{sample} Selection is always local. There is also the possibility to use |languageset| as command and |\languageset*| as environment. \begin{sample} |\begin{languageset}*[]{} \end{languageset}|\\ |\languageset[]{}| \end{sample} It very unlikely that you will use the starred version at all. For the sake of clarity, spaces are ignored in the optional argument, so that you can write \begin{sample} |\begin{languageset}[date, tools, no text]{spanish}| \end{sample} \begin{cmd} |\text{}|\\ |\languagetext[]{}{}| \end{cmd} A short text in another language. The behaviour of some features could be different. Use this command inside paragraphs (i. e., in horizontal mode) and |languageset| between paragraphs (i. e., in vertical mode---\verb+\languagetext+, unlike \verb|\languageset|, does not change the shapes of paragraphs. \begin{cmd} |\languageunset|\\ |\languagereset| \end{cmd} You can switch all of components off with |\languageunset|. Thus, you return to the original \LaTeX{} as far as \textsf{mem} is concerned.\footnote{Well, not exactly. But you should not notice it at all.} You can return to the status before the last |\languageunset| with |\languagereset|. A typical file will look like this \begin{sample} |\documentclass[german]{...}|\\ |\usepackage[spanish]{mem}|\\ |\newenvironment{spanish}%|\\ | {\begin{quote}\begin{languageset}{spanish}}%|\\ | {\end{languageset}\end{quote}}|\\ |\begin{document}|\\ |Deutscher text|\\ |\begin{spanish}|\\ | Texto en espa'nol|\\ |\end{spanish}|\\ |Deutscher text|\\ |\end{document}|\\ \end{sample} Note that |\languageset*{german}| is not necessary, since it is selected by the package. (Because there is no |loadonly| package option.) Note also that the enviroments can be redefined in terms of |languageset| (but |languageset| must not be modified). \subsection{Package options.} \begin{itemize} \item |charset=| Sets the input encoding for the whole document. %\item |texinput=| Sets the behaviour of %some \LaTeX{} commands (accents, symbols, etc.) \end{itemize} \subsection{Properties} Properties change the behaviour of some language features. Some properties are available in all languages, while some others are specific to certain languages. The first group includes (all take a value): \begin{itemize} \item \texttt{charset} The input encoding if different from the encoding of the rest of the document. \item \texttt{rmfamily}, \texttt{sffamily}, \texttt{ttfamily} Set the font family for the language. If not given, Mem uses the script values, or else the current font. \item \textit{encoding} Overrides the default font encoding list in the cfg file. It can be a single value or a list enclosed in braces. \item \texttt{hyphenation} Overrides the default hyphenation (currently patterns should be loaded when generating a format). \note{To be implemented} \item \texttt{script} Writing system. \note{To be implemented} \end{itemize} \begin{cmd} |\languageproperties{}{}| \end{cmd} The properties set in the preamble with this command will be added automatically when the is used. \subsection{Tools} \begin{cmd} |\languagename| \end{cmd} The name of the current language. You must not redefine this command in a document. \begin{cmd} |\languagelist| \end{cmd} Provides a list of the requested languages. \begin{cmd} |\allowhyphens| \end{cmd} Allows further hyphenation in words with the primitive |\accent|. \begin{cmd} |\nofiles| \end{cmd} Not a new command really, but it has been reimplemented to optimize some internal macros related with file writing. \begin{cmd} |\languageensure| \end{cmd} The \textsf{mem} package modifies internally some \LaTeX{} commands in order to do its best for making sure the current language is used in a head/foot line, even if the page is shipped out when another language is in force. Take for instance \begin{sample} (With |\languageset*{spanish}|)\\ |\section{Sobre la confecci'on de t'itulos}| \end{sample} In this case, if the page is broken inside, say, a German text, an implicit |\languageensure| restores |spanish| and hence the accents. Yet, some non-standard classes or packages can modify the marks. Most of times (but not always) |\languageensure| solves the problem: \begin{sample} (With |\languageset*{spanish}|)\\ |\section{\languageensure Sobre la confecci'on de t'itulos}| \end{sample} In typeset, writing and other modes it's ignored. \begin{cmd} |\unichar{}|\\ |\unitext{}|\\ |\utftext{}| \end{cmd} Conversion tools. The first one is a character with Unicode position . The argument of the second and third macros is text to be preserved as two byte Unicode encoding, or transcoded from utf-8, respectively. Ligatures are preserved with \verb|\unichar|, \verb|\unitext| and \verb|\utfstring|, but not with \verb|\utftext|. \note{See yatest.tex. The names must be cleaned up.} \subsection{Scripts} Writing systems are automatically handled by mem, so you should not be concerned too much with that. But there are some interesting points to be noted: \begin{itemize} \item If the document uses a single encoding, no matter it is Unicode or, say, ISO 8859-6 (Latin/Arabic), it will do its best in order to switch automatically the language when the script changes. The order which the languages are loaded in |\usepackage| is important, because the last language loaded using a certain script will be the default language for that script and used when an untagged change of script comes across.\note{How that should be implemented? Is it possible?} \item Fonts can be attached to a certain script. Mem stores the current font attributes and reselects the font (automatically selecting the font encoding) when the language changes, if desired. For example, \begin{verbatim} \scriptproperties{El}{rmfamily = grtimes} \end{verbatim} \note{Still under study and development} \end{itemize} \section{Developper Commands} Some command names could seem inconsistent with that of the user commands. In particular, when you refer to a language in a document you are referring in fact to a dialect, which belogs to a language. As user, you cannot access a language and you instead access a dialect named like the language. From now on, when we say ``current language'' we mean ``current language or dialect.'' For more details on dialects, see below. \subsection{General} \begin{cmd} |\DeclareLanguage{}| \end{cmd} The first command in the |.ld| must be this one. Files don't always have the same name as the language, so this command makes things work. \begin{cmd} |\DeclareLanguageCommand{}{}[][]{}|\\ |\DeclareLanguageCommand*{}{}[][]{}| \end{cmd} Stores a command in a component of the current language. The starred version makes sure that the utf-8 encoding is used, thus overriding the document encoding. The definition will be activated when the component is selected, and the old definition, if any, will be stored for later recovery if the component is switched off. There is a point to note (which applies also to the next commands). Suppose the following code: \begin{sample} At |spanish.ld|:\\ |\DeclareLanguageCommand*{\partname}{names}{Parte}|\\ | |\\ At document:\\ |\languageset*{spanish}|\\ |\renewcommand{\partname}{Libro}|\\ |\languageset*{english}|\\ |\partname| \end{sample} Obviously, at this point |\partname| is `Part'. But if you follow with \begin{sample} |\languageset*{spanish}|\\ |\partname| \end{sample} Surprise! \emph{Your} value of |\partname|, i.e., `Libro', is recovered. So you can customize easily these macros in your document, even if you switch back and forth between languages. \begin{cmd} |\SetLanguageVariable{}{}{}| \end{cmd} Here stands for the internal name of a counter or a lenght as defined by |\newconter| (|\c@...|) or |\newlenght|. The variable must be already defined. When the component is selected, the new value will be assigned to the variable, and the old one will be stored. \begin{cmd} |\SetLanguageCode{}{}{}{}| \end{cmd} Similar to |\SetLanguageVariable| but for codes. For instance: \begin{sample} |\SetLanguageCode{\sfcode}{text}{`.}{1000}| \end{sample} % \verb|\mem@frenchspacing| does that for you. Languages with |\frenchspacing| should set the |\sfcodes| with this command, so that a change with |\nonfrenchspacing| is recovered after a switch. \begin{cmd} |\UpdateSpecial{}| \end{cmd} Updates |\dospecials| and |\@sanitize|. First removes from both lists; then adds it if it has categorie other than `other' or `letter'. With this method we avoid duplicated entries, as well as removing a character being usually special (for instance |~|). \subsection{components} \begin{cmd} |\DeclareLanguageComponent{}|\\ |\DeclareLanguageComponent*{}| \end{cmd} Adds a new component. With the starred version, the component will be considered a |text| component, and hence included in the default of |\languageset|. Component names cannot begin with |no| because of the |no|-form convention given above. \subsection{Translation processes} In the context of Mem, OCP's/OTP's become \textit{processes}. However, a single conceptual process can be splitted into several OCP files because it requires more than one step. There are several levels of processes, each of them perform some specific task. The order which processes are applied and their names are determined by the following commands. \begin{cmd} |\DeclareLanguageProcess{}{}| \end{cmd} Declares a slot where ocp's could be added. You won't use this command very often, except if the four basic components---namely, charset, input, text, and font---don't fit your needs. \note{A more "abstract" syntax could replace by a which is added after (numerically).} \begin{cmd} |\AddLanguageProcess{}{}| \end{cmd} Adds the stated ocp's to the given slot for the current language. \begin{cmd} |\UseLanguageProcess| \end{cmd} Activates the ocp's corresponding to the current language, including those declared with \verb|\DeclareLanguageProcess| but excluding the generic processes described below. \begin{cmd} |\DeclareMemProcess{}{}| \end{cmd} Translation processes not attached to languages, but used as generic tools. There are five processes predefined in Mem, some of them in the kernel and some others in script definition files: \begin{description} \item[charset] (200) Converts the input text to Unicode (language). \item[unicode] (400) Apply Unicode transformations if necessary, such as normalization (language).\note{Which one to be used is still under study---composed or decomposed?} \item[transcript] (600) Transliterate from one script to another, for example with \texttt{charset=isolat1, input=latin} for Cyrillic (language). \note{Is this the right order?} . \item[input] (800) Input conventions like \TeX{} pseudo-ligatures such as \verb#---# (language). \item[case] (1000) Case changes and similar transformations within a script (like Japanese katakana/hiragana) (Mem). \item[text] (1000) Language dependant processes to follow typographical conventions (letter variants in Greek, spacing with puntuation marks in French, contextual forms in Arabic, etc. Language) \item[font] (1200) Transcoding to the target font and faking missing characters (like accented letters. Mem). \end{description} \begin{cmd} |\AddMemProcess{}{}| \end{cmd} Defines the stated ocp and creates a process with level and the same name as the ocp file. \begin{cmd} |\AddMemProcess{}[]{}| \end{cmd} Defines the stated ocp and creates a process with level and the name . To be used if the process consists in several ocp files. \begin{cmd} |\UseMemProcess{}| \end{cmd} Activates the translation process corresponding to the ($=$ ). \subsection{Scripts} All languages has at least an attached writing system which is written in; information for scripts is generic to languages using it (for instance, what |\guillemetright| or |\'| means). Each script has an associated file with extension |.sd| and named with the two letter codes from the ISO 15924 standard (lowercased, in the arguments you should use the mixed case of the standard). \begin{cmd} |\SetLanguageScript{}| \end{cmd} Every language file should contain a command like that. It loads the macros corresponding to the language (diacriticals, punctuation, etc.) and performs some additional task. The following macros can be used in the |.sd| files. \begin{cmd} |\DeclareScript{}| \end{cmd} Set up. \begin{cmd} |DeclareScriptCommand{}{}| \end{cmd} Declares a macro whose definition which will be in force with a certain script. |\DeclareScrinpCommand| is essentialy a disguise for |\DeclareTextCommand| because the internal handling of script macros is essentially the same than \LaTeXe{} font encodings. For example, |la.sd| (latin) contains \begin{verbatim} \DeclareScriptCommand{\~}[1]{#1\unichar{"0303}} \end{verbatim} while |el.sd| (greek) contains \begin{verbatim} \DeclareScriptCommand{\~}[1]{#1\unichar{"0342}} \end{verbatim} \begin{cmd} |\SwitchScript{}| \end{cmd} You should not use this macro. It is inserted automatically by Aleph/Omega when it thinks that a change of script is necessary. \begin{cmd} |\UseMemAccent{}{}{}| \end{cmd} Places the accent in over . The parameter is just a remainder of the accent (|\'|, |\"|, etc.) This command is intended to be used in otp files, and only to give support to fonts not compliant with mem (namely, 8t, 8r, 7t, etc. fonts). \subsection{Dates} \begin{cmd} |\DeclareDateFunction{}{}|\\ |\DeclareDateFunctionDefault{}{}|\\ |\DeclareDateCommand{}{}| \end{cmd} By means of |\DeclareDateCommand| you can define commands like |\today|. The good news is that a special syntax is allowed in with date functions called with \lc\rc. Here \lc\rc{} stands for the definition given in |\DeclareDateFunction| for the current language. If no such function for the language is given then the definition of |\DeclareDateFunctionDefault| is used. See |english.ld| for a very illustrative example. (The \lc{} and \rc{} are actual lesser and greater signs.) Predeclared functions (with |\DeclareDateFunctionDefault|) are: \begin{itemize} \item \lc|d|\rc{} one or two digits day: 1, 2, \dots, 30, 31. \item \lc|dd|\rc{} two digits day: 01, 02, \dots \item \lc|m|\rc{} one or two digits month. \item \lc|mm|\rc{} two digits month. \item \lc|yy|\rc{} two digits year: 96, 97, 98, \dots \item \lc|yyyy|\rc{} four digits year: 1996, 1997, 1998, \dots \end{itemize} Functions which are not predeclared, and hence should be declared by the |.ld| file, are: \begin{itemize} \item \lc|www|\rc{} short weekday: mon., tue., wes., \dots \item \lc|wwww|\rc{} weekday in full: Monday, Tuesday, \dots \item \lc|mmm|\rc{} short month: jan., feb., mar., \dots \item \lc|mmmm|\rc{} month in full: January, February, \dots \end{itemize} The counter |\weekday| (also |\value{weekday}|) gives a number between 1 and 7 for Sunday, Monday, etc. For instance: \begin{sample} |\DeclareDateFunction{wwww}{\ifcase\weekday\or Sunday\or Monday\or|\\ | Tuesday\or Wesneday\or Thursday\or Friday\or Saturday\fi}|\\ |\DeclareDateCommand{\weektoday}{|\lc|wwww|\rc |, |\lc|mmmm|\rc| |\lc|dd|\rc| |\lc|yyyy|\rc|}| \end{sample} \subsection{Dialects} As stated above, |\languageset| access dialects rather than languages. |\DeclareLanguage| declares both a language and a dialect with the same name, and selects the actual language. \begin{cmd} |\DeclareDialect{}|\\ |\SetDialect{}|\\ |\SetLanguage{}| \end{cmd} |\DeclareDialect| declares a dialect, which incorporates all declarations for the current actual language. With |\SetDialect| you set the dialect so that new declarations will belong only to that dialect. |\DeclareDialect| just declares but does not set it. A dialect with the same name as the language is always implicit. You can handle this dialect exactly as any other dialect. In other words, after setting the dialect, new declarations belong only to it. If you want to return to the actual language, so that new declarations will be shared by all dialects, use |\SetLanguage|. Note that commands and variables declared for a language are set by |\languageset| before those of dialects, no matter the order you declared it. For example: \begin{sample} |\DeclareLanguage{english}|\\ |\DeclareDialect{american}|\\ Declarations\\ |\SetDialect{english}|\\ |\DeclareDateCommmand{\today}{...}|\\ |\SetDialect{american}|\\ |\DeclareDateCommand{\today}{...}|\\ |\SetLanguage{english}|\\ More declarations shared by both |english| and |american| \end{sample} \subsection{Interaction with Classes} \begin{cmd} |\mem@no|\\ (i.e. |\mem@nonames|, |\mem@nolayout|, |\mem@notext|, etc.) \end{cmd} Initially, these commands are not defined, but if they are, the corresponding components are not loaded. This mechanism is intended for classes designed for a certain publication and with a very concrete layout which we don't want to be changed. You simply write in the class file \begin{sample} |\newcommand{\mem@nolayout}{}| \end{sample} Note this procedure does not ever load the component---if you select it nothing happens. \section{Configuration} The way languages and scripts are referred to inside mem is highly customizable. You can refer a language using the name in its own language (and even its local script), the english name or even the name in your own language. For that to be accomplished a set of configuration files are provided. \note{Currently, only that with English names are provided.} \note{To be implemented. However, I think that this feature leads to unmanageable configurations; I think that it should be restricted to English and local names} \begin{cmd} |mem.cfg| \end{cmd} The languages in English. Every line must contain three fields: \begin{description} \item[name] The name to be used in the document; the name of the language/dialect as used in the |.ld| file.\note{A solution to that could be to introduce the possibility to define ``synonymous'' in the document} \item[file] The file name, which uses the three-letter codes from ISO-639 and the extension |.ld|. \item[patterns] The hyphenation patterns to be used with the language/dialect. \end{description} \section{Customization} ``Well, but I dislike |spanish.ld|.'' You can customize easily a language once loaded, with new commands or by redefininig the existing ones. \begin{itemize} \item If want to redefine a language command, simply select the component of this language which defines it (as with |\languageset*|) and then redefine it with |\renewcommand|. \item If you want to define a new command for a language, first make sure no language is selected (for instance, with |\languageunset|). Then |\SetLanguage| and use the declaration commands provided by the package and described above. \end{itemize} A further step is creating a new file, by copying it, modifying the commands and, of course, renaming the file! Or with a file with extension |.ld| as: \begin{sample} |\ProvidesFile{...ld}|\\ |\input{...ld} | the file you want customize\\ |\languageset*{...}|\\ Commands to be redefined\\ |\languageunset|\\ |\SetLanguage{...}|\\ Commands to be created \end{sample} You can modify languages by means of a package. The |spguill| is an example. \section{Errors} \begin{cmd} |Unknown component| \end{cmd} You are trying to assign a command to an inexistent component. Perhaps you have misspelled it. \begin{cmd} |Missing language file| \end{cmd} Probable causes: \begin{itemize} \item Wrong configuration. \item The corresponding |.ld| file is missing or misplaced. \end{itemize} \begin{cmd} |Unknown language| \end{cmd} You forgot requesting it. Note that dialects stand apart from languages; i.e., you have no access to |austrian| just requesting |german|. \begin{cmd} |Invalid option/property skipped| \end{cmd} In the starred version of |\languageset| you must use the |no|-forms only. \begin{cmd} |Bug found ()| \end{cmd} You will only find this error when using the developpers commands---never with the user ones---or if there is a bug in description files written by others. In the latter case, contact with the author. The meaning of the parameter is \begin{enumerate} \item \emph{Unknown component in declaration.} The component hasn't been declared. Perhaps you misspelled it. \item \emph{Invalid component name.} Component names cannot begin with |no| to avoid mistakes when disabling a component. \item \emph{Declaration clash.} You are trying to redeclare a command or to set a new value to a variable for this language. If you want redefine it, select the language and simply use |\renewcommand| (or |\set..|). If you intend to define a new command for the language, sorry, you must change its name. \item \emph{Invalid language/dialect setting.} Generated by |\SetLanguage| or |\SetDialect| when the argument is not a declared language/dialect. \end{enumerate} Now we describe how mem generates \TeX{} and \LaTeX{} errors because of intrinsic syntax problems. \begin{cmd} |TeX capacity exceeded, sorry [save_size=]| \end{cmd} You are using to many ungrouped languages. Fix: use the environment version of |languageset|, the environments, or alternatively use |\languageunset| before a new |\languageset|. \section{Miscelaneous} This section is devoted to some miscelaneous topics which will be put in the right context once this documentation is more complete. \subsection{Mathematics} Aleph/Omega extends the possibilities of math fonts by enlarging the range of possible glyphs to 65.535. However, currently OCP's are not applied in math mode ... [to be filled] \textsf{Mem} redefines \verb|\DeclareMathSymbol| to accept large values (ie, to use \verb|\omathcode| and \verb|\omathchardef|). \subsection{Accents} \textsf{Mem} understands both composed and decomposed diacritical marks (at least in simplest cases) and can normalize them so that they are properly displayed---this way you can write \verb+\'{\c{\"{a}}}+ and the accents are rearranged and stacked. Unicode allows two ways to represent accented letters: either composed (ie, \textit{\.g} is a single character whose code is U+0121) or decomposed (ie, \textit{\.g} is two caracters, \textit{g} followed by \.{} with code U+0307). In order to get an internal representation as close as Unicode as posible, the accent commands are redefined as in the next example: \begin{verbatim} \DeclareScriptCommand\`[1]{#1\unichar{"0300}} \end{verbatim} so that the text can be handled uniformly. After normalizing to the composite form, \textsf{Mem} leaves to the font encoding process decide if the char exists in the font or if it should be decomposed and then faked with the help of \verb+\accent+ and related commands. \subsection{Encodings} Currently, the following encodings are supported: \begin{itemize} \item T1/TS1. Almost complete, but still with some parts missing. \item OT1/TS1. Incomplete. \item ULA. Omega Unicode-like for Latin. \item UEL. Omega Unicode-like for Greek. \item UCY. Omega Unicode-like for Cyrillic. \item UAR. Omega Arabic. It uses cuni2oar, which apparently mixes contextual analysis and font enconding---to be investigated \note{The previous encodings are uncomplete and very problematic. Together, they are like UT1, for Latin/Greek/Cyrillic with names because of two reasons: UT1/omlgc is a sort of ``modified Unicode'' and therefore not Unicode at all, ut1cmr exists but pointing to a OT1 enconded font, and ot1omglc points to a UT1 encoded font (!). Further, should a single enconding contain the full Unicode set?} \item T2A/TS1. Incomplete. \end{itemize} \note{Currently, font processes point directly to the target glyph. Another possibility is to load the LaTeX enconding then then use the LICR name, which in turn has the glyph code (which is, in fact, the procedure described below=. Pros of the latter: better control from within TeX; cons: font encodings has to be preloaded. I think it would be a nice thing if people has not to be concerned with font encondings (and if possible, we should minimize input encondings, perhaps giving a platform/language default so that [linux, czech] will be enough?} \subsection{Private User Area} \textsf{Mem} uses the first page of the Public User Area for special purposes in the following way: \begin{itemize} \item \verb|\uE000-\uE00F| are reserved for characters having always catocodes in the range 0-16. Actually, some of them does not make sense (eg, 11 and 12). \item \verb|\uE020-\uE0FF| duplicates the ASCII range but making sure they are not special characters. \end{itemize} \subsection{The original idea} % This will be done is such a way that you may still use % non-Lambda styles, because you may switch off the internal % modifications (provided, of course, the non-Lambda styles % can switch off their modifications). So, you will be able % to say \languageunset, then swith to other language, and % switch back to Lambda with, say, \languageset{spanish}. % Small pieces of text are inserted with the help of % \languagetext which is currently essentially the same as % \languageset except that in a future it could handle writing % direction in a somewhat different fashion. This section is devoted to a few ideas which I put forward in the \LaTeX3 list, which was followed by a very long discussion about a multilingual model (or more exactly, multiscript) for \LaTeX. These ideas lead to introduce the concept of LICR (\LaTeX{} internal character representacion). Actually, LaTeX has for a long time had a rigorous concept of a LaTeX internal representation but it was only at this stage that it got publicly named as such and its importance realised.\footnote{Chris Rowley, ``Re(2): [Omega] Three threads'', e-mail to the Omega list, 2002/11/04. I've discovered recently an article by Robin Fairbains advancing some of the ideas in Mem (``Omega -- why bother with Unicode'', \textit{TUGboat} 16/3, 1995) such as the clear separation in the functions of ocp's, which has been applied, for example, to \textsf{devnag} after presenting \textsf{Mem} (by then named \textsf{Lambda}) in Tokyo 2001.} The reader can find more on LICR in the second edition of \emph{The \LaTeX Companion}, by Frank Mittelbach and others (section 7.11.2). Let's explain how TeX handle non ascii characters. \TeX{} can read Unicode files, as \textsf{xmltex} demostrates, but non ascii chars cannot be represented internaly by \TeX{} this way. Instead, it uses macros which are generated by \textsf{inputenc}, and which are expanded in turn into a true character or a \TeX{} macro by fontenc: \begin{center} \'e --- inputenc $\to$ \verb|\'{e}| --- fontenc $\to$ \verb|^^e9| \end{center} That's true even for cyrillyc, arabic, etc., characters! Omega can represent internally non ascii chars and therefore actual chars are used instead of macros (with a few exceptions). Trivial as it can seem, this difference is in fact a \textit{huge} difference. For example, the path followed by \'e will be: \begin{center} \begin{tabular}{rcccl} \'e --- an encoding ocp && && T1 font ocp $\to$ \verb|^^e9|\\ & $\searrow$ && $\nearrow$\\ && U+00E9 \\ & $\nearrow$ && $\searrow$\\ \verb|\'e| --- fontenc (!) && && OT1 font ocp $\to$ \verb|\OT1\'{e}| \end{tabular} \end{center} It's interesting to note that fontenc is used as a sort of input method! For that to be accomplished with ocp's we must note that we can divide them into two groups: those generating Unicode from an arbitrary input, and those rendering the resulting Unicode using suitable (or maybe just available) fonts. The Unicode text may be so analyzed and transformed by external ocp's at the right place. \textsf{Mem} further divides these two groups into four (to repeat, these proposals are liable to change): \begin{enumerate} \item[1a)] charset: converts the source text to Unicode. \item[1)] input: set input conventions. Keyboards has a limited number of keys, and hands a limited number of fingers. The goal of this group is to provide an easy way to enter Unicode chars using the most basic keys of keyboards (which means ascii chars in latin ones). Examples could be: \begin{itemize} \item \verb|---| $\to$ em-dash (a well known \TeX{} input convention). \item ij $\to$ U+0133 (in Dutch). \item no $\to$ U+306E [the corresponding hiragana char] \end{itemize} \end{enumerate} Now we have the Unicode (with TeX tags) memory representacion which has to be rendered: \begin{enumerate} \item[2a)] writing: contextual analysis, ligatures, spaced punctuation marks, and so on. \item[2b)] font: conversion from Unicode to the local font encoding or the appropiate \TeX{} macros (if the character is not available in the font). \end{enumerate} Since before step 2 we have a Unicode representation, we can process the text with external tools compatible with Unicode (using \verb|\externalocp|; an interface to this feature must be added in the near future). This would be useful for, say, Thai word boundaries. This scheme fits well in the Unicode Design Principles, which state that that Unicode deals with memory representation and not with text rendering or fonts (with is left to ``appropiate standars''). There are some additional processes to "shape" changes (case, script variants, etc.). % \SetLanguageProcess{input}{texinput,ndlinput} % % Two files for input conventions: % % - texinput provides ---, --, etc. % % - ndlinput provides ij => U+0133 \subsection{MTP files} \textsf{Mem} has a new kind of OTP file named MTP. It extends the OCP syntax, by mean of a preprocessor written in Python, to provide the following features: \begin{enumerate} \item Special characters mapped to the Private User Area in order to have characters with the right catcode in verbatim. \textsf{Mem} sets these catcodes accordingly. \item Characters inserted with their Unicode name enclosed with brackets, like \verb|[COMBINING CARON]| which is lot more readable than \verb|@"030C|. \end{enumerate} This little tool, whose code is somewhat simple-minded, will be extended to allow UTF-8 characters. Binaries for Windows will be created, too, but a conversion to C would be nice, I think; very often, Unix and Linux have built-in Python interpreters. \subsection{Verbatim} Verbatim text with OCP's is a nuissance, because unlike macros replacements does not save the catcodes of characters. [---To be filled---]. \note{verbatim makes Aleph/Omega to enter sometimes in a infinite loop, but until now I have not discovered why. Unfortunately, OTP's even recatcode letters, so that something like \texttt{\string\string\^{}} does not work as expected---\^{} is recatcoded to `math superscript'!} \subsection{Math} \textsf{Mem} redefines \verb|\DeclareMathSymbol| to accept large values (ie, to use \verb|\omathcode| and \verb|\omathchardef|). \subsection{Extensions to Unicode} The Latin script has a rich typographical history, which not always can be reduced to the dual system character/glyph. As Jaques Andr\'e has pointed out, ``Glyphs or not, characters or not, types belong to a class that is not recognized as such.''\footnote{``The Cassetin Project,'' \textit{Proceedings of the Fourteenth Euro\TeX{} Conference,} Brest, 2003.} Being a typesetting system, neither Aleph nor Mem can ignore this reality, and therefore we will take into account projects like the Medieval Unicode Font Initiative (MUFI) or the Cassetin Project. However, it doesn't mean Unicode mechanism will be rejected when available. For example, ligatures can be created with the \textsc{zero width joiner}. If there is a certain method to carry out a certain task in Unicode, it will be emulated, like for example glyph variant selectors \note{Really??}. % Glyphs not available in Unicode will be mapped following its % recommendations and so, for example, dotless j will be placed in slot % U+E55C. \section{Final remarks} \note{There are some areas which I have not stydied in depth, particularly wrinting directions. I have some ideas, but they must worked out.} \end{document}