% $Id: tex4ht-mathjax.tex 1598 2024-11-20 11:40:35Z michal_h21 $ % compile: latex tex4ht-mathjax % % Copyright 2018-2024 TeX Users Group % Released under LPPL 1.3c+. % See tex4ht-cpright.tex for license text. \ifx \HTML\UnDef \def\HTML{mathjax-latex-4ht} \def\CONFIG{\jobname} \def\MAKETITLE{\author{Michal Hoftich}} \def\next{\input mktex4ht.4ht \endinput} \expandafter\next \fi \input{common-code} \input{common} \input{tex4ht-cpright} \<<< % mathjax-latex-4ht.4ht (|version), generated from |jobname.tex % Copyright 2018-2024 TeX Users Group | | | \AtBeginDocument{% | | | } \endinput >>> We don't require additional packages anymore, as everything necessary is included in LaTeX kernel now. % \RequirePackage{etoolbox,expl3,environ} \<<< >>> The \verb|\alteqtoks| command prints the used LaTeX math code to the output document in verbatim. We use \verb|\detokenize| to output verbatim LaTeX code to the HTML document. The side effect of this is that detokenize inserts space after each control sequence. This is completely valid TeX code, but earlier versions of MathJax didn't like that, rendering resulted in error. Fortunatelly, MathJax 3 supports these spaces, so we could remove regular expressions for space handling. The original code was this: % % convert \ { to \:{ % \regex_replace_all:nnN { \x{5C} \x{20} \x{7B} } { \x{5C} \x{3A} \x{7B} } \l_tmpa_tl % % delete spaces before left brackets % \regex_replace_all:nnN { \x{20} \x{7B} } { \x{7B} } \l_tmpa_tl % % replace \\:{ back to \\ { -- this can be introduced by the previous regex % \regex_replace_all:nnN { \x{5C} \x{5C} \x{3A} \x{7B} } { \x{5C} \x{5C} \x{20} \x{7B} } \l_tmpa_tl In July 2024, when I researched another issue, I've found that it is actually possible to avoid these extra spaces using LaTeX 3 commands. The inspiration comes from \Link[https://tex.stackexchange.com/a/44444/2891]this Bruno Le Foch\EndLink. One ongoing issue is that newlines are not presented. But they weren't preserved with \verb|\detokenize| either, so it shouldn't be a problem. In September 2024 I found that the LaTeX 3 method of preserving spaces didn't work correctly for code like this: \verb|$\int A f(x)\;dx$| -- the space between \verb|\int| and \verb|A| was removed, so it ended like command \verb|\intA|. So we are moving back to \verb|\detokenize|. This is the code that was promising, but didn't work in the end: \begin{verbatim} % save tokens, but preserve spaces % https://tex.stackexchange.com/a/44444/2891 \tl_set:Nn \l_tmpa_tl {#1} \regex_replace_all:nnN { . } { \c{string} \0 } \l_tmpa_tl \tl_set:Nx \l_tmpa_tl { \l_tmpa_tl } \end{verbatim} We still use regular expressions to escape invalid XML characters to entities, so it works only with LaTeX. But regular expressions may be slow if there is a huge number of math, and they may fail if contents of any math environment is larger than 64 KB. So we provide a "fastmathjax" option, that doesn't do any post-processing of the math contents. It should result in 60-70\% speed-up. The downside is that if math contains characters that are invalid in the XML mode, such as ampersand, less-than and greater-than, make4ht XML DOM filters will fail. Fortunatelly, make4ht will then switch to the HTML parser mode, so DOM filters will be still executed. \<<< \ExplSyntaxOn \:CheckOption{fastmathjax}\if:Option | \else | \fi \ExplSyntaxOff >>> Output verbatim math to HTML without any post-processing. We still need to set the token list to prevent some issues. For example, if the processed math contain paragraphs, we would get an error because \''\HCode' doesn't like that. \<<< \DeclareRobustCommand\alteqtoks[1]{% \tl_set:Ne \l_tmpa_tl {\detokenize{#1}} \HCode{\l_tmpa_tl} } >>> Escape illegal XML characters. \<<< \cs_new_protected:Npn \alteqtoks #1 { \tl_set:Ne \l_tmpa_tl {\detokenize{#1}} % % replace < > and & with xml entities \regex_replace_all:nnN { \x{26} } { & } \l_tmpa_tl \regex_replace_all:nnN { \x{3C} } { < } \l_tmpa_tl \regex_replace_all:nnN { \x{3E} } { > } \l_tmpa_tl % replace \par command with blank lines \regex_replace_all:nnN { \x{5C}par\b } {\x{A}\x{A}} \l_tmpa_tl \tl_set:Ne \l_tmpb_tl{ \l_tmpa_tl } \HCode{\l_tmpb_tl} } >>> Provide configuration for MathJax \<<< \NewConfigure{MathJaxConfig}{1} >>> The MathJaxMacros configuration includes a file with TeX macro definitions, and include them at the beginning of each HTML page. MathJax then interprets them, so they are available in math. This is an alternative to JavaScript method using MathJaxConfig. The file cannot contain any other commands than newcommand, all characters that could cause issues in the HTML parsing must be escaped, so \verb|<| should became \verb|<|, for example. \<<< \NewConfigure{MathJaxMacros}[1]{% \Configure{@BODY}{\bgroup\NoFonts\ttfamily\detokenize{\(}% \special{t4ht*<#1}% \detokenize{\)}\EndNoFonts\egroup}% } >>> The following commands are used for patching of the standard math commands and environments. They will then keep the LaTeX code unchanged. \<<< \long\def\AltlMath#1\){\expandafter\alteqtoks{\(#1\)}\)} \long\def\AltlDisplay#1\]{\alteqtoks{\[#1\]}\]} \long\def\AltMathOne#1${\alteqtoks{\(#1\)}$} % this seems a bit hacky -- we need to skip some code inserted at the % beginning of each display math \long\def\AltlDisplayDollars#1$${\alteqtoks{\[#1\]}$$} \NewConfigure{MathjaxEnv}{2} \newcommand\VerbMathToks[2]{% \a:MathjaxEnv% \HCode{\string\begin{#2}}% \alteqtoks{#1}% \HCode{\string\end{#2}}% \b:MathjaxEnv% } >>> In pictures, we want to use the original version of the math environments. Because we use an agressive verbatim method for the environments, we cannot use the usual \verb'\HLet' method. Instead, we save the original meaning, and user needs to restore it manually, using the \verb'\RestoreMathJaxEnvironments' command. \<<< \ExplSyntaxOn \seq_new:N\:savedmathjaxenvs \newcommand\:savemathjaxenv[1]{% \seq_gput_right:Nn\:savedmathjaxenvs{#1} \expandafter\let\csname mathjax-#1\expandafter\endcsname\csname #1\endcsname% \expandafter\let\csname mathjax-end#1\expandafter\endcsname\csname end#1\endcsname% } % we must not reintroduce the matrix environmeint in TikZ, because it interferes with the \matrix command \newcommand\:ignoretikzmatrix{\seq_remove_all:Nn\:savedmathjaxenvs{matrix}} \newcommand\RestoreMathJaxEnvironment[1]{% \expandafter\let\csname #1\expandafter\endcsname\csname mathjax-#1\endcsname% \expandafter\let\csname end#1\expandafter\endcsname\csname mathjax-end#1\endcsname% } \newcommand\RestoreMathJaxEnvironments{% \seq_map_function:NN\:savedmathjaxenvs\RestoreMathJaxEnvironment% } \ExplSyntaxOff >>> The \verb|\VerbMath| command redefines environments to pass their content verbatim to the HTML output. The optional command can contain name of the counter that should be updated. It will also use LaTeX 3 regular expressions to search for label commands. Thanks to that, cross-referencing to the math environments should work. It is a bit limited, for example subequations cannot work. There can be also issues in books, where equation numbers are based on chapters, but MathJax numbers them consecutively. In these cases, other mechanisms may be necessary. \<<< \ExplSyntaxOn \cs_generate_variant:Nn \regex_extract_once:nnNTF {nV} \newcommand\VerbMath[2][]{% \cs_if_exist:cTF{#2}{ \:savemathjaxenv{#2}% \RenewDocumentEnvironment{#2}{+!b}{% \NoFonts\expandafter\VerbMathToks\expandafter{\detokenize{##1}}{#2}\EndNoFonts% \ifx\relax#1\relax\else% \refstepcounter{#1}% \regex_extract_once:nVNTF { label\s* \x{7B}([^\x{7D}]*)\x{7D}} {\l_tmpb_tl} \l_tmp_seq {\label{\seq_item:Nn\l_tmp_seq{2}}} {}% \fi }{} }{}% } \ExplSyntaxOff >>> The \verb|\fixmathjaxtoc| command is used for patching commands which should keep their verbatim appearance in the TOC \<<< \def\fixmathjaxtoc#1{\Configure{writetoc}{\def#1{\detokenize{#1}}}} >>> This was meant to fix some commands in sectioning commands, but it is not a good solution. Better is to define the problematic commands as robust. \<<< \def\fixmathjaxsec#1{\def#1{\detokenize{#1}}} \fixmathjaxsec\right \fixmathjaxsec\left >>> Require verbatim math environments. \<<< \VerbMath{subarray} \VerbMath{smallmatrix} \VerbMath{matrix} \VerbMath{pmatrix} \VerbMath{bmatrix} \VerbMath{Bmatrix} \VerbMath{vmatrix} \VerbMath{Vmatrix} \VerbMath{cases} \VerbMath{subequations} \VerbMath{aligned} \VerbMath{alignedat} \VerbMath{gathered} \VerbMath{gather} \VerbMath{gather*} \VerbMath{alignat} \VerbMath{alignat*} \VerbMath{xalignat} \VerbMath{xalignat*} \VerbMath{xxalignat} \VerbMath{align} \VerbMath{align*} \VerbMath{flalign} \VerbMath{flalign*} \VerbMath{split} \VerbMath{multline} \VerbMath{multline*} \VerbMath[equation]{equation} \VerbMath{equation*} \VerbMath{math} \VerbMath{displaymath} \VerbMath{eqnarray} \VerbMath{eqnarray*} >>> It is necessary to reset env configurations for multiline environments, because default TeX4ht configurations turn them into pictures. \<<< \ConfigureEnv{gather}{}{}{}{} \ConfigureEnv{gather*}{}{}{}{} \ConfigureEnv{multline}{}{}{}{} \ConfigureEnv{multline*}{}{}{}{} >>> Fixes for tables of contents. \<<< \fixmathjaxtoc\int \fixmathjaxtoc\, \fixmathjaxtoc\sin \fixmathjaxtoc\cos \fixmathjaxtoc\tan \fixmathjaxtoc\arcsin \fixmathjaxtoc\arccos \fixmathjaxtoc\arctan \fixmathjaxtoc\csc \fixmathjaxtoc\sec \fixmathjaxtoc\cot \fixmathjaxtoc\sinh \fixmathjaxtoc\cosh \fixmathjaxtoc\tanh \fixmathjaxtoc\coth \fixmathjaxtoc\log \fixmathjaxtoc\ln \fixmathjaxtoc\sum \fixmathjaxtoc\( \fixmathjaxtoc\) \fixmathjaxtoc\begin \fixmathjaxtoc\end \fixmathjaxtoc\\ \fixmathjaxtoc\exp \fixmathjaxtoc\left \fixmathjaxtoc\right >>> \<<< \@ifpackageloaded{mhchem}{% \def\:tempa#1{\texttt{\detokenize{\(\ce{#1}\)}}} \HLet\ce\:tempa }{} >>> Restore environments redefined to produce verbatim output to their original meaning in the picture mode. Otherwise math cannot work inside TikZ etc. \<<< \@ifpackageloaded{tikz}{% %\tikzset{every picture/.append code={\:ignoretikzmatrix\RestoreMathJaxEnvironments}} \pend:def\tikz@installcommands{\RestoreMathJaxEnvironments} }{} >>> \endinput