From: Antti Louko To: spqr@uk.ac.soton.cm In-Reply-To: Sebastian Rahtz's message of Wed, 20 Jul 88 17:43:37 GMT <88072017 Subject: multiple hyphenations Resent-Date: Fri, 22 Jul 88 15:51:16 GMT Resent-From: spqr@uk.ac.soton.cm Resent-To: abbottp@uk.ac.aston.mail I have modified TeX to handle multiple hyphenation languages. This is done with integer parameter \patternsnum, which selects the pattern set currently used. The TeX version I modified is the web-to-c version. It should work with most UNIX-machines. For example: % tex This is TeX, C Version 2.9 (preloaded format=plain 88.7.5) **\relax *\patternsnum=0 % Well, this is default (English) *\showhyphens{hyphenation} Underfull \hbox (badness 10000) detected at line 0 [] \tenrm hy-phen-ation *\showhyphens{tavutus} % tavutus = hyphenation in Finnish Underfull \hbox (badness 10000) detected at line 0 [] \tenrm tavu-tus *\patternsnum=1 % Finnish *\showhyphens{hyphenation} Underfull \hbox (badness 10000) detected at line 0 [] \tenrm hyp-he-na-tion *\showhyphens{tavutus} Underfull \hbox (badness 10000) detected at line 0 [] \tenrm ta-vu-tus *\bye (see the transcript file for additional information) No pages of output. Transcript written on texput.log. % You can see how the hyphenation changes. \patternsnum is used also with the \patterns primitive, when TeX reads all the patterns. This is our /usr/lib/tex/macros/hyphen.tex file: \patternsnum=0 \input enghyphen % English hyphenation patterns (the original hyphen.tex) \patternsnum=1 \input fh % Finnish hyphenation patterns \patternsnum=0 % Reset to the default If \patternsnum <> 0 accents, hyphenation algorithm sees accented characters as pairs: \"a shows as ~~?a etc. This is useful with many languages. \patternsnum can be used in an environment where many languages are needed in the same document. \patternsnum affects when the whole paragraph is being hyphenated; that means that you should change change \patternsnum immediately after a \par. I sent diffs to ctex.ch file with this. If moderator wants to place it in a public place, please do so. It could be a good idea to send them to UNIX-TeX distributor, too. The diff is below. It passes the trip test. *** ctex.ch.orig Mon Jun 20 15:29:23 1988 --- ctex.ch Wed Jul 13 17:16:44 1988 *************** *** 12,17 **** --- 12,19 ---- % (1 /14/88) ETM Brought up to TeX 2.9 % (2 /20/88) PAM Revised format and module numbers % (3 /1 /88) ETM Eliminated some unused variables and unnecesary tests + % (21.6.88) alo@santra Added Finnish hyphenation + % (5.7.88) alo@santra Added alternate hyphenation patterns capability % NOTE: the module numbers in this change file refer to the published % text in TeX, the Program, Volume B. 1986 *************** *** 19,25 **** %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % [0] WEAVE: only print changes %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ! @x Tell WEAVE to print only the changes: l.69 \def\?##1]{\hbox to 1in{\hfil##1.\ }} } @y --- 21,28 ---- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % [0] WEAVE: only print changes %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ! % Tell WEAVE to print only the changes: ! @x l.69 \def\?##1]{\hbox to 1in{\hfil##1.\ }} } @y *************** *** 193,198 **** --- 196,202 ---- at most |max_halfword|} @!trie_size=8000; {space for hyphenation patterns; should be larger for \.{INITEX} than it is in production versions of \TeX} + @!pat_num_max=1; {maximum number of hyphenation patterns -1} @!dvi_buf_size=16384; {size of the output buffer; must be a multiple of 8} @!file_name_size=1024; {file names shouldn't be longer than this} @!pool_name='tex.pool'; *************** *** 837,842 **** --- 841,852 ---- if r>toint(p+1) then @; @z + @x l.3236 + @!k:integer; {index into |mem|, |eqtb|, etc.} + @y + @!j,k:integer; {index into |mem|, |eqtb|, etc.} + @z + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % [11.165] fix the word "free" so that it doesn't conflict with the C routine %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% *************** *** 923,928 **** --- 933,976 ---- end @z %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % Finnish hyphenation + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + @x l.4833 + @d new_line_char_code=49 {character that prints as |print_ln|} + @d int_pars=50 {total number of integer parameters} + @y + @d new_line_char_code=49 {character that prints as |print_ln|} + @d pat_num_code=50 {} + @d int_pars=51 {total number of integer parameters} + @z + + @x l.4891 + @d new_line_char==int_par(new_line_char_code) + @y + @d new_line_char==int_par(new_line_char_code) + @d pat_num==int_par(pat_num_code) + @z + + @x l.4950 + new_line_char_code:print_esc("newlinechar"); + @y + new_line_char_code:print_esc("newlinechar"); + pat_num_code:print_esc("patternsnum"); + @z + + @x l.5059 + primitive("newlinechar",assign_int,int_base+new_line_char_code);@/ + @!@:new_line_char_}{\.{\\newlinechar} primitive@> + @y + primitive("newlinechar",assign_int,int_base+new_line_char_code);@/ + @!@:new_line_char_}{\.{\\newlinechar} primitive@> + primitive("patternsnum",assign_int,int_base+pat_num_code);@/ + @!@:pat_num_}{\.{\\pat_num} primitive@> + @z + + %% End of Finnish + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % [17.241] fix_date_and_time %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @x l.5079 *************** *** 1148,1154 **** make_name_string:=make_string; end; ! {The X_make_name_string functions are changed to macros in C} @z %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% --- 1196,1202 ---- make_name_string:=make_string; end; ! {The |X_make_name_string| functions are changed to macros in C} @z %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% *************** *** 1259,1264 **** --- 1307,1449 ---- @z %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % Changes for Finnish hyphenation + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + @x l.16983 + @!hu:array[1..63] of ASCII_code; {like |hc|, before conversion to lowercase} + @!hyf_char:integer; {hyphen character of the relevant font} + @y + @!hu:array[1..63] of ASCII_code; {like |hc|, before conversion to lowercase} + @!hyf_char:integer; {hyphen character of the relevant font} + @!hacc:array[1..63] of pointer; + @z + + @x l.16990 + @!c:0..255; {character being considered for hyphenation} + @y + @!c:0..255; {character being considered for hyphenation} + @!s2,s3:pointer; + @z + + @x l.16998 + begin @pat_num_max)or(pat_num<0)) then pat_num:=0; + @; + hyphenate; + end; + done1: end + @y + at least five letters have been found, otherwise |goto done1|@>; + hyphenate; + end; + done1: while hn>0 do begin if hacc[hn]<>null then + flush_node_list(hacc[hn]); decr(hn); end; + end + @z + + @x l.17028 + else if (type(s)=kern_node)and(subtype(s)=normal) then c:=128 + @y + else if (type(s)=kern_node)and(subtype(s)=normal) then c:=128 + else if (type(s)=kern_node)and(subtype(s)=acc_kern)and(pat_num<>0) then + begin s2:=link(s); + if is_char_node(s2) then + begin c:=qo(character(s2)); hf:=font(s2); + end + else goto done1 + end + @z + + @x l.17036 + done2: hyf_char:=hyphen_char[hf]; + if hyf_char<0 then goto done1; + if hyf_char>255 then goto done1; + @y + done2: hyf_char:=hyphen_char[hf]; + if hyf_char<0 then goto done1; + if hyf_char>255 then goto done1; + for j:=1 to 63 do hacc[j]:=null; + @z + + @x l.17053 + @ + @y + @ + else if (type(s)=kern_node)and(subtype(s)=acc_kern)and(pat_num<>0) then + begin + s3:=link(s); + link(s):=null; + hacc[hn+1]:=copy_node_list(s); + link(s):=s3; + end + @z + + @x l.17199 + @ @= + if j=n then goto done; + @y + @ @= + if j=n then goto done; + if hacc[j+1]<>null then goto done; + @z + + @x l.17251 + @= + j:=0; + repeat l:=j; j:=reconstitute(j+1,hn); + @y + @= + j:=0; + repeat l:=j; + if hacc[j+1]<>null then + begin link(s):=copy_node_list(hacc[j+1]); + s:=link(s); + end; + j:=reconstitute(j+1,hn); + @z + + @x l.17385 + @ @d trie_link(#)==trie[#].rh {``downward'' link in a trie} + @d trie_char(#)==trie[#].b1 {character matched at this trie location} + @d trie_op(#)==trie[#].b0 {program for hyphenation at this trie location} + + @= + @!trie:array[trie_pointer] of two_halves; {|trie_link|, |trie_char|, |trie_op| + @!hyf_distance:array[quarterword] of small_number; {position |k-j| of $n_j$} + @!hyf_num:array[quarterword] of small_number; {value of $n_j$} + @!hyf_next:array[quarterword] of quarterword; {continuation of this |trie_op|} + @y + @ @d trie==trie_arr[pat_num] + @d trie_link(#)==trie[#].rh {``downward'' link in a trie} + @d trie_char(#)==trie[#].b1 {character matched at this trie location} + @d trie_op(#)==trie[#].b0 {program for hyphenation at this trie location} + @d hyf_distance==hyf_distance_arr[pat_num] + @d hyf_num==hyf_num_arr[pat_num] + @d hyf_next==hyf_next_arr[pat_num] + @d trie_max==trie_max_arr[pat_num] + + @= + @!trie_arr:array[0..pat_num_max] of array[trie_pointer] of two_halves; + {|trie_link|, |trie_char|, |trie_op|} + @!hyf_distance_arr:array[0..pat_num_max] of array[quarterword] of small_number + {position |k-j| of $n_j$} + @!hyf_num_arr:array[0..pat_num_max] of array[quarterword] of small_number; + {value of $n_j$} + @!hyf_next_arr:array[0..pat_num_max] of array[quarterword] of quarterword; + {continuation of this |trie_op|} + @z + + %% End of Finnish + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % [43.944] Fix a C casting/expression evaluation problem %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @x l.17672 *************** *** 1278,1283 **** --- 1463,1496 ---- 2718*toint(trie_l[p])+3142*toint(trie_r[p])) mod trie_size; @z + @x l.17774 + begin for h:=0 to trie_op_hash_size do trie_op_hash[h]:=min_quarterword; + @y + begin if ((pat_num>pat_num_max)or(pat_num<0)) then pat_num:=0; + for h:=0 to trie_op_hash_size do trie_op_hash[h]:=min_quarterword; + @z + + @x l.17805 + @t\hskip1em@>@!trie_max:trie_pointer; {largest location used in |trie|} + @y + @t\hskip1em@>@!trie_max_arr:array[0..pat_num_max] of trie_pointer; + {largest location used in |trie|} + @z + + @x l.17827 + trie_link(0):=0; trie_char(0):=0; trie_op(0):=min_quarterword; + for k:=1 to 127 do trie[k]:=trie[0]; + trie_max:=127; + @y + for j:=0 to pat_num_max do + begin pat_num:=j; + trie_link(0):=0; trie_char(0):=0; trie_op(0):=min_quarterword; + for k:=1 to 127 do trie[k]:=trie[0]; + trie_max:=127 + end; + pat_num:=0; + @z + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % [49.1275] a_open_in of \read file needs path specifier %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% *************** *** 1439,1445 **** dump_things(eqtb[k], l-k); @z ! @x l.23037 while k + @~extensions to \TeX@> + + @y + incompatible extensions of \TeX\ from proliferating. + @~system dependencies@> + @~extensions to \TeX@> + + The support for different hyphenation algorithms is also implemented + as an extension. + @~Finnish hyphenation@> + + @z + + %% End of Finnish + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % [54.1376] add editor-switch variables to globals %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @x l.23926 *************** *** 1735,1741 **** @~system dependencies@> @y Here is a temporary integer, used as a holder during reading and writing of ! TFM files, and a temporary memory_word, used in reading/writing format files. Also, the variables used to hold ``switch-to-editor'' information. @~ --- 1984,1990 ---- @~system dependencies@> @y Here is a temporary integer, used as a holder during reading and writing of ! TFM files, and a temporary |memory_word|, used in reading/writing format files. Also, the variables used to hold ``switch-to-editor'' information. @~ *---------------------------------------------------------------------------* alo@santra.UUCP (mcvax!santra!alo) Antti Louko alo@santra.hut.fi Helsinki University of Technology alo@fingate.bitnet Computing Centre alo%fingate.bitnet@cunyvm.cuny.edu SF-02150, Espoo FINLAND tel. +358 0 4512624 *---------------------------------------------------------------------------*