\def\proposal{\smallskip\noindent{\sl Proposal:\/\enspace}} \def\ssection#1{\par\noindent{\sl#1:}\enspace} \def\MLTeX{Multilingual T\kern-.1667em\lower.5ex\hbox{\^E}\kern-.125emX} \def\JTeX{\lower.5ex\hbox{J}\kern-.1667em\TeX} \def\TeXXeT{\TeX-X\kern-.125em\lower.5ex\hbox{E}\kern-.1667emT} \def\command#1{{\tt\char`\\#1}} \def\cite#1{} \def\caption#1{\centerline{#1}} \frame{5pt}{% \noindent The following contribution has some of the characteristics of a historical document. It is essentially the paper which Jan Michael took to Stanford and with which contsind the details of the arguments with which he and Roswitha Graham so ably and persistently bent the ear of Don Knuth -- with the outcome that after many years of stability, \TeX\ is changing.} \medskip \centerline{\bf Proposal to the TUG Meeting at Stanford} \medskip \noindent Donald Knuth created \TeX\ to typeset American text interspersed with mathematical formulas. Apart from a few limitations \TeX\ has also proved suitable for typesetting other languages. Most limitations have been overcome by modifying the \TeX\ program itself (\MLTeX, \JTeX, \TeXXeT\dots), creating new fonts (Devan\=agar\=\i, Hebrew\dots), and modifying the existing ones (Icelandic Modern\dots). While covering the requirements of one or more specific languages these modifications have created compatibility problems. \TeX\ input files may require features which are only available in certain extended versions of \TeX, or {\DVI} output files may not print correctly unless you make non-standard modifications to the device drivers. The only way to solve these problems is put the necessary capabilities into the standard version of \TeX. In this paper I describe the problems of typesetting the Nordic languages with the current version of \TeX, and propose modifications to the \TeX\ program itself, the standard fonts, and macro packages, which will solve the problems. Most of the ideas concerning hyphenation come from Michael Ferguson's \MLTeX, which solves some of the problems in that area, but not all. I also include a proposal for modifications to macro packages (\LaTeX\dots), in order to support the use of the language of your own choice. It originates from a proposal by Hubert Partl, approved by the German \TeX\ users group. In spite of the fact that I have restricted myself to the Nordic languages, the proposals that I put forward will also be useful for a large number of other languages. All the modifications proposed in this paper may be carried out in a backward compatible way, that will not influence the use of \TeX\ unless you decide to use the added capabilities. This list of proposals was first discussed at the Nordic \TeX\ meeting in Stockholm in June 1989, and followed up by discussions through electronic mail. Among those people who presented suggestions and critique on my original proposal were Staffan Romberger, Peter Busk Laursen, Bruce Wolman, Simen Gaure, Heikki Heikkil\"a, Bo Thid\'e, and Bernard Gaulle. I apologize if I left anyone out of this list. I also want to express my gratitude to Roswitha Graham, who arranged the Nordic \TeX\ meeting, and whose encouragement has been of invaluable help to me. \smallskip \leftline{\sl Introduction} \noindent The Nordic problems with \TeX\ may be divided into: \item{\rtr} typesetting text in the Nordic languages \item{\rtr} using our own words, date formats, etc., instead of the American ones hard wired into some macro packages (\LaTeX\dots) \item{\rtr} using our national letters in macro names, etc. The last-mentioned is of minor importance, so I set it aside, and concentrate on the typesetting. To avoid the confusion that may arise from using different character sets and handling at the various stages of \TeX's typesetting process, I have further subdivided this group of problems into: \item{\rtr} input of national letters \item{\rtr} internal handling of national letters and words \item{\rtr} output of typeset national letters \smallskip \centerline{\vbox{\halign{#&&\quad\hfil#\hfil\cr \multispan2{Code}&{\sc ascii}&Sw/Fi&Da/No\cr 91&&\tt[&\tt\"A&\tt\AE\cr 92&&\tt\char`\\&\tt\"O&\tt\O\cr 93&&\tt]&\tt\AA&\tt\AA\cr 123&&\tt\char`\{&\tt\"a&\tt\ae\cr 124&&\tt \char'174&\tt\"o&\tt\o\cr 125&&\tt\char`\}&\tt\aa&\tt\aa\cr }}} \caption{Nordic 7-bit national characters.} \smallskip \leftline{\sl Input of national letters} \ssection{7-bit national characters} ISO 646 \cite{iso:646} ({\sc ascii} is the U.S.\ national character set based on ISO 646) reserves character codes 64, 91--94, 96, and 123--126 for `national or application use'. Unfortunately, three of these codes are used both for important functions in \TeX, and for important national letters in the Swedish, Finnish, Danish, and Norwegian national character sets. Nordic users expect to be able to use their national letters on their national keyboards, just like any other letter. Hence, on computers with 7-bit character codes or where \TeX\ will only accept 7-bit input we have had to remap \TeX's escape and group delimiter characters, for national use. \smallskip \centerline{\vbox{\halign{#\hfil&&\quad\hfil#\hfil\cr Function&{\sc ascii}&Sw/Fi/Da/No\cr Escape character&\tt\char`\\&{\tt/}\quad or\quad {\tt!}\cr Begin group&\tt\char`\{&\tt<\cr End group&\tt\char`\}&\tt>\cr }}}\caption{Nordic 7-bit \TeX\ remapping.} \smallskip The only real problem with this, apart from us being incompatible, is that the \command{write} command uses the begin group and end group characters in force when a macro was defined, not the ones in force when the \command{write} command is executed. \proposal Add a \command{bgroupchar} and an \command{egroupchar} (\TeX\ already has an \command{escapechar}). \ssection{8-bit extended {\sc ascii} characters} Some implementations of \TeX\ running on computers with 8-bit extended {\sc ascii} character sets discard the 8th bit of the character code, silently converting characters with codes 128--255 (usually including European national letters) to other characters. This is both confusing to the users (who have these letters on their keyboards, and expect them to work in \TeX, just like in any other program), and not very useful. \proposal Make character codes 128--255 permanently active on computers with 8-bit extended {\sc ascii} character sets. This will make it possible to redefine them to do something sensible, or lead to an `Undefined control sequence' error if they have not been defined. As more and more computers get such character sets, this should be part of the \TeX\ standard. On computers with an {\sc ebcdic} character set, the character codes which don't correspond to {\sc ascii} characters should be mapped to 128--255, and made active. \smallskip \leftline{\sl Internal handling of national letters and words} \ssection{Hyphenating words with accented letters} \DeK\ restrained \TeX\ from hyphenating words with accented letters, as those words are foreign to American and probably won't hyphenate correctly anyway. When using \TeX\ for typesetting languages which use accented letters (with the appropriate patterns) this restriction has been worked around by creating non-standard versions of \TeX\ (like \MLTeX) or non-standard fonts. \proposal Make \TeX\ accept accented letters in \command{patterns} and \command{hyphenation}. This will make it possible to use the standard version of \TeX, and the standard fonts to typeset text in other languages than American, using the built-in hyphenation. \smallskip \ssection{Don't use shifted boxes for national letters} \TeX\ (the macros in {\tt plain}, used with the Computer Modern fonts) uses shifted boxes for some national letters (\AA, \L, \c c\dots). This makes \TeX\ dependent on the metrics of the Computer Modern fonts, and it won't work with the above-mentioned hyphenation. \proposal Put national letters into the fonts or use \command{accent} to construct them. Use the same method for the uppercase and lowercase version of the same letter. Assign the appropriate \command{lccode} to each national letter put into the fonts, so that words with these letters may be hyphenated. \smallskip \ssection{Multilingual hyphenation} \TeX\ only supports one set of hyphenation patterns and exceptions. \proposal Make it possible to have multiple sets, and to switch between them, so \TeX\ may easily be used for multilingual typesetting. \smallskip \ssection{Variable suppression of hyphens} \TeX\ will never insert a hyphen that has fewer than 2 letters before it or fewer than 3 letters after it. These values are not suitable for all needs. \proposal Make the number of letters required before and after an inserted hyphen settable parameters. Either generate all patterns so that these parameters can be set to their minimum values (both 1), or add information to the patterns about the values used when they were generated, and restrict \TeX\ from setting the parameters to smaller values than that. \smallskip \ssection{Allow hyphenation of words with explicit hyphens} Some languages have very long words with explicit hyphens. Normally \TeX\ will refuse to hyphenate these words, though they are perfectly hyphenatable and must often be hyphenated. \proposal Make it possible to turn off this `feature', so that words with explicit hyphens may be hyphenated. \smallskip \ssection{Discretionary hyphens in patterns and exceptions} Some languages (Swedish, German, and probably some more) have words that require \command{discretionary} hyphenation. \TeX\ does not allow for this in \command{patterns} and \command{hyphenation}. \proposal Make \TeX\ accept \command{discretionary} hyphenation, both in \command{patterns} and in \command{hyphenation}, and make the corresponding changes to {\tt PATGEN}. It might be tempting to refer these words to the exception mechanism, but since it is possible to form an almost unlimited number of compound words and inflection forms that require \command{discretionary}, I think they are best handled by adding `hyphenation rules' to the hyphenation patterns. The basic rule, the only one \TeX\ knows of at the moment, is \command{discretionary\char`\{-\char`\}\char`\{\char`\}\char`\{\char`\}}. The German rule that `ck' is written as `k-k' when hyphenated, may be described by \command{discretionary\char`\{k-\char`\}\char`\{\char`\}\char`\{c\char`\}}, and the triple consonant suppression (for example, `tt' turns into `tt-t' in some words, because one of the t's was suppressed), takes a few more rules like \command{discretionary\char`\{t-\char`\}\char`\{\char`\}\char`\{\char`\}}, for a handful of consonants that may occur at such positions. The rules should be read into \TeX\ with the patterns, and the number of the rule to be applied at a certain position should be stored together with the odd interletter value at that position.\looseness1 \smallskip \ssection{Log hyphenations for proof-reading} No matter how good the hyphenation is, there will always be words which are not correctly hyphenated. In languages where compound words are frequent, more words tend to be incorrectly hyphenated. It may therefore be necessary to proof-read the hyphenations. As even a minimal change may cause a lot of hyphenations to change, all hyphenations may have to be proof-read after every change, unless there is a way to detect which hyphenations have changed. \proposal Make it possible to log hyphenations, e.g.\ by adding a \command{tracinghyphenations}. This will facilitate the proof-reading of hyphenations, and make it possible to write a program which finds the hyphenations that have changed. \smallskip \leftline{\sl Output of typeset national characters} \ssection{Full Latin character set} \TeX\ is capable of typesetting most letters used in Latin scripts, using the macros in {\tt plain} and the Computer Modern fonts. The following letters and accents in ISO 6937/2 \cite{iso:6937/2} (which is a superset of the ISO 8859/1, 2, 3, and 4 character sets) cannot be typeset that way: Icelandic {\em Thorn/thorn} and {\em Eth/eth}, Lapp {\em Eng/eng} and {\em T/t~with stroke}, Maltese {\em H/h~with stroke}, Croat and Lapp {\em D/d~with stroke}, Catalan {\em L/l~with middle dot}, Greenlandic {\em k~with short stem}, and the {\em ogonek accent}, used in Polish and Lithuanian. A few more letters and accents used in Latin scripts may be found in ISO/DP 10646\cite{iso:10646}. The baseline quotation marks (,,), required for Icelandic and German typesetting, and the guillemots (similar to $\ll$ and $\gg$), required for French typesetting, and occasionally used for typesetting the Nordic languages, are also missing from Computer Modern. \proposal Extend \TeX's Latin fonts so that they contain all of the letters, accents, quotation marks, currency symbols and other graphic symbols needed to typeset languages written with a Latin script. Of the 234 letters in ISO 6937/2 (counting both upper and lower case), 155 may be formed from a basic letter and a diacritical mark. Therefore, only 13 accents and 25 special letter forms need to be in the fonts. \smallskip \ssection{Placement of accents} \TeX's accent placement algorithm does not place all accents correctly\cite{romberger:texlatin}. \proposal Extend the {\tfm} format, so that \TeX\ may: \item{\rtr} use the current algorithm \item{\rtr} use an explicit accent placement with respect to the base letter, instead of the one calculated by the algorithm \item{\rtr} use an explicit accent placement for accent-letter pairs not handled by the above-mentioned method \item{\rtr} use specially designed accented letters as replacements for accent-letter pairs \smallskip\goodbreak \ssection{Kern accented letters, too} \TeX\ performs no implicit kerning between two letters if the second is accented. In most cases, the result would look better if the letters were kerned. \proposal Make \TeX\ insert implicit kerns between letters without regard to if they are accented, or design an algorithm which takes both the basic letter and the accent into consideration when inserting implicit kerns. \smallskip \leftline{\sl Macro packages} \noindent Several macro packages (like the \LaTeX\ style files) contain American words, date formats, etc., which should be replaced by the corresponding words and date formats from the language used in the text, when using another language. Thus, the heading `Contents' should be changed to `Inneh\aa ll' in Swedish, `Inhalt' in German, etc. \proposal Define macros for all occurences of American words, date formats, etc.\ in macro packages, so that they can easily be replaced, e.g.\ by redefining those macros in a style file. \smallskip \leftline{\sl Conclusion} \noindent The problems of typesetting the Nordic languages with the current version of \TeX\ have been decribed, together with proposed modifications to the \TeX, the Computer Modern fonts, and macro packages, which should solve the problems. This may serve as a basis for further discussions aimed at a European proposal. The changes needed may then, be incorporated into a standard version of \TeX. \smallskip \leftline{\sl Bibliography} {\frenchspacing\parindent0pt\hangindent20pt\hangafter1 Michael Ferguson, {\sl A \MLTeX}, \TUGboat, 6(2), 1985, pp57--58. \hangindent20pt\hangafter1 Michael Ferguson, {\sl\MLTeX\ Update}, \TUGboat, 7(1), 1986, p16. \hangindent20pt\hangafter1 ISO, {\sl ISO 646: ISO 7-bit coded character set for information interchange}, 1983. \hangindent20pt\hangafter1 ISO, {\sl ISO 6937/2: Coded character sets for text communication -- Part 2: Latin alphabetic and non-alphabetic characters}, 1983. \hangindent20pt\hangafter1 ISO, {\sl ISO\slash DP 10646: Multiple octet coded character set}, 1989. \hangindent20pt\hangafter1 Donald~E. Knuth, {\sl The \TeX book}, Addison Wesley, Reading, Massachusetts, 1986. \hangindent20pt\hangafter1 Donald~E. Knuth, {\sl\TeX: The Program}, Addison Wesley, Reading, Massachusetts, 1986. \hangindent20pt\hangafter1 Donald~E.\ Knuth and Pierre MacKay, {\sl Mixing right-to-left texts with left-to-right texts}, \TUGboat, 8(1), 1987, pp.\ 14--25. \hangindent20pt\hangafter1 Leslie Lamport, {\sl\LaTeXsl: A Document Preparation System}, Addison Wesley, Reading, Massachusetts, 1986. \hangindent20pt\hangafter1 Hubert Partl, {\sl German \TeX}, \TUGboat, 9(1), 1988, pp70--72. \hangindent20pt\hangafter1 Hubert Partl {\it et al}, {\ttit german.sty} \hangindent20pt\hangafter1 Staffan Romberger and Yngve Sundblad, {\sl Adapting \TeX\ to Languages that use Latin Alphabetic Characters}, Proceedings of the First European Conference on \TeX\ for Scientific Documentation, Addison Wesley, Reading, Massachusetts, 1985. \hangindent20pt\hangafter1 Yasuki Saito, {\sl Report on \JTeX: A Japanese \TeX}, \TUGboat, 8(2), 1987, pp.\ 103--116. \hangindent20pt\hangafter1 Dominik Wujastyk. {\sl The Many Faces of \TeX: A Survey of Digital \MFsl}, \TUGboat, 9(2), 1988, pp.\ 131--151. {\sl see also} \TeXline\ 8. } \rightline{\sl Jan Michael Rynning}