\def\proposal{\smallskip\noindent{\sl Proposal:\/\enspace}}
\def\ssection#1{\par\noindent{\sl#1:}\enspace}
\def\MLTeX{Multilingual T\kern-.1667em\lower.5ex\hbox{\^E}\kern-.125emX}
\def\JTeX{\lower.5ex\hbox{J}\kern-.1667em\TeX}
\def\TeXXeT{\TeX-X\kern-.125em\lower.5ex\hbox{E}\kern-.1667emT}
\def\command#1{{\tt\char`\\#1}}
\def\cite#1{}
\def\caption#1{\centerline{#1}}
\frame{5pt}{%
\noindent
The following contribution has some of the characteristics
of a historical document. It is essentially the paper which
Jan Michael took to Stanford and with which contsind
the details of the arguments with which he and
Roswitha Graham so ably and persistently bent the ear of Don
Knuth -- with the outcome that after many years of stability,
\TeX\ is  changing.}
\medskip
\centerline{\bf Proposal to the TUG Meeting at Stanford} 
\medskip
\noindent
Donald  Knuth created \TeX\ 
to typeset American text interspersed with mathematical
formulas. Apart from a few limitations \TeX\ has also
proved suitable for typesetting other languages. Most
limitations have been overcome by modifying the \TeX\
program itself (\MLTeX, \JTeX, \TeXXeT\dots),
creating new fonts (Devan\=agar\=\i,
Hebrew\dots), and modifying the
existing ones (Icelandic Modern\dots).  
While covering the requirements of one or more
specific languages these modifications have created
compatibility problems. \TeX\ input files may require
features which are only available in certain extended
versions of \TeX, or {\DVI}  output
files may not print correctly unless you make non-standard
modifications to the device drivers. The only way to solve
these problems is put the necessary capabilities into the
standard version of \TeX. 

In this paper I describe the
problems of typesetting the Nordic languages with the
current version of \TeX, and propose modifications to the
\TeX\ program itself, the standard fonts, and macro
packages, which will solve the problems. Most of the ideas
concerning hyphenation come from Michael Ferguson's
\MLTeX, which solves some of the problems in that area, but
not all. I also include a proposal for modifications to
macro packages (\LaTeX\dots), in
order to support the use of the language of your own
choice.  It originates from a proposal by Hubert Partl, approved by the German
\TeX\ users group. In spite of the fact that I have
restricted myself to the Nordic languages, the proposals
that I put forward will also be useful for a large number
of other languages. All the modifications proposed in this
paper may be carried out in a backward compatible way, that
will not influence the use of \TeX\ unless you decide to
use the added capabilities.

This list of proposals was first discussed at the Nordic \TeX\ meeting
in Stockholm in June 1989, and followed up by discussions through
electronic mail. Among those people who presented suggestions and
critique on my original proposal were Staffan Romberger, Peter Busk
Laursen, Bruce Wolman, Simen Gaure, Heikki Heikkil\"a, Bo Thid\'e,
and Bernard Gaulle. I apologize if I
left anyone out of this list. I also want to express my gratitude to
Roswitha Graham, who arranged the Nordic \TeX\ meeting, and whose
encouragement has been of invaluable help to me.

\smallskip
\leftline{\sl Introduction}
\noindent
The Nordic problems with \TeX\ may be divided into:
\item{\rtr} typesetting text in the Nordic languages
\item{\rtr} using our own words, date formats, etc., instead of the American
ones hard wired into some macro packages (\LaTeX\dots)
\item{\rtr} using our national letters in macro names, etc.

The last-mentioned is of minor importance, so I set it aside, and
concentrate on the typesetting. To avoid the confusion that may arise
from using different character sets and handling at the various stages
of \TeX's typesetting process, I have further subdivided this group of
problems into:

\item{\rtr} input of national letters
\item{\rtr} internal handling of national letters and words
\item{\rtr} output of typeset national letters

\smallskip
\centerline{\vbox{\halign{#&&\quad\hfil#\hfil\cr
\multispan2{Code}&{\sc ascii}&Sw/Fi&Da/No\cr
91&&\tt[&\tt\"A&\tt\AE\cr
92&&\tt\char`\\&\tt\"O&\tt\O\cr
93&&\tt]&\tt\AA&\tt\AA\cr
123&&\tt\char`\{&\tt\"a&\tt\ae\cr
124&&\tt \char'174&\tt\"o&\tt\o\cr
125&&\tt\char`\}&\tt\aa&\tt\aa\cr
}}}
\caption{Nordic 7-bit national characters.}

\smallskip
\leftline{\sl Input of national letters}
\ssection{7-bit national characters}
ISO 646 \cite{iso:646} ({\sc ascii} is the U.S.\ national character set based
on ISO 646) reserves character codes 64, 91--94, 96, and 123--126 for
`national or application use'.  Unfortunately, three of these codes
are used both for
important functions in \TeX, and for important national letters in
the Swedish, Finnish, Danish, and Norwegian national character sets.
 
Nordic users expect to be able to use their national letters on their
national keyboards, just like any other letter. Hence, on computers
with 7-bit character codes or where \TeX\ will only accept 7-bit input
we have had to remap \TeX's escape and group delimiter characters, for
national use.
\smallskip
\centerline{\vbox{\halign{#\hfil&&\quad\hfil#\hfil\cr
Function&{\sc ascii}&Sw/Fi/Da/No\cr
Escape character&\tt\char`\\&{\tt/}\quad or\quad {\tt!}\cr
Begin group&\tt\char`\{&\tt<\cr
End group&\tt\char`\}&\tt>\cr
}}}\caption{Nordic 7-bit \TeX\ remapping.}
\smallskip
The only real problem with this, apart from us being incompatible, is
that the \command{write} command uses the begin group and end group
characters in force when a macro was defined, not the ones in force
when the \command{write} command is executed.

\proposal
Add a \command{bgroupchar} and an \command{egroupchar} (\TeX\ 
already has an \command{escapechar}).

\ssection{8-bit extended {\sc ascii} characters}
Some implementations of \TeX\ running on computers with 8-bit extended
{\sc ascii} character sets discard the 8th bit of the character code, silently
converting characters with codes 128--255 (usually including European
national letters) to other characters.  This is both confusing to the
users (who have these letters on their keyboards, and expect them to
work in \TeX, just like in any other program), and not very useful.

\proposal
Make character codes 128--255 permanently active on computers with 8-bit
extended {\sc ascii} character sets.  This will make it possible to redefine
them to do something sensible, or lead to an `Undefined control
sequence' error if they have not been defined.  As more and more
computers get such character sets, this should be part of the \TeX\
standard.  On computers with an {\sc ebcdic} character set, the character
codes which don't correspond to {\sc ascii} characters should be mapped to
128--255, and made active.
\smallskip
\leftline{\sl Internal handling of national letters and words}
\ssection{Hyphenating words with accented letters}
\DeK\ restrained \TeX\ from hyphenating words with accented
letters, as those words are foreign to American and probably won't
hyphenate correctly anyway.  When using \TeX\ for typesetting languages
which use accented letters (with the appropriate patterns) this
restriction has been worked around by creating non-standard versions of
\TeX\ (like \MLTeX) or non-standard fonts.

\proposal
Make \TeX\ accept accented letters in \command{patterns} and
\command{hyphenation}. This will make it possible to use the
standard version of \TeX, and the standard fonts to typeset text in other
languages than American, using the built-in hyphenation.
 
\smallskip
\ssection{Don't use shifted boxes for national letters}
\TeX\ (the macros in {\tt plain}, used with the Computer Modern fonts) uses
shifted boxes for some national letters (\AA, \L, \c c\dots).
This makes \TeX\ dependent on the metrics of the Computer Modern fonts,
and it won't work with the above-mentioned hyphenation.

\proposal
Put national letters into the fonts or use \command{accent} to construct
them. Use the same method for the uppercase and lowercase version of the
same letter. Assign the appropriate \command{lccode} to each national
letter put into the fonts, so that words with these letters may be
hyphenated.
 
\smallskip
\ssection{Multilingual hyphenation}
\TeX\ only supports one set of hyphenation patterns and exceptions.

\proposal
Make it possible to have multiple sets, and to switch between them,
so \TeX\ may easily be used for multilingual typesetting.
 
\smallskip
\ssection{Variable suppression of hyphens}
\TeX\ will never insert a hyphen that has fewer than 2 letters before it
or fewer than 3 letters after it. These values are not suitable for all
needs.

\proposal
Make the number of letters required before and after an inserted hyphen
settable parameters. Either generate all patterns so that these parameters
can be set to their minimum values (both 1), or add information to the
patterns about the values used when they were generated, and restrict \TeX\
from setting the parameters to smaller values than that.

\smallskip
\ssection{Allow hyphenation of words with explicit hyphens}
Some languages have very long words with explicit hyphens. Normally \TeX\
will refuse to hyphenate these words, though they are perfectly hyphenatable
and must often be hyphenated.

\proposal
Make it possible to turn off this `feature', so that words with explicit
hyphens may be hyphenated.
 
\smallskip
\ssection{Discretionary hyphens in patterns and exceptions}
Some languages (Swedish, German, and probably some more) have
words that require \command{discretionary} hyphenation.
\TeX\ does not allow for this in \command{patterns} and \command{hyphenation}.

\proposal
Make \TeX\ accept \command{discretionary} hyphenation, both in
\command{patterns} and in \command{hyphenation}, and
make the corresponding changes to {\tt PATGEN}.  It might be tempting to refer
these words to the exception mechanism, but since it is possible to form
an almost unlimited number of compound words and inflection forms that
require \command{discretionary}, I think they are best handled by adding
`hyphenation rules' to the hyphenation patterns.  The
basic rule, the only one \TeX\ knows of at the moment, is
\command{discretionary\char`\{-\char`\}\char`\{\char`\}\char`\{\char`\}}.
The German rule that `ck' is written as `k-k' when
hyphenated, may be described by
\command{discretionary\char`\{k-\char`\}\char`\{\char`\}\char`\{c\char`\}},
and the triple consonant suppression (for example, `tt'
turns into `tt-t' in some words, because one of the t's was
suppressed), takes a few more rules like
\command{discretionary\char`\{t-\char`\}\char`\{\char`\}\char`\{\char`\}},
for a handful of consonants that may occur at such
positions. The rules should be read into \TeX\ with the
patterns, and the number of the rule to be applied at a
certain position should be stored together with the odd
interletter value at that position.\looseness1
 
\smallskip
\ssection{Log hyphenations for proof-reading}
No matter how good the hyphenation is, there will always be words which
are not correctly hyphenated.  In languages where compound words are
frequent, more words tend to be incorrectly hyphenated.  It may therefore
be necessary to proof-read the hyphenations.  As even a minimal change
may cause a lot of hyphenations to change, all hyphenations may have to
be proof-read after every change, unless there is a way to detect which
hyphenations have changed.

\proposal
Make it possible to log hyphenations, e.g.\ by adding a
\command{tracinghyphenations}.  This will facilitate the proof-reading
of hyphenations, and make it possible to write a program which finds
the hyphenations that have changed.
 

\smallskip
\leftline{\sl Output of typeset national characters}
\ssection{Full Latin character set}
\TeX\ is capable of typesetting most letters used in Latin
scripts, using the macros in {\tt plain} and the Computer
Modern fonts. The following letters and accents in ISO
6937/2 \cite{iso:6937/2} (which is a superset of the ISO
8859/1, 2, 3, and 4 character sets) cannot be typeset that
way: Icelandic {\em Thorn/thorn} and {\em Eth/eth}, Lapp
{\em Eng/eng} and {\em T/t~with stroke}, Maltese {\em
H/h~with stroke}, Croat and Lapp {\em D/d~with stroke},
Catalan {\em L/l~with middle dot}, Greenlandic {\em k~with
short stem}, and the {\em ogonek accent}, used in Polish
and Lithuanian.  A few more letters and accents used in
Latin scripts may be found in ISO/DP 10646\cite{iso:10646}.


The baseline quotation marks (,,), required for Icelandic and German
typesetting, and the guillemots (similar to $\ll$ and $\gg$), required
for French typesetting, and occasionally used for typesetting the Nordic
languages, are also missing from Computer Modern.

\proposal
Extend \TeX's Latin fonts so that they contain all of the letters,
accents, quotation marks, currency symbols and other graphic symbols needed
to typeset languages written with a Latin script.
Of the 234 letters in ISO 6937/2 (counting both upper and lower case),
155 may be formed from a basic letter and a diacritical mark.  Therefore,
only 13 accents and 25 special letter forms need to be in the fonts.  
 
\smallskip
\ssection{Placement of accents}
\TeX's accent placement algorithm does not place all
accents correctly\cite{romberger:texlatin}.

\proposal
Extend the {\tfm} format, so that \TeX\ may:

\item{\rtr} use the current algorithm
\item{\rtr} use an explicit accent placement with respect to the base letter,
instead of the one calculated by the algorithm
\item{\rtr} use an explicit accent placement for accent-letter pairs not
handled by the above-mentioned method
\item{\rtr} use specially designed accented letters as replacements for
accent-letter pairs

 
\smallskip\goodbreak
\ssection{Kern accented letters, too}
\TeX\ performs no implicit kerning between two letters if the second is
accented.  In most cases, the result would look better if the letters
were kerned.

\proposal
Make \TeX\ insert implicit kerns between letters without regard to if they
are accented, or design an algorithm which takes both the basic letter and
the accent into consideration when inserting implicit kerns.
 
\smallskip
\leftline{\sl Macro packages}
\noindent
Several macro packages (like the \LaTeX\ style files) contain American words,
date formats, etc., which should be replaced by the corresponding words and
date formats from the language used in the text, when using another language.
Thus, the heading `Contents' should be changed to
`Inneh\aa ll' in Swedish, `Inhalt' in German, etc.

\proposal
Define macros for all occurences of American words, date formats, etc.\ in
macro packages, so that they can easily be replaced, e.g.\ by
redefining those macros in a style file.
 
\smallskip
\leftline{\sl Conclusion}
\noindent
The problems of typesetting the Nordic
languages with the current version of \TeX\ have been decribed, together with proposed
modifications to the \TeX, the Computer Modern fonts, and macro packages,
which should solve the problems.  This may serve as a basis
for further discussions aimed at a European proposal.  The changes needed may then,
 be incorporated into a standard version of \TeX.

\smallskip
\leftline{\sl Bibliography}
{\frenchspacing\parindent0pt\hangindent20pt\hangafter1
Michael Ferguson,
{\sl A \MLTeX},
\TUGboat, 6(2), 1985, pp57--58.

\hangindent20pt\hangafter1
Michael Ferguson,
{\sl\MLTeX\ Update},
\TUGboat, 7(1), 1986, p16.

\hangindent20pt\hangafter1
ISO, {\sl  ISO 646: ISO 7-bit coded character set for
information interchange},
 1983.

\hangindent20pt\hangafter1
ISO,
{\sl ISO 6937/2: Coded character sets for text
communication -- Part 2: Latin alphabetic and
non-alphabetic characters},  1983.

\hangindent20pt\hangafter1
ISO,
{\sl ISO\slash DP 10646: Multiple octet coded character set},
1989.

\hangindent20pt\hangafter1
Donald~E. Knuth,
{\sl The \TeX book},
Addison Wesley, Reading, Massachusetts, 1986.
 
\hangindent20pt\hangafter1
Donald~E. Knuth,
{\sl\TeX: The Program},
Addison Wesley, Reading, Massachusetts, 1986.
 
\hangindent20pt\hangafter1
Donald~E.\ Knuth and Pierre MacKay,
{\sl Mixing right-to-left texts with left-to-right texts},
\TUGboat, 8(1), 1987, pp.\ 14--25.

\hangindent20pt\hangafter1
Leslie Lamport,
{\sl\LaTeXsl: A Document Preparation System},
Addison Wesley, Reading, Massachusetts, 1986.

\hangindent20pt\hangafter1
Hubert Partl,
{\sl German \TeX},
\TUGboat, 9(1), 1988, pp70--72.

\hangindent20pt\hangafter1
Hubert Partl {\it et al},
{\ttit german.sty}

\hangindent20pt\hangafter1
Staffan Romberger and Yngve Sundblad,
{\sl
Adapting \TeX\ to Languages that use Latin Alphabetic
Characters},  Proceedings of the First
European Conference on \TeX\ for Scientific Documentation,
Addison Wesley, Reading, Massachusetts, 1985.

\hangindent20pt\hangafter1
Yasuki Saito,
{\sl Report on \JTeX: A Japanese \TeX},
\TUGboat, 8(2), 1987, pp.\ 103--116.

\hangindent20pt\hangafter1
Dominik Wujastyk.
{\sl The Many Faces of \TeX: A Survey of Digital \MFsl},
\TUGboat, 9(2), 1988, pp.\ 131--151. {\sl see also}
\TeXline\ 8.

}

\rightline{\sl Jan Michael Rynning}