% For macro-command names \DeclareRobustCommand\code[1]{\texttt{\char`\\#1}} \def\TFM{\acro{TFM}} \def\JAVA{Java} \def\DANTEeV{\acro{DANTE} e.V.} \def\CNAN{\acro{CNAN}} \def\eTeX{$\varepsilon$-\TeX} \let\stress\emph \def\NTS{\acro{NTS}} \let\Section\section \let\Subsection\subsection \title {\eTeX\ V2: a peek into the future} \author [Philip Taylor]{Philip Taylor\\ Royal Holloway and Bedford New College,\\ University of London\\ E-mail: \texttt{P.Taylor@Vms.Rhbnc.Ac.Uk} } \begin{Article} \begin{abstract} \eTeX\ V1 was released towards the end of 1996, and it was intended at the time that \eTeX\ V2 should be released approximately one year later. For various reasons the release date has slipped a little, but V2 is in alpha-test and we confidently expect to release it in the near future: indeed, it may well have been released by the time that this article appears in print. In this article the new features of V2 are reviewed, and we conclude by peeking a little further into the future to see what will happen with \NTS. \end{abstract} \Section {Introduction} When the \NTS\ project was first established in 1992, we hoped that work would commence fairly quickly. Sadly it became all too obvious that a project of that magnitude would require one or more full-time workers, and none of the (volunteer) team had that sort of time to spare. The team decided that, until funding could be found, they would work on the less-demanding but still worthwhile task of extending \TeX\ in its current (Pascal-WEB) form. The first fruits of that work were revealed in late 1996, when \eTeX\ V1 was released. This version added approximately 30 new primitives to \TeX\, and a small adjunct macro library was also produced which both extended the plain \TeX\ format to accommodate the new primitives and also added new functionality such as natural language handling and \TeX\ module libraries. Once \eTeX\ V1 had been released, the group were free to concentrate on the next version. Originally intended to ship a year later than the first, this version has slipped a little as pressure of other commitments has forced members of the team to invest less time in the project than they would have wished. However, V2 is now in alpha test, and we confidently expect to be able to make a general release in the near future: release may well have taken place by the time this article appears in print. The majority of the remainder of this article describes the features that we are fairly confident will be present in that release. \Section {Ideas which are almost certain to appear in \eTeX\ V2} Although we do not wish to give an absolute commitment at this stage, particularly as many of the proposals are the process of being tested at the time of writing, we do believe that the ideas contained in the following section are very likely to appear in \eTeX\ V2. A subsequent section discusses ideas which may appear in future releases. \Subsection {Increasing \TeX's registers} \TeX\ has 256 of the most commonly used registers: counts, dimens, skips, boxes, toks, etc., and whilst these are enough for normal applications, advanced formatting systems really require more. In \eTeX\ V2, we intend to provide 32768 of each of these, which we hope will be sufficient for the most demanding packages. Insertion classes will still be restricted to 256 or fewer, and \code{box} 255 will retain its special significance. The `etex' format will allow both local and global allocation of these registers (`plain' allows only global), and for efficiency reasons a user will be able to elect whether to allocate a register from the dense (0..255) pool or from the sparse (256..32767). To allow the allocation mechanism to overflow from dense to sparse without risking a conflict with the allocation of insertion classes, the format allows a user or package to pre-reserve a number of insertion classes. Facilities for block-allocating a contiguous set of registers will be provided. \Subsection {Improved natural language handling} \TeX\ overloads the \code{lccode} concept, using it both for `real' lower-casing operations and also for purposes of hyphenation. In \eTeX\ V2 these operations are unbundled, and the codes used for hyphenation can be staticised as the patterns are read in (the current set of lccodes is used). Thereafter, whenever a particular language is used, the corresponding set of hyphenation codes is loaded. \Subsection {Arithmetic expressions} Although \TeX\ can perform simple arithmetic (addition, multiplication and division), these operations are `assignments' and therefore cannot be used in expansion-only and certain other contexts. \eTeX\ V2 provides a set of arithmetic primitives which evaluate an expression in such a way that the value of the expression can be accessed in expansion-only contexts, as well as being usable (for example) when \TeX\ is looking for a \meta {number}, \meta {dimen}, etc. As \TeX\ intentionally uses only integer arithmetic wherever the results of a computation are accessible to the user, floating-point arithmetic has \stress {not} been provided. There are four new primitives, \code{numexpr}, \code{dimexpr}, \code{glueexpr} and \code{muexpr}, each of which requires its operands to be of appropriate type (or coercible to that type). Parentheses may be used to indicate precedence wherever this will clarify or disambiguate an expression. The normal arithmetic operators `+', `-', `*' and `/' are allowed within an expression. \Subsection {Discards are no longer discarded!} When \TeX\ performs page-breaking, so-called `discardable items' which follow the chosen breakpoint are discarded; whilst this is perfectly reasonable if the page break is actually taken, recursive techniques aimed at optimising the appearance of multiple pages require the ability to `undo' a pagebreak in order to try the effect of breaking elsewhere. The discarded items are therefore required in order to re-create the vertical list which \TeX\ is trying to break. In \eTeX\ V2, we allow access to these `discarded items' via a new primitive \code{pagediscards}. The discards which occur during \code {vsplit}ting are also accessible via an analogous primitive \code{splitdiscards}. Both of these primitives return a vertical list, similar to that obtained by \code{unvbox}ing a box register. \Subsection {Read-write access to \code{parshape}} Although \TeX\ allows the user to create arbitrarily complicated paragraph shapes through the use of the \code{parshape} primitive, it provides no way for the user to find out which \code{parshape} is currently active (although it does allow the user to ascertain the number of lines of the current \code{parshape} specification). In \eTeX\ V2, we allow full read access to all the elements of the current \code{parshape}. %%% Peter: is there any merit in allowing read/write access, do you think? \Subsection {Interrogating the current conditional context} In \eTeX\ V1 we allowed users to make environmental enquiries concerning the current group, both as to its depth of nesting and to its type. In \eTeX\ V2, we generalise this concept and allow analogous access to the current conditional context through the use of \code{currentiflevel}, \code{currentiftype}, \code{currentifbranch} and \code{showifs}. \Subsection {Access to information concerning font-character combinations} Although it is possible to gain some information about a particular character in a given font by typesetting that character in a box and then measuring the dimensions of the box, not all the dimensions of the character can be reliably obtained in this way, and there is no way to ensure that the character actually exists in the font before attempting to typeset and measure it. In \eTeX\ V2 we allow the user both to check whether a particular character exists in a given font, using \code{iffontchar}, and (if it does exist) to measure the four fundamental dimensions of that font/character combination using \code{fontcharwd}, \code{fontcharht}, \code{fontchardp}, and \code{fontcharic} (representing width, height, depth and italic correction respectively). Furthermore, we ensure that users are alerted to the existence of missing characters in a font by causing lost characters to be logged to the console as well as to the log file if \code{tracinglostchars} is set to a value greater than 1. \Subsection {Better debugging aids} In order to assist in diagnosing mis-matched or runaway group problems, \eTeX\ V2 allows the user to opt to be warned whenever a file is left in a group or conditional other than that at which it was entered. This may be accomplished by setting \code{tracingnesting} to a value greater than zero. \Subsection {Subtle change to the semantics of \code{protected}} \eTeX\ V1 introduced a new prefix, \code{protected}, which inhibited the expansion of the `protected' macro in contexts in which expansion was unlikely to be required. Further research into this area suggested that at least one such case had been missed, and `protected' macros are now inhibited from expansion when \TeX\ is scanning ahead while processing alignments. \Subsection {Optimisations} To improve the overall efficiency of \eTeX\, internal modifications have been made to reduce the resources required when there are a number of \code{aftergroup}s active for a single group, and to eliminate the stack space wasted in setting a register to the same value as it currently holds. \Subsection {Access to the components of a glue quantity} Whilst it is possible to gain access to the various components of a glue value by clever macro programming, the code required is sufficiently arcane to suggest that a better method is much to be preferred. Accordingly we are considering a set of primitives \code{gluestretch}, \code{gluestretchorder}, \code{glueshrink} and \code{glueshrinkorder} which will give much-simplified access to these quantities. As a part of the same process we are looking at two conversion primitives, \code{mutoglue} and \code{gluetomu}. \Subsection {Improved typographic quality} Whilst the majority of the work in \eTeX\ is aimed at providing the \eTeX\ programmer with more powerful tools, we are aware that the real purpose of \TeX\ is to generate typeset output of the highest quality. During a meeting in Brno with Prof. Knuth on the occasion of his honorary doctorate, he suggested that we might like to consider improving the typographic quality of the last line of a paragraph. According to Don, traditional (hot-lead) typesetters would set the last line to the same tightness or looseness as the immediately preceding line, and he thought that \eTeX\ should be capable of doing likewise. We are looking into providing this but in a parameterised manner, so that \stress {all} possibilities between \TeX's current behaviour and that suggested by Don can be achieved. We think that this might be controlled by a parameter called \code{lastlinefit}. \Subsection {Improved typographic quality, cont.} In the same vein, we are looking into ways of allowing better parameterisation of the page-breaking process by having not just one penalty for (say) a club- or widow line but a whole array of such penalties which can reflect the undesirability of leaving one, two, three to $n$ lines at the top or bottom of a page. Other related penalties are also candidates for this process. \Section {Ideas still under discussion} The following ideas are all under discussion but are very unlikely to find their way into \eTeX\ V2: some may be deferred to \eTeX\ V3, and some may never appear at all. Although the group have some idea into which category each of these ideas may fall, it is probably not helpful to go into too much detail here, and so they are all lumped together as `under discussion'. \Subsection {Can \TeX\ find this font?} In `the good old days', a \TeX\ program could count on finding all 76 of the standard \TeX\ fonts no matter where it was run in the world. These days, with many documents being set in exotic fonts from a myriad of sources, it is no longer certain that, just because site~A has font~F, site~B will have the same font. We are therefore considering providing an \code{iffont} primitive which will allow \eTeX\ to ascertain at run-time whether a particular font exists on the system on which the document is being processed. It is not certain at this stage whether this would be a simple `does this font exist?' test, or a more complex `does it exist and is the \TFM\ file for it valid?'. \code{tryfont} has also been suggested as an alternative approach. \Subsection {Maths alignments} Peter Breitenlohner, the implementer of \eTeX, probably typesets more mathematics than the rest of the group put together, and he believes that there is a case for a maths alignment primitive. He has not yet finished his research on this topic, and all that can be said at this stage is that we are considering implementing some form of \code{malign}. \Subsection {Typesetting on a grid} Whilst \TeX\ is \stress {excellent} at typesetting in designs where variable quantities of white space can be allowed to occur, trying to coerce it to set on a regular grid (something at which packages such as Quark Xpress excel) is \stress {far} more difficult. The various macro-based solutions which have been tried do not seem to address the underlying problems, and we are looking at providing an entirely new paradigm within \eTeX\ whereby material being typeset can be caused to `lock on' to a grid at some point in the page-building process. Although at first sight it might be thought that it is the reference points of the lines making up the page which need to lock to this grid, we are fairly certain that this is not always the case, and we are therefore looking at ways of associating one or more `handles' with a particular box. In the degenerate case there will be one handle which is coincident with the reference point of the line, but in more complex cases there may be two, three or ever more handles, each of which will lock on to one line of the grid. Even so, there are also situations in grid-based designs where the grid-lock contraints just have to be violated, and one topic still unresolved is whether it is then better simply to allow the box to float free, or whether it is better to constrain it in some way, perhaps by associating with the handle(s) a degree of flexibility which is in some way analogous to \TeX's current use of the `glue' concept. Subsection {More on improved typographic quality} Another point made by Don during his stay in Brno is that there are situations in which \TeX's (vertical) positioning of elements of mathematical formul\ae\ is less than ideal. He points out that even in typesetting the \TeX book he had to make use of kludges such as \code{sub \bs strut} in order to achieve the best visual effect. We are investigating ways in which the effect (and related effects) could be achieved by better parameterisation of the mathematical typesetting process. \Section {NTS} In the introduction to this paper it was mentioned that the \NTS\ project proper had been put `on ice' until the group had sufficient funds to allow a programmer to be employed full-time to work on the project. It is with great pleasure that I can now report that, as a result of the generosity of \DANTEeV\, the group has DM 30 000 which can be used for this purpose. On the recommendation of Ji\v r\'\i\ Zlatu\v ska, we have made an offer to Karel Skoupy of the Czech Republic, which he has accepted, and Karel will be starting work on \NTS\ during late February 1998. It has been agreed that the language of implementation will be Sun's \JAVA, and Karel's first task (apart from becoming a \JAVA\ expert\dots) will be to draw up a specification for \NTS. Provided that the group agree with his design, he will then start work on implementing \NTS, and we hope to be able to review his work after a further six months. Within one year of commencement we hope to have a working implementation of \NTS, not simply a port of \TeX\ to \JAVA\ but a total re-design intended to emphasise the deep structure of \TeX\ whilst avoiding the design features which make the present system rather difficult to extend or change. The group are still determined that \NTS\ will be 100\% \TeX-compatible, and are confident that it will remain so for at least the first five years of its life. We are less certain whether divergence should then be permitted, in order to add new functionality which is in some way incompatible with \TeX. If we do decide that compatibility must be sacrificed, we will give considerable notice of that decision, and users who must retain the ability to process legacy documents in a manner identical to \TeX\ will be advised to take an archive copy of \NTS\ before compatibility is lost. One exciting idea which the use of \JAVA\ permits is the possibility to integrate access to \CTAN\ (\CNAN?!) with \NTS: it is by no means impossible that \NTS\ might be able to fetch for itself any module which cannot be found on the local system and which is needed in order to process a document. If that becomes a reality, \TeX\ will have become truly integrated into today's (and tomorrow's) globally-networked world. \end{Article}