\def\CMR#1{{\fontfamily{cmr}\fontencoding{OT1}\selectfont#1}}
\iffalse
Dear Sebastian 29 May 1994
Here is an article for Baskerville.
To make your life easier, why don't I promise to add or substract
material so that it occupies exactly two pages.
I need to send you material regarding consultants list.
As I am responsible for any errors of fact or exposition, if you need
to edit it for style or content, I would like that you send the
revised version to me for approval, and if possible consult with me
before making changes.
with best regards
Jonathan
\fi
\title{Backslash---Expansion of macros and so forth}
\author[Jonathan Fine]{Jonathan Fine\\\texttt{J.Fine@uk.ac.cam.pmms}}
\begin{Article}
\noindent
It is usual, in programming languages which admit compilation (such
as {\it C}, BASIC and Pascal) for there to be a rigid and inviolable
separation between code and data. It is possible for an interpreted
BASIC program to write a program source file which is then loaded and
run, but such is rather bad form. The same separation generally
applies to Smalltalk, which is probably the most sophisticated of the
interpreted languages. (My knowledge of LISP is limited. May its
supporters please note that my endorsement of Smalltalk is, for the
purposes of this column, a personal opinion only).
\TeX, however, has no inbuilt distinction between code and data. As
far as it is concerned, all is just one long sequence of varying
types of tokens. This will be made clearer later. It is not as if
there is one stream from which instructions are drawn, and another
from which data is drawn. It is usual for compiled programming
languages to have a ``\verb"GOTO"'' mechanism (usually implicit
within loop and conditional constructs, and also subroutine and
function calls) that allows forward and backward jumps within the
code stream, which is in fact more like a heap of tiny sequences of
instructions linked by random access pointers.
Why am I saying all this? Most beginners expect \TeX\ to behave like
other programming languages. Up to a point it does, particularly if
all one wishes to do is write a simple replacement text macro, or set
the values of some registers or parameters. But when it come to
reading data from within a macro it definitely does not, and here
beginners generally become unstuck, in the sense of losing their
grip and running off the rails.
%% deletable
In another sense, of course, they become stuck. You pays your money,
you takes your choice.
%%
I know that I had these problems six years ago when I started with
\TeX. While the {\em \TeX book} explained to me how \TeX\ behaved,
it did not give examples to clearly dispel my wrong prejudices.
%% deletable
(If you have prejudices or habits, may they be beneficial.)
%%
Hence this article. Most people have some experience of writing a
program, even if only a humble batch file for use with MS-DOS.
It is a simplification, which does no harm for the purpose of this
article, to imagine the input stream to \TeX\ being one enormous long
list of tokens. Change of category codes, \verb"\input" and
\verb"\endinput" commands, and also the \verb"\openin" and
\verb"\read" commands do not fundamentally alter this point of view.
If a format file or some macros have previously been loaded (and such
usually has been) then some of these tokens will be macros (or more
exactly will have macro meaning when executed) and will thus
influence the subsequent operation of \TeX.
It is now time to announce the fundamental law on the expansion of
\TeX\ macros. Suppose a \TeX\ macro in the input stream (usually but
not necessarily at the very head of the stream) is expanded. The
effect of this expansion is to alter or edit the input stream, in a
very specific manner. This is explained on [203]
(this means page~203 of {\em The \TeX book}).
Once the parameter text, if any, has been read, and the replacement
text, if any, has been put in its place, the expansion of the macro
is at an end. It is done, over, finished, and no more. However, for
the purposes of error reporting \TeX\ keeps a note of how the
replacement text came to arise. We will see the use of this later.
This information however in no way affects subsequent error-free
execution. As far as \TeX\ is concerned, it is just as if it had
been presented at this stage with the given amended input stream.
Processing by \TeX\ now continue with the current state and the new
stream of tokens.
Here is an example. Plain \TeX\ defines
\begin{verbatim}
\def\centerline #1{\line{\hss #1\hss}}
\end{verbatim}
and so the expansion of
\begin{verbatim}
\centerline{
}
\end{verbatim}
is
\begin{verbatim}
\line{\hss \hss}
\end{verbatim}
and that's it. This is the end of the expansion of the
\verb"\centerline" macro. It so happens that \verb"\line" is also a
macro
\begin{verbatim}
\def\line{\hbox to \hsize}
\end{verbatim}
and so we obtain
\begin{verbatim}
\hbox to \hsize{\hss \hss}
\end{verbatim}
as a subsequent stage from the \verb"\centerline" command. The token
\verb"\hbox" refers to a primitive \TeX\ command, which is now
executed. Note that if there were control sequences in the
\verb"", then they will not be executed until \TeX\ is
processing the contents of the \verb"\hbox".
If there is a misspelt control sequence with the \verb"",
\TeX\ will produce one of its famous multiline error messages, saying
that within the expansion of \verb"\centerline" there was an
expansion of \verb"\line", within which there was an expansion of the
misspelt control sequence. But because misspelt and thus, presumably
unknown, the expansion is to produce an error message. Knuth has new
users run through precisely this situation [33]. Did you follow his
advice and typeset the story about R.~J. Drofnats? I confess that I
did not.
The expansion of a macro results in a change in the input stream of
tokens. Let us use the word `performance' to mean the end and final
result of the expansion and execution of the macro
and the tokens contained within, and perhaps their performance also.
The expansion of \verb"\centerline" is as above. The execution is to
set text in a horizontal box of width \verb"\hsize" and centered.
Beginners may be frightened by the line of code
\begin{verbatim}
\setbox 0=\centerline{Title}
\end{verbatim}
but experts will know that this is in fact legitimate, and for why.
Let us now move on to loops. I know that such things are avoided by
all except those with tendencies to ovine larceny
%% deletable
(I'm struggling to fill the white space at the end of the article)
%%
but just suppose we wish to read a sequence of letters and---oh
horror---put a small space between each and the next.
There are many ways to do this
(letter space, not steal sheep).
Without a context there is no right or wrong, although the more
bizarre solutions are more amusing and instructive of human
psychology than useful. Without further ado, let's have some
examples.
My favourite is admirable in its simplicity. Here it is.
\begin{verbatim}
\def \spaceit #1{#1\littlespace\spaceit}
\end{verbatim}
We assume that \verb"\littlespace" will produce a small space, say by
a kern.
Let's see it in operation. The performance of
\begin{verbatim}
\spaceit Baskerville
\end{verbatim}
begins with the expansion of \verb"\spaceit"
\begin{verbatim}
B\littlespace \spaceit askerville
\end{verbatim}
and then the \verb"B" and \verb"\littlespace" are performed
(\ie typeset and added to the current horizontal list), leaving
\begin{verbatim}
\spaceit askerville
\end{verbatim}
which now proceeds as before. This is called ``tail recursion'' by
computer scientists [219]. It is an elegant way of repeating a story
(Groan).
All things, even \verb"Baskerville", will come to an end. We need to
find a way of persuading \verb"\spaceit" to stop. One way to do this
is to space a sentinel and the end of \verb"Baskerville", for which
\verb"\spaceit" can test with each iteration. I will show how to do
this next month.
Testing for the sentinel takes time. In some situations it is better
to take a more active approach. Let us look at this. We want
\begin{verbatim}
\endspaceit
\end{verbatim}
to break the \verb"\spaceit" loop, so that
\begin{verbatim}
\spaceit Baskerville\endspaceit
\end{verbatim}
will insert all those \verb"\littlespace"s.
The penultimate expansion of \verb"\spaceit" is
\begin{verbatim}
\spaceit e\endspaceit
e \littlespace \spaceit \endspaceit
\end{verbatim}
and once the `\verb"e"' and the \verb"\littlespace" have been done we
have
\begin{verbatim}
\spaceit \endspaceit
\endspaceit \littlespace \spaceit
\end{verbatim}
and now we go for a dirty trick. With the definition
\begin{verbatim}
\def \endspaceit \littlespace \spaceit {}
\end{verbatim}
the expansion of the previous line is
\begin{verbatim}
% empty
\end{verbatim}
which is just what we want. There we are, a loop without use on any
of the control primitives. (It is worth noting that the so called
{\em expansion\/} of a macro might be {\em smaller\/} than its
arguments, or even zero.
Finally, solutions and exercises.
\noindent
{\bf Solution 3.}
{\em Two tokens have the same meaning. When does the substitution of one
for the other make a difference?} For definiteness suppose that we
\begin{verbatim}
\let \RELAX \relax
\end{verbatim}
and then replace some occurence of \verb"\relax" by \verb"\RELAX". I
know that this example is unlikely, but it serves to express the
solution to the problem. It will make a difference in the following
situations. Firstly,
\begin{verbatim}
\string \relax
\end{verbatim}
and secondly any assignment such as
\begin{verbatim}
\let \relax \something
\def \relax { ... }
\end{verbatim}
and finally
\begin{verbatim}
\def \macro { ... \relax ... }
\end{verbatim}
should an \verb"\if" or \verb"\meaning" be subsequently applied to
\verb"\macro", and as far as I know, that's it.
\noindent
{\bf Solution 4.}
{\em What operational difference is there between
\begin{verbatim}
\def\aaa{aaaaaaaa}
\def\xyz{aaaaaaaa}
\end{verbatim}
and
\begin{verbatim}
\def\aaa{aaaaaaaa}
\let\xyz\aaa
\end{verbatim}
if any at all\/} was the problem. Macros need memory for their
storage, and [383] tells us how much. The second variant will
require less main memory (and make for quicker \verb"\ifx" tests I
presume) than the first. This is because the \verb"\let" command
[206--7] sets the meaning of the first argument (\verb"\xyz") to be
whatever the current meaning of the second (\verb"\aaa") is. \TeX\
stores meanings in its memory. The \verb"\let" command sets the
meaning pointer for \verb"\xyz" to be equal to (and so point to the
same meaning as) the meaning pointer for \verb"\aaa". Moreover, if
the code above itself appears in a macro, this macro will require
less storage {\em and\/} execute quicker when the second variant is
used.
\noindent
{\bf Exercise 5.}
This comes from the excellent {\em Around the Bend\/} puzzle column
run by Michael Downes of the American Mathematical Society (email
{\tt mkd@math.ams.org}). The problem is to write a macro which will
trim the leading and trailing spaces from user supplied text, such as
the parameter text to \verb"\centerline" or \verb"\section".
\noindent
{\bf Exercise 6.}
When unexpandable commands are inserted between the letters of a word
the kerning and ligatures are lost [19, Exercise 5.1]. Compare
`WAW' to `W\/A\/W'. The second has had \verb"\relax"
commands inserted between the letters. Clearly, high class letter
spacing (should there be such a thing) will respect the kerning
information in the original font. For ligatures it is not so clear,
and certainly harder. The problem is to deal with this kerned
letterspacing problem. And while you're at it, how do we deal with
the trailing \verb"\littlespace" that \verb"\spaceit" will leave at
the end of \verb"Baskerville".
\end{Article}
\endinput