\MakeShortVerb{+}
\title{\HT: a working standard}
\author[Arthur P. Smith]{Arthur P. Smith\\
Dept. of Chemistry, BG-10,\\
University of Washington, \\Seattle, WA
98195\\\texttt{asmith@mammoth.chem.washington.edu}}
\begin{Article}
\section{Introduction}
\emph{Note: this paper was prepared for the American Phsyical Society
electronic publishing conference, Los Alamos N.M. October 14--15 1994.}
The past year has seen a revolution in the processes of
Internet-based information navigation and retrieval with the
advent of easy-to-use graphical browsers (in particular Mosaic)
based on the World-Wide-Web (WWW). The revolution is a result of two
components --- first the browsers allow a near-uniform (point-and-click
or other method) access to documents in almost any format
and from almost any Internet-based
source, accessed as regular files or via ftp, gopher, http or one of
many other possible methods; along with this the Universal
Resource Locator (URL) mechanism provides
a surprisingly easy and uniform way to specify the location of any
document on the net. Second, for certain classes of documents
(html files, or gopher text files) embedded URL's or other
addresses are understood to refer to other, external, documents which can
be followed according to the interests of the person viewing the
document, producing an interconnected web of documents.
The goal of the \HT{} collaboration is to extend this second
privileged class of documents to include documents based on \TeX{},
the word-processing language of choice for mathematical and scientific
writing, thus fully incorporating \TeX{} documents into the burgeoning
\textbf{web} of information on the internet.
\section{Why \HT?}
There already exists one approach for incorporating \TeX{} documents
more fully into the \textbf{web} --- conversion to HTML, as in the
program \texttt{latex2html} by Nikos Drakos. This can work very well,
and is already used in some of the electronic publications in
mathematics, but there are also several serious problems with this,
aside from the technical issues associated with the complexity of the
conversion process. HTML by design allows very little author control
of the visual form of a document. This is touted as an advantage
because it preserves only the \emph{essential} elements of a document
and not the artificialities of a page --- in fact HTML documents do
not have pages at all, although some of the sense of a \emph{page} is
implied by separation of a single document into many files. Aside from
loss of author control, there is a practical problem of a lack of
mathematical tools in the current implementations of HTML --- tables
and equations are either difficult to implement or impossible.
\texttt{latex2html} gets around this by conversion of such things to
bitmapped images, but this is an inefficient and expensive process ---
and goes in just the opposite direction of HTML's theme of extracting
the \emph{essence} of a document, making the document essentially
unreadable without a good network connection and a computer with a
high quality display.
These problems with HTML are compounded if scientific authors
attempt to write documents directly in HTML rather than using
\TeX{} first --- the lack of authoring tools, the absence of macro
capabilities, and the ill-defined nature of the language make
this an unpleasant task; just dealing with ordinary text
is easy, but getting Greek letters, mathematical symbols, equations
and tables into your document is not. The one nice feature
of HTML is the ease with which figures can be incorporated into
a document. But at least PostScript figures can be incorporated into
a \TeX{} document with equal ease using modern \emph{dvi} interpreters, and the
\HT{} standard presented here allows arbitrary images and other
external documents to be referred to and brought to the screen
with a single mouse click.
The point of all this is that hypertext capabilities, and the use of
URL's to locate new documents --- the main feature of HTML that makes it
such a useful network information navigation tool --- can be much
more easily incorporated into \TeX{} than the mathematical capabilities
of \TeX{} and the years of experience embedded in various \TeX{} macro
packages can be incorporated into HTML. Whether \TeX{} in general provides
a better model for the viewing of on-line information remains to
be seen.
\section{How does it work?}
The underlying element of our implementation of \HT{} is the use
of a \TeX{} macro that bypasses the \TeX{} interpretation process and
sends a message directly to the \emph{dvi} interpreter that processes \TeX{}
output. This is the +\special+ command, previously used to define
procedures for drawing or including figures in \TeX{} documents. When
the characters +\special{+{\ttfamily\itshape string}+}+ appear in the
\TeX{} document, the \emph{string} is passed directly without
interpretation to the output \emph{dvi} file (preceded by a marker to
identify this as a \emph{special} message to the \emph{dvi} interpreter). The
\emph{dvi} previewers or processers then interpret this string according to
its first few characters. The original \HT{} specification (due
to Paul Ginsparg, Tanmoy Bhattacharya, and me) uses the initial
characters \emph{html:} to denote \HT{} elements in an HTML-like
style. David Oliver (\texttt{oliver@gang.umass.edu}) has introduced a slightly
different specification that uses the initial characters \emph{hyp} to
denote his own style of \HT. I will discuss only the original
specification in this paper, since as far as they are currently
implemented both specifications are essentially equivalent. Note that
\emph{dvi} interpreters that do not understand the \emph{html:} or \emph{hyp}
special commands will ignore them, or at worst print out warning
messages. Therefore \emph{dvi} files processed to include \HT{}
commands are fully compatible with old \emph{dvi} interpreters.
After the initial \emph{html:} string, the specification is identical
to a restricted form of HTML. The five arguments we have added to
the +\special+ command are:
\begin{description}
\item[href:] +html:+
\item[name:] +html:+
\item[end:] +html:+
\item[image:] +html:+
\item[base\_name:] +html:+
\end{description}
The \emph{href}, \emph{name} and \emph{end} commands are used to do
the basic hypertext operations of establishing links between sections
of documents. The \emph{image} command is intended (as with current
html viewers) to eventually place an image of arbitrary graphical
format on the page in the current location. Currently for \XHDVI,
\emph{image} brings up an external viewer with the image, if such
a viewer is available. The \emph{base\_name}
command should be used to communicate
to the \emph{dvi} viewer the full (URL) location of the current document so that
files specified by relative URL's may be retrieved correctly.
The href and name commands must be paired with an end command later in
the \TeX{} file --- the \TeX{} commands between the two ends of a pair
form an \emph{anchor} in the document. In the case of an +\href+
command, the \emph{anchor} is to be highlighted in the \emph{dvi} viewer, and
when clicked on will cause the scene to shift to the destination
specified by \emph{href\_string}. The \emph{anchor} associated with a
name command represents a possible location to which other hypertext
links may refer, either as local references (of the form
\texttt{href="\#name\_string"} with the \emph{name\_string}
identical to the one in the name command) or as part of a URL (of the
form \emph{URL\#name\_string}). Here \emph{href\_string} is a valid
URL or local identifier, while name\_string could be any string at
all: the only caveat is that `+"+' characters should be escaped with a
backslash (+\+), and if it looks like a URL name it may cause
problems. There may also be problems if \LaTeX\ tries to interpret the
\emph{href\_string} or \emph{name\_string} --- in that case preceding
the command with +\protect+ should usually work. Any defined
\emph{name\_string} can be referred to in any href referring to the
document, in the form \texttt{href="URL\#name\_string"}. Note that
anchors may be nested. The only restriction in current implementations
is that anchors are truncated at page boundaries.
Because this html-based naming scheme is somewhat unwieldy, although
very general, Tanmoy Bhattacharya (\texttt{tanmoy@qcd.lanl.gov}) has
written several collections of \TeX{} macros to simplify things. The
basic package is \emph{hyperbasics.tex}\footnote{{\ttfamily
http://nqcd.lanl.gov/people/tanmoy/hypertex/hyperbasics.tex}}
which defines the following simple low level hypertex macros:
\begin{itemize}
\item+\href{url}{text}+: text becomes an href anchor referring to \emph{url}.
\item+\hname{myname}{text}+: text becomes a name anchor with name
\emph{myname}.
\end{itemize}
plus others that are used to automatically convert \LaTeX\ or other
style markup into corresponding names and references.
\section{How do I use it?}
\subsection{As a reader}
There are currently two \emph{dvi} interpreters that understand the
\HT{} +\special+s: \XHDVI{} for X windows, and HyperTeXView.app
for NextStep. We are proceeding with work on
a \emph{dvi}-pdf converter that understands \HT{}, and we are
encouraging work on \emph{dvi} previewers
or \TeX{} authoring tools for Macintosh and PC that incorporate
\HT{} elements.
For a \TeX{} document that has already been processed to a
\emph{dvi} file with \HT{} elements, viewing the internal hypertext
is almost trivial --- you just fire up the \emph{dvi} previewer and navigate
by button clicks as with Mosaic or other WWW browsers. To have
\XHDVI, for example, brought up automatically from Mosaic when
a \emph{dvi} document is referenced, you need to have a +.mailcap+ file
in your home directory, and create or modify the line:
\begin{verbatim}
application/x-dvi; xhdvi %s
\end{verbatim}
Your machine must already have the \TeX{} essentials on board of course --- in
particular the pk font files, and the location of those font files
needs to be communicated to the previewer. If xdvi is already working
for you, \XHDVI{} should work too. Details for getting \XHDVI{} working on
your machine are provided below.
For jumping to external documents from within the hypertexted \emph{dvi} file,
a couple of additional elements are needed, also desribed
below for the case of \XHDVI.
\subsection{As an author}
Here is where the power of \TeX's macro capabilities appears.
A working internal hypertext document can be made from a \LaTeX\
document with a one-line addition to the file, using Tanmoy Bhattacharya's
hypertex macros. These macros convert the standard \LaTeX\ markup
into hypertext links between the different sections of the document,
so that references to equations, tables, footnotes, and section
headings are in place, and bibliographic references and figures
refer back at least to the bibliography entry or figure caption.
These in turn may be set to refer to corresponding external documents
but this process is not automatic --- currently the author will have
to add these references by hand, although automatic procedures
can be envisioned. With an Internet connection, \XHDVI{} can be used
to preview the document and check that the references actually work,
before the document is submitted to the archives.
The macros developed thus far use standard naming conventions
for the underlying structures in \LaTeX\ and other standard
macro packages, so that appending \#equation.2.3, \#page.7,
\#figure.4, \#table.2, etc. to the URL for any \TeX{} file processed
with these packages will go to the right place, allowing
easy hypertext reference to the internal structure of other
documents.
In order to get started, however, you need to place these macro files
in one of the standard areas that your \TeX{} looks for input files
(you can modify your TEXINPUTS environment variable to get it to look
in your own directories). The needed macro files are itemized in the
\HT{} introductory document at
\URL|http://xxx.lanl.gov/hypertex/index.html\#more| and can be
obtained in one lump by anonymous
ftp.\footnote{\relax\texttt{ftp://snorri.chem.washington.edu/hypertex/hypermacros.Z}}
\subsection{As an e-print manager}
Since we currently only have \emph{dvi} previewers, an e-print server
would have to serve the documents in pre-processed \emph{.dvi} form.
This means converting documents to \HT{} if the author
has not already done this, and possibly applying automated
insertion of URL's corresponding to references in the
bibliographic section. The manager could do this by hand
but it might be rather time-consuming.
For ease of use, the best way to serve the documents
is probably as a combined package of \emph{dvi} and PostScript files that
go together. This requires the e-print manager to create
a new content-type associated with this package, and to supply
an unpackaging program for the reader to place in their +.mailcap+
file, which automatically calls up \XHDVI{} or another \HT{}
browser on the resultant main \emph{dvi} file. The reason for doing this
is that .ps files included by standard macros
will not generally be understood as remote documents, at least at
the current level of previewer capabilities.
Another option in this unpackaging method is to supply the \TeX{} file
itself, pipe it through a simple converter to \HT{} and through
\TeX{} itself, and then call one of the \HT{} viewers. These approaches
are already in use at some locations (e.g., CERN).
When the pdf converter is available, the entire document should
come as a single pdf file, unless the document refers to non-PostScript
images or other inclusions in which case the packaging approach (or
use of absolute URL's) remains necessary.
\section{How do I get it?}
Currently the following are available:
\begin{enumerate}
\item A \HT{}
viewer\footnote{\texttt{ftp://snorri.chem.washington.edu/hypertex/xhdvi\_0.6.tar.Z}}
based on xdvi-18, modified by Arthur Smith. Precompiled versions
for various UNIX architectures are available in the same directory.
\item HyperTeXview.app,\footnote{\texttt{dmitri@physics.stanford.edu}}
courtesy of Dmitri Linde (also the author of InstantTeX.app) for
NextStep, precompiled for Motorola and Intel-based NeXT
machines.\footnote{See
\texttt{http://xxx.lanl.gov/hypertex/index.html\#dvi} for
availability.}
\end{enumerate}
The macro and style files listed above
by Tanmoy Bhatta\-charya, available at
\URL|ftp://nqcd.lanl.gov/people/tanmoy/hypertex|
\section{Details on \protect\XHDVI}
\XHDVI{} retains all the features of the latest version of xdvi
(version 18) and adopts in addition many of the hypertext features
of Mosaic, the most popular WWW browser. Hypertext links are
underlined or altered in colour (the underlining can be turned off)
and a left-mouse click on a link causes the view to shift to
the destination point for the link, as long as the destination
is another \emph{dvi} file. If the link is not to a \emph{dvi} file, an external
viewer is employed, following the mime and mailcap definitions or
using standard defaults if those are not locally defined.
A middle mouse click on a link brings up a new viewer whether
or not the destination is a \emph{dvi} file --- this is intended to be useful
to refer back to equations or to bring up footnotes, since the new
\emph{dvi} window is small. There are also a large number of keyboard
accelerators, all described in detail in the man page.
In general, see the installation notes provided with \XHDVI.
In outline what is needed is:
\begin{enumerate}
\item The compiled \XHDVI{} program --- precompiled binaries are available for
Sun, NeXT, SGI, HP, IBM RS6000, or you can get the source and compile
it yourself. Let me know of any compilation troubles --- it's written in C.
\item The \TeX{} fonts, at least in pk format. If xdvi, \emph{dvi}ps or some other
\emph{dvi} interpreter are working on your machine then they must be around
somewhere.
\item Set up the connections between the Web browser and \XHDVI. If you
use mosaic for example,
\begin{verbatim}
setenv WWWBROWSER /usr/local/bin/mosaic
\end{verbatim}
will let \XHDVI{} know what to send HTML files to. To let mosaic know to
bring up \XHDVI{} for any \emph{dvi} files, you need to amend
in your +.mailcap+ file as described above.
\item The application defaults file for \XHDVI{} should be installed in
the standard application defaults directory on your machine, or you
can take lines from it and modify them for your own taste and put them
in your +~/.Xdefaults+ file. For example I use the following resource
specifications to get a particular size and position of the window
with white on black lettering and with the hyperlinks in cyan, and
to remove the buttons:
\begin{itemize}
\item[\null] xhdvi*geometry: 800x600-0-0
\item[\null] xhdvi*foreground: white
\item[\null] xhdvi*background: black
\item[\null] xhdvi*highlight: cyan
\item[\null] xhdvi*expert: true
\end{itemize}
\item You need to have the \textsf{ghostscript} program on your machine and in
your default execution path in order to view postscript from \XHDVI.
Similarly, other viewers defined in the +.mailcap+ file should be
available on the machine.
\item You need to install the man page xhdvi.man in
\texttt{/usr/local/man/man1} and add \texttt{/usr/local/man} to your
MANPATH environment variable in order for \emph{help} to work from
\XHDVI.
\end{enumerate}
\section{Some examples}
This document is available in raw \HT{} format and in converted
\emph{dvi} format via anonymous ftp at the address
\URL|ftp://snorri.chem.washington.edu/hypertex|. The \HT{}
version of this paper uses the two-column APS journal style of revtex.
The table of contents at the beginning is generated automatically with
the \LaTeX\ +\tableofcontents+ command.
See also the examples provided by Paul Ginsparg in
the \HT{} introductory document at
\URL|http://xxx.lanl.gov/hypertex/index.html|. Some of these are
files randomly selected from the HEP archive, including \LaTeX,
Rev\TeX, and other formats.
\section{What still needs to be done?}
Unfortunately, at this point reference to networked files
(via URL's) suffers from a couple of problems. \XHDVI{} does not
yet include any of the network transport code that ordinary
WWW browsers use, and the intention was to avoid having to add
this layer of complexity by communications back and forth
with a WWW browser. However, such communication is as yet
not standardized, and suffers from its own problems. So currently,
when \XHDVI{} comes across a URL reference, it forwards it directly
to the WWW browser (defined by environment or Xresource variables)
so that a reference to an external \emph{dvi} file would bring up a
new instance of the WWW browser which would in turn bring up a new
\XHDVI{} viewer. This is a rather inelegant solution, but it is
perhaps sufficient at the moment. A better solution will come along,
and it may simply be inclusion of network transport code in the \XHDVI{}
viewer itself, to make it a competing WWW browser\ldots
The other problem is that if brought up by a WWW browser, \XHDVI{}
is not provided with the absolute URL information used in
obtaining the \emph{dvi} file it is working on, and so cannot pass
this information on to further instances. Therefore, relative
URL's in a \HT{} document (unless they can be guaranteed to be to local
files that would have been transported along with the \emph{dvi} file) will not work.
Both of the above are problems intrinsic to current WWW browsers, and
we are working on promulgating solutions to these.
\section{How do I stay in contact?}
The Hypertex discussion group is a mailing list based at
\FTP|snorri.chem.washington.edu| which I maintain. Send me
e-mail if you want to join the list, or send queries directly to the
mailing list: \Email|hypertex@snorri.chem.washington.edu|.
\DeleteShortVerb{+}
\end{Article}