\MakeShortVerb{\|}
\title{Portable Documents: What Next?}
\author{Frank Harwood}
%(omits Klaus' talk, as I arranged with him: since Tardival
%provided her own report, you could also omit his summary of that)
\begin{Article}
\subsection{Introduction}
%(I have to write this $\ldots$)
This article is a summary of the conference held on 15~February 1996
by the British
Computer Society Electronic Publishing Specialist Group in
conjunction with \ukt\ continuing the `portable documents' theme.
The two previous articles on Euromath and Hyper-G also cover papers
presented at this conference and so are not mentioned in this
summary.

\section{What Next?}
\subsection{Les Carr, University of Southampton}

``Now we have them, how do we use them/maximise the benefit?''
was the theme of Les's talk.  We have available a spectrum of
portable formats, from \texttt{.PDF} with high visual fidelity, to \acro{SGML}
which preserves logic of content without reference to
appearance.  (The intriguing question was floated and left
open, that although we have a large measure of portability
between systems -- do we have temporal portability? -- can I
read today's electronic document in 30 years time?)

Documents are now available from around the globe, and to be
most useful need links between them.  Hypertext-type linking
is found in personal systems (e.g., Guide, Hypercard) through to
global systems (e.g., \acro{WWW}), but to a degree, all are `closed' -- a
non-universal and therefore `proprietary' markup is used which
does not extend into other people's systems.

The Microcosm Model (at Southampton) separates the document
control system (how to produce and display it), from the link
control system.  Links are any type of relationship between
documents, and flexible link definitions are allowed for.  The
Open Journals Project, funded by \acro{JISC}, applies Microcosm
technology to \acro{WWW}.  It is possible to integrate on-line
journals with each other and with various on-line databases
and teaching resources.  The concept of a document becomes
very broad indeed and the databases of links (linkbases)
become value added commodities in their own right.  Linkbases
are configurable for different levels and purposes.  This
could be seen as opening a new publishing idiom where the
various `closed' technologies mentioned become local and short
term solutions.

See it at \url{http://journals.ecs.soton.ac.uk/}

\section{Converting from \LaTeX\ to SGML}

\subsection{Sebastian Rahtz, Elsevier Science}

An in-depth review, heavily illustrated with examples, was
given by Sebastian, discussing the problem, various feasible
approaches, and demonstrating results so far produced at
Elsevier  (package not in the public domain).

The publisher faces a large community using \LaTeX, a mature
notation and free typesetting system, well suited to
scientific and multi-lingual work.  Unfortunately, it is not
what the publisher uses, not an international standard and
does not convert for various purposes as does \acro{SGML}.

Four practical approaches were mentioned:
\begin{enumerate}
\item Throw away electronic file and retype.
\item Strip out \TeX\ coding and treat as unknown word processor.
\item Write program to parse \LaTeX\ and output \acro{SGML}.
\item Re-implement \TeX\ to output \acro{SGML} codes.
\end{enumerate}
of which the first two were not discussed.

The parser approach has been implemented a number of ways but
can only be partially successful because \TeX\ is macro based
with extensible syntax.  All results so far require afterwork
to tidy up.  To implement route 4, three methods have been
used.
\begin{enumerate}
\item Replace \TeX\ backend
\item Rewrite \TeX\ in a new language -- has been done in \acro{LISP}
\item Write \acro{SGML} code to the dvi file and extract it from there -- used
by Elsevier.
\end{enumerate}

The work done was described in detail stressing the importance
of the target \acro{DTD} and the richness of the \LaTeX, highlighting
also a number of pitfalls.  The acid test is that it works and
real scientific papers can be translated, though human
intervention at some level is frequently needed to perfect the
end product.

For more detail, see \TUB\ 16.3.

\section{\acro{SGML} is here}

\subsection{Andrew Dorward and Neil Bradley, Pindar}

Substituting for the speaker originally planned, Andrew and
Neil gave a lightning rendition of the \acro{SGML} story -- %
principles, implementation considerations, current
developments and criteria for use or rejection!

The principles of \acro{SGML} are well understood within the group.  It
is an open system, defined by \acro{ISO}\,8879 (1986).  There has been
a recent expansion of interest, triggered by \acro{HTML} -- restricted
and non-open.  The current state of play is that \acro{SGML} is used
for many more `pages', but has a lower profile.  Recent
developments, particularly \acro{DSSSL} and Hytime provide
enhancements to the use of \acro{SGML}.  Although \acro{SGML} is independent
of any software publishers, there are numerous products around -- parsers,
editors and \acro{DTD} tools, and in a production
situation it makes sense to adopt the best available.   The
new Frame \acro{SGML} software suite -- just out of beta-test and
becoming available now, was strongly recommended.

Implementation considerations were discussed using a markup of
an article as an example.  Whereas the considerations are the
same whatever the task, the use of appropriate software can
aid efficiency, reduce errors and give more options on the use
of tagged data and the control of style.

Take account of:
\begin{description}
\item[``Granularity''] -- how far to break down the material.  High
granularity $=$ a fine breakdown.  This adds value to the
information at a price (in effort) and should be chosen
appropriate to purpose.

\item[``Hierarchy''] -- markup objects can contain other markup
objects.  Many levels are possible.  Again this needs to
be set appropriate to material and purpose.

\item[``Attributes''] -- e.g., use of |<name 'personal'>| and
|<name 'company'>| can enable separate listing.

\item[``Hypertext''] -- the setting up of cross-reference jumps
from one point in the text to another, internal or
external to the document or database.  Here again the
software used can greatly facilitate the process.

\item[\acro{DTD}] -- the Document Type Definition controls granularity,
quality, optional/mandatory items, alternatives,
sequences, element names.  A visual \acro{DTD} tool such as
``Near and Far'' makes for faster and more accurate
production of \acro{DTD}s.

\item[Authoring Environment] -- may be structured (software from
Frame, SoftQuad) where the authoring process is
interactively constrained according to the \acro{DTD}, or
`loose' (software from Microsoft or other out-of-line
parsers).
\end{description}

There is a checklist of reasons to adopt \acro{SGML}.  With two ticks
against the list \acro{SGML} should be considered, with four ticks,
it would be very foolish not to use it.
\begin{itemize}
\item[--]long shelf life data? (\acro{SGML} is too expensive for transient
data)
\item[--]for multiple media publication?
\item[--]frequent republication?
\item[--]need searchable database?
\item[--]for inter-department or inter-company exchange?
\item[--]new product extraction from existing data?
\item[--]need heavy hypertext?
\item[--]industry requirement?
\end{itemize}

A fictional ``must use'' example was given.  A Company with a
body of high value articles wishes to publish on hard copy,
\acro{CD}-\acro{ROM}, \acro{WWW}, also abstracts, also lists of articles and
contributors as separate products.  All can be pulled out with
little effort after the initial investment in \acro{SGML}.


\section{Java -- The Krakatoa of the Web}

\subsection{Henry Rzepa, Imperial College, London}

This item outlined a particular body of work done at Imperial
College before the appearance of Java, the improvements made
possible by Java, and some informed speculation on future
developments.

Work started in 1994 to try to publish representations of 3\acro{D}
molecular structures (\acro{MIME} type).  The concept was to be able
to click on a hyperlink and get a 3\acro{D} rotatable model within a
2\acro{D} document.  This required a marked up dataset defining the
model and a script  on the user's computer which read and
interpreted the markup.  A result which worked was achieved,
albeit with non-standard components and on Unix only.  When
\acro{VRML} was brought into use in March 1995, improved
communication between the 3\acro{D} model and 2\acro{D} document resulted,
but the whole was still non-standard and Unix only.

The introduction of the Java language to the project in July
1995 achieved a seamless interface between the \acro{WWW} client,
data and action, and seamless memory and security models for
the whole ``document''.  Not only did this fulfill the original
concept elegantly, but extra features became possible.
Rotatable models of various representations could be
interchanged and extra information delivered, e.g.\ by clicking
on certain points of the model, Java applets could be invoked
which delivered inter-atomic distances.

Java in its present form, as a \acro{C}++ like language, is a
powerful tool, used as directed, and had a major impact on
this particular project.  There are plans to use the project
as a basis for an electronic conference in June 1996.  Its
full features are available on Unix and Windows~95 platforms
only, and whereas this accounts for 90\% of visitors to
Imperial's web site, it really needs Windows 3.1 and Mac
versions for universal take-up.  There is also mileage to be
had from a future integration of \acro{VRML} with Java.  Apple's
CyberDog is an alternative but may be released 2 years too
late for acceptance.

There are, of course, problems.  Already it appears that early
Java applets are not compatible with later;  a serious flaw
which must be addressed before Java can become a mainstream
language/method.  For publishing/\acro{WWW} applications there is a
particular danger that data incorporated within applets will
``disappear'' in the sense that it will not be searchable as is
HTML data.

See it (and rotate it!) on:

\url{http://WWW.ch.ic.ac.uk/java/java_1.html}

\end{Article}