\MakeShortVerb{\|} \title{Portable Documents: What Next?} \author{Frank Harwood} %(omits Klaus' talk, as I arranged with him: since Tardival %provided her own report, you could also omit his summary of that) \begin{Article} \subsection{Introduction} %(I have to write this $\ldots$) This article is a summary of the conference held on 15~February 1996 by the British Computer Society Electronic Publishing Specialist Group in conjunction with \ukt\ continuing the `portable documents' theme. The two previous articles on Euromath and Hyper-G also cover papers presented at this conference and so are not mentioned in this summary. \section{What Next?} \subsection{Les Carr, University of Southampton} ``Now we have them, how do we use them/maximise the benefit?'' was the theme of Les's talk. We have available a spectrum of portable formats, from \texttt{.PDF} with high visual fidelity, to \acro{SGML} which preserves logic of content without reference to appearance. (The intriguing question was floated and left open, that although we have a large measure of portability between systems -- do we have temporal portability? -- can I read today's electronic document in 30 years time?) Documents are now available from around the globe, and to be most useful need links between them. Hypertext-type linking is found in personal systems (e.g., Guide, Hypercard) through to global systems (e.g., \acro{WWW}), but to a degree, all are `closed' -- a non-universal and therefore `proprietary' markup is used which does not extend into other people's systems. The Microcosm Model (at Southampton) separates the document control system (how to produce and display it), from the link control system. Links are any type of relationship between documents, and flexible link definitions are allowed for. The Open Journals Project, funded by \acro{JISC}, applies Microcosm technology to \acro{WWW}. It is possible to integrate on-line journals with each other and with various on-line databases and teaching resources. The concept of a document becomes very broad indeed and the databases of links (linkbases) become value added commodities in their own right. Linkbases are configurable for different levels and purposes. This could be seen as opening a new publishing idiom where the various `closed' technologies mentioned become local and short term solutions. See it at \url{http://journals.ecs.soton.ac.uk/} \section{Converting from \LaTeX\ to SGML} \subsection{Sebastian Rahtz, Elsevier Science} An in-depth review, heavily illustrated with examples, was given by Sebastian, discussing the problem, various feasible approaches, and demonstrating results so far produced at Elsevier (package not in the public domain). The publisher faces a large community using \LaTeX, a mature notation and free typesetting system, well suited to scientific and multi-lingual work. Unfortunately, it is not what the publisher uses, not an international standard and does not convert for various purposes as does \acro{SGML}. Four practical approaches were mentioned: \begin{enumerate} \item Throw away electronic file and retype. \item Strip out \TeX\ coding and treat as unknown word processor. \item Write program to parse \LaTeX\ and output \acro{SGML}. \item Re-implement \TeX\ to output \acro{SGML} codes. \end{enumerate} of which the first two were not discussed. The parser approach has been implemented a number of ways but can only be partially successful because \TeX\ is macro based with extensible syntax. All results so far require afterwork to tidy up. To implement route 4, three methods have been used. \begin{enumerate} \item Replace \TeX\ backend \item Rewrite \TeX\ in a new language -- has been done in \acro{LISP} \item Write \acro{SGML} code to the dvi file and extract it from there -- used by Elsevier. \end{enumerate} The work done was described in detail stressing the importance of the target \acro{DTD} and the richness of the \LaTeX, highlighting also a number of pitfalls. The acid test is that it works and real scientific papers can be translated, though human intervention at some level is frequently needed to perfect the end product. For more detail, see \TUB\ 16.3. \section{\acro{SGML} is here} \subsection{Andrew Dorward and Neil Bradley, Pindar} Substituting for the speaker originally planned, Andrew and Neil gave a lightning rendition of the \acro{SGML} story -- % principles, implementation considerations, current developments and criteria for use or rejection! The principles of \acro{SGML} are well understood within the group. It is an open system, defined by \acro{ISO}\,8879 (1986). There has been a recent expansion of interest, triggered by \acro{HTML} -- restricted and non-open. The current state of play is that \acro{SGML} is used for many more `pages', but has a lower profile. Recent developments, particularly \acro{DSSSL} and Hytime provide enhancements to the use of \acro{SGML}. Although \acro{SGML} is independent of any software publishers, there are numerous products around -- parsers, editors and \acro{DTD} tools, and in a production situation it makes sense to adopt the best available. The new Frame \acro{SGML} software suite -- just out of beta-test and becoming available now, was strongly recommended. Implementation considerations were discussed using a markup of an article as an example. Whereas the considerations are the same whatever the task, the use of appropriate software can aid efficiency, reduce errors and give more options on the use of tagged data and the control of style. Take account of: \begin{description} \item[``Granularity''] -- how far to break down the material. High granularity $=$ a fine breakdown. This adds value to the information at a price (in effort) and should be chosen appropriate to purpose. \item[``Hierarchy''] -- markup objects can contain other markup objects. Many levels are possible. Again this needs to be set appropriate to material and purpose. \item[``Attributes''] -- e.g., use of || and || can enable separate listing. \item[``Hypertext''] -- the setting up of cross-reference jumps from one point in the text to another, internal or external to the document or database. Here again the software used can greatly facilitate the process. \item[\acro{DTD}] -- the Document Type Definition controls granularity, quality, optional/mandatory items, alternatives, sequences, element names. A visual \acro{DTD} tool such as ``Near and Far'' makes for faster and more accurate production of \acro{DTD}s. \item[Authoring Environment] -- may be structured (software from Frame, SoftQuad) where the authoring process is interactively constrained according to the \acro{DTD}, or `loose' (software from Microsoft or other out-of-line parsers). \end{description} There is a checklist of reasons to adopt \acro{SGML}. With two ticks against the list \acro{SGML} should be considered, with four ticks, it would be very foolish not to use it. \begin{itemize} \item[--]long shelf life data? (\acro{SGML} is too expensive for transient data) \item[--]for multiple media publication? \item[--]frequent republication? \item[--]need searchable database? \item[--]for inter-department or inter-company exchange? \item[--]new product extraction from existing data? \item[--]need heavy hypertext? \item[--]industry requirement? \end{itemize} A fictional ``must use'' example was given. A Company with a body of high value articles wishes to publish on hard copy, \acro{CD}-\acro{ROM}, \acro{WWW}, also abstracts, also lists of articles and contributors as separate products. All can be pulled out with little effort after the initial investment in \acro{SGML}. \section{Java -- The Krakatoa of the Web} \subsection{Henry Rzepa, Imperial College, London} This item outlined a particular body of work done at Imperial College before the appearance of Java, the improvements made possible by Java, and some informed speculation on future developments. Work started in 1994 to try to publish representations of 3\acro{D} molecular structures (\acro{MIME} type). The concept was to be able to click on a hyperlink and get a 3\acro{D} rotatable model within a 2\acro{D} document. This required a marked up dataset defining the model and a script on the user's computer which read and interpreted the markup. A result which worked was achieved, albeit with non-standard components and on Unix only. When \acro{VRML} was brought into use in March 1995, improved communication between the 3\acro{D} model and 2\acro{D} document resulted, but the whole was still non-standard and Unix only. The introduction of the Java language to the project in July 1995 achieved a seamless interface between the \acro{WWW} client, data and action, and seamless memory and security models for the whole ``document''. Not only did this fulfill the original concept elegantly, but extra features became possible. Rotatable models of various representations could be interchanged and extra information delivered, e.g.\ by clicking on certain points of the model, Java applets could be invoked which delivered inter-atomic distances. Java in its present form, as a \acro{C}++ like language, is a powerful tool, used as directed, and had a major impact on this particular project. There are plans to use the project as a basis for an electronic conference in June 1996. Its full features are available on Unix and Windows~95 platforms only, and whereas this accounts for 90\% of visitors to Imperial's web site, it really needs Windows 3.1 and Mac versions for universal take-up. There is also mileage to be had from a future integration of \acro{VRML} with Java. Apple's CyberDog is an alternative but may be released 2 years too late for acceptance. There are, of course, problems. Already it appears that early Java applets are not compatible with later; a serious flaw which must be addressed before Java can become a mainstream language/method. For publishing/\acro{WWW} applications there is a particular danger that data incorporated within applets will ``disappear'' in the sense that it will not be searchable as is HTML data. See it (and rotate it!) on: \url{http://WWW.ch.ic.ac.uk/java/java_1.html} \end{Article}