\def\question#1\par{{\noindent\em #1\par}} \title{Questions and Answers} \author{compiled by Jonathan Fine} \begin{Article} \bgroup \begin{small} \def\[#1]{\noindent[{\bf #1}] } The last session of the Bridewell meeting was a panel comprising the speakers David Barron \[DB] and Martin Key \[MK], joined by Lou Burnard \[LB] (Oxford, Text Encoding Initiative) and David Evans \[DE] (part of David Brailsford's group in Nottingham). The session was chaired by David Penfold \[DP] from the co-sponsors, the BCS Electronic Publishing Special Group. (I have prepared this report from a not always audible tape recording. Remarks have been edited. I hope I have not introduced any error or misrepresentation. Questions have been set in italics.---Jonathan Fine) \question What solutions are there going from one encoding method to another, say from Microsoft to \LaTeX? \[LB] I recommend via SGML as internal format. The public domain tool Rainbow Makers has an interesting approach. It takes a document marked up with formatting codes, and turns those codes into SGML tags. So you do get terrible things such as tags for font and type-size, but now they are represented using the SGML format. Translating from that to real SGML is a lot easier. \[DP] {\em Word-for-Word\/} has been well used in the publishing industry for years. It converts between stacks of mainly word-processor (but also Frame for example) formats. I expect they will eventually get round to SGML. \[Malcolm Clark] \question If all I'm interested in is portable documents, in other words shifting a document from one site to another electronically, why don't I just standardise on Microsoft Word? It's not high enough quality for publishing, but for 90\% of what I do, memos and stuff like that, I do it in Word and attach it to the electronic mail message I am sending. The recipient at a Windows or Macintosh unbundles it and they've got it in the same format I created it in. It's solved all the problems. \[LB] Because then you can only talk to people who've done the same thing. \[MK] Several answers. As a company, Elsevier wishes to retain the material they produce for some considerable time. We still sell material that is 10 or 20 years old. Microsoft Word as a format is fine for sending it off today, but in 10 years time, who knows? It probably won't be compatible with anything. So retaining it in that format is no use. We have to convert it into something we can do things with. Secondly, there are limitations with what you can do with a Word document. Such as how you can search text, specifying where the author's name is, etc. SGML allows us to structure the complete document properly. \[DB] There is a distinction between portability for immediate delivery and portability for archiving (see his article). Also, Warwick University (MC's location) must be different from Southampton, where no such uniformity exists. For example computer scientists use \TeX. The problem is like herding sheep to get them to move in the same direction. \[Allan Reese] Word is not a standard, it's a mess. At our site we have different versions of Word on different platforms, and they have different and incompatible document formats. The lowest common denominator of portability is to have the text transmitted from one place to another. With the Internet this is generally not possible. As soon as you are using a particular character set lying outside ASCII you are lost. One example is a text file (produced by a software company) which was transferred from Mac Word to Win Word without being checked carefully. The left quote character had come out as a `O' slash. These funny things happen. Another example is from Spain, sent to a news group. The sender has the character `\~n' on his keyboard. He presses this key and it comes out as `\~n' on the screen. He sends it to me and it comes out as `\$'. He can't even send the word `Espa\~nol' to the Spanish language news group! Transmission of text is a big problem which a lot of users haven't yet tackled. In academic publishing one will have to deal with multiple character sets, if only to be to accomodate authors' names. \[Jonathan Fine] \question Two related questions. Firstly, where will we be in the year 2000? If we have a meeting here in five years time, what will have happened? Secondly, if SGML succeeds, what will fail? \[LB] In five years time people will talking about Microsoft in rather the same way we talk about IBM. Remember IBM, they used to make computers. What will fail will be the forces of the evil empire. Namely the idea that it is perfectly legal and correct for any software company or equipment vendor to take information away from the people who have created it and lock it up in a proprietary format. That is an idea that I really would like to see the end of. (Others expressed doubt at the early demise of Microsoft.) \[MK] As a publisher, still dependent on paper. Five years is not very long. Unless an electronic product appears that is really user friendly to read for any length of time, we will still want paper. There will be more electronic products, particularly on CD-ROM. In our environment a lot of specific document formats will probably fail. Ventura Publisher for example. People will concentrate on just three or four products eventually. \LaTeX\ and \TeX\ will still be around, and a few word processors. \[Sebastian Rahtz] \question A slightly heretical question about maths and chemistry. A lot of effort has gone into providing DTDs for these things. Perhaps these will just wither and die. In five years time perhaps we will stop pretending that math is structured and regard it in the same way as we regard graphics? Would any one like to defend the SGML markup of maths? \[MK] The only reason it is useful, is that it is independent of fonts. With \LaTeX\ you are still dependent on the font. When we combine different articles into a book, we want a uniform appearance. We don't want a mixture of fonts, otherwise we're back to the horrible camera ready copy. As for the fact that SGML maths is structured, I wouldn't particularly want to defend it. There was quite some discussion of this from the floor, which the tape recorder did not clearly pick up. \[Dina Desai] \question We would like to use SGML markup for our maths. What would you suggest? \[DB] Do you mean what DTD, or what software, or both? {\em What DTD?} (Some information given by Mike Popham about specific math DTDs.) \[Gerard van Nes] The whole problem with maths and SGML is that we simply need an SGML editing program which is able to display as we write such complicated formulae. It is of course very uncomfortable to write the huge amount of mark-up as one needs for SGML maths. But if you have a really good SGML editor it's no problem at all. \[DB] I would have for maths a single tag, which says \verb"" with an attribute which is notation, whose values will obviously be \verb"tex", \verb"eqn", etc. If you know it's \verb"eqn" or whatever, it is searchable. You can put hyperlinks into it. \question About DTDs. About compound documents really. Say someone wants a journal with a video snip of the author explaining the article, a sound bite, or what have you. Where would one get a DTD for this? \[DB] Much the same as the maths. A tag that says \verb"" and an attribute which says which encoding scheme you are using. \[LB] There is an application of SGML called HyTime which is (about to become?) an ISO Standard. It defines time-based media of all kinds and also different architectures within which you can associate events happening in time. There is one product that can function against HyTime specifications, it is something to watch out for. There is in the TEI Guidelines a simplified version of some HyTime concepts. \[DB] In the latest issue of EPodd there is a paper with the title {\em Why Use HyTime?} \[Angus Duggan] \question Adobe put a lot of effort into Acrobat as a static encapsulation format, able to reproduce the exact form of documents. Will this in the future be important, or will content be all? Will the first published form of the document be important? \[DE] For archival purposes things like Acrobat are very useful, because they can encapsulate exactly how a document looked. For other things, such as database access and searching it is the content which is more important. So you might want two different electronic forms of the document. Also, the printed and on-screen versions of the document might be formatted differently. \[Angus Duggan] \question If we want to represent content and we do want keep the original form it was published in, then obviously Acrobat solves the one problem and SGML the other. What thoughts about document formats which maintain both equally as well? \[LB] I'd like to question one of your premises. You talk about the orginal form of the document. I think we're going to forget what that is very soon. I don't know what the original form of the TEI guidelines as published is. It was produced as an SGML document. At home it is white letters on a black background. In the office black on a white. In yellow on green when I was in Chicago. This was the authoring. Similar remarks apply to the printing, on US and UK sized paper. We had to do some fiddling around with the page numbers, as you can imagine. There is another version of the guidelines which is equally authorative and has exactly the same content, and that exists on a DynaText screen. It has no page numbers at all. I'm trying to make the point that I don't know which is the original form. They're all equally valid. \[MK] When authors have their references in the article they often refer to an article in a book by its page number. How on earth will you make a reference to a location in an electronic document which does not have page numbers? \[LB] By referring to the logical organisation of the text. Paragraph~38 within division~3 of etc. \[MK] I'm sure that we will eventually have a combined product which will have the format and all the structure in it from the SGML. All in one product. Because there's so much more you can do with that than you can with just PDF. The problem, as David Brailsford mentioned this morning, is the Brand-X between the SGML and the PDF. Until we can resolve that problem we can get to PDF, but not via the generic route, which is what we as publishers want. \[Allan Reese] About chasing up a reference. Page numbers are physical objects, and when the document changes the physical indexing is out of date. With electronic documents you will go just by keyword and content indexing. You won't have to know where it is. You just say I want the paper by Fred Smith or whoever. \[MK] That works when you are searching an electronic product from another electronic product. It won't work from a paper to an electronic product. \[Angus Duggan] Intermediate version of documents. If people are publishing on the Internet, if you put a content link in a document to a document being revised, this may change or break the link. So you need to have links to particular versions of particular documents. \[David Coton] Chapters and verses are a menace. We are looking at how to regard text in an object-oriented way. One big probelm with such a scheme is that Bible text has two hierarchical structures. It has chapter - verse structure, and it has section - paragraph - sentence structure. Both are useful, both are valid. Both are in different circumstances necessary. But the two do not coincide. Any object-oriented system has difficulty with that. There are ways round it. We are looking at introducing small enough units of text to give coincident boundaries. SGML also has a problem with this. I'm told that the standard allows for dual hierarchies. However, none of the existing tools implement it. \[DB] There is only one product I've seen that supports this CONCUR feature, which is the Mark-it parser, which is quite old. This problem is discussed in the TEI guidelines. I don't think this is an SGML problem. It is a characteristic of textual materials that they can be organised in many different ways. This is inherent in the nature of text. \[Jonathan Fine] \question I'm not sure I should be asking this question. What future for typography? \[MK] If you believe the formatted original will continue, not replaced by a large amorphous glob of text, there still is a case for typography. It is very important to read something, to understand how it's put together. Reading off a screen is difficult anyway. Typography is purely there to allow you to read something easily and quickly. I can't see any reason why it should disappear. Especially as I still believe paper will be around for a while. \[LB] I think that typography is hard, hard, difficult and very important. There are so many people clamouring in the Web and other electronic marketplaces, that anything that stands out will be enormously important. One of the skills conspicuously missing on the Web is good typographic understanding. The skills needed to present stuff effectively and well in the electronic medium are really an outgrowth from the skills the typographers have developed in the past. \[DP] Typography is one aspect, but information design is perhaps an even more important. We've hardly covered information design. If more and more people are putting things on the Web and not thinking about how they design the document, then the morass of information overload is going to get worse and worse. \[DB] If you go to \verb"www.whitehouse.gov" you'll find that it has been done by a professional graphics designer. That really makes it stand out on the Web. \[Mary Dyson] Can I add to that? Typography on the screen. Have we got it well sussed? I don't think we have yet. To go back to the old chestnut, it's not necessarily just transferring the same principles. Italics are not terribly good for emphasis on screen. So you want perceptual equivalents of legibility. \[DP] We've come to a good point to stop. One final thing. The Web assumes one model of access to information. There are many others, such as browsing, which are virtually impossible on the Web. Maybe we should be thinking about how other forms of access to information are possible. Can I thank the four people on the panel and everyone else, particularly the speakers. \end{small} \egroup \end{Article}