\def\dvips{\textsf{dvips}} \title{\LaTeX, \dvips, \acro{EPS} and the web \ldots} \author[Sebastian Rahtz]{Sebastian Rahtz\\ Elsevier Science Ltd\\The Boulevard, Langford Lane\\Kidlington\\ Oxford, UK\\\texttt{s.rahtz@elsevier.co.uk}} \begin{Article} \begin{abstract} Browsers of \TeX\ question \emph{fora} like \texttt{comp.text.tex} will often be asked what are the issues surrounding Encapsulated PostScript, and how one goes about making \acro{EPS} files from \LaTeX\ output, and maybe using them on the World Wide Web. This short note\footnote{Reprinted from \TUB{} 16(3) with kind permission of Barbara Beeton.} offers some suggestions. \end{abstract} \section{What and why is \acro{EPS}?} \acro{EPS} stands for Encapsulated PostScript; \acro{EPS} files \emph{are} PostScript, but they conform to a minimum standard of good behaviour. This is so they can be included in other documents, possibly resized or rotated. In practice \acro{EPS} means not using certain commands which have global effects (don't worry, this is quite rare), and inserting structured comments (starting with \verb|%%|) which tell other programs something about the file. The \emph{PostScript Language Reference Manual} goes into great depth describing what these comments can contain, but the minumum that is necessary for practical purposes are: \begin{enumerate} \item A first line starting \texttt{\%!PS-Adobe}; \dvips, for instance, puts \texttt{\%!PS-Adobe-2.0 EPSF-2.0} in its output, meaning that it claims conformance with version 2 of the \acro{EPS} standard (we are now at version 3); \item A `BoundingBox', like \texttt{\%\%BoundingBox: 33 101 584 715} which tells applications how much space on the page is occupied. \end{enumerate} How do you turn \acro{PS} files into \acro{EPS} files? They probably are already, if they come from a reputable bit of software (avoid anything from MicroSoft)\Dash a good check is to see if there is a BoundingBox. You will come across three types of problem with files that look like \acro{EPS}. Firstly, the BoundingBox may not be accurate; since this determines how much space will be left in enclosing applications like \TeX, it matters. Keith Reckdahl's recent tutorial in \TUB\ goes into detail on this problem. Secondly, your file may be \emph{serious} \acro{EPS}, and use all the facilities of structured comments to specify what sort of resources (fonts etc) it expects you to supply when you deal with it. This is bad news if you are in \TeX\ world outside a Macintosh. Look out for lines with words like \texttt{ProcSetsNeeded}. Thirdly, your file may think it is \acro{EPS}, but in fact breaks the rules, and has weird PostScript in it. The rescue technique is to read it with a forgiving PostScript interpreter, and get a new version written out. Three programs to try are: \begin{enumerate} \item Adobe Acrobat Distiller; this turns PostScript files into \acro{PDF}, and Acrobat Exchange can then load them, and save them as ordinary PostScript. Since it is written by Adobe, Distiller is an extremely powerful PostScript interpreter, and can cope with almost anything you throw at it. It is not cheap (except to academics), but worth having. \item Recent versions of Adobe Illustrator share some of the Acrobat code, and can read PostScript files, as well as edit \acro{PDF} files. \item The free Ghostscript is now a very mature and sophisticated product. It understands all of the current Level~2 PostScript, and can turn it onto a wide variety of bitmap forms. Version~4 (released in June~1996) also performs many of the functions of Distiller, and it already reads \acro{PDF} files and writes PostScript. Unfortunately, its handling of PostScript text to \acro{PDF} is at present unfinished. However, you can still use Ghostscript to read your PostScript and write it out again as a bitmap (e.g.~\acro{TIFF}). \end{enumerate} \section{What about \texttt{dvi} to Encapsulated PostScript?} Most \TeX\ systems, free or commercial, supply a \texttt{dvi} to PostScript driver; most of them write out more or less acceptable Encapsulated PostScript, but three are especially well-featured (in the author's experience): the Macintosh Textures driver, Y\&Y's \textsf{dvipsone} for Windows and the free \dvips. Since the latter is available for all platforms, is well-supported, and is probably the finest of its type,\footnote{For several years, \textsf{dvipsone} has offered partial downloading of fonts, a very powerful feature, but this is now coming into \dvips; there are also flaws in \dvips' use of structured \acro{EPS} comments, and Textures is superior in this respect.} we shall concentrate on that. If you want to produce re-useable PostScript output from \dvips{} (and this includes output destined for Acrobat Distiller), the absolute priority is to use outline fonts, not the PK fonts traditionally used by \TeX. You can either use traditional fonts (usually commercial, like Adobe Times, but Ghostscript now comes with an excellent free set donated by \acro{URW}) or Computer Modern itself in PostScript Type 1 format. Either buy these from \acro{Y}\&\acro{Y} for Windows and Unix or Blue Sky for Macintosh, or use Basil Malyshev's BaKoMa set, of almost comparable quality.\footnote{Windows-worshippers may prefer to get into the world of TrueType fonts, which are available for Computer Modern from Kinch Computer Company.} If you do not use outline fonts, and re-use your output scaled up, you will not like the effect of Figure~\ref{tips1} at all, compared to Figure~\ref{tips2}. If you want to turn your documents into \acro{PDF}, Distiller will produce vile results from PK fonts. \begin{figure*} \includegraphics[width=\textwidth,height=1.2in]{tips1} \caption{Bitmap \acro{EPS} file, enlarged and distorted}\label{tips1} \includegraphics[width=\textwidth,height=1.2in]{tips2} \caption{Outline font \acro{EPS} file, enlarged and distorted}\label{tips2} \end{figure*} The second priority is to get the right bounding box. Surprisingly many applications cheat by simply making it the page size, regardless of whether the whole area is used. \dvips{} does this by default too, but has a command-line option \texttt{-E}, which asks it to try and calculate the actual extent used. Note that \acro{EPS} files are, by definition, only one page, so you also have to use \dvips{} options to select just one page. There are two caveats when preparing the input. Firstly, make sure you do not include a page number (try \verb|\pagestyle{empty}| in \LaTeX), or else the bounding box will cover that too. Secondly, \dvips{} does not always work out the extent of text correctly. For instance, if you wrote (why, I have no idea): \begin{verbatim} Hello\raisebox{10pt}[0pt][0pt]{Up there}! \end{verbatim} you would be asking \LaTeX{} to raise \emph{Up there} off the baseline, but to pretend that it has no effect on the height calculation. \dvips{} will believe this, and calculate a bounding box on the \emph{claimed} height. If you use complicated add-in packages like PSTricks, which add in arbitrary PostScript code, you will also end up in real trouble. In these cases you can either adjust the BoundingBox by hand, or place invisible marks in \LaTeX\ to make sure that \dvips{} recognizes the full extent. A useful trick to remember if you think that \TeX{} knows what you want, but \dvips{} does not, is to make judicious use of color. Suppose you wanted to use PSTricks to encircle a mathematical symbol, you might write: \begin{verbatim} absurd \pscirclebox{$\surd$} \end{verbatim} \TeX{} leaves the right space, since the PSTricks macros understand what is going on, but \dvips{} is told to draw the circle in raw PostScript, and the bounding box calculation ignores that. The result is that the limits are set just around the size of the letters. If we wrote: \begin{verbatim} \framebox{absurd \pscirclebox{$\surd$}} \end{verbatim} it would work correctly, because \dvips{} would look at the enclosing frame, not just the words. But you end up with an unwanted box; so make it (in effect) invisible by writing: \begin{verbatim} {\color{white}\fboxsep{0pt}% \framebox{% {\color{black}absurd \pscirclebox{$\surd$}}% }% } \end{verbatim} This creates a white frame around black text; \LaTeX{} proceeds happily, and so does \dvips, calculating the right extents, but nothing shows on paper. Obviously, this only works in a monochrome environment. \section{\LaTeX\ to \acro{EPS} to \acro{GIF} to Web} Why do we do all this in practice? Often, these days, because people want their \LaTeX{} mathematical output on the World Wide Web, and their only recourse is to embed \acro{GIF} images in their \acro{HTML}. The sophisticated \emph{latex2html} program does all this for you; its technique is worth understanding, as it has general utility; the sequence of events is: \begin{enumerate} \item Place bits of \LaTeX\ in an special file, one fragment per page, and with no page numbers; \item Run \LaTeX\ to generate a multi-page \texttt{dvi} file; \item Use \dvips' \texttt{-i} and \texttt{-S} options to generate one self-contained output file per page; \item Give each page to Ghostscript, and ask it to render them in \acro{PBM} (Portable Bitmap) form; \item Use the \acro{PBM}plus/Netpbm utility \emph{pnmcrop} to trim away white space; \item Use the \emph{ppmtogif} utility to convert the result to a \acro{GIF} image. \end{enumerate} Note that it does \emph{not} use the \texttt{-E} option for \dvips{}, but relies on simply removing all white pixels until just text is left. This has the advantage that it avoids the problem we saw in the last section, but it has three disadvantages: \begin{enumerate} \item The \acro{PBM} utilities are primarily Unix tools, and many people do not have access to them; \item The cropping process is memory-intensive, slow and eats temporary disk space; \item The cropping forces everything to the baseline, effectively. A character like em-dash (---) which sits above the baseline, will be cropped above and below, so that the placed \acro{GIF} looks wrong. \end{enumerate} The core of the problem is the use of Ghostscript, which always creates a page-sized bitmap, even if there is only one word on the page. What we want is for Ghostscript to render just the portion of the image inside the bounding box, if we \emph{do} use the \texttt{-E} flag for \dvips. We can achieve this by giving Ghostscript a customized page size, which is the size of the bounding box. Then we can insert some extra PostScript code to move the image so that it starts at the 0,0 coordinate (adjusting the bounding box accordingly). Ghostscript then displays or converts the image just within the desired area, and no cropping is needed. The transformations of the bounding box can be achieved using \emph{epsffit}, which is part of Angus Duggan's \texttt{psutils} collection (\acro{CTAN}:\texttt{support/psutils}); the page size change is most easily done using a Level~2 PostScript operator \texttt{setpagedevice}. Thus a PostScript file which starts: \begin{verbatim} %!PS-Adobe-2.0 EPSF-2.0 %%BoundingBox: 135 528 284 668 ... \end{verbatim} needs to be transformed to something like: \begin{verbatim} %!PS-Adobe-2.0 EPSF-2.0 %%BoundingBox: 0 0 149 140 << /PageSize [149 140] >> setpagedevice gsave -135 -528 translate ... grestore \end{verbatim} Here we have worked out the width and height of the enclosing rectangle (149 $\times$ 140 units), moved the origin down to 0,0 on the page, and set the page size. PostScript purists will shudder at the \texttt{setpagedevice} command, and point out that this is probably illegal in Encapsulated PostScript, but as long as we only use this file strictly in the controlled environment of Ghostscript, we are safe enough. Figure \ref{fitps} lists a simple Perl script which performs the necessary changes to a PostScript file for Ghostscript to eat, without any need for \emph{epsffit}.\footnote{I am aware that it does not cope with an \texttt{(atend)} bounding box\ldots} \begin{figure*} \begin{verbatim} #!/usr/local/bin/perl $bbneeded=1; $bbpatt="[0-9\.\-]"; while (<>) { if ( /%%BoundingBox:(\s$bbpatt+)\s($bbpatt+)\s($bbpatt+)\s($bbpatt+)/ ) { if ($bbneeded) { $width = $3 - $1; $height = $4 - $2; $xoffset = 0 - $1; $yoffset = 0 - $2; print "%%BoundingBox: 0 0 $width $height\n"; print "<< /PageSize [$width $height] >> setpagedevice\n"; print "gsave $xoffset $yoffset translate\n"; $bbneeded=0; } } else { print; } } print "grestore\n"; }; \end{verbatim} \caption{A Perl script to transform an \acro{EPS} file for Ghostscript} \label{fitps} \end{figure*} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{figure*}[t!] \includegraphics[width=\textwidth]{tips3} \caption[]{\LaTeX{} $\rightarrow$ \texttt{dvi} $\rightarrow$ \acro{EPS} $\rightarrow$ \acro{GIF}}\label{tips3} \includegraphics[width=\textwidth]{tips4} \caption[]{\LaTeX{} $\rightarrow$ \texttt{dvi} $\rightarrow$ \acro{EPS} $\rightarrow$ \acro{GIF}, anti-aliased}\label{tips4} \end{figure*} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{figure*}[t!] \centerline{\includegraphics{tips5}} \caption[]{Mid-aligned \acro{GIF} image in Netscape}\label{tips5} \end{figure*} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Now that Ghostscript is only rendering the desired area, we can use its builtin bitmap output facilities. The Unix or \acro{DOS} command line: \begin{verbatim} gs -dNOPAUSE -q -r100 -sDEVICE=tiffg4 \ -sOutputFile=foo.tif foo.ps -c quit \end{verbatim} will generate a \acro{TIFF} fax group 4 image (Ghostscript does not support \acro{GIF} output directly, for legal reasons) at 100dpi of just the imaged area of the PostScript file \texttt{foo.ps} with no further ado. Ghostscript version 4 adds anti-aliasing facilities; using the Netpbm tools under Unix, we can create a variant \acro{GIF} image, using the command line: \begin{verbatim} gs -r100 -dNOPAUSE -q -sOutputFile=- \ -sDEVICE=pnm -dTextAlphaBits=4 \ -dGraphicsAlphaBits=4 foo.ps -c quit | \ ppmtogif -interlace \ -transparent \#ffffff > \ equation.gif \end{verbatim} Figures~\ref{tips3} and \ref{tips4} show the result of transformations with and without anti-aliasing. There is one remaining problem --- the World Wide Web browsers can usually align images top, middle or bottom; but what if we have an image of some characters with descenders below the base line? Bottom alignment of the images places the bottom of the descenders on the baseline; top alignment is riduculous, and middle alignment is not quite right either. The answer is to use middle alignment, and make \TeX\ lie to \dvips{} (and thence down the chain) about the extent of the character; making its depth equal to its height, and then middle aligning it in the Web browser, has the desired effect. So how do we make \TeX{} lie? Here is my suggestion: \begin{verbatim} \newsavebox{\@Fragment} \def\Fragment#1{% \savebox{\@Fragment}{#1}% \@tempdima\ht\@Fragment \@tempdimb\dp\@Fragment \ifdim\@tempdima>\@tempdimb \dp\@Fragment\@tempdima \else \ht\@Fragment\@tempdimb \fi \fboxsep0pt \color{white}% \fbox{% {\color{black}% \box\@Fragment}% }% } \end{verbatim} I use the \LaTeX{} box framing command to ensure that \dvips{} thinks the depth is there, with the same color trick as we saw earlier. Unfortunately, there is a side effect --- an \acro{HTML} browser loading the resulting \acro{GIF} image mid-aligns the image and sticks the `ballast' white space into the line below, making an unsightly gap (see Figure~\ref{tips5}, where the Greek \ensuremath{\eta}s have a small descender). With the current browser technology, there is little than be done about this. In practice, we will have to check first whether there \emph{is} any descender; if so, we use the mid-align technique, and accept the gap; if there is not, we can make a simpler process and use bottom alignment. It is imperative, of course, that Web-making readers do not take these examples as `recipes', without both a precise specification of the desired Web page, or an understanding of some of the basic image-processing techniques. The aim here has simply been to show how relatively trivial and efficient it is to create bitmap output from \LaTeX{} and \dvips{} using the free facilities of Ghostscript. \end{Article}