1. Introduction
  2. Formats
    1. PDF
    2. TeX
    3. Word Processors
    4. XHTMl+CSS
  3. Requirements
    1. Automatic numbering
    2. Equations
    3. Cross-reference
    4. Multiple columns
    5. Required standards
  4. Browser support
    1. Opera 7
    2. Mozilla 1.4a
    3. Internet Explorer 6
    4. Other browsers
  5. XHTML Examples
    1. Numbered headings
    2. Numbered equations
    3. Numbered images with captions
    4. Numbered tables with captions
    5. Maths
  6. Conclusion
  7. References
  8. About the author

Introduction

This document will explore the possiblity of using XHTML and CSS (and other open standards) for publishing scientific and technical papers to the web. This could provide a light-weight, accessible and free alternative to existing formats. This text will focus purely on W3C webstandards and mostly on XHTML and CSS.

I will try to answer a couple of questions on the matter: "Which formats are currently being used for publishing scientific documents to the web? Which requirements have to be met when doing so and to which extent is this possible with XHTML and CSS, today and tomorrow?"

This article is mostly geared towards the use of CSS and not to advanced applications of for instance XML, simply because I not know those standards very well. If you believe I am missing out on a technology, please contact me.

Document formats

PDF

An often used format to publish papers on the web is Adobe's PDF format which is essentially a postscript file of the original document (What is PDF?). The use of PDFs on the web has become widespread and generally accepted but it is a very inaccessible format: you need a plugin to view it and you often need an expensive application (Adobe Acrobat) to edit it. There are also free tools available to create and edit PDF files, but due to the very nature of a PDF file, editing is limited.

There are more drawbacks to the use of PDF. The user has no end control over the layout and PDF is merely a medium to publish a document in: you need an additional (expensive) program to actually write the document. There are advantages to PDF too. The main advantage is its widespread acceptance and a great deal of the netizens already have a plugin installed. Another advantage of PDF is its high quality printing which is useful for publishing to paper afterwards.

TeX

A lot of scientific papers are traditionally marked up with LaTeX, a high quality typesetting environment. The program has a steep learning curve but it allows detailed control over the layout of the document and complex equations can also be created. The high-quality typesetting makes it especially useful for printing.

It is less suited for web publishing though: viewing a TeX document requires having the program installed and although both commercial and free versions are available this reduces accessiblity. The solution is to export the LaTeX document to another format, of which PDF is an often used format. Another possiblity is to export the LaTeX document to XHTML. Good programs already exist for this and they can export equations as MathML or convert them into small images.

Word processors

Nowadays a lot of papers are also created with general purpose word processers such as Microsoft Word and OpenOffice Writer. Reading those formats often requires having the same or similar software installed as the author, once again limiting accessibility. These wordprocessors often include equation editors and though most can export to (X)HTML only few can also export the equations to MathML. It has to be noted though that the HTML pages produced by for instance Microsoft Word are notoriously complex and invalid.

XHTML with CSS

XHTML and CSS have the potential for providing a light-weight, accessible and free alternative to existing formats. A lot of the authoring tools are free and for editing a document, a simple text editor often suffices. The trend has been to separate content from style in the document, which strongly improves the readability of both. A drawback is the learning curve, which is similar to LaTeX: you have to learn the syntax.

Concerning accessibility: documents written in XHTML and CSS can be viewed by most modern standard web browsers, so no special program is needed. So theoretically the format is very accessible. A major problem however is that the support for the standards may strongly vary between the different programs: from no support to full support. In practice this means that there is no guarantee whatsoever that the person viewing your document gets it displayed correctly...

Another point of concern is whether the standards are well-enough equipped to support the requirements for scientific applications. That is what this document will try to illustrate.

Requirements

When evaluating a format's suitability for publishing scientific papers on the web, the layout capabilities must meet a number of requirements. A number of these is listed here.

It will be shown that combination of XHTML and CSS can meet a lot of the above requirements. They will be discussed individually in the following paragraphs.

Automatic numbering

All the automatic numbering and captions can easily be done with CSS 2.1 by using CSS counters and generated content. Unfortunately this technology is poorly supported by any browser other than Opera, which has provided full support since Opera 5.

Equations

Equations can be done in three different ways in browsers:

  1. Images
  2. MathML
  3. XHTML with CSS

The use of images for showing equations is widespread because it does not require advanced support for standards and can thus be viewed with virtually all browsers. The use of images also has its disadvantages: the equations cannot be edited and printing can give poor quality. The problem of poor quality can be solved by using the SVG standard which describes vector graphics and allow for good scaling and sharp printing of images. However this is currently only really supported through plugins.

MathML is a markup language especially designed for creating equations on the web. It is currently only supported by the Mozilla and Amaya browser. MathML has several advantages and disadvantages. MathML is powerful and can create difficult equations. Editing them however requires a special editor because the markup is very complex and bandwidth intensive. This makes it very inaccessible for editing by hand. Another disadvantage is the way the language mixes content with markup. The trend has been to separate style from content by introducing XHTML and CSS and MathML does not fit in this trend. The problem of editing the equations can be solved by using special tools such as Amaya or exporting them from applications such as LaTeX but it is desirable to make the pages easily editable.

Update: A reader commented that Internet Explorer is also capable of rendering MathML equations using the MathML player. So this only leaves Opera as the odd one out.

This leaves us with the third and final option, that of XHTML with CSS. A lot of mathematic markup can currently be achieved with a combination of XHTML and CSS, which is easily editable and easy on bandwidth. This document showcases this method. The complexity of the equations is limited though and does not yet compare with the possibilities of MathML. This may be improved with the specifications for CSS3 which include modules for Math. Currently no drafts have yet been submitted.

The methods for equations used in this document are pioneered by George Chavchanidze and he has some example pages showing the possibilities. He has recently added some really advanced (and annotated stylesheets) for XML pages which really are brilliant!

Cross-referencing

At first glance cross-referencing appears to be no problem. HTML supports internal linking doesn't it? Well, yes, ofcourse, but it does not support automatic updating of the link content. For instance when you say: "... as seen in equation 14 ..." the equation number the text refers to will not be updated when the numbering of the equations is changed. Another very good application of cross-linking is an automatically generated index. The CSS3 draft on pages media contains specifications for cross-referencing but these will not be supported for some years to come.

Multiple columns

The last requirement, that of multiple column layout: a lot of scientific papers are currently published with multiple columns and most authors will want to use this layout on the web too. It is currently possible to emulate multiple columns with CSS but content will not flow from one column into the other. This will most likely be solved when CSS3 becomes generally supported in a few years time.

Required standards

Ok, let's summarize: which standards are required to be able to meet all the requirements set forth? As discussed in previous pargraphs: XHTML, CSS2.1 and CSS3 are required. MathML and SVG support would also be welcome but is not necessary unless really complex equations are used.

Browser support

For the argument of accesibility to hold, the required levels of XHTML and CSS must be supported by the mainstream browsers. Currently this is not the case, both due to lacking support of the major browsers and also because certain specifications have not yet been approved or even written.

Opera 7.11

The Opera 7.11 (and higher) browser is currently the browser with the best support for the required standards and the only brower which will correctly display this page. It is currently the only browser to support automatic numbering and has the best support for the :before and :after pseudo-classes. It is also the only browser which will properly show the XHTML equations. Opera 6 will render most of the page correctly, with exception of the equations.

Mozilla 1.4

Mozilla does not support the automatic numbering and has limited support for generated content. Also the equations used in this document are not properly rendered due to poor support for display:inline-table;. On the other hand Mozilla is the only browser to fully support MathML and SVG.

Internet Explorer 6.028

Internet Explorer currently does not support any of the required standards and displays this page very poorly.

Other browsers

I currently only have access to the browsers described above and only on the Windows platform. Some browsers have a cross-platform core and should render the same on different platforms, which is currently the case for Mozilla and Opera 7.11.

XHTML Examples

Some of the layout effects described previously will be demonstrated and explained in this chapter.

Numbered headings

This page itself uses automatic numbering of headings and sections. This can easily be done with CSS counters.

h1:before {
   display: marker;
   content: counter(chapter) ". ";
   counter-increment: chapter;  /* Add 1 to chapter */
   counter-reset: section;      /* Set section to 0 */
}

h2:before {
    display: marker;
    content: counter(chapter) "." counter(section) " ";
    counter-increment: section;
}

It works amazingly simple and it saves a lot of hazzle when creating a page like this: just change/add/delete a header and the numbering automatically changes!

Numbered equations

When equations are exported as image files or XHTML/CSS is used, it is possible to automatically number them. This will be demonstrated in the following sections.

Numbered images with captions

When using images in a publication, they have to be numbered for easy reference throughout the document. Correct captions are important too and they will be automatically added with CSS.

This can be achieved by the following code:

img:after {
   content: "[" counter(image) "] " attr(title); 
   counter-increment: image;
   display: block;
   font-size: 10px;
   font-weight: bold;
   margin-top: 5px;
   margin-bottom: 20px;
}

div.example {
   counter-reset: image;
}

This code uses the image 'title' attribute for the caption text. You can set the automatic numbering of images to continue throughout the document, or reset at certain locations. In this document numbering of images will restart whenever a new example is given.

Prosthesis Wilmer Valve

Numbering of the images will restart.

Prosthesis Wilmer Valve

Numbered tables with captions

The numbering and captions as discussed for images can also be applied to tables, but this time the 'summary' attribute is used for the caption.

This can be achieved by the following code:

table:after {
   content: "[" counter(table) "]" attr(summary); 
   counter-increment: table;
   display: table-caption;
   caption-side: bottom;
   font-size: 10px;
   font-weight: bold;
   margin-top: 5px;
   margin-bottom: 20px;
   white-space: nowrap; /* caption extends beyond table width */
}
This is
a table

Another simple table

A1 A2
B1 B2

In contrast to the automatic numbering of the images, the numbering of the tables will continue in this example.

This is the first column of the first row This is the second column of the first row
This is the first column of the second row This is the second column of the second row

Another simple table

A1 A2
B1 B2

Maths

The following text and corresponding stylesheet were copied with permission from http://geocities.com/csssite/index.xml and there are more examples and good explanations on those pages. The version of the math.css stylesheet used on this page is old and newer, more advanced versions are now available.

This is the code for the first equation in the next example. It gives an idea of the markup.

<e id="re">

<ov>R</ov><l>E</l>(u) =
<phic/><l>W</l><t><minus/> 1</t>([E , <phic/><l>W</l>(u)]) <minus/>
L<l>E</l>u
</e>

The code for the equations is too elaborate to explain here and is better explained elsewhere. It uses XHTML custom tags for the equations, saving a lot of space in the process. The code for numbering the equations will be shown here:

/* equation counter */

e:after {
   content: "[" counter(equation) "]";
   counter-increment: equation;
   display: block;
   float: right;
   margin-right: 10px;
}

This makes the number show to the right of the equation.

Proof:
Let us consider the following operator on a space of 1-forms E(u) = W 1([E , W(u)]) LEu (here W is the isomorphism 2). It is obvious that E is a linear operator and it is invariant since the evolution operator W(h) commutes with both W (as far as [W(h) , W] = 0) and E (because E generates symmetry [E , W(h)] = 0). In the terms of the local coordinates E has the following form E = ab Lab dza zb and the invariance condition d dt E = LW(h)E = 0 yields d dt E = d dt ab Lab dza zb = ab ( d dt Lab) dza zb + ab Lab (LW(h)dza) zb + ab Lab dza (LW(h) zb ) = ab ( d dt Lab) dza zb + abcd Labc(Waddh)dzc zb + abcd Labb(Wcddh)dza zc = ab ( d dt Lab + c (PacLcb LacPcb)) dza zb = 0 or in matrix notations d dt L = [L , P]. So, we have proved that the non-Noether symmetry canonically yields a Lax pair on the algebra of linear operators on cotangent bundle over the phase space.

Conclusion

At this moment in time, the available web specifications are not yet fully equipped for publishing scientific documents. For instance cross-referencing and multi-column layout are scheduled for CSS3 which will not be supported for a few years. Also the available standards are not supported sufficiently by most of the available webbrowsers. This severely compromises the usefulness of the format. Currently the browser with best support for the required specifications is the Opera 7.11 browser which supports all required modules from CSS2.

A special point of concern is the equations: there are MathML specifications but this is supported only by a few browsers and doubts are expressed about the language, its complexity and mixing of markup and content. A limited range of equations can be achieved with XHTML and CSS, with more extensions planned for CSS3 but none have yet been written.

Despite the problems encountered, I still believe there is potential for XHTML and CSS for scientific publications. There are other advantages I haven't mentioned before: the same document can be used for online publishing, presentation and printing. For instance, press F11 when in Opera and you will get a slideshow (browse with PageUp/PageDown), press 'P' and you get an adapted version for printing. There is even a special version for handhelds (press Shift-F11). Try that with PDF! :-)

References

W3C CSS 2 specs: http://www.w3.org/TR/REC-CSS2/generate.html

W3C CSS 3 working draft: http://www.w3.org/Style/CSS/current-work

Mathematics with XHTML: http://geocities.com/chavchan/

About the author

My name is Mark Schenk and I am a Mechanical Engineering student at the Delft University of Technology, an Opera user/lover and webdesign hobbyist. You can contact me via e-mail.

Thanks to George Chavchanidze for letting me use his fantastic mathematics stylesheet. Kudos to him!

Creative Commons License
This work is licensed under a Creative Commons License.