MathML forges on

The standard for mathematical content in publishing work flows, technical writing, and math software

20 years into the web, math and science are still second class citizens on the web. While MathML is part of HTML 5, its adoption has seen ups and downs but if you look closely you can see there is more light than shadow and a great opportunity to revolutionize educational, scientific and technical communication.

Printer in 1568-ce

Somebody once compared the first 20 years of the web to the first 100 years of the printing press. It has become my favorite perspective when thinking about web standards, the web platform and in particular browser development. 100 years after Gutenberg the novel had yet to be invented, typesetting quality was crude at best and the main products were illegally copied pamphlets. Still, the printing press had revolutionized communication and enabled social change on a massive scale.

DE-Zeitungsrollenoffsetdruck by Steschke

In the near future, all our current web technology will look like Gutenberg’s original press sitting next to an offset digital printing machine.

With faster and faster release cycles it is sometimes hard to keep in mind what is important in the long run—enabling and revolutionizing human communication.

Since I joined the MathJax team in 2012, I have gained many new perspectives on MathML, the web standard for display of mathematical content, and its role in making scientific content a first class citizen on the web. But it is rather useless to talk about MathML’s potential without knowing about the state of MathML on the web. So let’s tackle that in this post.

A little bit of history

MathML has a long and somewhat particular history among the standards that make up HTML 5. It started out as the <math> tag in the draft of HTML 3 (all the way back in 1995) but HTML 3 proved too complex for browsers to implement (we are talking Netscape and IE here). As modularization was the hope and XML was the fashion, the math tag was eventually kicked out with HTML 3.2 and turned into a separate XML language—and a year later MathML was born.

In the past decade MathML became the standard for mathematical content in publishing work flows, technical writing, and math software. Despite its success MathML remained hidden from view of the (web) community as most MathML content stayed behind paywalls, within intranets or in print. For lack of browser support, MathML entered the open web mostly as image renderings or PDFs, losing all the advantages of its markup in the process.

With HTML 5 MathML (now at version 3) was finally brought back into the fold as a regular part of HTML; no more namespaces, no more XML parsing—in HTML 5, MathML is just HTML.

(By the way, for those who know TeX/LaTeX (and maybe think it is the one true way), it is probably important to know that the MathML and LaTeX working groups overlap and that MathML, while a completely different beast, is sort of what you get when you try to fit math mode LaTeX into a DOM (this is, of course, a lie).)

Browser support

Given its odd history, it is perhaps not too surprising that the state of MathML support in browsers is a bit complicated. First off, MathML consists of two large parts, Presentation and Content MathML. No browser or JavaScript polyfill renders Content MathML (but see below for plugins). However, this is not really a problem in real life since Content MathML can be converted and embedded into Presentation MathML.

Gecko/Firefox

Mozilla has relied on its community to slowly move MathML in Gecko/Firefox forward. Most of the implementation has been done by unpaid volunteers (but of course code review, support and maintenance has been provided by Mozilla employees). Still, Mozilla does not seem too interested in pushing its MathML development over the finish line.

The current state of MathML in Mozilla is solid and sufficient for production use; see their overview for details. The largest part of (Presentation) MathML 3 is implemented. The big missing blocks are the “elementary math” elements (important for school-level math) as well as some of the more advanced <mtable> features and alignments (one of the trickiest parts of MathML that cannot easily be done with HTML/CSS constructs).

Gecko’s implementation re-uses HTML/CSS rendering code, which is as it should be—math is two-dimensional text, typographically speaking. There are a few Firefox Add-ons to tweak things, in particular for Content MathML and elementary math support via XSLT stylesheets.

The future outlook is much like the past. Whether the remaining MathML features are implemented will depend largely on volunteers or third parties stepping up but at least Mozilla considers the code base important.

WebKit/Safari

WebKit has also relied on unpaid volunteers for its implementations. There have been basically two separate pushes (by separate volunteers) and no steady development so far. In addition, Apple developers have been spotted working on VoiceOver/accessibility related issues. WebKit has not really shown strong interest or support for the volunteer work but has at least accepted contributions. The second volunteer push was in 2012 (more on that later) and it seems the last chunk of these contributions has finally been integrated into Safari 7.

WebKit’s MathML support can only be described as partial and is not ready for professional production. However, it is enough for many enthusiast which could create a productive feedback loop. While simple mathematical expressions will render, some basic constructs are not implemented. Notably limited or missing are horizontal stretch characters, RTL, linebreaking, elementary math, and most mtable attributes; check WebKit’s status page for details. As with Firefox, WebKit’s implementation re-uses HTML/CSS rendering components.

The future outlook is somewhat bleak. There is a little bit of volunteer work going on (fixing security bugs, implementing multiscripts). The 2012 push (see Blink below) has shown that without a clearer message from WebKit companies, it is unlikely WebKit will attract new contributions.

On the bright side, a few WebKit developers review code and Safari actually uses the MathML code. While the rendering quality of WebKit is currently too low, the 80/20 point is not that far off.

Internet Explorer

Microsoft has never indicated interest in implementing MathML in Internet Explorer directly. However, in the good old days of IE monopoly (aka 2000), Design Science released its free MathPlayer plugin for IE. MathPlayer (now at version 3) still provides the most complete MathML 3 support (including Content MathML), but unfortunately only on IE 6-9.

Accordingly, the current support in IE is a bit odd: for IE<10 with MathPlayer installed, MathML support is virtually flawless. Otherwise, there is absolutely no native support.

The outlook for the future is actually worse. Microsoft has begun to kick plugins out of IE so that MathPlayer will not be able to support IE10+. Unfortunately, Microsoft is still not showing interest in adding MathML support to IE. On the other hand, Microsoft has consistently sent a member to the W3C Math Working Group and supports MathML in its Office products and handwriting recognition.

Blink/Chrome/Opera

Chrome was the center of quite a bit of drama in early 2013. As mentioned above, 2012 saw the second volunteer effort to improve MathML in WebKit (which Chrome used at the time). A single volunteer re-wrote & improved most of the MathML code in WebKit with the specific goal of getting MathML activated in Chrome. He eventually succeeded and Chrome 24 enabled MathML. But, as it often goes, volunteers have to go back to making a living and the Chrome team did not take up ownership of the code. So when a security issue came up, Chrome decided to deactivate the MathML code again, rather than accept further community patches. (Those issues have since been resolved in WebKit.)

With the move to Blink, the MathML code was officially removed from Chrome’s code base. It could probably be brought back from WebKit for a while but for the time being Chrome has no support for MathML (and it does not exactly inspire third-party contributions in the future).

Opera gave up its (extremely) limited MathML support in Presto with the move to Chromium. Presto’s implementation was based on a CSS stylesheet using the CSS3 tables module designed for that purpose. Unfortunately, only Presto ever supported it and the module has since been retired.

As it turned out, stylesheets are simply too limited to implement MathML. If Opera’s switch to Chromium has one good consequence, it might be that the W3C “MathML CSS profile” (which you should never even link to) will finally be forgotten—it never worked and never will but gave casual observers plenty of wrong ideas.

Addendum. Shortly before publishing this post, a Chrome team member added a comment to issue 152430 (Enabling MathML support) stating that “MathML is not something that we want at this time”. This sounds very dramatic but won’t surprise anyone following the development over the past year. Unfortunately, the reactions on that thread are mostly noise. Personally, I believe this statement can actually be helpful and restart the conversation.

MathML Test Suite

As a point of reference, the W3C Math Working Group keeps an extensive MathML test suite. Feel free to run your favorite browser through it and compare the results.

Polyfills

MathML’s history makes polyfills challenging. To say the least. Most polyfills take on new APIs, e.g. web sockets, storage, shadow DOM. Accordingly, most polyfills are not held to extremely high standards, have time to grow and influence browser development and, above all, most polyfills do not to implement a whole new text rendering capability, closely interacting with existing text rendering.

Mathematical layout is different. Math is two-dimensional text and there are already high standards, deeply ingrained into education and publishing. At the same time, the need for math on the web has been urgent from the start which has led to horrifyingly stopgap but inherently wrong solutions such as image-rendering (would you render regular text as an image?).

Besides providing acceptable rendering, MathML polyfills also have to tackle several other problems. For example, fonts remain a challenge (see more below) and browsers do not offer enough APIs for accessing font metrics, calculating correct widths and heights, or signaling the download of webfonts.

Another problem for polyfills is CSS. While picking up direct (inline) styling of MathML elements is easy enough and inheritance of surrounding CSS will work if a polyfill generates HTML, the only other proper method for accessing intended styles is getComputedStyle—which is too slow for hundreds or thousands of equations.

A common misconception is that a stylesheet may be enough to implement MathML. Opera tried and failed. (If you take away only one thing from reading this, please let it be this: a stylesheet for MathML will not work. Really. Save yourself the time and everyone the trouble.)

MathJax

[Disclaimer in case you missed it earlier: I am part of the MathJax team.]

MathJax is a bit more than just a MathML polyfill. The MathJax project started in 2008 as the successor to jsmath when technologies such as CSS 2.1 and webfonts were just about good enough to have (lots of clever) JavaScript solve the problem—no plugins, no font installation, just working out of the box.

MathJax implements the TeX layout algorithm, laying out subexpressions recursively with precise measurements and asynchronously providing webfonts, their font metrics and everything else needed for cross-browser support, all the way down to IE6.

MathJax is highly modular and extensible in input, output, and internal format; it currently accepts MathML, TeX and asciimath input, and creates either HTML/CSS, SVG or (tweaked) MathML output. Its MathML support covers most of Presentation MathML and there is experimental support for Content MathML. Notably missing are elementary math, RTL, and some of the advanced mtable attributes; see the MathJax documentation for details. MathJax also comes with a rich set of APIs which have enabled everything from StackExchange sites to web-based editors to interactive document formats (iPython, Sage) to ePub 3 reading systems.

jqmath

jqmath is actually almost as old as MathJax but only recently separated input and output so that it now works with standard MathML input (jqmath also offers a nice serialized input language).

As a polyfill, jqmath takes a completely different approach from MathJax, trying to let browser layout engines do most of the work. It is faster than MathJax but has trouble dealing with more complex content—browsers are just not reliable enough. jqmath mostly relies on local fonts and browser support for those.

jqmath is developed by Dave Barton who is the volunteer who worked on WebKit’s MathML code in 2012. Accordingly, jqmath works especially well augmenting WebKit/Safari MathML support.

tl;dr—samples!

Here are two sets of samples.

What works

First, a sample of four equations that should render ok out of the box in Firefox, Safari, MathPlayer, jqmath, and MathJax. But first, let’s look at your browser’s rendering.



Here are some screenshots for comparison.

What does not work

Here is a sample of four equations where at least one won’t work out of the box in your browser (except with MathPlayer on IE). This sample contains elementary math and RTL content in the bottom row. Again, let’s first look at your browser’s rendering.



Again, here are some screenshots for comparison.

Fonts

Fonts are a particular issue for mathematics, MathML and polyfills in particular. The most obvious problem is that most fonts do not contain glyphs for mathematical characters. But even when they do, many mathematical and scientific characters lie outside the Unicode BMP and only recent browser versions support non-BMP codepoints well enough.

Mathematics also needs stretchy characters (parenthesis, braces, root signs, etc.) which are built out of multiple glyphs; some of these glyphs have no Unicode codepoint and fonts store them at PUA codepoints (or even outside the Unicode range). In theory, the OpenType MATH table extension (developed but not officially released by Microsoft) could resolve these problems. However, no browser actually supports OpenType MATH tables—and like most low-level font technology, JavaScript polyfills would probably not be able to access them.

On the browser side, Gecko/Firefox supports stretchy constructions with Unicode characters in general and but for non-unicode components support is limited to STIX and Asana fonts (by hardcoding the PUA codepoints); see Mozilla’s documentation. WebKit/Safari only supports some stretchy constructions with Unicode characters. A big problem is for users to have the necessary fonts on their system. Firefox does not ship math fonts but there is s a math fonts Addon. Safari ships with the STIX fonts on OSX, but not iOS.

On the polyfill side, MathJax provides the necessary font data for its own webfonts and the STIX fonts with more font options in the upcoming release; jqmath leverages local fonts.

Accessibility

One of the great advantages of MathML is accessibility. Accessibility today is not just about low vision and blindness but everything from physical to learning disabilities. Most of all, it improves content for all users. With MathML, mathematical content becomes native—searchable, re-usable, copy&paste-able, and can take part in dynamic content. Simply put: it does anything we’ve come to expect from text on the web.

The state of MathML accessibility is another one of those odd aspects of its history. With so little browser support, you might not expect accessibility tools. However, the already mentioned MathPlayer plugin for IE is also the gold standard for math accessibility, providing state-of-the-art speech generation in several languages as well as Braille output, synchronized highlighting and other advanced features.

The newcomer is ChromeVox (for Chrome, ChromeOS and Android), which added math support earlier this year. Apple’s VoiceOver also recently added some support for voicing MathML in iOS7 and OSX 10.9 (Maverick). Given the state of MathML support in Safari and Chrome, this is truly “putting accessibility first”. Very recently, NVDA’s James Teh announced a prototype with MathML support.

Both MathPlayer and ChromeVox work well with MathJax. MathJax will recognize MathPlayer and hand off rendering & accessibility features. ChromeVox leverages MathJax’s APIs to make MathJax output as accessible as native MathML and in turn uses MathJax to make image renderings accessible on sites like Wikipedia, MathWorld and WordPress.com.


If you don’t see the screencast, please follow this link to the Design Science demo page.



The future

You might be dismayed when you hear that there is little reliable browser support and little to no active development. Or you might be frustrated with the complexities of polyfilling MathML.

A different perspective is to see MathML as the comeback kid of web standards. Browser vendors may not be able to see the importance or the opportunities that lie in MathML but its community won’t give up. Where other standards have slowly withered away, MathML has not only stood its ground, it was kicked out and made it back into HTML.

There is a simple reason for this: the standard is good. Its community is robust, the development steady and the need for math simply universal in education, research and industry around the world.

The most important outlook is that we are on the brink of solving the problem of browser support. Gecko is already past the 80/20 point and with a bit of funding WebKit could get there quickly. This might, in turn, lead to Blink reconsidering and re-importing the code from WebKit. At that point a large majority of users would be covered and polyfills could start augmenting the native rendering, instead of replacing it—and develop the web forward, towards future iterations of MathML.

We do not even have to wait for browser vendors to get around to this, it can start right now.

tags: , , , ,