March 24, 2014

Shadow DOM

SVG, CSS, React and Angular

For a while now I've been working on MathBox 2. I want to have an environment where you take a bunch of mathematical legos, bind them to data models, draw them, and modify them interactively at scale. Preferably in a web browser.

Unfortunately HTML is crufty, CSS is annoying and the DOM's unwieldy. Hence we now have libraries like React. It creates its own virtual DOM just to be able to manipulate the real one—the Agile Bureaucracy design pattern.

The more we can avoid the DOM, the better. But why? And can we fix it?

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
  width="400px" height="400px" viewBox="0 0 400 400" enable-background="new 0 0 400 400" xml:space="preserve">
  <polygon fill="#FDBD10" stroke="#BE1E2D" stroke-width="3" stroke-miterlimit="10" points="357.803,105.593 276.508,202.82 
    343.855,310.18 226.266,262.91 144.973,360.139 153.592,233.697 36.002,186.426 158.918,155.551 167.538,29.109 234.885,136.469 "/>
  <polygon fill="#FDEB10" points="326.982,114.932 259.695,195.408 315.441,284.271 218.109,245.146 150.821,325.623 157.955,220.966 
    60.625,181.838 162.364,156.283 169.499,51.625 225.242,140.488 "/>
</svg>

Dangling Nodes

Take SVG. Each XML tag is a graphical shape or instruction. Like all XML, the data has to be serialized into tags with attributes made of strings. Large data sets turn into long string attributes to be parsed. Large collections of stuff turn into many separate tags to be iterated over. Neither is really desirable.

It only represents basic operations, so all serious prep work has to be done by the user up front. This is what D3 is used for, generating and managing more complex mappings for you.

When you put SVG into HTML, each element becomes a full DOM node. A simple <tag> with attributes is now a colossal binding between HTML, JS, CSS and native. It's a JavaScript object that pretends to be an XML tag, embedded inside a layout model that takes years to understand fully.

Its namespace mixes metadata with page layout, getters and setters with plain properties, native methods with JS, string shorthands with nested objects, and so on. Guess how many properties the DOM Node Object actually has in total? We'll be generous and count style as one.

A hundred is not even close. A plain <div> doesn't fare much better. Just serializing a chunk of DOM back into its constituent XML is a tricky task once you get into fun stuff like namespaces. Nothing in the DOM is as simple as JSON.stringify. Why does my polygon have a base URI?

We have all these awesome dev tools now, yet we're using them to teach a terrible model to people who don't know any better.

DOM Shader

In contrast, there's Angular. I like it because they've pulled off a very neat trick: convincing people to adopt a whole new DOM by disguising it as HTML.

<body ng-controller="PhoneListCtrl">
  <ul>
    <li ng-repeat="phone in phones">
      {{phone.name}}
      <p>{{phone.snippet}}</p>
    </li>
  </ul>
</body>

When you use <input ng-model="foo"> or <my-directive>, you're creating a controller and a scope, entirely separate from the DOM, with their own rules and chain of inheritance. The pseudo-HTML in the source code is merely an initial definition, most of it inert to the browser. Angular parses it out and replaces much of it.

Like React, the browser's live DOM is subsumed and used as a sort of render tree, a generic canvas to be cleverly manipulated to match a given set of views. The real view tree hides in the shadows of JS, where controllers operate on scopes. They only use the DOM to find each other on creation, and then communicate directly. The DOM is mostly there to trigger events, do layout and look pretty. Form controls are the one exception.

It's a bad fit because the DOM was built for text markup and there's tons of baggage in the form of inline spans, floats, alignment, indentation, etc. Most of these are layout systems disguised as typography, of which CSS now has several.

The whole idea of cascading styles is suspect. In reality, most styles don't actually cascade: paddings and backgrounds are set on individual elements. The inherited ones are almost all about typography: font styles, text justification, writing direction, word wrap, etc.

Think of it this way: why should a table have a font size? Only the text inside the table can have a font size, the table is just a box with layout that contains other boxes. Why don't we write table text { size: 16px; } instead of table { font-size: 16px; }? Text nodes exist today.

Well because that's how HTML's <font> tag worked. Instead of just making a selector for text nodes, they gave all the other elements font properties. They didn't get rid of font tags, they made them invisible and put one inside each DOM node.

<html><font>
  <body><font>
    <h1><font>Hello World</font></h1>
    <p><font>Welcome to the future.</font></p>
  </font></body>
</font></html>

Unreasonable Behavior

It was decided the world would be made of block and inline elements—divs and spans—and they saw that it was good, until someone came along and said, hey, so what about my table?

<table>
  <tr>
    <td>Forever</td>
    <td>Alone</td>
  </tr>
</table>

This <table> can't be replicated with CSS 1. Tables require a particular arrangement of children and apply their own box model. It's a directive posing amongst generic markup, just like Angular.

CSS has never been able to deliver on the promise of turning semantic HTML into arbitrary layout. We've always been forced to add extra divs or classes. These are really just attachment points for independent behaviors.

Purists see these as a taint upon otherwise pristine HTML, even though I've never seen someone close a website because the markup was messy. Not all HTML should be semantic. Rather, HTML stripped of its non-semantic parts should remain meaningful to robots.

CSS 2's solution was instead to make <table> invisible too, to go with the invisible <float>, <layer>, <clear> and <frame> tags which we pretended we didn't have. Watch:

17.2.1 Anonymous table objects

[…] Any table element will automatically generate necessary anonymous table objects around itself, consisting of at least three nested objects corresponding to a 'table'/'inline-table' element, a 'table-row' element, and a 'table-cell' element. Missing elements generate anonymous objects (e.g., anonymous boxes in visual table layout) according to the following rules […]

.grid {
  display: table;
}
.grid > ul {
  display: table-row;
}
.grid > ul > li {
  display: table-cell;
}

This is called Not Using Tables.

Without typographical styles, block elements start to look very different. They're styled boxes with implied layout constraints. They stack vertically, expand horizontally and shrink wrap vertically. Floated blocks are boxes that stack horizontally, and shrink wrap both ways. Tables are grids of boxes that are locked together.

Just think how much simpler CSS would be if boxes had box styles and text had text styles, instead of all of them having both. Besides, block margins and paddings don't even work the same on inline elements, there's a whole new layout behavior there.

So we do have two kinds of objects, text and boxes, but several different ways of combining them into layout: inline, stacked, nested, absolute, relative, fixed, floated, flex or table. We have optional behaviors like scrollable, draggable, clipped or overflowing.

They're spread across display, position, float and more, only meaningful in some combinations. And presence is mixed in there too. As a result, you can't unhide an element without knowing its display model. This is a giant red flag.

Thinking with Portals

It should further raise eyebrows that the binary world of inline and block now also includes a hybrid called inline-block.

Medium share thing

You generally don't need to embed a contact form–or all of Gmail—in the middle of mixed English/Hebrew poetry shaped like a bird. You just don't. To attach something to flowing text, you should insert an anchor point instead and add floating constraints. Links are called anchor tags for a reason. Why did we forget this?

Don't shove your entire widget right between the words. You'd inherit styles, match new selectors and bubble all your events up through the text just for the sake of binding a pair of (x, y) coordinates.

Heck, pointer events, cursors, hover states... these are for interactive elements only. Why isn't that optional, so mouse events wouldn't need to bubble up through inert markup? This would completely avoid the mouseover vs mouseenter problem. What is the point of putting a resize cursor on something that is dead without JavaScript? Pointer events shouldn't fire on inert children, and inert parents shouldn't care about interactive children. It's about boundaries, not hierarchy.

Things like SVG are better used as image tags instead of embedded trees, just slotting into place while ignoring their surroundings. They do need their own tree structure, but there is little reason to graft it onto HTML/CSS, inheriting original sin. The nodes have too little in common. At most you can share the models, not the controllers.

We should be able to manipulate them from the outside, like a <canvas>, but define and load them declaratively, like an image tag.

For that matter, MathML should really be a single inline text tag, optimized for math, not a bunch of tags. Regular text spans are not just "plain text". They are trimmed, joined, bidirectionalized, word wrapped and ellipsified before display. It's a separate embedded layout model that makes up the true, invisible <p> tag. A tag that HTML1 actually sort of got right: as an operator.

We create JavaScript with code, not as abstract syntax trees. Why should I build articles and embedded languages out of enormously nested trees, instead of just typing them out and adding some anchor tags around specific interesting parts? The DOM already inserts invisible text nodes everywhere. We didn't need to wrap all our words in <text> tags by hand just to embiggen one of them. The mutant tree on the right could just look like this:

<math>x = (-b &pm; &Sqrt;(b^2 - 4 a c)) / 2a</math>

<math>x = (-b ± √(b^2 - 4 a c)) / 2a</math>

Wasn't HTML5 supposed to match how people write it? LaTeX exists.

And which is easier: defining a hairy new category of pseudo-elements like :first-letter and :first-line… or just telling people to wrap their first letter in a span if they really want to make it giant? It was ridiculous to have this feature in a spec that didn't include tables.

The :first-line problem should be solved differently: you define two separate blocks inside a directive, to spread markup across two children with a content binding. It's no different from flowing text across lines and columns.

<mrow>
  <mi>x</mi>
  <mo>=</mo>
  <mfrac>
    <mrow>
      <mrow>
        <mo>-</mo>
        <mi>b</mi>
      </mrow>
      <mo>&#xB1;<!--PLUS-MINUS SIGN--></mo>
      <msqrt>
        <mrow>
          <msup>
            <mi>b</mi>
            <mn>2</mn>
          </msup>
          <mo>-</mo>
          <mrow>
            <mn>4</mn>
            <mo>&#x2062;<!--INVISIBLE TIMES--></mo>
            <mi>a</mi>
            <mo>&#x2062;<!--INVISIBLE TIMES--></mo>
            <mi>c</mi>
          </mrow>
        </mrow>
      </msqrt>
    </mrow>
    <mrow>
      <mn>2</mn>
      <mo>&#x2062;<!--INVISIBLE TIMES--></mo>
      <mi>a</mi>
    </mrow>
  </mfrac>
</mrow>

This is the first example in the MathML spec. Really. "Invisible times".

<join>
  <box class="first-line"></box>
  <box></box>
  <content>Hello New World</content>
</join>

Would this really be insane?

The Boxed Model

CSS got it wrong and we're now suffering the consequences. The HTML feature that was ignored in CSS 1 was the thing they should've focused on: tables, which were directives that generated layout. It set us on a path of trying to fake them by piggybacking on supposedly semantic elements, like lipstick on a div. Really we were pigeonholing non-linear layout as a nested styling problem.

Semantic content was a false spectre on the document level. Making our menus out of <ul> and <li> tags did not help impaired users skip to the main article. Adding roman numerals for lists did not help us number our headers and chapters automatically.

View and render trees are supposed to be simple and transparent data structures, the model for and result of layout. This is why absolute positioning is a performance win for mobile: it avoids creating invisible dynamic constraints between things that rarely change. Styles are orthogonal to that, they merely define the shape, not where it goes.

Flash had its flaws, but it worked 15 years ago. Shoving raw SVG or MathML into the DOM—or god forbid XML3D–is a terrible idea. It's like there's an entire class of developers who've now forgotten how fast computers actually are and how memory is supposed to work. A stringly typed kitchen sink is not it.

So I frown when I see people excited about SVG in the browser in the year 2014, making polygons out of CSS 3D or driving divs with React. Yes I know, it's fun and it does work. And Angular shows the web component approach has merit. But we need a way out.

CSS should be limited to style and typography. We can define a real layout system next to it rather than on top of it. The two can combine in something that still includes semantic HTML fragments, but wraps layout as a first class citizen. We shouldn't be afraid to embrace a modular web page made of isolated sections, connected by reference instead of hierarchy.

Not my problem though, I can make better SVGs with WebGL in the meantime. But one can dream.

Angular CSS DOM Featured JavaScript React SVG March 24, 2014

Hackery, Math & Design

Steven Wittens i