Acko.net

Yak Shading

2015-09-27T00:00:00+02:00

Data-Driven Geometry

MathBox primitives need to take arbitrary data, transform it on the fly, and render it as styled geometry based on their attributes. Done as much as possible on the graphics hardware.

Three.js can render points, lines, triangles, but only with a few predetermined strategies. The alternative is to write your own vertex and fragment shader and do everything from scratch. Each new use case means a new ShaderMaterial with its own properties, so called uniforms. If the stock geometry doesn't suffice, you can make your own triangles by filling a raw BufferGeometry and assign custom per-vertex attributes. Essentially, to leverage GPU computation with Three.js—most engines, really—you have to ignore most of it.

Virtual Geometry

Shader computations are mainly rote transforms. For example, if you want to draw a line between two points, you'll have to make a long rectangle, made out of two triangles. But this simple idea gets complicated quickly once you add corner joins, depth scaling, 3D clipping, and so on. Doing this to an entire data set at once is what GPUs are made for, through vertex shaders which transform points.

Vertex shaders can only do 1-to-1 mappings. This isn't a problem by itself. You can use a gather approach to do N-to-1 mapping, where all the necessary data is pre-arranged into attribute arrays, with the data repeated and interleaved per vertex as necessary.

The proper tool for this is a geometry shader: a program that creates new geometry by N-to-M mapping of data, like making triangles out of points. WebGL doesn't support geometry shaders, won't any time soon, but you can emulate them with texture samplers. A texture image is just a big typed array, and you have random access unlike vertex attributes.

The original geometry acts only as a template, directing the shader's real data lookups. You lose some performance this way, but it's not too bad. Any procedural sampling pattern works, drawing 1 shape or 10,000. As textures can be rendered to, not just from, this also enables transform feedback, using the result of one pass to create new geometry in another.

All geometry rendered this way is 100% static as far as Three.js is concerned. New values are uploaded directly to GPU memory just before the rendering starts. The only gotcha is handling variable size input, because reallocation is costly. Pre-allocating a larger texture is easy, but clipping off the excess geometry in an O(1) fashion on the JS side is hard. In most cases there's the work around of dynamically generating degenerate triangles in a shader, which collapse down to invisible edges or points. This way, MathBox can accept variable sized arrays in multiple dimensions and will do its best to minimize disruption. If attribute instancing was more standard in WebGL, this wouldn't be such an issue, but as it stands, the workarounds are very necessary.

Vertex Party

If you squint very hard it looks a bit like React for live geometry. Except instead of a diffing algorithm, there's a few events, some texture uploads, a handful of draw calls and then an idle CPU. It's ideal for drawing thousands of things that look similar and follow the same rules. It can handle not just basic GL primitives like lines or triangles, but higher level shapes like 3D arrows or sprites.

My first prototype of this was my last christmas demo. It was messy and tedious to make, especially the shaders, but it performed excellently: the final scene renders ~200,000 triangles. Despite being a layer around Three.js … around WebGL … around OpenGL … around a driver … around a GPU … performance has far exceeded my expectations. Even complex scenes run great on my Android phone, easily 10x faster than MathBox 1, in some cases more like 1000x.

Of course compared to cutting edge DirectX or OpenCL (not a typo), this is still very limited. In today's GPUs, the charade of attributes, geometries, vertices and samples has mostly been stripped away. What remains is buffers and massive operations on them, exposed raw in new APIs like AMD's Mantle and iOS's Metal. My vertex trickery acts like a polyfill, virtualizing WebGL's capabilities to bring them closer to the present. It goes a bit beyond what geometry shaders can provide, but still lacks many useful things like atomic append queues or stream compaction.

For large geometries, the set up cost can be noticeable though. Shader compilation time also grows with transform complexity, doubly so on Windows where shaders are recompiled to HLSL / Direct3D. This makes drawing ops the heaviest MathBox primitives to spawn and reallocate. You could call this the MathBox version of the dreaded 'paint' of HTML. Once warmed up though, most other properties can be animated instantly, including the data being displayed: this is the opposite of how HTML works. Hence you can mostly spawn things ahead of time, revealing and hiding objects as needed, with minimal overhead and jank at runtime.

This all relies on carefully constructed shaders which have to be wired up in all their individual permutations. This needed to be solved programmatically, which is where we go last.

ShaderGraph 2

2015-09-27T00:00:00+02:00

Functional GLSL

For MathBox 1, I already needed to generate GL shaders programmatically. So I built ShaderGraph. You gave it snippets of GLSL code, each with a function inside. It would connect them for you, matching up the inputs and outputs. It supported directed graphs of calls with splits and joins, which were compiled down into a single shader. To help build up the graph progressively, it came with a simple chainable factory API.

It worked despite being several steps short of being a real compiler and having gaps in its functionality. It also committed the cardinal sin of regex code parsing, and hence accepted only a small subset of GLSL. All in all it was a bit of a happy mess, weaving vertex and fragment shaders together in a very ad-hoc fashion. Each snippet could only appear once in a shader, as it was still just a dumb code concatenator. I needed a proper way to compose shaders.

Select a node to view its code

Instanced Data Flow

Enter ShaderGraph 2. It's a total rewrite using Chris Dickinson's bona fide glsl-parser. It still parses snippets and connects them into a directed graph to be compiled. But a snippet is now a full GLSL program whose main() function can have open inputs and outputs. What's more, it now also links code in the proper sense of the word: linking up module entry points as callbacks.

Basically, snippets can now have inputs and outputs that are themselves functions. These connections don't obey the typical data flow of a directed graph and instead are for function calls. A callback connection provides a path along which calls are made and values are returned.

Snippets can be instanced multiple times, including their uniforms, attributes and varyings (if requested). Uniforms are bound to Three.js-style registers as you build the graph incrementally. So it's a module system, sort of, which enables functional shader building. Using callbacks as micro-interfaces feels very natural in practice, especially with bound parameters. You can decorate existing functions, e.g. turning a texture sampler into a convolution filter.

// Build shader graph
var shader = shadergraph.shader();
shader
  .callback()
    .pipe('sampleColor')
    .fan()
      .pipe('sepiaColor')
    .next()
      .pipe('invertColor')
    .join()
    .pipe('combineColors')
  .join()
  .pipe('convolveColor');

GLSL Composer

If you know GLSL, you can write ShaderGraph snippets: there is no extra syntax, you just add inputs and outputs to your main() function. You can use in/out/inout qualifiers or return a value. If there's no main function, the last defined function is exported.

vec3 callback(vec3 arg1, vec3 arg2);

To create a callback input in a snippet, you declare a function prototype in GLSL without a body. The function name and signature is used to create the outlet.

To create a callback output, you use the factory API. You can .require() a snippet directly, or bundle up a subgraph with .callback().….join(). In the latter case, the function signature includes all unconnected inputs and outputs inside. Outlets are auto-matched by name, type and order, with the semantics from v1 cleaned up.

Building basic pipes is easy: .pipe(…).pipe(…).…, passing in a snippet or factory. For forked graphs, you can .fan() (1-to-N) or .split() (N-to-N), use .next() to begin a new branch, and then .join() at the end. There's a few other operations, nothing crazy.

var v = shadergraph.shader();

// Graphs generated elsewhere
v.pipe(vertexColor(color, mask));
v.require(vertexPosition(position, material, map, 2, stpq));

v.pipe('line.position',    uniforms, defs);
v.pipe('project.position', uniforms);

By connecting pairs you create a functional data flow that compiles down to vanilla GLSL. It's not functional programming in GLSL, it just enables useful run-time assembly patterns, letting the snippets do the heavy lifting the old fashioned way.

As GPUs are massively parallel pure function applicators, the resulting mega-shaders are a great fit.

`$ cat *.glsl | magic`

The process still comes down to concatenating the code in a clever way, with global symbols namespaced to be unique. Function bodies are generated to call snippets in the right order, and the callbacks are linked. In the trivial case it links a callback by #defineing the two symbols to be the same. It can also impedance match compatible signatures like void main(in float, out vec2) and vec2 main(float) by inserting an intermediate call.

precision highp float;
precision highp int;
uniform mat4 modelMatrix;
uniform mat4 modelViewMatrix;
uniform mat4 projectionMatrix;
uniform mat4 viewMatrix;
uniform mat3 normalMatrix;
uniform vec3 cameraPosition;
#define _sn_191_getPosition _pg_103_
#define _sn_190_getPosition _pg_102_
#define _sn_189_getSample _pg_100_
#define _pg_99_ _sn_185_warpVertex
#define _pg_103_ _sn_190_getMeshPosition
#define _pg_100_ _sn_188_getTransitionSDFMask
#define _pg_101_ _sn_189_maskLevel
vec2 _sn_180_truncateVec(vec4 v) { return v.xy; }
uniform vec2 _sn_181_dataResolution;
uniform vec2 _sn_181_dataPointer;

vec2 _sn_181_map2DData(vec2 xy) {
  return fract((xy + _sn_181_dataPointer) * _sn_181_dataResolution);
}

uniform sampler2D _sn_182_dataTexture;

vec4 _sn_182_sample2D(vec2 uv) {
  return texture2D(_sn_182_dataTexture, uv);
}

vec4 _sn_183_swizzle(vec4 xyzw) {
  return vec4(xyzw.x, xyzw.w, 0.0, 0.0);
}
uniform float _sn_184_polarBend;
uniform float _sn_184_polarFocus;
uniform float _sn_184_polarAspect;
uniform float _sn_184_polarHelix;

uniform mat4 _sn_184_viewMatrix;

vec4 _sn_184_getPolarPosition(vec4 position, inout vec4 stpq) {
  if (_sn_184_polarBend > 0.0) {

    if (_sn_184_polarBend < 0.001) {
      
      
      
      
      vec2 pb = position.xy * _sn_184_polarBend;
      float ppbbx = pb.x * pb.x;
      return _sn_184_viewMatrix * vec4(
        position.x * (1.0 - _sn_184_polarBend + (pb.y * _sn_184_polarAspect)),
        position.y * (1.0 - .5 * ppbbx) - (.5 * ppbbx) * _sn_184_polarFocus / _sn_184_polarAspect,
        position.z + position.x * _sn_184_polarHelix * _sn_184_polarBend,
        1.0
      );
    }
    else {
      vec2 xy = position.xy * vec2(_sn_184_polarBend, _sn_184_polarAspect);
      float radius = _sn_184_polarFocus + xy.y;
      return _sn_184_viewMatrix * vec4(
        sin(xy.x) * radius,
        (cos(xy.x) * radius - _sn_184_polarFocus) / _sn_184_polarAspect,
        position.z + position.x * _sn_184_polarHelix * _sn_184_polarBend,
        1.0
      );
    }
  }
  else {
    return _sn_184_viewMatrix * vec4(position.xyz, 1.0);
  }
}
uniform float _sn_185_time;
uniform float _sn_185_intensity;

vec4 _sn_185_warpVertex(vec4 xyzw, inout vec4 stpq) {
  xyzw +=   0.2 * _sn_185_intensity * (sin(xyzw.yzwx * 1.91 + _sn_185_time + sin(xyzw.wxyz * 1.74 + _sn_185_time)));
  xyzw +=   0.1 * _sn_185_intensity * (sin(xyzw.yzwx * 4.03 + _sn_185_time + sin(xyzw.wxyz * 2.74 + _sn_185_time)));
  xyzw +=  0.05 * _sn_185_intensity * (sin(xyzw.yzwx * 8.39 + _sn_185_time + sin(xyzw.wxyz * 4.18 + _sn_185_time)));
  xyzw += 0.025 * _sn_185_intensity * (sin(xyzw.yzwx * 15.1 + _sn_185_time + sin(xyzw.wxyz * 9.18 + _sn_185_time)));

  return xyzw;
}



vec4 _sn_186_getViewPosition(vec4 position, inout vec4 stpq) {
  return (viewMatrix * vec4(position.xyz, 1.0));
}

vec3 _sn_187_getRootPosition(vec4 position, in vec4 stpq) {
  return position.xyz;
}
vec3 _pg_102_(vec4 _io_510_v, in vec4 _io_519_stpq) {
  vec2 _io_509_return;
  vec2 _io_511_return;
  vec4 _io_513_return;
  vec4 _io_515_return;
  vec4 _io_517_return;
  vec4 _io_520_stpq;
  vec4 _io_527_return;
  vec4 _io_528_stpq;
  vec4 _io_529_return;
  vec4 _io_532_stpq;
  vec3 _io_533_return;

  _io_509_return = _sn_180_truncateVec(_io_510_v);
  _io_511_return = _sn_181_map2DData(_io_509_return);
  _io_513_return = _sn_182_sample2D(_io_511_return);
  _io_515_return = _sn_183_swizzle(_io_513_return);
  _io_520_stpq = _io_519_stpq;
  _io_517_return = _sn_184_getPolarPosition(_io_515_return, _io_520_stpq);
  _io_528_stpq = _io_520_stpq;
  _io_527_return = _pg_99_(_io_517_return, _io_528_stpq);
  _io_532_stpq = _io_528_stpq;
  _io_529_return = _sn_186_getViewPosition(_io_527_return, _io_532_stpq);
  _io_533_return = _sn_187_getRootPosition(_io_529_return, _io_532_stpq);
  return _io_533_return;
}
uniform vec4 _sn_190_geometryResolution;

#ifdef POSITION_STPQ
varying vec4 vSTPQ;
#endif
#ifdef POSITION_U
varying float vU;
#endif
#ifdef POSITION_UV
varying vec2 vUV;
#endif
#ifdef POSITION_UVW
varying vec3 vUVW;
#endif
#ifdef POSITION_UVWO
varying vec4 vUVWO;
#endif


vec3 _sn_190_getMeshPosition(vec4 xyzw, float canonical) {
  vec4 stpq = xyzw * _sn_190_geometryResolution;
  vec3 xyz = _sn_190_getPosition(xyzw, stpq);

  #ifdef POSITION_MAP
  if (canonical > 0.5) {
    #ifdef POSITION_STPQ
    vSTPQ = stpq;
    #endif
    #ifdef POSITION_U
    vU = stpq.x;
    #endif
    #ifdef POSITION_UV
    vUV = stpq.xy;
    #endif
    #ifdef POSITION_UVW
    vUVW = stpq.xyz;
    #endif
    #ifdef POSITION_UVWO
    vUVWO = stpq;
    #endif
  }
  #endif
  return xyz;
}

uniform float _sn_188_transitionEnter;
uniform float _sn_188_transitionExit;
uniform vec4  _sn_188_transitionScale;
uniform vec4  _sn_188_transitionBias;
uniform float _sn_188_transitionSkew;
uniform float _sn_188_transitionActive;

float _sn_188_getTransitionSDFMask(vec4 stpq) {
  if (_sn_188_transitionActive < 0.5) return 1.0;

  float enter   = _sn_188_transitionEnter;
  float exit    = _sn_188_transitionExit;
  float skew    = _sn_188_transitionSkew;
  vec4  scale   = _sn_188_transitionScale;
  vec4  bias    = _sn_188_transitionBias;

  float factor  = 1.0 + skew;
  float offset  = dot(vec4(1.0), stpq * scale + bias);

  vec2 d = vec2(enter, exit) * factor + vec2(-offset, offset - skew);
  if (exit  == 1.0) return d.x;
  if (enter == 1.0) return d.y;
  return min(d.x, d.y);
}
uniform float _sn_191_worldUnit;
uniform float _sn_191_lineWidth;
uniform float _sn_191_lineDepth;
uniform float _sn_191_focusDepth;

uniform vec4 _sn_191_geometryClip;
attribute vec2 line;
attribute vec4 position4;

#ifdef LINE_PROXIMITY
uniform float _sn_191_lineProximity;
varying float vClipProximity;
#endif

#ifdef LINE_STROKE
varying float vClipStrokeWidth;
varying float vClipStrokeIndex;
varying vec3  vClipStrokeEven;
varying vec3  vClipStrokeOdd;
varying vec3  vClipStrokePosition;
#endif


#ifdef LINE_CLIP
uniform float _sn_191_clipRange;
uniform vec2  _sn_191_clipStyle;
uniform float _sn_191_clipSpace;

attribute vec2 strip;

varying vec2 vClipEnds;

void _sn_191_clipEnds(vec4 xyzw, vec3 center, vec3 pos) {

  
  vec4 xyzwE = vec4(strip.y, xyzw.yzw);
  vec3 end   = _sn_191_getPosition(xyzwE, 0.0);

  
  vec4 xyzwS = vec4(strip.x, xyzw.yzw);
  vec3 start = _sn_191_getPosition(xyzwS, 0.0);

  
  vec3 diff = end - start;
  float l = length(diff) * _sn_191_clipSpace;

  
  float arrowSize = 1.25 * _sn_191_clipRange * _sn_191_lineWidth * _sn_191_worldUnit;

  vClipEnds = vec2(1.0);

  if (_sn_191_clipStyle.y > 0.0) {
    
    float depth = _sn_191_focusDepth;
    if (_sn_191_lineDepth < 1.0) {
      float z = max(0.00001, -end.z);
      depth = mix(z, _sn_191_focusDepth, _sn_191_lineDepth);
    }
    
    
    float size = arrowSize * depth;

    
    
    float mini = clamp(1.0 - l / size * .333, 0.0, 1.0);
    float scale = 1.0 - mini * mini * mini; 
    float invrange = 1.0 / (size * scale);
  
    
    diff = normalize(end - center);
    float d = dot(end - pos, diff);
    vClipEnds.x = d * invrange - 1.0;
  }

  if (_sn_191_clipStyle.x > 0.0) {
    
    float depth = _sn_191_focusDepth;
    if (_sn_191_lineDepth < 1.0) {
      float z = max(0.00001, -start.z);
      depth = mix(z, _sn_191_focusDepth, _sn_191_lineDepth);
    }
    
    
    float size = arrowSize * depth;

    
    
    float mini = clamp(1.0 - l / size * .333, 0.0, 1.0);
    float scale = 1.0 - mini * mini * mini; 
    float invrange = 1.0 / (size * scale);
  
    
    diff = normalize(center - start);
    float d = dot(pos - start, diff);
    vClipEnds.y = d * invrange - 1.0;
  }


}
#endif

const float _sn_191_epsilon = 1e-5;
void _sn_191_fixCenter(vec3 left, inout vec3 center, vec3 right) {
  if (center.z >= 0.0) {
    if (left.z < 0.0) {
      float d = (center.z - _sn_191_epsilon) / (center.z - left.z);
      center = mix(center, left, d);
    }
    else if (right.z < 0.0) {
      float d = (center.z - _sn_191_epsilon) / (center.z - right.z);
      center = mix(center, right, d);
    }
  }
}


void _sn_191_getLineGeometry(vec4 xyzw, float edge, out vec3 left, out vec3 center, out vec3 right) {
  vec4 delta = vec4(1.0, 0.0, 0.0, 0.0);

  center =                 _sn_191_getPosition(xyzw, 1.0);
  left   = (edge > -0.5) ? _sn_191_getPosition(xyzw - delta, 0.0) : center;
  right  = (edge < 0.5)  ? _sn_191_getPosition(xyzw + delta, 0.0) : center;
}

vec3 _sn_191_getLineJoin(float edge, bool odd, vec3 left, vec3 center, vec3 right, float width) {
  vec2 join = vec2(1.0, 0.0);

  _sn_191_fixCenter(left, center, right);

  vec4 a = vec4(left.xy, right.xy);
  vec4 b = a / vec4(left.zz, right.zz);

  vec2 l = b.xy;
  vec2 r = b.zw;
  vec2 c = center.xy / center.z;

  vec4 d = vec4(l, c) - vec4(c, r);
  float l1 = dot(d.xy, d.xy);
  float l2 = dot(d.zw, d.zw);

  if (l1 + l2 > 0.0) {
    
    if (edge > 0.5 || l2 == 0.0) {
      vec2 nl = normalize(d.xy);
      vec2 tl = vec2(nl.y, -nl.x);

#ifdef LINE_PROXIMITY
      vClipProximity = 1.0;
#endif

#ifdef LINE_STROKE
      vClipStrokeEven = vClipStrokeOdd = normalize(left - center);
#endif
      join = tl;
    }
    else if (edge < -0.5 || l1 == 0.0) {
      vec2 nr = normalize(d.zw);
      vec2 tr = vec2(nr.y, -nr.x);

#ifdef LINE_PROXIMITY
      vClipProximity = 1.0;
#endif

#ifdef LINE_STROKE
      vClipStrokeEven = vClipStrokeOdd = normalize(center - right);
#endif
      join = tr;
    }
    else {
      
      float lmin2 = min(l1, l2) / (width * width);

      
#ifdef LINE_PROXIMITY
      float lr     = l1 / l2;
      float rl     = l2 / l1;
      float ratio  = max(lr, rl);
      float thresh = _sn_191_lineProximity + 1.0;
      vClipProximity = (ratio > thresh * thresh) ? 1.0 : 0.0;
#endif
      
      
      vec2 nl = normalize(d.xy);
      vec2 nr = normalize(d.zw);

      vec2 tl = vec2(nl.y, -nl.x);
      vec2 tr = vec2(nr.y, -nr.x);

#ifdef LINE_PROXIMITY
      
      vec2 tc = normalize(mix(tl, tr, l1/(l1+l2)));
#else
      
      vec2 tc = normalize(tl + tr);
#endif
    
      float cosA   = dot(nl, tc);
      float sinA   = max(0.1, abs(dot(tl, tc)));
      float factor = cosA / sinA;
      float scale  = sqrt(1.0 + min(lmin2, factor * factor));

#ifdef LINE_STROKE
      vec3 stroke1 = normalize(left - center);
      vec3 stroke2 = normalize(center - right);

      if (odd) {
        vClipStrokeEven = stroke1;
        vClipStrokeOdd  = stroke2;
      }
      else {
        vClipStrokeEven = stroke2;
        vClipStrokeOdd  = stroke1;
      }
#endif
      join = tc * scale;
    }
    return vec3(join, 0.0);
  }
  else {
    return vec3(0.0);
  }

}

vec3 _sn_191_getLinePosition() {
  vec3 left, center, right, join;

  float edge = line.x;
  float offset = line.y;

  vec4 p = min(_sn_191_geometryClip, position4);
  edge += max(0.0, position4.x - _sn_191_geometryClip.x);

  
  _sn_191_getLineGeometry(p, edge, left, center, right);

#ifdef LINE_STROKE
  
  vClipStrokePosition = center;
  vClipStrokeIndex = p.x;
  bool odd = mod(p.x, 2.0) >= 1.0;
#else
  bool odd = true;
#endif

  
  float width = _sn_191_lineWidth * 0.5;

  float depth = _sn_191_focusDepth;
  if (_sn_191_lineDepth < 1.0) {
    
    float z = max(0.00001, -center.z);
    depth = mix(z, _sn_191_focusDepth, _sn_191_lineDepth);
  }
  width *= depth;

  
  width *= _sn_191_worldUnit;

  join = _sn_191_getLineJoin(edge, odd, left, center, right, width);

#ifdef LINE_STROKE
  vClipStrokeWidth = width;
#endif
  
  vec3 pos = center + join * offset * width;

#ifdef LINE_CLIP
  _sn_191_clipEnds(p, center, pos);
#endif

  return pos;
}

uniform vec4 _sn_189_geometryResolution;
uniform vec4 _sn_189_geometryClip;
varying float vMask;


void _sn_189_maskLevel() {
  vec4 p = min(_sn_189_geometryClip, position4);
  vMask = _sn_189_getSample(p * _sn_189_geometryResolution);
}

uniform float _sn_192_styleZBias;
uniform float _sn_192_styleZIndex;

void _sn_192_setPosition(vec3 position) {
  vec4 pos = projectionMatrix * vec4(position, 1.0);

  
  float bias  = (1.0 - _sn_192_styleZBias / 32768.0);
  pos.z *= bias;
  
  
  if (_sn_192_styleZIndex > 0.0) {
    float z = pos.z / pos.w;
    pos.z = ((z + 1.0) / (_sn_192_styleZIndex + 1.0) - 1.0) * pos.w;
  }
  
  gl_Position = pos;
}
void main() {
  vec3 _io_546_return;

  _io_546_return = _sn_191_getLinePosition();
  _sn_192_setPosition(_io_546_return);
  _pg_101_();
}

It still does guarded regex manipulation of code too, but those manipulations are now derived from a proper syntax tree. GLSL doesn't have strings and its scope is simple, so this is unusually safe. I'm sure you can still trip it up somehow, but it's worth it for speed. I'm seeing assembly times of ~10-30ms cold, 2-4ms warm, but it depends entirely on the particular shaders.

The assembly process is now properly recursive. Unassembled shaders can be used in factory form, standing in for snippets. Completed graphs form stand-alone programs with no open inputs or outputs. The result can be turned straight into a Three.js ShaderMaterial, but there is no strict Three dependency. It's just a dictionary with code and a list of uniforms, attributes and varyings. Unlike before, building a combined vertex/fragment program is now merely syntactic sugar for a pair of separate graphs.

As it's run-time, you can slot in user-defined or code-generated GLSL just the same. Shaders are fetched by name or passed as inline code, mixed freely as needed. You supply the dictionary or lookup method. You could bundle your GLSL into JS with a build step or include embedded



PowerPoint Must Die








  "I think a lot of mathematics is really about how you understand things in your head. It's people that did mathematics, we're not just general purpose machines, we're people. We see things, we feel things, we think of things. A lot of what I have done in my mathematical career has had to do with finding new ways to build models, to see things, do computations. Really get a feel for stuff.


It may seem unimportant, but when I started out people drew pictures of 3-manifolds one way and I started drawing them a different way. People drew pictures of surfaces one way and I started drawing them a different way. There's something significant about how the representation in your head profoundly changes how you think.


It's very hard to do a brain dump. Very hard to do that. But I'm still going to try to do something to give a feel for 3-manifolds. Words are one thing, we can talk about geometric structures. There are many precise mathematical words that could be used, but they don't automatically convey a feeling for it. I probably can't convey a feeling for it either, but I want to try."
– William Thurston, The Mystery of 3-Manifolds (Video)






How do you convince web developers—heck, people in general—to care about math? This was the challenge underlying Making Things With Maths, a talk I gave three years ago. I didn't know either, I just knew why I liked this stuff: demoscene, games, simulation, physics, VR, … It had little to do with what passed for mathematics in my own engineering education. There we were served only eyesore PowerPoints or handwritten overhead transparencies, with simplified graphs, abstract flowcharts and rote formulas, available on black and white photocopies.

Smart people who were supposed to teach us about technology seemed unable to teach us with technology. Fixing this felt like a huge challenge where I'd have to start from scratch. This is why the focus was entirely on showing rather than telling, and why MathBox 1 was born. It's how this stuff looks and feels in my head, and how I got my degree: by translating formulas into mental pictures, which I could replay and reason about on demand.

PowerPoint Syndrome

Initially I used MathBox like an embedded image or video: compact diagrams, each a point or two in a presentation. My style quickly shifted though. I kept on finding ways to transform from one visualization to another. Not for show, but to reveal the similarities and relationships underneath. MathBox encouraged me to animate things correctly, leveraging the actual models themselves, instead of doing a visual morph from A to B. Each animation became a continuous stream of valid examples, a quality both captivating and revealing.






    




For instance, How to Fold a Julia Fractal is filled with animations of complex exponentials, right from the get go. This way I avoid the scare that ($ e^{i\pi} $) is a meaningful expression; symbology and tau-tology never have a chance to obscure geometrical workings. Instead a web page that casually demonstrates conformal mapping and complex differential equations got 340,000 visits. Despite spotty web browser support and excluding all mobile phones for years.




  
  Meanwhile academics voluntarily published their writings behind a $42 per PDF paywall, the colossal idiots.




The next talk, Making WebGL Dance, contained elaborate long takes worthy of an Alfonso Cuarón film, with only 3 separate shots for the bulk of a 30 minute talk. The lesson seemed obvious: the slides shouldn't have graphics in them, rather, the graphics should have slides in them. The diagnosis of PowerPoint syndrome is then the constant trashing of context from one slide to the next. A traditional blackboard doesn't have this problem: you build up diagrams slowly, by hand, across a large surface, erasing selectively and only when you run out of space.







It's not just about permanence and progression though, it's also about leveraging our natural understanding of shape, scale, color and motion. Think of how a toddler learns to interact with the world: poke, grab, chew, spit, smash. Which evolves into run, jump, fall, get back up again. Humans are naturals at taking multiple cases of "If I do this, that will happen" and turning it into a consistent, functional model of how things work. We learn language by bootstrapping random jibberish into situational meaning, converging on a shared protocol.

That said, I find the usual descriptions of how people experience language and thought foreign. Instead, when Temple Grandin speaks about visual thinking, I nod vigorously. Thought to me is analog concepts and sensory memories, remixed with visual and other simulations. It builds off the quantities and qualities present in spatial and temporal notions, which appear built-in to us.





Speech and writing is then a program designed to reconstruct particular thoughts in a compatible brain. There are a multitude of evolving languages, they can be used elegantly, bluntly, incomprehensibly, but the desired output remains the same. In my talks, armed with weapons-grade C2-continuous animations, it is easy to transcode my film reel into words, because the slides run themselves. The string of concepts already hangs in the air, I only add the missing grammar that links them up. This is a puzzle our brains are so good at solving, we usually do it without thinking.

Language is the ability of thoughts to compute their own source code.

(It's not proof, I just supply pudding.)




  
  Tip: Powerpoint remotes are 4-key USB keyboards with PageUp/PageDown, F5 and . keys.

Comes with dongle.





  


  



  I sketch rough thumbnails, then start animating until I hit a dead end. Then start another one. Titles and overlays always come last.




Manifold Dreams

I don't say all this to up my Rain Man cred, but to lay to rest the recurring question of where my work comes from. I translate the pictures in my head to HD, in order to learn from and refine the view. As I did with quaternions: I struggled to grok the hypersphere, it wouldn't fit together right. So I wrote the code to trace out geodesics in color and fly around in it, and suddenly the twisting made sense. Hence my entire tutorial was built to replicate the same discovery process I went through myself.







  
    

    For visualizing the 4D hypersphere, quaternions are a natural fit.
They reveal their underlying cyclic symmetry under 4D stereographic projection.
  





There was one big problem: scenes now consisted of diagrams of diagrams, which meant working around MathBox more than with it. Performance issues arose as complexity grew. Above all there was a total lack of composability in the components. None of this could be fixed without ripping out significant pieces, so doing it incrementally seemed futile. I started from scratch and set off to reinvent all the wheels.

$$ \text{MathBox}^2 = \int_1^2 \text{code}(v) dv $$

MathBox 2 was inevitably going to suffer second-system syndrome, parts would be overengineered. Rather than fight it, I embraced it and effectively wrote a strange vector GPU driver in CoffeeScript. (Such is life, this is a blueprint meant to be simplified and made obsolete over time, not expanded upon.) It's a freight train straight to the heart of a graphics card, combining low-level and high-level in a way that feels novel 🐴 when you use it, squeezing 🐴 through a very small opening.

What was tedious before, now falls out naturally. If I format the scene above as XML/JSX, it becomes:




  


  
  
  
  
    
    
      
      
      
      
        
        
        
        

        
        
        
        

        
        
        
        
      
    
  







  
In order to make these pieces behave, a bunch of additional attributes are applied, most of which are strings or values, some of which are functions/code, either JavaScript or GLSL:




  

id="1" scale={300}>
  id="2" proxy={true} position={[0, 0, 3]} />
  id="3" speed={1/4}>
    id="4" bend={1}>
      id="5" code="
uniform float cos1;
uniform float sin1;
uniform float cos2;
uniform float sin2;
uniform float cos3;
uniform float sin3;
uniform float cos4;
uniform float sin4;

vec4 getRotate4D(vec4 xyzw, inout vec4 stpq) {
  xyzw.xy = xyzw.xy * mat2(cos1, sin1, -sin1, cos1);
  xyzw.zw = xyzw.zw * mat2(cos2, sin2, -sin2, cos2);
  xyzw.xz = xyzw.xz * mat2(cos3, sin3, -sin3, cos3);
  xyzw.yw = xyzw.yw * mat2(cos4, sin4, -sin4, cos4);

  return xyzw;
}"
 cos1=>{(t) => Math.cos(t * .111)} sin1=>{(t) => Math.sin(t * .111)} cos2=>{(t) => Math.cos(t * .151 + 1)} sin2=>{(t) => Math.sin(t * .151 + 1)} cos3=>{(t) => Math.cos(t * .071 + Math.sin(t * .081))} sin3=>{(t) => Math.sin(t * .071 + Math.sin(t * .081))} cos4=>{(t) => Math.cos(t * .053 + Math.sin(t * .066) + 1)} sin4=>{(t) => Math.sin(t * .053 + Math.sin(t * .066) + 1)} />
      id="6">
        id="7" rangeX={[-π/2, π/2]} rangeY={[0, τ]} width={129} height={65} expr={(emit, θ, ϕ, i, j) => {
        q1.set(0, 0, Math.sin(θ), Math.cos(θ));
        q2.set(0, Math.sin(ϕ), 0, Math.cos(ϕ));
        q1.multiply(q2);
        emit(q1.x, q1.y, q1.z, q1.w);
      }} live={false} channels={4} />
        id="8" color="#3090FF" />
        id="9" rangeX={[-π/2, π/2]} rangeY={[0, τ]} width={129} height={65} expr={(emit, θ, ϕ, i, j) => {
        q1.set(0, Math.sin(θ), 0, Math.cos(θ));
        q2.set(Math.sin(ϕ), 0, 0, Math.cos(ϕ));
        q1.multiply(q2);
        emit(q1.x, q1.y, q1.z, q1.w);
      }} live={false} channels={4} />
        id="10" color="#20A000" />
        id="11" rangeX={[-π/2, π/2]} rangeY={[0, τ]} width={129} height={65} expr={(emit, θ, ϕ, i, j) => {
        q1.set(Math.sin(θ), 0, 0, Math.cos(θ));
        q2.set(0, 0, Math.sin(ϕ), Math.cos(ϕ));
        q1.multiply(q2);
        emit(q1.x, q1.y, q1.z, q1.w);
      }} live={false} channels={4} />
        id="12" color="#DF2000" />
      >
    >
  >
>





Phew. That's how you make a 4D diagram with Hopf fibration as far as the eye can see. Except it's not actually JSX, that's just me and my pretty-printer pretending.

Geometry Streaming

The key is the data itself. It's an array of points mostly, but how that data is laid out and interpreted determines how useful it can be.

Most basic primitives come in fixed size chunks. Particles are single points, lines have two points, triangles have three points. Polygons and polylines have N points. So it made sense to have a tuple of N points be the basic logical unit. You can think in logical pieces of geometry, rather than raw points or individual triangles, unlike GL.

Each primitive maps over data in a standard way. Feed an array of points to a line, you get a polyline. Feed a matrix of points to a surface and you get a grid mesh. Simple. But feed a voxel to a vector, and you get a 3D vector field. The general idea is that drawing 1 of something should be as easy as drawing 100×100×100.





  
    
  







This is particularly useful for custom data expressions, which stream in live or procedural data. They now receive an emit(x, y, z, w) function, for emitting a 4-vector like XYZW or RGBA. This is little more than an inlineable call to fill a floatArray[i++] = x, quite a lot faster than returning an array or object.







  
    
  

mathbox
  .interval({
    expr: function (emit, x, i, t) {
      y = Math.sin(x + t);
      emit(x,  y);
      emit(x, -y);
    },
    width:   64,
    items:    2,
    channels: 2,
  })
  .vector({
    color: 0x3090FF,
    width: 3,
    start: true,
  });




Emitting 64 2D vectors on an interval, 2 points each.

More importantly it lets you emit N points in one iteration, which makes the JS expressions themselves feel like geometry shaders. The result feeds into one or more styled drawing ops. The number of emit calls has to be constant, but you can always knock out or mask the excess geometry.





  emit = switch channels
  when 1 then (x) ->
    array[i++] = x
    ++j
    return

  when 2 then (x, y) ->
    array[i++] = x
    array[i++] = y
    ++j
    return

  when 3 then (x, y, z) ->
    array[i++] = x
    array[i++] = y
    array[i++] = z
    ++j
    return

  when 4 then (x, y, z, w) ->
    array[i++] = x
    array[i++] = y
    array[i++] = z
    array[i++] = w
    ++j
    return

  
  
  Both the expression and emitter will be inlined into the stream's iteration loop.





  consume = switch channels
  when 1 then (emit) ->
    emit array[i++]
    ++j
    return

  when 2 then (emit) ->
    emit array[i++], array[i++]
    ++j
    return

  when 3 then (emit) ->
    emit array[i++], array[i++], array[i++]
    ++j
    return

  when 4 then (emit) ->
    emit array[i++], array[i++], array[i++], array[i++]
    ++j
    return

  
    
  Closures of Hanoi





(4-in-1)²

GPUs can operate on 4×1 vectors and 4×4 matrices, so working with 4D values is natural. Values can also be referenced by 4D indices. With one dimension reserved for the tuples, that leaves us 3 dimensions XYZ. Hence MathBox arrays are 3+1D. This is for width, height, depth, while the tuple dimension is called items. It does what it says on the tin, creating 1D W, 2D W×H and 3D W×H×D arrays of tuples. Each tuple is made of N vectors of up to 4 channels each.

Thanks to cyclic buffers and partial updates, history also comes baked in. You can use a spare dimension as a free time axis, retaining samples on the go. You can .set('history', N) to record a short log of a whole array over time, indefinitely.

All of this is modular: a data source is something that can be sampled by a 4D pointer from GLSL. Underneath, arrays end up packed into a regular 2D float texture, with "items × width" horizontally and "height × depth" vertically. Each 'pixel' holds a 1/2/3/4D point.

Mapping a 4D 'pointer' to the real 2D UV coordinates is just arithmetic, and so are operators like transpose and repeat. You just swap the XY indices and tell everyone downstream that it's now this big instead. They can't tell the difference.

You can create giant procedural arrays this way, including across rectangular texture size limits, as none of them actually exist except as transient values deep inside a GPU core. Until you materialize them by rendering to a texture using the memo primitive. Add in operators like interpolation and convolution and it's a pretty neat real-time finishing kit for data.





  
    
  













Continued in Part 2.



MathBox²
2015-09-27T00:00:00+02:00








Part 2







Continued from Part 1.

I-Can't-Believe-It's-Not-React

Underneath sits a large codebase driving it, 200+ files in JS/CS alone, not including dependencies that aren't my own. Much of it is infrastructure necessary to pull off certain tricks consistently: you can draw 2.5D lines with grace, render arbitrary Unicode text in GL, sync HTML to GPU-computed geometry, and do all this with GLSL code composed on the fly, including your own. Nobody needs all of this.





  
    
  





    
    LaTeX HTML with KaTeX.
    Plain HTML DOM.
    Virtual HTML with DOM diff.
    Live Signed Distance Fields for GL.
    
  




These wildly different strategies are actually all abstracted into the DOM path or the GL path, with a quacks-like-React component as the glue in the DOM path.






  
  
  
  
      
      
      
      
      
      
      
      
      
      
      
      

      ...      

      
      
      
      
      
      
      
      
      
      
      
      
      
      

    
  






// Define VDOM handler to clone real DOM elements
var clone = MathBox.DOM.createClass({
  render: function (el, props, children) {
    var element = children.cloneNode(true);
    return element;
  },
});

// Define VDOM handler to format 'latex' into an HTML span
var latex = MathBox.DOM.createClass({
  render: function (el) {
    this.props.innerHTML = katex.renderToString(this.children);
    return el('span', this.props);
  }
});



The interface with the HTML DOM.

Appearances deceive however,
as MathBox's own DOM is an entirely different beast.





Most of the code is for initialization only, building up a reactive machine by combining components. Once assembled it lets the GPU do most of the crunching, while relying on the JS VM to inline and optimize the chewy outside.

At the top level, MathBox is plain old JavaScript, used like this:




  /* Easy Mode */

// Bootstrap MathBox and Three.js
var mathbox = mathBox();
if (mathbox.fallback) { throw Error("No WebGL support.") };

// Make MathBox primitives
var view =
  mathbox
  .set({
    scale: null,
  })
  .camera({
    proxy: true,
    position: [0, 0, 3]
  })
  .polar({
    range: [[-2, 2], [-1, 1], [-1, 1]],
    scale: [2, 1, 1],
    bend: .25
  });

view.interval({
  width: 48,
  expr: function (emit, x, i, t) {
    // Emit sine wave
    y = Math.sin(x + t / 4) * .5 + .75;
    emit(x, y);
  },
  channels: 2,
})
.line({
  color: 0x30C0FF,
  width: 16,
})
.resample({
  width: 8,
})
.point({
  color: 0x30C0FF,
  size: 60,
})
.html({
  width:  8,
  expr: function (emit, el, i, j, k, l, t) {
    // Emit random latex
    var color = ['#30D0FF','#30A0FF'][i%2];
    var a = Math.round(t + i) % symbols.length;
    var b = Math.round(Math.sqrt(t * t + Math.sin(t + i * i) + 5));
    emit(el(latex, {style: {color: color}},
      '\\sqrt{\\text{LaTeX} + '+(i + b)+' \\pi^{'+symbols[a]+'}}'));
  },
})
.dom({
  snap: false,
  offset: [0, 32],
  depth: 0,
  zoom: 1,
  outline: 2,
  size: 20,
});

// ...






  
    

    CSS 3D rims included.
  





  
    

    The default fallback message.
  




  
The WebGL Canvas bootstrapper is a separate piece though, it's wrapping Threestrap, the little non-framework that could. It lets you spawn a fully functioning GL canvas in one exceedingly configurable call. It takes care of browser support, resizing with retina, CSS alignment, warm-up and more.

If you prefer instead, you can spawn a bare MathBox context in a Three.js scene of your choosing, but you'll have to babysit it:




  /* Simple Mode */

// Vanilla Three.js
var renderer = new THREE.WebGLRenderer();
var scene = new THREE.Scene();
var camera = new THREE.PerspectiveCamera(60, WIDTH / HEIGHT, .01, 1000);

// Insert into document
document.body.appendChild(renderer.domElement);

// MathBox context
var context = new MathBox.Context(renderer, scene, camera).init();
var mathbox = context.api;

// Set size
renderer.setSize(WIDTH, HEIGHT);
context.resize({ viewWidth: WIDTH, viewHeight: HEIGHT });

// Place camera and set background
camera.position.set(0, 0, 3);
renderer.setClearColor(new THREE.Color(0xFFFFFF), 1.0);

// MathBox elements
view = mathbox
.set({
  focus: 3,
})
.cartesian({
  range: [[-2, 2], [-1, 1], [-1, 1]],
  scale: [2, 1, 1],
});
// ...

// Render frames
var frame = function () {
  requestAnimationFrame(frame);
  context.frame();
  renderer.render(scene, camera);
};
requestAnimationFrame(frame);




Despite looking like a monolith, it really isn't, it was merely a matter of convenience and sanity to not decouple it more until its shape had stabilized. Minimal builds are, for now, left as an exercise to the reader. I've split up the thinking and design into several articles, mirroring the architecture. However, you don't need to know all this to use MathBox 2, they are for people who want to know the how and why... the document model, the geometry core and the shader assembly.

In putting it all together, the devil's in the details of course. Depending on your imagination, it's either much more or much less powerful than you want. There's still far too much to cover: slideshows, keyframe tracks, fov-calibrated units, z-indexes, atlas retexting, … Most of this is unsurprising in that it all works. You can define a keyframe interpolation between two value or emitter expressions, and watch the smoothly lerped data go. Animation tracks are tied to triggers like clocks and slides, which lets them fit naturally in presentations.








  
  index={1}>
    
    steps={2}>
      
      
        
        
        
        
        
        script={[
            {props: {color: 'red'}},
            {props: {color: 'blue'}},
          ]} >
      
    
    
    
      
      
        
        
        
        script={[
                       {props: {expr: (emit, x) => emit(x, Math.sin(x))}},
                       {props: {expr: (emit, x) => emit(x, Math.tan(x))}},
                      ]} />
        
        

        
        
        
      
    
    
    
      
      
        
        
        
        
      
    
  






One More Thing…



Images are data. So is audio. That means MathBox 2 is Winamp AVS, Milkdrop and the mythical Fridge all rolled into one. You can replicate your everyday trippy music visualizer with two operators: render-to-texture (rtt) and compose. It acts as an embedded scene, rendering all of its children to an off-screen image, while Compose renders a full-screen pass. This is where the model's expressiveness shines.

Milkdrop equals mathbox.rtt(…).compose(…).….end().compose(…), that is, an image feeding back into itself, but also rendered to the screen. The necessary double buffering and swaps are abstracted away. Drop in shapes and shaders, add transforms, nest as you like. RTTs have a history parameter like arrays, so Turing patterns, self-propagating hypno spirals, and other cool partial diffy eqs are a shader away.







  
    
  






  

id="1" scale={720}>
  id="render" minFilter="nearest" magFilter="nearest" type="unsignedByte">
    id="3" lookAt={[0, 0, 0]} position=>{(t) => { return [Math.cos(t) * 3, 0, Math.sin(t) * 3] }} />
    id="4" range={[[-2, 2], [-1, 1], [-1, 1]]} scale={[2, 1, 1]}>
      id="5" scale={[7/10, 7/10, 7/10]}>
        id="6" width={5} divideX={2} divideY={2} zBias={10} opacity={1/4} color={16768992} />
      >
    >
  >
  id="rtt1" history={4} type="unsignedByte" minFilter="linear" magFilter="linear">
    id="8" code="
uniform vec3 dataResolution;
uniform vec3 dataSize;
uniform float cosine;
uniform float sine;
vec4 getSample(vec3 xyz);
vec4 getFramesSample(vec3 xyz) {
  vec2 pos = xyz.xy * dataResolution.xy - .5;
  pos = ((pos * dataSize.xy) * mat2(cosine, sine, -sine, cosine) * .999) / dataSize.xy;
  xyz.xy = (pos + .5) * dataSize.xy;
  vec4 c = getSample(xyz + vec3( 0.0, 0.0, 1.0));
  vec3 t = getSample(xyz + vec3( 0.0, 1.5, 0.0)).xyz;
  vec3 b = getSample(xyz + vec3( 0.0,-1.5, 0.0)).xyz;
  vec3 l = getSample(xyz + vec3(-1.5, 0.0, 0.0)).xyz;
  vec3 r = getSample(xyz + vec3( 1.5, 0.0, 0.0)).xyz;
  return vec4((t + b + l + r) / 2.0 - c.xyz, c.w);
}"
 cosine=>{(t) => Math.cos(Math.sin(t * .2) * .005)} sine=>{(t) => Math.sin(Math.sin(t * .2) * .005)} />
    id="resample1" indices={3} channels={4} />
    id="10" />
    id="11" source="#render" blending="add" />
  >
  id="12" minFilter="linear" magFilter="linear" type="unsignedByte">
    id="colormap" code="
uniform float modulate1;
uniform float modulate2;
uniform float modulate3;
uniform float modulate4;
vec4 getSample(vec3 xyz);
vec4 getFramesSample(vec3 xyz) {
  vec4 color = (
    getSample(xyz) +
    getSample(xyz + vec3(0.0, 0.0, 1.0)) +
    getSample(xyz + vec3(0.0, 0.0, 2.0)) +
    getSample(xyz + vec3(0.0, 0.0, 3.0))
  ) / 4.0;
  color = color * color * color * 1.15;
  float v = color.x + color.y + color.z;
  vec3 c = vec3(v*v + color.x * .2, v*v, v*v*v + color.z) * .333;
  c = mix(c, mix(sqrt(c.yzx * c), c.zxy, modulate1), modulate2);
  c = mix(c, mix(c.yzx, c.zxy, modulate1), modulate2);
  c = mix(c, mix(abs(sin(c.yxz * 2.0)), c.zyx, modulate3), modulate4);
  return vec4(c, 1.0);
}"
 modulate1=>{(t) => Math.cos(t * .417) * .5 + .5} modulate2=>{(t) => Math.cos(t * .617 + Math.sin(t * .133)) * .5 + .5} modulate3=>{(t) => Math.cos(t * .217 + 2.0) * .5 + .5} modulate4=>{(t) => Math.cos(t * .117 + 3.0 + Math.sin(t * .133)) * .5 + .5} />
    id="resample2" source="#rtt1" indices={3} channels={4} />
    id="15" />
  >
  id="16" code="
vec4 getSample(vec2 xy);
vec4 getFramesSample(vec2 xy) {
  return getSample(xy + vec2(0.5, 0.5));
}"
 />
  id="resample3" indices={2} channels={4} />
  id="18" source="#resample2" />
>







The difference compared to AVS is that the .rtt() is inert to its container by default. Until you add on a .compose() pass, it's just a dangling data source. Meanwhile the .compose() op offers the necessary  GL blend modes, opacity and color tints through style properties. Document order defines drawing order, so the decomposition into render passes is direct. On top, zOrder can be overridden (drawing order), as can zIndex (2D stacking order) and zBias (3D stacking order).

You can nest effects and compose shaders to create recursive visualizations, sampling from themselves or each other:







  
    
  
  
  
  
  Open in New Window
(Burn that fillrate, baby.)
  
  Bonus: Endless Visualizer.





  
id="1" scale={720}>
  id="2" proxy={true} position={[3/10, 1/10, 2]} />
  id="3">
    id="audioTime" data={[]} width={1024} channels={1} />
    id="audioFreq" data={[]} width={512} channels={1} />
  >
  id="render" width={256} height={144} type="unsignedByte" minFilter="nearest" magFilter="nearest">
    id="7" position={[0, 0, 5/2]} />
    id="8">
      id="9" source="#audioTime" order="yx" />
      id="10" width={[861/250, 0, 0, 0]} />
      id="11" code="
vec4 getSample(vec4 xyzw);
vec4 getColor(vec4 xyzw) {
  float h = getSample(xyzw).y;
  return vec4(vec3(h), 1.0);
}"
 />
      id="12" />
      id="13" scale={[1, 3/4, 1]}>
        id="14" points="<<" colors="<" width={5} color={16777215} opacity={2/5} blending="add" />
      >
    >
    id="15" range={[[-2, 2], [-1, 1], [-1, 1]]} scale={[1/2, 1/4, 1/4]} quaternion=>{(t) => {
          c = Math.cos(t / 3);
          s = Math.sin(t / 3);
          c2 = Math.cos(t / 8.71);
          s2 = Math.sin(t / 8.71);
          return [s * s2, s * c2, .2, c];
        }}>
      id="16" divideX={4} divideY={4} zBias={10} opacity={1/10} color={16768992} width={6} />
    >
  >
  id="rtt1" history={4} width={256} height={144} type="unsignedByte">
    id="18" code="#map-rotate" />
    id="resample1" indices={3} channels={4} />
    id="20" color="#ffffff" zWrite={false} />
    id="21" source="#render" blending="add" color="#ffffff" zWrite={false} />
  >
  id="rtt2" width={256} height={144} type="float">
    id="23" position={[0, 0, 5/2]} />
    id="24" seek=>{(t) => audio ? audio.currentTime : t}>
      id="25" code="#map-temporal-blur" time=>{(t) => t * 16.0} modulate=>{(t) => {
            var bang = ((t > 69.229311)  && (t < 88.922656)) ||
                       ((t > 88.922656)  && (t < 148.9143)) ||
                       ((t > 148.9143)   && (t < 158.2)) ||
                       ((t > 168.284427) && (t < 188.00)) ? 1 : 0;
            if ((t > 88.922656)  && (t < 148.9143)) {
              bang *= .5 + .45 * Math.cos(t / 3);
            }
            if ((t > 168)) {
              bang *= .85 + .15 * Math.cos(t);
            }
            modulate = modulate + (bang - modulate) * .1;
            return modulate;
          }} pattern=>{(t) => {
            var bang = ((t > 88.922656) && (t < 148.9143));
            pattern = pattern + (bang - pattern) * .1;
            if ((t > 168)) {
              pattern = .5 + .4 * Math.cos(t * 2.311);
            }
            return pattern;
          }} warp=>{(t) => {
            var bang = (t > 148.9143);
            if ((t > 168)) {
              warp *= 1 + .5 * Math.cos(t * .556);
            }
            if ((t > 148.2) && (t < 158.2)) warp = warp + .75 + .25 * Math.cos((t - 158.2));
            warp = warp + (bang - warp) * .1;
            return warp;
          }} shift=>{(t) => {
            var bang = (t > 168) ? Math.max(0, Math.min(1, .1 * (t - 168))) : 0;
            bang *= .75 + .25 * Math.cos(t * .731);
            warp = warp + (bang - warp) * .1;
            return warp;
          }} />
      id="resample2" source="#rtt1" indices={3} channels={4} />
      id="27" color="#fff" zWrite={false} />
    >
    id="28" scale={[1, 1/4, 1]}>
      id="29" source="#audioTime" order="yx" />
      id="30" width={[861/250, 0, 0, 0]} />
      id="31" code="
vec4 getSample(vec4 xyzw);
vec4 getColor(vec4 xyzw) {
  float h = getSample(xyzw).y;
  return vec4(vec3(h) * .2, 1.0);
}"
 />
      id="32" />
      id="33" points="<<" colors="<" width={50} color={16777215} opacity={1} blending="add" />
    >
  >
  id="34" width={129} height={73} />
  id="lerp" depth={2} />
  id="36" code="#map-xy-to-xyz" />
  id="37" indices={3} channels={3} />
  id="transpose" order="xywz" />
  id="color" source="#lerp" order="xywz" />
  id="40" seek=>{(t) => audio ? audio.currentTime : t}>
    id="disco" speed=>{(t) => {
        var bang = ((t > 69.329311)  && (t < 89.122656)) ||
                   ((t > 148.9143)   && (t < 158.0)) ||
                   ((t > 168.284427) && (t < 188.077772));
        return bang ? 1 : .2;
      }}>
      id="42" code="#map-z-to-color" modulate1=>{(t) => Math.cos((t + 1) * .417) * .5 + .5} modulate2=>{(t) => Math.cos((t + 1) * .617 + Math.sin(t * .133)) * .5 + .5} modulate3=>{(t) => Math.cos((t + 1) * .217 + 2.0) * .5 + .5} modulate4=>{(t) => Math.cos((t + 1) * .117 + 3.0 + Math.sin(t * .133)) * .5 + .5} />
      id="color1" source="#lerp" indices={2} channels={4} />
      id="44" code="#map-z-to-color-2" modulate1=>{(t) => Math.cos((t + 1) * .417) * .5 + .5} modulate2=>{(t) => Math.cos((t + 1) * .617 + Math.sin(t * .133)) * .5 + .5} modulate3=>{(t) => Math.cos((t + 1) * .217 + 2.0) * .5 + .5} modulate4=>{(t) => Math.cos((t + 1) * .117 + 3.0 + Math.sin(t * .133)) * .5 + .5} />
      id="color2" source="#lerp" indices={2} channels={4} />
    >
    id="46" range={[[-1.7788, 1.7788], [-1, 1], [-1, 1]]} scale={[16/9, 1, 1]} quaternion=>{(t) => {
        t = t / 3;
        c = Math.cos(t / 4);
        s = Math.sin(t / 4);
        c2 = Math.cos(t / 11.71) * 1.71;
        s2 = Math.sin(t / 11.71) * 1.71;
        return [s * s2, s * c2, -.2, c];
      }}>
      id="47" source="#transpose" width={33} height={19} />
      id="48" source="#color2" width={33} height={19} />
      id="49">
        id="50" points="<<" colors="<" color="#ffffff" width={2} zBias={5} />
      >
      id="51" script={{19: {position: [0, 0, 0]}, 39: {position: [0, 0, 2]}, 57: {position: [0, 0, 0]}}} />
      id="52" source="<<" order="yxzw" />
      id="53" source="<<" order="yxzw" />
      id="54">
        id="55" points="<<" colors="<" color="#ffffff" width={2} zBias={5} />
      >
      id="56" script={{19: {position: [0, 0, 0]}, 39: {position: [0, 0, -2]}, 57: {position: [0, 0, 0]}}} />
      id="57">
        id="58" points="<<" colors="<" color="#ffffff" size={10} zBias={5} zOrder={1} blending="add" zWrite={false} />
      >
      id="59" script={{19: {position: [0, 0, 0]}, 39: {position: [0, 0, -1]}, 57: {position: [0, 0, 0]}}} />
      id="60">
        id="61" points="#transpose" colors="#color2" color="#ffffff" size={5} zBias={5} zOrder={1} blending="add" zWrite={false} />
      >
      id="62" script={{9: {position: [0, 0, 0]}, 39: {position: [0, 0, 1]}, 57: {position: [0, 0, 0]}}} />
      id="63" points="#transpose" colors="#color1" color="#ffffff" start={false} end={false} width={40} opacity={3/100} blending="add" zWrite={false} zOrder={-2} />
    >
  >
>
  






Full Stack

What's left is basically kicking the tires and fixing the blind spots. As such this is not MathBox 2.0, this is MathBox 2 Alpha 1. It's still rough in the compatibility department, easily letting you exceed GL limits satisfied by only 70-80% of WebGL implementations in the wild, without warning. My own goal for public release was to be able to make another one of those presentations with it, only this time, 100% idiosyncratic MathBox. Result, The Pixel Factory.

Some people have assumed this talk was another tour-de-force of multi-week autism, but in fact, rebuilding my old slides for v2 was easy and obvious. The RGBA subpixels and their labels are animated lambdas and GLSL. The multi-samples, the depth buffer columns, the tangents and normals, same thing. JavaScript twiddles the knobs while the GPU visualizes the visualizer, and in doing so, itself.

Here the code be.

To give it a whirl in your browser, open the JSBin Sandbox. There is a quick start introduction and a list of legos.


  MathBox² - PowerPoint Must Die
  A DOM for Robots - Modelling Live Data
  Yak Shading - Data Driven Geometry
  ShaderGraph 2 - Functional GLSL







A DOM for Robots
2015-09-27T00:00:00+02:00








Modelling Live Data

I want to render live 3D graphics based on a declarative data model. That means a choice of shapes and transforms, as well as data sources and formats. I also want to combine them and make live changes. Which sounds kind of DOMmy.




  




3D engines don't have Document Object Models though, they have scene graphs and render trees: minimal data structures optimized for rendering output. In Three.js, each tree node is a JS object with properties and children like itself. Composition only exists in a limited form, with a parent's matrix and visibility combining with that of its children. There is no fancy data binding: the renderer loops over the visible tree leaves every frame, passing in values directly to GL calls. Any geometry is uploaded once to GPU memory and cached. If you put in new parameters or data, it will be used to produce the next frame automatically, aside from a needsUpdate bit here and there for performance reasons.

So Three.js is a thin retained mode layer on top of an immediate mode API. It makes it trivial to draw the same thing over and over again in various configurations. That won't do, I want to draw dynamic things with the same ease. I need a richer model, which means wrapping another retained mode layer around it. That could mean observables, data binding, tree diffing, immutable data, and all the other fun stuff nobody can agree on.

However I mostly feed data in and many parameters will end up as shader properties. These are passed to Three as a dictionary of { type: '…', value: x } objects, each holding a single parameter. Any code that holds a reference to the dictionary will see the same value, as such it acts as a register: you can share it, transparently binding one value to N shaders. This way a single .set('color', 'blue') call on the fringes can instantly affect data structures deep inside the WebGLRenderer, without actually cascading through.





  
    
  




  




I applied this to build a view tree which retains this property, storing all attributes as shareable registers. The Three.js scene graph is reduced to a single layer of THREE.Mesh objects, flattening the hierarchy. Rather than clumsy CSS3D divs which encode matrices as strings, there's binary arrays, GLSL shaders, and highly optimizable JS lambdas.

As long as you don't go overboard with the numbers, it runs fine even on mobile.




  

id="1" scale={600} focus={3}>
  id="2" proxy={true} position={[0, 0, 3]} />
  id="3" code="
uniform float time;
uniform float intensity;

vec4 warpVertex(vec4 xyzw, inout vec4 stpq) {
  xyzw +=   0.2 * intensity * (sin(xyzw.yzwx * 1.91 + time + sin(xyzw.wxyz * 1.74 + time)));
  xyzw +=   0.1 * intensity * (sin(xyzw.yzwx * 4.03 + time + sin(xyzw.wxyz * 2.74 + time)));
  xyzw +=  0.05 * intensity * (sin(xyzw.yzwx * 8.39 + time + sin(xyzw.wxyz * 4.18 + time)));
  xyzw += 0.025 * intensity * (sin(xyzw.yzwx * 15.1 + time + sin(xyzw.wxyz * 9.18 + time)));

  return xyzw;
}"
 time=>{(t) => t / 4} intensity=>{(t) => {
        t = t / 4;
        intensity = .5 + .5 * Math.cos(t / 3);
        intensity = 1.0 - Math.pow(intensity, 4);
        return intensity * 2.5;
      }} />
  id="4" stagger={[10, 0, 0, 0]} enter=>{(t) => 1.0 - Math.pow(1.0 - Math.min(1,  (1 + pingpong(t))*2), 2)} exit=>{(t) => 1.0 - Math.pow(1.0 - Math.min(1,  (1 - pingpong(t))*2), 2)}>
    id="5" pass="view">
      id="6" bend={1/4} range={[[-π, π], [0, 1], [-1, 1]]} scale={[2, 1, 1]}>
        id="7" position={[0, 1/2, 0]}>
          id="8" detail={512} />
          id="9" divide={10} unit={π} base={2} />
          id="10" width={3} classes=["foo", "bar"] />
          id="11" divide={5} unit={π} base={2} />
          id="12" expr={(x) => {
        return x ? (x / π).toPrecision(2) + 'π' : 0
      }} />
          id="13" depth={1/2} zIndex={1} />
        >
        id="14" axis={2} detail={128} crossed={true} />
        id="15" position={[π/2, 0, 0]}>
          id="16" axis={2} detail={128} crossed={true} />
        >
        id="17" position={[-π/2, 0, 0]}>
          id="18" axis={2} detail={128} crossed={true} />
        >
        id="19" divideX={40} detailX={512} divideY={20} detailY={128} width={1} opacity={1/2} unitX={π} baseX={2} zBias={-5} />
        id="20" width={512} expr={(emit, x, i, t) => {
        emit(x, .5 + .25 * Math.sin(x + t) + .25 * Math.sin(x * 1.91 + t * 1.81));
      }} channels={2} />
        id="21" width={5} />
        id="22" pace={10} loop={true} to={3} script=[[{color: "rgb(48, 144, 255)"}], [{color: "rgb(100, 180, 60)"}], [{color: "rgb(240, 20, 40)"}], [{color: "rgb(48, 144, 255)"}]] />
      >
    >
  >
>  
  
  
    
  Note: The JSX is a lie, you define nodes in pure JS.




Keep it Simple

From afar there's a tree of nodes, similar to SVG tags. This is the MathBox library of vector primitives. The basic shapes are all there: points, lines, faces, vectors, surfaces, etc. These nodes are placed inside a shallow hierarchy of views and transforms.

However none of the shapes draw anything by themselves. They only know how to draw data supplied by a linked source. Data can be an array (static or live), a procedural source, custom JS / GLSL code, etc. This is further augmented by data operators which can be sandwiched between source and shape, forming automatic pipelines between siblings.

The current set of components looks like this:









  
  Base
  
    Group
    Inherit
    Root
    Unit
  

  Camera
  
    Camera
  
  

  
  Draw
  
    Axis
    Face
    Grid
    Line
    Point
    Strip
    Surface
    Ticks
    Vector
  
  

  
  Data
  
    Area
    Array
    Interval
    Matrix
    Scale
    Volume
    Voxel
  
  

  
  Operator
  
    Grow
    Join
    Lerp
    Memo
    Resample
    Repeat
    Slice
    Split
    Spread
    Swizzle
    Transpose
  
  

  

  
  Overlay
  
    DOM
    HTML
  

  Present
  
    Move
    Play
    Present
    Reveal
    Slide
    Step
  
  

  
  RTT
  
    Compose
    RTT
  

  Shader
  
    Shader
  
  

  
  Text
  
    Format
    Label
    Text
    Retext
  

  Time
  
    Clock
    Now
  
  

  
  Transform
  
    Fragment
    Layer
    Transform
    Transform4
    Vertex
  
  

  
  View
  
    Cartesian
    Cartesian4
    Polar
    Spherical
    Stereographic
    Stereographic4
    View
  
  









To make you feel at home, nodes have an id and classes, and you can use CSS selectors to identify them. Nodes link up with preceding siblings and parents by default, but you can select any node in the tree. This allows for arbitrary graphs, including feedback loops. However all of this is optional: you can also pass in direct node objects or MathBox's own jQuery-like selections. What it doesn't have is a notion of detached document fragments: nodes are immediately inserted on creation.

A node's attributes can be .get() and .set(), though there is also a read-only .props dictionary for fashionable reasons. The values are strongly typed as Three.js colors, vectors, matrices, … but accept e.g. CSS colors and ordinary arrays too. The values are normalized for immediate use, the original values are preserved on the side for printing and serialization.



What's unique is the emphasis on time. First, properties can be directly bound to time-dependent expressions, on creation or afterwards. Second, clocks are primitives on their own. This allows for nested timelines, on-demand bullet time, fast forwards and more. It even supports limited time travel, evaluating an expression several frames in the past. This can be used to ensure consistent 60 fps data logging through janky updates, useful for all sorts of things. It's exposed publicly as .bind(key, expr) and .evaluate(key, time) per node. It's also dogfood for declarative animation tracks. The primitives clock/now provide timing, while step and play handle keyframes on tracks.

This is definitely a DOM, but it has only basic features in common with the HTML DOM and does much less. Most of the magic comes from the components themselves. There's no cascade of styles to inherit. Children compose with a parent, they do not inherit from it, only caring about their own attributes. The namespace is clean, with no weird combo styles à la CSS. As much as possible, attributes are unique orthogonal knobs you can turn freely.

Model-View-Projection

On the inside I separate the generic data model from the type-specific View Controller attached to it. The controller's job is to create and manage Three.js objects to display the node (if any). Because a data source and a visible shape have very little in common, the nodes and their controllers are blank slates built and organized around named traits. Each trait is a data mix-in, with associated attributes and helpers for common behavior. Primitives with the same traits can be expected to work the same, as their public facing models are identical.

Controllers can traverse the graph to find each other by matching traits, listening for events and making calls in response. This way only specific events will cascade through cause and effect, often skipping large parts of the hierarchy. The only way to do a "global style recalculation" would be to send a forced change event to every single controller, and there's never a reason to do so.

The controller lifecycle is deliberately kept simple: make(), made(), change(…), unmake(), unmade(). When a model changes, its controller either updates in place, or rebuilds itself, doing an unmake/make cycle. The change handler is invoked on creation as well, to encourage stateless updates. It affords live editing of anything, without having to micro-optimize every possible change scenario. Controllers can also watch bound selectors, retargeting if their matched set changes. This lets primitives link up with elements that have yet to be inserted.

Unlike HTML, the DOM is not forced to contain a render tree as well. Only some of the leaf nodes have styles and create renderables. Siblings and parents are called upon to help, but the effects don't have to be strictly hierarchical. For example, a visual effect can wrap a single leaf but still be applied after all its parents, as transformations are collected and composed in passes.


It'll Do

The result is not so much a document model as it is a computational model inside a presentational model. You can feed it finalized data and draw it directly… or you can build new models within it and reproject them live. Memoization enables feedback and meta-visualization. The line between data viz and demo scene is rarely this blurry.

Here, the notion of a computed style has little meaning. Any value will end up being transformed and processed in arbitrary ways down the pipe. As I've tried to explain before, the kinds of things people do with getComputedStyle() and getClientBoundingRect() are better achieved by having an extensible layout model, one that affords custom constraints and composition on an equal footing. To do otherwise is to admit defeat and embrace a leaky abstraction by design.

The shallow hierarchy with composition between siblings is particularly appealing to me, even if I realize it introduces non-traditional semantics more reminiscent of a command-line. It acts as both a jQuery-style chainable API, and a minimal document model. If it offends your sensibilities, you could always defuse the magic by explicitly wiring up every relationship. In case of confusion, .inspect() will log syntax highlighted JSX, while .debug() will draw the underlying shader graphs.

I've defined a good set of basic primitives and iterated on them a few times. But how to implement it, when WebGL doesn't even fully cover OpenGL ES 2?



  MathBox² - PowerPoint Must Die
  A DOM for Robots - Modelling Live Data
  Yak Shading - Data Driven Geometry
  ShaderGraph 2 - Functional GLSL






Animate Your Way to Glory
2013-09-13T00:00:00+02:00








Math and Physics in Motion








  “The last time that I stood here was seven years ago.”

  “Seven years ago! How little do you mortals understand time.
Must you be so linear, Jean-Luc?”
  – Picard and Q, All Good Things, Star Trek: The Next Generation






Note: This article is significantly longer than previous instalments. It features 4 interactive slideshows, each introducing a new tool as well as related concepts around it. In one way, it's just another math guide, but going much deeper. In another, it's a thesis on everything I know about animating. Their intersection is a handbook for anyone who wants to make things move with code, but I hope it's an interesting read even if that's not your goal.





Developers have a tough job these days. A seamless experience is mandatory, polish is expected. On touch devices, they are expected to become magicians. The trick is to make an electronic screen look and feel like something you can physically manipulate. Animation is the key to all of this.

Not just any animation though. Flash intros were hated for a reason. The  tag is not your friend, and flashing banner ads only annoy rather than invite. If elaborately designed effects distract from the content, or worse, ruin smoothness and performance, it'll turn people off rather than endear. Animation can only add value when its fast and fluid enough to be responsive.

It's not mere polish either, a finishing touch. Animation–and UI in general—should always be an additional conversation with the user, not a representation of internal software or hardware state. When we press Play in a streaming music app, the app should respond immediately by showing a Pause control, even if the music won't actually start playing for another second. When we enable Airplane Mode on our phones, we don't care that it'll take a few seconds to say good bye to the cell tower and turn off the radio. The UI is there to respond to our wishes: it should act like a personal assistant, not a reluctant helper, or worse, a demanding master.






  
  The OS X 'genie' effect. Ridiculed, but it leaves no question where the window went.




Hence animation is visual language and communicates both explicitly and implicitly. It establishes an unspoken trust and confidence between designer and user: we promise nothing will appear, change or disappear without explanation. It can show where to find things, like an application that minimizes into place in the dock, or a picture sliding into a thumbnail strip. It can tell miniature stories, like a Download button turning into a progress bar turning into a checkmark. More simply just the act of scrolling around a live document, creating the illusion of viewing an infinite canvas, persisting in space and time. Here, page layout is the use of placement and style to denote hierarchy and meaning in a 2D space.

As with any conversation, tone matters, in this case expressed through choreography. Items can fade into the background or pop to demand our attention, expressing calm or assertiveness. Elements can complement or oppose, creating harmony or dissonance. Animations can be minimalist or baroque, ordered or parallel, independent or layered. The proper term for this is staging, and research shows that it can significantly increase our understanding of diagrams and graphs when applied carefully. Whenever elements transition, preferably one at a time, it is easier to gauge changes in color, size, shape and position than when we are only shown a before and after shot.

This is important everywhere, but especially so for abstract topics like data visualization and mathematics. When we have no natural mental model of something, we build our understanding based on the interface we use to examine it. The more those interfaces act like real objects, the less surprising they are.

In doing so, we replace explicit explanations with implicit metaphors from the natural world: distance, direction, scale, shadow, color, contrast. These are the cues our brains evolved to be excellent at interpreting. By imbuing virtual objects with these properties, we make them more realistic and thus more understandable. Mind you, this is not a call for skeuomorphism, far from it. The properties we are seeking to mimic are far more basic, far more important, than some faux leather and stitching.




  
  D3.js Force Directed Graph — Mike Bostock



  
  Star Trek TNG PADD, aka the iPad. Arrived slightly before the 2360s.






The clearest example of this has to be inertial scrolling. Compared to an ordinary mouse wheel, scrolling on a tablet is actually much more complicated. We can flick and grab, go as fast or slow as we want. When skimming through a list, often we never wait for the page to stop moving, in theory requiring more effort to read. Yet everyone who's seen a toddler with an iPad can attest to its uncanny ease of use and efficiency, offering improved control and comprehension. Our brains are very good at tracking and manipulating objects in motion, particularly when they obey the laws of physics: moving with consistent inertia and force.

Which brings me to the actual topic of this post: how animation works on a fundamental level. I'd like to teach a mental model based on physics and math, and how to precisely control it. Along the way, we'll come to understand why Apple built a physics engine into iOS 7's UI, reveal some secrets of the demoscene, compose fluid keyframe animations, and defeat the final boss: seamless rotation in 3D. In doing so, we'll also go beyond just visual animation. The techniques described here work equally well for manipulating audio, processing data or driving meatspace devices. In a world of data, animation is just a different word for precise control.

A Matter of Time

An animation is something that changes over time. As it so happens, these three humble words are a veritable Pandora's box of mathematics. They open up to the strange world of the continuously and infinitely dividable, also known as calculus.

In a previous article, I covered the origins of calculus and how to approach the concept of infinity. In what follows, we won't be needing it much though. We'll be working with finite steps throughout, with discrete time. This makes it vastly easier to understand, and is an eminently useful stepping stone to the true theory of continuous motion, which you can find in any good physics textbook.

Math class hates it when we just punch numbers into our calculator instead of deducing the exact result: a decimal number is meaningless on its own. On that, I can agree. But when we punch in a couple thousand numbers and look at them in aggregate, it can tell us just as much. This page will be your calculator.





  
    
  

  

    
      Let's start where Isaac Newton supposedly did, with an apple.
    

    
      Gravity kicks in. The apple bounces off the ground, losing some energy in the process. After a few bounces, its kinetic energy (speed) and potential energy (height) have both dissipated, and the apple is at rest.
    

    
      But analyzing motion by watching it in real-time is tricky. It's better to visualize time as its own dimension, here horizontal, and look at the entire animation as a whole.
    

    
      $$ \class{blue}{p(t)} $$
      The apple's position $ \class{blue}{p(t)} $ moves through space and time, along arcs of decreasing height and duration. Once at rest, it continues advancing through time, without moving in space. In common parlance, this is the animation's easing curve.
    

    
      $$ \class{blue}{p_i}, \, t_i $$
      It's worth pointing out they're not really arcs. This animation consists of individually numbered frames $ i $, switching 60 times per second. While a frame is displayed, the position $ \class{blue}{p_i} $ of the apple is constant. In between its value changes instantly, at times $ t_i $.
    

    
      For convenience's sake, it's reasonable to consider this a curve, approximated by a series of straight lines. After all, that's the illusion that the animation successfully tricks us into seeing. The discrete nature of the curve will let us dissect it more easily. We're interested in the physics of this motion.
    

    
      $$ \class{green}{v_{i→}} = \frac{\class{blue}{p_{i+1}} - \class{blue}{p_{i}}}{t_{i+1} - t_i} $$
      To determine the speed of the apple, we find the slope of a line segment: vertical divided by horizontal. Dividing distance by time gives us speed, e.g. meters per second. But actually, we're dealing with its cousin velocity which has a direction too. Positive slope means going up, negative slope means going down. This operation is called a forward or backward difference, depending on whether you look forward ($ \class{green}{v_{i→}} $) or backward ($ \class{green}{v_{←i}} $) around a point.
    

    
      $$ \class{green}{v_{i↓}} = \frac{\class{blue}{p_{i+1}} - \class{blue}{p_{i-1}}}{t_{i+1} - t_{i-1}} $$
      Forward differences tell us about what's happening between two adjacent points. We're more interested in what's happening at the points themselves. To fix this, we can take a central difference $ \class{green}{v_{i↓}} $, spanning two frames instead. We now get a good approximation for the slope directly at a point of interest, and thus the velocity.
    

    
      $$ \class{blue}{p_i}, \, \class{green}{v_{i↓}} $$
      If we apply this procedure along the entire curve, we can graph the apple's velocity over time, in sync with its position. This is the discrete version of taking the derivative in calculus, or differentiation and shows these two quantities are intimately related.
    

    
      While in the air, the apple's velocity decreases along a straight line, first positive, then negative. On impact, the velocity suddenly reverses, though only to a portion of its previous value. At the top of each arc, the velocity passes through zero, which means the apple essentially hangs motionless in the air for a fraction of a second.
    

    
      $$ \class{blue}{p_i}, \, \class{green}{v_{i↓}}, \, \class{orangered}{a_{i↓}} $$
      To further analyze this, we can repeat the procedure, and find the slope of the velocity. This is the change in velocity over time, better known as acceleration. It can be expressed in meters per second per second, that is, $ m / s^2 $. According to Newton, acceleration is force divided by mass: the heavier something is, the less effect the same force has.
    

    
      What looked like a complicated animation at the position level is now revealed to be very simple: the apple undergoes a small constant acceleration downwards from gravity. It also experiences a short burst of much stronger acceleration upwards whenever it bounces. Once the upward force goes below a critical threshold, the apple stops moving. At the end, gravity is countered by the apple's resistance to being squished, and the net acceleration is zero.
    

    
      Suppose we were given only the acceleration, and wanted to reconstruct the animation. Can we do that?
    
    
    
      $$ \class{green}{v_{i+1→}} = \class{green}{v_{i→}} + \class{orangered}{a_{i→}} \cdot (t_{i+1} - t_i) $$
      Yep, we just work our way back up. If the acceleration represents a difference in velocity over time, then we can track the velocity by adding these differences back, accumulating them one step at a time. Since we divided the differences by time initially, we'll now have to multiply each value by the time between frames. Technically we need forward differences ($ \class{orangered}{a_{i→}} $) for this, not central ones ($ \class{orangered}{a_{i↓}} $), but the error will be minor.
    

    
      $$ \class{green}{v_{i+1→}} = \sum\limits_{k=0}^i\class{orangered}{a_{k→}} \cdot Δt $$
      In calculus, this accumulation process is called integration. In our case, it's a sum ($ \sum $). As we are multiplying the vertical value $ \class{orangered}{a_{k→}} $ by the horizontal time step $ Δt $, each term represents the area of a thin rectangle. By adding up all these signed areas, positive for up and negative for down, we can approximate the integral and get velocity back. Integrals and areas under curves are very closely linked.
    

    
      
        $$ \class{green}{v_{i+1→}} = \class{green}{v_{0→}} + \sum\limits_{k=0}^i\class{orangered}{a_{k→}} \cdot Δt $$
        $$ \class{blue}{p_{i+1}} = \class{blue}{p_0} + \sum\limits_{k=0}^i\class{green}{v_{k→}} \cdot Δt $$
      
      Similarly, we can integrate velocity into position by adding up strips of area under the velocity curve, recreating the original bounce. Note that for both sums, we needed to manually specify the starting point. If we didn't set it correctly, the apple would drift, bounce on thin air or penetrate the ground.
    

    
      We've produced real physical behavior from raw forces like gravity. That means we've just described a real physics engine. It's a one-dimensional one, but a physics engine none the less. It implements Euler integration, a fast but generally inaccurate method. In this case, the reconstruction is not perfect due to the earlier mentioned usage of central rather than forward differences.
    

    
      We only need one of the three in order to produce a plausible copy of the other two. That means we can control animations on any of the three levels. If we want full control, we specify position directly. For simple constrained motions, we can manipulate velocity and integrate once. For full-on physics, we set acceleration from physical laws and integrate twice. This is why the Newtonian model of motion is so important.
    

    
      It also reveals smoothness. A smooth animation isn't just continuous in its path. Its velocity is continuous too, without sudden jumps. In some cases, we'll even want smooth acceleration too. An ordinary bounce effect is shown to involve a large acceleration, a sudden jerk. This is a noticeable visual disruption, the kind we generally want to avoid. If you've ever tried to ignore a bouncing icon, you'll know how hard this is.
    

    
      In fact, jerk is what we call the slope of acceleration. That's three derivatives deep, and it's turtles all the way down. The next ones are imaginatively called snap, crackle and pop, though they signify little directly. A large jerk however implies a sudden, jarring change in force.
    

    
      
        $$ \class{purple}{E_p} = m \cdot g \cdot h $$
      
      There's more physics hiding in plain sight. Earlier on, I mentioned energy: kinetic and potential. The apple's available potential energy $ \class{purple}{E_p} $ comes from gravity and is proportional to its height $ h $ above the ground, as well as the mass $ m $ and the local strength of gravity $ g $.
    

    
      
        $$ \class{cyan}{E_k} = \frac{1}{2} \cdot m \cdot v^2 $$
      
      The kinetic energy $ \class{cyan}{E_k} $ comes from its motion. It's proportional to the velocity squared. That means each additional meter per second makes the previous ones more energetic, adding more kinetic energy the faster it's already going. To explain, we can imagine the force required to stop a moving object. By increasing the speed, you don't just add additional momentum: the impact also takes less time, concentrating it.
    
    
    
      
        $$ \class{purple}{E_p} = m \cdot g \cdot h $$
        $$ \class{cyan}{E_k} = \frac{1}{2} \cdot m \cdot v^2 $$
      
      In a closed system, total momentum is conserved. As we are treating gravity as an outside force, this does not apply. Energy is conserved however. There's a vertical symmetry, where one energy level goes up as the other goes down, and vice versa. So we actually have a fourth level to control physics at: that of energy and potential. With some minor bookkeeping, we can create motion this way, called Hamiltonian mechanics.
    

    
      
        $$ \class{royal}{E_t} = \class{purple}{E_p} + \class{cyan}{E_k} $$
      
      The total energy, potential plus kinetic, is perfectly constant between bounces. On impact, a significant amount is lost. Note that the dips towards zero are a side effect of the finite approximation: if the bounce occurs between two frames, the apple appears to slow down for a frame, instantly falling down and bouncing back to where it was one frame earlier. Finite differences are oblivious to this.
    

    
      The energy levels follow a decaying exponential curve. This is very typical: exponentials show up whenever a quantity is related to its rate of change. Hamiltonian models are useful for more complicated things like 3D roller coasters, where they allow you to abstract away complex interactions into a few concise relations like this.
    

    
      In simple animation though, we'll generally stick to the direct Newtonian model. We can use it to analyze real use cases. Let's start with a common easing curve, cosine interpolation, used by default in jQuery.animate() and these slides too.
    

    
      
        $$ lerp(\class{orangered}{a}, \class{green}{b}, f) = \class{orangered}{a} + (\class{green}{b} - \class{orangered}{a}) \cdot f $$
      
      We animate the apple's position, changing its Y coordinate. In practice, that means we apply linear interpolation, lerping, between the start $ \class{orangered}{a} $ and end $ \class{green}{b} $. We take the starting point and add a fraction $ f $ of the difference $ \class{green}{b} - \class{orangered}{a} $ to it. Half the difference gets us halfway there, and so on. As long as $ f $ is between 0 and 1, we end up somewhere in the middle. When $ f $ reaches 1, the animation is complete.
    

    
      
        $$ elerp(\class{orangered}{a}, \class{green}{b}, f) = \class{orangered}{a} + (\class{green}{b} - \class{orangered}{a}) \cdot  \class{blue}{ease(f)} $$
        $$ \class{blue}{ease(f)} = 0.5 - 0.5 \cdot \cos πf $$
        


      
      The purpose of the easing curve is then to make the animation non-linear, not in space, but in time: in this case, the apple smoothly starts and stops. We can use any curve we like, e.g. half of a cosine wave of period 2. This eased lerp is the basic building block of any animation system.
    

    
      
        $$ \class{blue}{p_i}, \, \class{green}{v_{i↓}}, \, \class{orangered}{a_{i↓}} $$
      
      The effect of the easing curve is visible when we take central differences again, and look at velocity and acceleration. The acceleration has been divided by 3 to fit. This doesn't seem bad, all three quantities appear to change smoothly. This picture is deceptive though.
    

    
      All curves continue before and after the animation. The smooth cosine ease turns out to be quite jarring in its acceleration: it's like flooring the accelerator from standstill then easing off gently. At the halfway point you start braking, more and more until you stop. It's one of the most responsive animations possible that's still smooth at both ends. Smoother easing curves have smoother accelerations, but respond slower.
    

    
      
        $$ \class{blue}{ease(f)} = f^2 $$
      
      A simpler example is the half-ease, here achieved with a quadratic curve $ \class{blue}{f^2} $. The velocity is a linearly increasing ramp. The acceleration is constant, except for a very large instant deceleration at the end. This is like flooring the accelerator from standstill, holding it down for the duration, and then crashing into a wall—the suicide ease. Due to this, half-easing is typically used for fading transitions, where the object is invisible–or the audio inaudible–at the start or end. 
    

    
      
        $$
        \class{blue}{ease(f)} = 
        \left\{
        	\begin{array}{ll}
        		f^2  & \mbox{if } f \leq 1 \\
        		2f - 1 & \mbox{if } f > 1
        	\end{array}
        \right.
        
        $$
      
      But we can repurpose it quite easily. By tweaking this at the velocity level, we can maintain a constant speed at the end. This is the slow start, and can be expressed directly as an open-ended easing curve. In this case, we allow $ f $ to exceed 1, and the linear interpolation turns into extrapolation for free, no extra charge. We can scale the curve vertically to change the final speed, and scale it horizontally to control the delay. The slow start (and stop) is used throughout these slides.
    

    
      
        $$
        \class{blue}{ease(f)} = \frac{1}{4} \cdot (1 - \cos 2πf)
        + \left\{
        	\begin{array}{ll}
        		f^2  & \mbox{if } f \leq 1 \\
        		2f - 1 & \mbox{if } f > 1
        	\end{array}
        \right.
        $$
      
      We can combine curves too. Here, we add a cosine wave to the slow start, creating perhaps the motion of a rising jellyfish. Adding up animations is an easy way to create variations on a theme, used often in the demoscene. The derivatives add straight up too, so all three curves shift up and down by a sine or cosine wave. You can see how a small shift in position can have a large effect on both velocity and acceleration. 
    

    
      The next example is a bit different. Any guesses as to what this is? The hint is in the vertical scale, now measured in pixels. This animation moves almost 1000 pixels in just over one second.
    

    
      It's an inertial flick gesture, recorded on Mac OS X. We can plot velocity and acceleration again. There's a slight measurement error, visible as noisy ripples on the acceleration, even after smoothing out the data: derivatives are very sensitive to noise. The velocity and acceleration have also been scaled down to fit, as they are both quite large.
    

    
      The first part of the curve is not an animation at all: it was tracking the direct motion of my finger. Fingers move very smoothly: the acceleration follows a curve up and down. This is more physics: of nerve signals causing muscle fibers to contract and digits to move. This work smoothly converts chemical potential into kinetic energy. The small jump in speed at time 0 is easy to explain: my finger was already moving when it touched the pad.
    

    
      The second part is the actual inertial animation. It kicks in as soon as the finger leaves the pad. All three values follow an exponential curve past that point, disregarding the noise. But the important one is velocity: the animation starts with the last known velocity and smoothly decays it to zero. Where we end up depends on how fast we were going when the finger left the pad.
    

    
      
        $$ \class{green}{v_{i+1→}} = \class{green}{v_{i→}} \cdot (1 - \class{royal}{f}) $$
      
      Inertial scroll is easiest to control at the velocity level. We can measure the initial velocity by finding the position's slope, usually averaged over several frames. We then start at this velocity, but reduce it every frame by a fraction $ \class{royal}{f} $, which is a coefficient of friction. We don't need to care how far we'll go or how long it'll take: we can just keep animating until the velocity gets close enough to 0.
    

    
      Suppose we do care where we end up. We might be showing a list of items, each 100 pixels tall. It could be good to control the animation so it always stops right at an item. We can't violate the principle of smooth motion, so we can't just change the position or velocity directly. We have to change the coefficient of friction.
    

    
      
        $$ \class{green}{v_{i→}} = \class{green}{v_{0→}} \cdot (1 - \class{royal}{f})^i $$
      
      As the velocity follows a simple curve, we don't have to track it manually. We can express it over time as a direct relation, based on the initial velocity $ \class{green}{v_{0→}} $. The exponential nature is clear, with the frame number $ i $ appearing as the exponent of a number between 0 and 1.
    

    
      
        $$
        \begin{array}{rl}
          \class{blue}{p_{i}} & = \class{blue}{p_0} + \sum\limits_{k=0}^{i} \class{green}{v_{0→} \cdot (1 - f)^k} \cdot Δt \\
                                & = \class{blue}{p_0} + \class{green}{v_{0→}} \cdot Δt \cdot \class{purple}{\sum\limits_{k=0}^{i} (1 - f)^k}
        \end{array}
        $$
      
      The position at frame $ i $ is then the sum of all the previous velocities times the time step $ Δt $, just like before, relative to the initial position $ \class{blue}{p_0} $. As the time step and initial velocity are constant, we can move both outside the sum.
    

    
      
        $$
        \begin{array}{rl}
        \class{blue}{p_∞} & = \class{blue}{p_0} + \class{green}{v_{0→}} \cdot Δt \cdot \class{purple}{\sum\limits_{k=0}^{∞} (1 - f)^k} \\
                           & = \class{blue}{p_0} + \frac{\class{green}{v_{0→}} \cdot Δt}{\class{royal}{f}}
        \end{array}
        $$
      
      To find the final resting position, we theoretically have to continue the animation all the way to infinity. This can be done using a limit. For now, we'll just look up the formula for this infinite sum, a geometric series. We end up dividing by the coefficient of friction: the lower it is, the further we go after all. If the coefficient were 0, there'd be no friction. We'd divide by zero, because there's no final resting position when you never slow down.
    

    
      
        $$
        \class{royal}{f} = \frac{\class{green}{v_{0→}} \cdot Δt}{\class{blue}{Δp}}
        $$
      
      We can invert this relationship to find the coefficient of friction required to stop at a given target. We just need the initial distance to the target, $ \class{blue}{Δp} $. To apply this in practice, we determine the friction needed to reach the next couple of items, and pick the one which is closest to the default case. The user won't notice the subtle change in friction—the UI will just magically seem better.
    

    
      The simulation works identical in all cases and the velocities are still continuous and exponential, which means: physical. This effect only requires one additional calculation at the start, which makes it all the more strange that developers have come up with increasingly jarring ways to achieve something similar.
    

    
      Now let's try animating in 2D.
    

    
      
        $$ x(t) = \sin t $$
        $$ y(t) = \sin t $$
      
      We can move the apple in 2D by animating its X and Y coordinates. Here we animate both in lockstep, using a sine wave: the apple moves diagonally, as X and Y are always equal. By adjusting their relative amplitudes, we can control the angle of motion.
    

    
      
        $$ x(t) = \sin t $$
        $$ y(t) = \sin \frac{7}{8}t $$
      
      If we animate X and Y separately, we create arbitrary paths. Here they both follow a sine wave, but with different frequencies. The resulting path is called a Lissajous curve. The sine waves drift in and out of phase, going from a diagonal to an oval to a circle, and back again.
    

    
      
        $$
        \class{blue}{\vec p(t)}
        
        =
        
        \begin{bmatrix}
          \class{blue}{p_x(t)} \\
          \class{blue}{p_y(t)}
        \end{bmatrix}
        
        =
        
        \begin{bmatrix}
          \sin t \\
          \sin \frac{7}{8}t
        \end{bmatrix}
        $$
      
      It makes more sense to picture the position as a 2D vector, an arrow. It has both a direction and a length, relative to the origin. While the calculation is equivalent—animating X and Y separately—the vector representation is more natural once we look at the derivatives.
    

    
      
        $$
          \class{green}{\vec v_{i→}} = \frac{\class{blue}{\vec p_{i+1}} - \class{blue}{\vec p_{i}}}{t_{i+1} - t_i}
        $$
      
      What does slope and velocity mean in this context? The same principle applies: we take the difference in position between two frames, and divide it by the difference in time $ Δt $. In this case, all quantities except time are vectors.
    

    
      As a single frame is very short, the velocity is quite large, and always tangent to the path. Its length directly represents speed.
    

    
      If we center the velocity vector, it traces out its own Lissajous curve. This one is slightly different and doubles back on itself at regular intervals.
    

    
      
        $$
          \class{orangered}{\vec a_{i→}} = \frac{\class{green}{\vec v_{i+1→}} - \class{green}{\vec v_{i→}}}{t_{i+1} - t_i}
        $$
      
      We can apply finite differences again to dissect velocity into acceleration. It follows yet another Lissajous curve, a scaled and rotated version of the position.
    

    
      Finally, we can disentangle these curves by plotting them out over time. Position, velocity and acceleration dance around each other. Despite its artificial construction, even this motion is physical: it's what happens when you take an object and hang it off independently moving horizontal and vertical springs of different stiffness. With the right visualization, raw physics is quite beautiful in its own right.
    
    
  
  





  
We've seen how to examine an animation at multiple levels of change: position, velocity, acceleration. Differences  approximate derivatives and let us to dissect our way down the chain. Accumulators approximate integration and let us construct higher levels from lower ones. Thus we can manipulate an animation at any level. By plugging in correct physical laws or arbitrary formulas, we can produce behavior that is as physical or unphysical as we like.

Customer is King

Everything we've done so far has been independent animation, without interaction. Even inertial scrolling has this luxury: whenever the user is touching, there is no inertia and the animation system is inactive. It's only when you let go that the surface coasts.

In many cases, this is not enough: animations need to be scheduled and executed while retaining full interactivity. Often the animation needs to continue despite its target changing midway. In order to handle such situations, we need to build adaptive models that remain continuous and smooth, no matter what.

We'll also need to drop the assumption that the frame rate—the time step—is constant. In the real world, the frame rate might drop here or there, or be variable altogether. In either case, we'd prefer it if the effect of this was minimal. If we're adding music to an animation, this is essential to prevent desynchronization. It will also have some nasty consequences for our physics engine, and we need to level it up significantly.





  
    
  

  

    
      So far, we've assumed a constant frame rate.
    

    
      If our animation is defined by an easing curve, we can look up its value at any point along the way.
    

    
      It seems at first, variable frame rates are trivial: we can evaluate the curve at arbitrary times instead of pre-set intervals. 
    

    
      
        
$$ \class{green}{v_{i→}} = \frac{\class{blue}{p_{i+1}} - \class{blue}{p_{i}}}{t_{i+1} - t_i} $$




      
        




$$ \class{blue}{p_{i+1}} = \class{blue}{p_0} + \sum\limits_{k=0}^i\class{green}{v_{k→}} \cdot Δt_i $$
      
      If we take forward differences to measure slope, we still get a smooth velocity curve. We can accumulate—integrate—these differences back into position as long as we account for a variable time step $ Δt_i $. It seems our physics engine should be unbothered too. But there's a few problems.
    

    
      
        $$ \class{green}{v_{i+1→}} = \class{green}{v_{i→}} \cdot (1 - \class{royal}{f}) $$
      
      First, if we implemented inertial scrolling like we did before, multiplying the velocity by $ 1 - \class{royal}{f} $ every frame, we'd get the wrong curve. The amount of velocity lost per frame should now vary, we can no longer treat it as a convenient constant.
    

    
      
        $$ 
          \begin{array}{rcl}
        	
           (1 - \class{purple}{f_i})^\frac{t}{Δt_i} & = & (1 - \class{royal}{f})^\frac{t}{Δt} \\
            ⇔ \,\,\, \class{purple}{f_i} & = & 1 - e^{\frac{Δt_i}{Δt} \log_e (1 - \class{royal}{f})}
          
          \end{array}
          $$
      
      If we do the math, we can find an expression for the correct amount of friction $ \class{purple}{f_i} $ per frame for a given step $ Δt_i $, relative to the default $ \class{royal}{f} $ and $ Δt $. Not pretty, and this is just one object experiencing one force. In more elaborate scenarios, finding exact expressions for positions or velocities can be hard or even impossible. This is what the physics engine is supposed to be doing for us.
    

    
      There's another problem. If we integrate these curve segments to get position, we get an exponential curve, just as before. Did we achieve frame rate independence?
    

    
      Well, no. If we change the time steps and run the algorithm again, it looks the same. However, the new curve and old curve don't match up. The difference is surprisingly large, as this animation is only half a second long and the average frame rate is identical in both cases. Such errors will compound the longer it runs, and make your program unpredictable.
    

    
      Luckily we can have our cake and eat it too. We can achieve consistent physics and still render at arbitrary frame rates. We just have to decouple the physics clock from the rendering clock.
    

    
      Whenever we have to render a new frame, we compare both clocks. If the render clock has advanced past the physics clock, we do one or more simulation steps to catch up. Then we interpolate linearly between the last two values until we run out of physics again.
    
    
    
      
    This means the visuals are delayed by one physics frame, but this is usually acceptable. We can even run our physics at half the frame rate or less to conserve power. Though more error will creep in, this error will be identical between all runs, and we can manually compensate for it if needed.
    

    
      
        When we implement variable frame rates correctly, we can produce an arbitrary number of frames at arbitrary times. This buys us something very important, not for the end-user, but for the developer: the ability to skip to any point in time, or at least fast-forward as quickly as your computer can manage.
      
    

    
      
        But just because the simulation is consistent, doesn't mean it's correct or even stable. Euler integration fits our intuitive model of how pieces add up, but it's actually quite terrible. For example, if we made our bouncing apple perfectly elastic in the physical sense—losing no energy at all—and apply Euler, it would start bouncing irregularly, gaining height.
      
    

    
      
        Which means the first bounce simulation wasn't using Euler at all. It couldn't have: the energy wouldn't have been conserved. All the finite differentiation and integration magic that followed only worked neatly because the position data was of a higher quality to begin with. We have to find the source of this phantom energy so we can correct for it, creating the Verlet integration that was used.
      
    

    
      
        We're trying to simulate this path, the ideal curve we'd get if we could integrate with infinitely small steps. We imagine we start at the point in the middle, and would like to step forward by a large amount. The time step is exactly 1 second, so we can visually add accelerations and velocities like vectors, without having to scale them. Note that this is not a gravity arc, the downward force now varies.
      
    

    
      
        
          $$ \class{green}{v_{i+1→}} = \class{green}{v_{i→}} + \class{orangered}{a_{i→}} \cdot Δt $$
          $$ \class{blue}{p_{i+1}} = \class{blue}{p_{i}} + \class{green}{v_{i→}} \cdot Δt $$
        
      
      
        Earlier, I said that if we used forward differences, we could get the velocity between two points. And that we could make a reconstruction of position from forward velocity by applying 'Euler integration'. While that's true, that's not actually what Euler integration is.
      
    

    
      
        See, this is a chicken and egg problem. This velocity isn't the slope at the start or the end or even the middle. It's the average velocity over the entire time step. We can't get this velocity without knowing the future position, and we can't get there without knowing the average velocity in the first place.
      
    

    
      
        
          $$ \class{green}{v_{i+1↓}} = \class{green}{v_{i↓}} + \class{orangered}{a_{i↓}} \cdot Δt $$
          $$ \class{blue}{p_{i+1}} = \class{blue}{p_{i}} + \class{green}{v_{i↓}} \cdot Δt $$
        
      
      
         The velocity that we're actually tracking is for the point itself, at the start of the frame. Any force or acceleration is calculated based on that single instant. If we integrate, we move forward along the curve's tangent, not the curve itself. This is where the extra height comes from, and thus, phantom gravitational energy.
      
    

    
      
        For any finite step, there will always be some overshooting, because we don't yet know what happens along the way. Euler actually made the same mistake we made earlier: he used a central difference where a forward one was required, because the forward difference can only be gotten after the fact. The 'central difference' here is the actual velocity at a point, the true derivative.
      
    

    
      
        
          $$ \class{green}{v_{i+1↓}} = \class{green}{v_{i↓}} + \class{orangered}{a_{i↓}} \cdot Δt $$
          $$ \class{blue}{p_{i+1}} = \class{blue}{p_{i}} + \frac{\class{green}{v_{i↓}} + \class{green}{v_{i+1↓}}}{2} \cdot Δt $$
        
      
      
        As the acceleration changes in this particular scenario, we could try applying Euler, and then averaging the start and end velocities to get something in the middle. It fails, because the end velocity itself is totally wrong. Though we get closer than Euler did, we now undershoot by half the previous amount.
      
    

    
      
        
          $$ \class{green}{v_{←i}} = \frac{\class{blue}{p_{i}} - \class{blue}{p_{i-1}}}{Δt} $$
        
      
      
        To resolve the chicken and egg, we need to look to the past. We assume that rather than starting with one position, we start with two known good frames, defined by us. That means we can take a backwards difference and now know the average velocity of the previous frame. How does this help?
      
    

    
      
        Well, we assume that this velocity happens to be equal or close to the velocity at the halfway point. We also still assume the acceleration is constant for the entire duration. If we then integrate from here to the next halfway point, something magical happens.
      
    

    
      
        
          $$ \class{green}{v_{i→}} = \class{green}{v_{←i}} + \class{orangered}{a_{i↓}} \cdot Δt $$
        
      
      
        We get a perfect prediction for the next frame's average velocity, the forward difference. By always remembering the previous position, we can repeat this indefinitely. That this works at all is amazing: we're applying the exact same operation as before—constant acceleration—for the same amount of time. On just a slightly different concept of velocity. Without even knowing exactly when the object reaches that velocity. That's Verlet integration.
      
    
    
    
      Euler integration failed on a simple constant acceleration like gravity and can only accurately replicate a linear ease $ f $. This motion is a cubic ease $ f^3 $, with linear acceleration that decreases. Verlet still nails it, even when leaping seconds at a time. Why does this work?
    

    
      Euler integration applies a constant acceleration ahead of a point. If there's any decrease in acceleration, it overestimates by a significant amount. That's on top of stepping in the wrong direction to begin with. Both position and velocity will instantly begin to drift away from their true values.
      
    

    
      
        
          $$ \class{blue}{p_{i+1}} = 2 \cdot \class{blue}{p_{i}} - \class{blue}{p_{i-1}} + \class{orangered}{a_{i↓}} \cdot Δt^2 $$
        
      
      Verlet integration applies the same constant acceleration around a point. If the acceleration is a perfect line, the error cancels out: the two triangles make up an equal positive and negative area. By starting with a known good initial velocity and cancelling out subsequent errors, we can precisely track velocity through a linear force. If we simplify the formula, velocity even disappears: we can work with positions and acceleration directly.
      
    

    
      As this captures the slope of acceleration, we only get errors if the acceleration curves. In this case, the left and right areas don't cancel out exactly. The missing area however smoothly approaches 0 as the time step shrinks, a further sign of Verlet's error-defeating properties. If we do the math, we find the position has $ O(Δt^2) $ global error: decrease the time step $ Δt $ by a factor of 10, and it becomes 100× more accurate. Not bad.
      
    

    
      
        For completeness, here's the 4th order Runge-Kutta method (RK4), which is a sophisticated modification of Euler integration. It involves taking full and half-frame steps and backtracking. It finds 4 estimates for the velocity based on the acceleration at the start, middle and end.
      
    
    
    
      
        The physics can then be integrated from a weighted sum of these estimates, with coefficients $ [\frac{1}{6}, \frac{2}{6}, \frac{2}{6}, \frac{1}{6}] $. We end up in the right place, at the right speed. This method offers an $ O(Δt^4) $ global error. Decrease the time step 10× and it becomes 10,000× more accurate. We have a choice of easy-and-good-enough (Verlet) or complicated-but-precise (RK4), at any frame rate. Each has its own perks, but Verlet is most attractive for games.
       
    

    
      
        With physics under our belt, let's move on. Why not animate time itself? This is the variable speed clock and it's dead simple. It's also a great debugging tool: sync all your animations to a global clock and you can activate bullet time at will. You can tell right away if a glitch was an animation bug or a performance hiccup. On this site too: if you hold Shift, everything slows down 5×.
      
    

    
      
        $$ \class{green}{v_{←i}} = \frac{\class{blue}{t_i} - \class{blue}{t_{i-1}}}{\class{blue}{t_i} - \class{blue}{t_{i-1}}} = \frac{Δt_i}{Δt_i} = 1 $$
      
      
        First, we differentiate the clock's time backwards—because in real-time applications, we don't know what the future holds. This is time's velocity $ \class{green}{v_{←i}} $. As we have to divide by the time step too, the velocity is constant and equal to 1. Let's change that.
      
    

    
      
        $$ \class{blue}{t'_i} = \sum\limits_{k=0}^i \class{green}{v'_{←k}} \cdot Δt_k $$
      
      
        We can reduce the speed of time at will, by changing $ \class{green}{v_i} $. If we then multiply by the time step $ Δt_i $ again and add the pieces back together incrementally, we get a new clock $ t'_i $. By integrating this way, we only need to worry about slope, not position: time always advances consistently. This is also where variable frame rates pay off: going half the speed is the same job as rendering at twice the frame rate.
      
    

    
      
        Using our other tricks, we can animate $ \class{green}{v_i} $ smoothly, easing in and out of slow motion, or speeding into fast-forward. If we didn't do this, then any animation cued off this clock would jerk at the transition point. This is the chain rule for derivatives in action: derivatives compound when you compose functions. Any jerks caused along the way will be visible in the end result.
      
    

    
      If time is smooth, what about interruptions? Suppose we have a cosine eased animation. After half a second, the user interrupts and triggers a new animation. If we abort the animation and start a new one, we create a huge jerk. The object stops instantly and then slowly starts moving again.
      
    

    
      One way to solve this is to layer on another animation: one that blends between the two easing curves in the middle. Here it's just another cosine ease, interpolating in the vertical direction, between two changing values. We blend across the entire animation for maximum smoothness. This has a downside though: if the blended animation itself is interrupted, we'd have to layer on another blend, one for each additional interruption. That's too much bookkeeping, particularly when using long animations.
    

    
      We can fix this by mimicking inertial scrolling. We treat everything that came before as a black box, and assume nothing happens afterwards. We only look at one thing: velocity at the time of interruption.
    

    
      After determining the velocity of any running animations, we can construct a ramp to match. We start from 0 to create a relative animation.
    

    
      We can bend this ramp back to zero with another cosine ease, interpolating vertically. This time however, the first easing curve is no longer involved.
    

    
      If we then add this to the second animation, it perfectly fills the gap at the corner. We only need to track two animations at a time: the currently active one, and a corrective bend. If we get interrupted again, we measure the combined velocity, and construct a new bend that lets us forget everything that came before.
      
    

    
      By using a different easing curve for the correction, we can make it tighter, creating a slight wave at the end. Either way, it doesn't matter how the object was moving before, it will always recover correctly.
    

    
      But what if we get interrupted all the time? We could be tracking a moving pointer, following a changing audio volume, or just have a fidgety user in the chair. We'd like to smooth out this data. The interrupted easing approach would be constantly missing its target, because there is never time for the value to settle. There is an easier way.
      
    

    
      
        $$ \class{blue}{p_{i+1}} = lerp(\class{blue}{p_{i}}, \class{purple}{o_{i}}, \class{royal}{f}) $$
      
      We use an exponential decay, just like with inertial scrolling. Only now we manipulate the position $ p_{i} $ directly: we move it a certain constant fraction towards the target $ \class{purple}{o_{i}} $, chasing it. Here, $ \class{royal}{f} = 0.1 = 10\% $. This is a one-line feedback system that will keep trying to reach its target, no matter how or when it changes. When the target is constant, the position follows an exponential arc up or down.
      
    

    
      
        $$ \class{blue}{p_{i+1}} = lerp(\class{blue}{p_{i}}, \class{purple}{o_{i}}, \class{royal}{f}) $$
        $$ \class{cyan}{q_{i+1}} = lerp(\class{cyan}{q_{i}}, \class{blue}{p_{i}}, \class{royal}{f}) $$
      
       The entire path is continuous, but not smooth. That's fixable: we can apply exponential decay again. This creates two linked pairs, each chasing the next, from $ \class{slate}{q_{i}} $ to $ \class{blue}{p_{i}} $ to $ \class{purple}{o_{i}} $. Each level appears to do something akin to integration: it smooths out discontinuities, one derivative at a time. Where a curve crosses its parent, it has a local maximum or minimum. These are signs that calculus is hiding somewhere.
      
    

    
      
        $$ \class{blue}{p_{i+1}} = lerp(\class{blue}{p_{i}}, \class{purple}{o_{i}}, \class{royal}{f}) $$
        $$ \class{cyan}{q_{i+1}} = lerp(\class{cyan}{q_{i}}, \class{blue}{p_{i}}, \class{royal}{f}) $$
        $$ \class{slate}{r_{i+1}} = lerp(\class{slate}{r_{i}}, \class{cyan}{q_{i}}, \class{royal}{f}) $$
      
      That's not so surprising when you know these are difference equations: they describe a relation between a quantity and how it's changing from one to step to the next. These are the finite versions of differential equations from calculus. They can describe sophisticated behavior with remarkably few operations. Here I added a third feedback layer. The path gets smoother, but also lags more behind the target.
      
    

    
      If we increase $ f $ to 0.25, the curves respond more quickly. Exponential decays are directly tuneable, and great for whiplash-like motions. The more levels, the more inertia, and the longer it takes to turn. 
      
    

    
      
        $$ \class{blue}{p_{i+1}} = lerp(\class{blue}{p_{i}}, \class{purple}{o_{i}}, \class{blue}{f_1}) $$
        $$ \class{cyan}{q_{i+1}} = lerp(\class{cyan}{q_{i}}, \class{blue}{p_{i}}, \class{cyan}{f_2}) $$
        $$ \class{slate}{r_{i+1}} = lerp(\class{slate}{r_{i}}, \class{cyan}{q_{i}}, \class{slate}{f_3}) $$
      
      We can also pick a different $ f_i $ for each stage. Remarkably, the order of the $ \class{royal}{f_i} $ values doesn't matter: 0.1, 0.2, 0.3 has the exact same result as 0.3, 0.2, 0.1. That's because these filters are all linear, time-invariant systems, which have some very interesting properties.
    

    
      
        If you shift or scale up/down a particular input signal, you'll get the exact same output back, just shifted and scaled in the same way. Even if you shift by less than a frame. We've created filters which manipulate the frequencies of signals directly. These are 1/2/3-pole low-pass filters that only allow slow changes. That's why this picture looks exactly like sampling continuous curves: the continuous and discrete are connected.
    

    
      Exponential decays retain all their useful properties in 2D and 3D too. Unlike splines such as Bezier curves, they require no set up or garbage collection: just one variable per coordinate per level, no matter how long it runs. It works equally well for adding a tiny bit of mouse smoothing, or for creating grand, sweeping arcs. You can also use it to smooth existing curves, for example after randomly distorting them.
    

    
      However there's one area where decay is constantly used where it really shouldn't be: download meters and load gauges. Suppose we start downloading a file. The speed is relatively constant, but noisy. After 1 second, it drops by 50%. This isn't all that uncommon. Many internet connections are traffic shaped, allowing short initial bursts to help with video streaming for example.
    

    
      
        $$ \class{blue}{p_{i+1}} = lerp(\class{blue}{p_{i}}, \class{purple}{o_{i}}, \class{royal}{f}) $$
      
      Often developers apply slow exponential easing to try and get a stable reading. As you need to smooth quite a lot to get rid of all the noise, you end up with a long decaying tail. This gives a completely wrong impression, making it seem like the speed is still dropping, when it's actually been steady for several seconds. The same shape appears in Unix load meters: it's a lie.
    

    
      
        $$ p'_{i+1} = lerp(p'_{i}, \class{purple}{o_{i}}, \class{royal}{f}) $$
        $$ \class{cyan}{q_{i+1}} = lerp(\class{cyan}{q_{i}}, p'_{i}, \class{royal}{f}) $$
      
      If we apply double exponential easing, we can increase $ f $ to get a shorter tail for the same amount of smoothing. But we can't get rid of it entirely: the more levels of easing we add, the more the curve starts to lag behind the data. We can do much better.
    

    
      We can analyze the filters by examining their response to a standard input. If we pass in a single step from 0 to 1, we get the step response for the two filters.
    

    
      Another good test pattern is a single one frame pulse. This is the impulse response for both filters. The impulse responses go on forever, decaying to 0, but never reaching it. This shows these filters effectively compute a weighted average of every single value they've ever seen before: they have a memory, an infinite impulse response (IIR).
    

    
      Doesn't this look somewhat familiar? It turns out, the step response is the integral of the impulse response. It's a position. Vice versa, the impulse response is the derivative of the step response. It's a velocity. Surprise, physics!
    

    
      But it gets weirder. Integration sums together all values starting from a certain point, multiplied by the (constant) time step. That means that integration is itself a filter: its impulse response is a single step, the integral of an impulse. Its step response is a ramp, a constant increase.
    

    
      It works the other way too. Differentiation takes the difference of neighbouring values. It's a filter and its step response is just an impulse, detecting the single change in the step. Its impulse response is an upward impulse followed by a downward one: the derivative of an impulse. When one value is weighed positively and the other weighed negatively, the sum is their difference.
    

    
      
        $$ \sum p_i \cdot Δt \,\,\, ↑ $$
      
      
        $$ ↓ \,\,\, \frac{Δp}{Δt} $$
      
      This explains why exponential filters seem to have integration-like qualities: these are all integrators, they just apply different weights to the values they add up. Every step response is another filter's impulse response, and vice versa, connected through integration and differentiation. We can use this to design filters to spec.
    

    
      
        $$ \class{green}{v_{i→}} = \sin \frac{π}{4} t_i $$
      
      That said, filter design is still an art. IIR filters are feedback systems: once a value enters, it never leaves, bouncing around forever. Controlling it precisely is difficult under real world conditions, with finite arithmetic and noisy measurements to deal with. Much simpler is the finite impulse response (FIR), where each value only affects the output for a limited time. Here I use one lobe of a sine wave over 4 seconds.
    

    
      
        $$ \class{blue}{p_{i+1}} = \class{blue}{p_0} + \sum\limits_{k=0}^i\class{green}{v_{i→}} \cdot Δt $$
      
      Even if we don't know how to build the filter, we can still analyze it. We can integrate the impulse response to get the step response. But there's a problem: it overshoots, and not by a little. Ideally the filtered signal should settle at the original height. The problem is that the area under the green curve does not add up to 1.
    

    
      
        $$ \class{green}{v_{i→}} = \frac{π}{8} \sin \frac{π}{4} t_i \,\,\,\,\,\,\,\,\,\,\, \class{blue}{p_{i}} = \frac{1}{2} + \frac{1}{2} \cos \frac{π}{4} t_i $$
      
      To fix this, we divide the impulse response by the area it spans, $ \class{green}{\frac{8}{π}} $, or multiply by $ \class{green}{\frac{π}{8}} $, normalizing it to 1. Such filters are said to have unit DC gain, revealing their ancestry in analog electronics. The step response turns out to be a cosine curve, and this filter must therefore act like perpetually interruptible cosine easing.
    
    
    
      There's two ways of interpreting the step response. One is that we pushed a step through the filter. Another is that we pushed the filter through a step—an integrator. This symmetry is a property of convolution, which is the integration-like operator we've been secretly using all along.
    

    
      Convolution is easiest to understand in motion. When you convolve two curves $ \class{purple}{q_i} ⊗ \class{green}{r_i} $, you slide them past each other, after mirroring one of them. As our impulse response is symmetrical, we can ignore that last part for now.
    
    
    
      
        $$ \class{blue}{p_i} = \class{purple}{q_i} ⊗ \class{green}{r_i} = \class{cyan}{\sum\limits_{k=-∞}^{+∞}} \class{purple}{q_k} \cdot \class{green}{r_{i-k}} $$
      
      We multiply both curves with each other, creating a new curve in the overlap: here a growing section of the impulse response. The area under this curve is the output of the filter at that time. The sum goes to infinity in both directions, allowing for infinite tails. We already saw something similar when we used a geometric series to determine the final resting position of an inertial scroll gesture. With a FIR filter, the sum ends.
    

    
      
        $$ \class{blue}{p_i} = \class{purple}{q_i} ⊗ \class{green}{r_i}  = \class{cyan}{\sum\limits_{k=-∞}^{+∞}} \class{purple}{q_k} \cdot \class{green}{r_{i-k}} $$
      
      But why did we have to mirror one curve? It's simple: from the impulse response's point of view, new values approach from the positive X side, now left, not the negative X side, right. By flipping the impulse response, it faces the other signal, which is what we want.
    

    
      
        $$ \class{blue}{p_i} = \class{green}{r_i}  ⊗ \class{purple}{q_i} = \class{cyan}{\sum\limits_{k=-∞}^{+∞}} \class{green}{r_k} \cdot \class{purple}{q_{i-k}} $$
      
      If we center the view on the impulse response, it's clear we've swapped the role of the two curves. Now it's the step that's passing backwards through the filter, rather than the other way around.
    

    
      If we replace the step response with a random burst of signal, the filter can work its magic, smoothing out the input through convolution. It's a weighted average with a sliding window. The filter still lags behind the data, but the tail is now finite.
    
     
    
      If we make the window narrower, its amplitude increases due to the normalization. We get a more variable curve, but also a shorter tail. This is like a blur filter in Photoshop, only in 1D instead of 2D. As Photoshop has the entire image at its disposal, rather than processing a real-time signal, it doesn't have to worry about lag: it can compensate directly by shifting the result back a constant distance when it's done.
    

    
      
        $$ \class{blue}{ease(f)} = \frac{1}{2} - \frac{1}{2} \cdot \cos πf 
          \,\,\,\,\,\,\,\,\,\,\,
           \class{green}{slope(f)} = \frac{1}{2}π \cdot \sin πf $$
      
      What about custom filter design? Well, if you're an engineer, that's a topic for advanced study, learning to control the level and phase at exact frequencies. If you're an animator, it's much simpler: you pick a desired easing curve, and use its velocity to make a normalized filter. You end up with the exact same step response, turning the easing curve into a perpetually interruptible animation.
    

    
      
        $$ \class{blue}{ease(f)} = (\frac{1}{2} - \frac{1}{2} \cdot \cos πf) \cdot (1 + 20f \cdot (1 - f)^\frac{5}{2}) $$
      
      Which leads to the last trick in this chapter: removing lag on a real-time filtered signal. There's always an inherent delay in any filter, where signals are shifted by roughly half the window length. We can't get rid of it, only reduce it. We have to change the filter to prefer certain frequencies over others, making it resonate to the kind of signal we expect. We use an easing curve that overshoots, and preferably a short one. This is just one I made up.
    

    
      The velocity—here scaled down—now has a positive and negative part. As neither part is normalized by itself, the filter will first amplify any signal it encounters. The second part then compensates by pulling the level back down.
    
    
    
      
     The result is that the filter actually tries to predict the signal, which you can imagine is a useful thing to do. At certain points, the lag is close to 0, when the resonance frequency matches and slides into phase. When applied to animation, resonant filters can create jelly-like motions. When applied to electronic music at about 220 Hz, you get Acid House.
     
   

    
      
     Let's put it all together, just for fun. Here we have some particles being simulated with Verlet integration. Each particle experiences three forces. Radial containment pushes them to within a certain distance of the target. Friction slows them down, opposing the direction of motion. A constantly rotating force, different for each particle, keeps them from bunching up. The target follows the mouse, with double exponential easing.
     
   

    
      
        Friction links acceleration to velocity. Containment links acceleration to position. And integration links them back the other way. These circular dependencies are not a problem for a good physics engine. Note that the particles do not interact, they just happen to follow similar rules.

        Tip: Move the mouse and hold Shift to see variable frame rate physics in action.
     
   

    
      
        If we add up the three forces and trace out curves again, we can watch the particles—and their derivatives—speed through time. Just like you are doing right now, in your chair. As velocity and acceleration only update in steps, their curves will only be smooth if the physics clock and rendering clock are synced.
     
   

  
  






By manipulating time, we've managed to eliminate frame rate issues altogether, even make it work to our advantage. We've discovered more accurate physics engines, so we don't have to waste time simulating tiny steps. We've also created interruptible animations and turned them into filters. We can choose their easing curves and use feedback systems to remove the need for any manual interruptions altogether.

Here, linear time-invariant systems are very useful building blocks: they are simple to implement, but eminently customizable. Both IIR and FIR filters are simple in their basic form. We can also combine feedback systems with other physical or unphysical forces: we can move the target any way we like, perhaps superimposing variation onto existing curves. If we broaden our horizons a bit, we can find applications outside of animation: data analysis, audio manipulation, image processing, and much, much more.

Of course, there are plenty of non-linear and/or non-time-invariant systems too, too many to cover. When dealing with animation though, we'll prefer systems based on physics. They're just the trick to turn a bunch of artificial data into something that feels slick and natural. That said, physics itself is sometimes non-linear: fluids like water, smoke or fire are perfect examples. Solving those particular boondoggles requires the kind of calculus that frightens most adults and large children, so we won't go into that here. It's the same thing though: you simulate it finitely with a couple of clever tricks and the awesome power of raw number crunching.

Continued in part two.





To Infinity… And Beyond!
2013-01-28T00:00:00+01:00








Exploring the outer limits








  “It is known that there are an infinite number of worlds, simply because there is an infinite amount of space for them to be in. However, not every one of them is inhabited. Therefore, there must be a finite number of inhabited worlds.


Any finite number divided by infinity is as near to nothing as makes no odds, so the average population of all the planets in the universe can be said to be zero. From this it follows that the population of the whole universe is also zero, and that any people you may meet from time to time are merely the products of a deranged imagination.”
  – The Restaurant at the End of the Universe, Douglas Adams






If there's one thing mathematicians have a love-hate relationship with, it has to be infinity. It's the ultimate tease: it beckons us to come closer, but never allows us anywhere near it. No matter how far we travel to impress it, infinity remains disinterested, equally distant from everything: infinitely far!

$$ 0 < 1 < 2 < 3 < … < \infty $$

Yet infinity is not just desirable, it is absolutely necessary. All over mathematics, we find problems for which no finite amount of steps will help resolve them. Without infinity, we wouldn't have real numbers, for starters. That's a problem: our circles aren't round anymore (no $ π $ and $ \tau $) and our exponentials stop growing right (no $ e $). We can throw out all of our triangles too: most of their sides have exploded.




  
  A steel railroad bridge with a 1200 ton counter-weight.
Completed in 1910. Source: Library of Congress.



We like infinity because it helps avoid all that. In fact even when things are not infinite, we often prefer to pretend they are—we do geometry in infinitely big planes, because then we don't have to care about where the edges are.

Now, suppose we want to analyze a steel beam, because we're trying to figure out if our proposed bridge will stay up. If we want to model reality accurately, that means simulating each individual particle, every atom in the beam. Each has its own place and pushes and pulls on others nearby.

But even just $ 40 $ grams of pure iron contains $ 4.31 \cdot 10^{23} $ atoms. That's an inordinate amount of things to keep track of for just 1 teaspoon of iron.




  
  The crystal structure of 32 iron atoms in the hot austenite phase.
If your steel looks like this, your bridge is on fire.



  
  A chunk of solid, mathematical iron.



Instead, we pretend the steel is solid throughout. Rather than being composed of atoms with gaps in between, it's made of some unknown, filled in material with a certain density, expressed e.g. as grams per cubic centimetre. Given any shape, we can determine its volume, and hence its total mass, and go from there. That's much simpler than counting and keeping track of individual atoms, right?

Unfortunately, that's not quite true.

The Shortest Disappearing Trick Ever
Like all choices in mathematics, this one has consequences we cannot avoid. Our beam's density is mass per volume. Individual points in space have zero volume. That would mean that at any given point inside the beam, the amount of mass there is $ 0 $. How can a beam that is entirely composed of nothing be solid and have a non-zero mass?



Bam! No more iron anywhere.

While Douglas Adams was being deliberately obtuse, there's a kernel of truth there, which is a genuine paradox: what exactly is the mass of every atom in our situation?

To make our beam solid and continuous, we had to shrink every atom down to an infinitely small point. To compensate, we had to create infinitely many of them. Dividing the finite mass of the beam between an infinite amount of atoms should result in $ 0 $ mass per atom. Yet all these masses still have to add up to the total mass of the beam. This suggests $ 0 + 0 + 0 + … > 0 $, which seems impossible.

If the mass of every atom were not $ 0 $, and we have infinitely many points inside the beam, then the total mass is infinity times the atomic mass $ m $. Yet the total mass is finite. This suggests $ m + m + m + … < \infty $, which also doesn't seem right.

It seems whatever this number $ m $ is, it can't be $ 0 $ and can't be non-zero. It's definitely not infinite, we only had a finite mass to begin with. It's starting to sound like we'll have to invent a whole new set of numbers again to even find it.





That's effectively what Isaac Newton and Gottfried Leibniz set in motion at the end of the 17th century, when they both discovered calculus independently. It was without a doubt the most important discovery in mathematics and resulted in formal solutions to many problems that were previously unsolvable— our entire understanding of physics has relied on it since. Yet it took until the late 19th century for the works of Augustin Cauchy and Karl Weierstrass to pop up, which formalized the required theory of convergence. This allows us to describe exactly how differences can shrink down to nothing as you approach infinity. Even that wasn't enough: it was only in the 1960s when the idea of infinitesimals as fully functioning numbers—the hyperreal numbers—was finally proven to be consistent enough by Abraham Robinson.

But it goes back much further. Ancient mathematicians were aware of problems of infinity, and used many ingenious ways to approach it. For example, $ π $ was found by considering circles to be infinite-sided polygons. Archimedes' work is likely the earliest use of indivisibles, using them to imagine tiny mechanical levers and find a shape's center of mass. He's better known for running naked through the streets shouting Eureka! though.

That it took so long shows that this is not an easy problem. The proofs involved are elaborate and meticulous, all the way back. They have to be, in order to nail down something as tricky as infinity. As a result, students generally learn calculus through the simplified methods of Newton and Leibniz, rather than the most mathematically correct interpretation. We're taught to mix notations from 4 different centuries together, and everyone's just supposed to connect the dots on their own. Except the trail of important questions along the way is now overgrown with jungle.




  
  A diagram from Isaac Newton's Philosophiæ Naturalis Principia Mathematica (1687) about finding area under a smooth curve.


  
Still, it shows that even if we don't understand the whole picture, we can get a lot done. This article is in no way a formal introduction to infinitesimals. Rather, it's a demonstration of why we might need them.
  
What is happening when we shrink atoms down to points? Why does it make shapes solid yet seemingly hollow? Is it ever meaningful to write $ x = \infty $? Is there only one infinity, or are there many different kinds?

To answer that, we first have to go back to even simpler times, to Ancient Greece, and start with the works of Zeno.

Achilles and the Tortoise

Zeno of Elea was one of the first mathematicians to pose these sorts of questions, effectively trolling mathematics for the next two millennia. He lived in the 5th century BC in southern Italy, although only second-hand references survive. In his series of paradoxes, he examines the nature of equality, distance, continuity, of time itself.

Because it's the ancient times, our mathematical knowledge is limited. We know about zero, but we're still struggling with the idea of nothing. We've run into negative numbers, but they're clearly absurd and imaginary, unlike the positive numbers we find in geometry. We also know about fractions and ratios, but square roots still confuse us, even though our temples stay up.





  
    
  

  

    
      So the story goes: the tortoise challenges Achilles to a footrace.
    

    
      
        "If you give me a head start," it says, "any start at all, you can never win.".

        Achilles laughs and decides to be a good sport: he'll only run twice as fast as the tortoise.
      
    

    
      
        The tortoise explains: "If you want to pass me, first you have to move to where I am. By the time you get there, I'll have walked ahead a little bit."
      
    

    
      
        "While you cross the next distance, I will move yet again. No matter how many times you try to catch up, I'll always be some small distance ahead. Therefor, you cannot beat me."
      
    

    
      Achilles realizes that talking tortoises are not a sign of positive mental health, so he decides to find a wall to run into instead. It will either confirm the theory, or end the pain.
      
    

    
      
        See, the race is actually unnecessary, because the problem remains the same.
In order to reach the wall, Achilles first has to cross half the way there.
      
    
    
    
      Then he has to go half that distance again, and again. No matter how many times he repeats this, there will always be some distance left. So if Achilles can't cross this distance in a finite amount of steps, why is he wearing that stupid helmet?
      
    

    
      $$ … $$
      
        The distance travelled forms a never ending sequence of expanding sums.
We have to examine the entire sequence, rather than individual numbers in it.
      
    

    
      
        By definition, the distance travelled and distance to the wall always add up to $ 1 $. So one simple way to resolve this conundrum is to say: Well yes, it's going to take you infinitely long to glue all those pieces together, but only because you already spent an infinite amount of time chopping them up!

        But that's not a very mathematically satisfying answer. Let's try something else.
      
    

    
      
        The distance to the wall is always equal to the last step taken. We know that each step is half as long as the previous one, starting with $ \frac{1}{2} $. Therefor, the distance to the wall must decrease exponentially: $ \frac{1}{2}, \frac{1}{4}, \frac{1}{8}, \frac{1}{16}, … $, getting closer to zero with every step.
      
    

    
      
        But why can we say that this gap effectively closes to zero after 'infinity steps'? The number that we're building up is $ \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + … \,$
      
    
    
    
      
        We know our sum will never exceed $ 1 $, as there is only $ 1 $ unit of distance being divided. This means $ \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + … \leq 1 $, which eliminates every number past the surface of the wall—but not the surface itself.
      
    

    
      
        Suppose we presume $ \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + … < 1 $ and hence that this number lies some tiny distance in front of the wall.
      
    

    
      
        Well in that case, all we need to do is zoom in far enough, and we'll see our sequence jump past it after a certain finite number of steps.
      
    

    
      
        If we try to move it closer to the wall, the same thing happens. This number simply cannot be less than $ 1 $. Therefor $ \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + … \geq 1 $
      
    

    
      
        The only place $ \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + … \, $ can be is exactly $ 0 $ units away from $ 1 $. If two numbers have zero distance between them, then they are equal.
      
    

    
      $$ … $$
      
        What we've actually done here is applied the principle of limits: we've defined a procedure of steps that lets us narrow down the interval where the infinite sum might be. The lower bound is the sequence of sums itself: it only increases towards $ 1 $, never decreases. For the upper bound, we established no sum could exceed $ 1 $. Therefor the interval must shrink to nothing, and the sequence converges.
      
    

    
      
        $$ \lim_{n \to +\infty} x_n = \mathop{\class{no-outline}{►\hspace{-2pt}►}}_{\infty\hspace{2pt}} x_n $$
      
      
        The purpose of a limit is then to act as a supercharged fast-forward button. It lets us avoid the infinite amount of work required to complete sums like $ \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + … $ and simply skip to the end. To do so, we have to step back, spot the pattern, and pin down where it ends. So limits allow us to literally reach the unreachable. But in fact, you already knew that.
      
    

    
      $$ \frac{2}{3} = 0.66666… $$
      $$ 0.6 + 0.06 + 0.006 + …\hspace{2pt} $$
      
        As soon as you learned to divide, you found $ 2 \div 3 = 0.666… = 0.6 + 0.06 + 0.006 + …\hspace{2pt} $ 
Even in primary school the opportunity to examine infinity is there. Rather than tackle it head on, it's simply noted and filed. Eight years later it's regurgitated in the form of cryptic epsilon-delta definitions.
    

    
      
        $$ 1 - 1 + 1 - 1 + 1 … $$

      
      
        But then there's those pesky consequences again. By allowing the idea of infinity, we can invent an entire zoo of paradoxical things. For example, imagine a lamp that's switched on ($1$) and off ($0$) at intervals that decrease by a factor of two: on for $ \frac{1}{2} $ second, off for $ \frac{1}{4} s $, on for $ \frac{1}{8} s $, off for $ \frac{1}{16} s $, …
After $ 1\,s $, when the switch has been flipped an infinite amount of times, is the lamp on or off?
      
    

    
      
        


        
        

        $$ (1 - 1) + (1 - 1) + (1 - 1) + … = 0 \,? $$
        
      
      
        


        
        

        


        $$ 1 + (-1 + 1) + (-1 + 1) + … = 1 \,? $$
        
      
      
        Another way to put this is that the lamp's state at $ 1\,s $ is the result of the infinite sum $ 1 - 1 + 1 - 1 + … $ 
Intuitively we might say each pair of $ +1 $ and $ -1 $ should cancel out and make the entire sum equal to $ 0 $. 
But we can pair them the other way, leading to $ 1 $ instead. It can't be both.
      
    

    
      
        If we zoom in, it's obvious that no matter how close we get to $ 1\,s $, the lamp's state keeps switching. Therefor it's meaningless to attempt to 'fast forward' to the end, and the limit does not exist. At $ 1\,s $ the lamp is neither on nor off: it's undefined. This infinite sum does not converge.
      
    

    
      
        But actually, we overcomplicated things. Thanks to the power of limits, we can ask a simpler, equivalent question. Given a lamp that switches on and off every second, what is its state at infinity? The answer's the same: it never settles.
      
    

  
  





  
  Limits are the first tool in our belt for tackling infinity. Given a sequence described by countable steps, we can attempt to extend it not just to the end of the world, but literally forever. If this works we end up with a finite value. If not, the limit is undefined. A limit can be equal to $ \infty $, but that's just shorthand for the sequence has no upper bound. Negative infinity means no lower bound.

Breaking Away From Rationality
  
  Until now we've only encountered fractions, that is, rational numbers. Each of our sums was made of fractions. The limit, if it existed, was also a rational number. We don't know whether this was just a coincidence.
  
  It might seem implausible that a sequence of numbers that is 100% rational and converges, can approach a limit that isn't rational at all. Yet we've already seen similar discrepancies. In our first sequence, every partial sum was less than $ 1 $. Meanwhile the limit of the sum was equal to $ 1 $. Clearly, the limit does not have to share all the properties of its originating sequence.
  
  We also haven't solved our original problem: we've only chopped things up into infinitely many finite pieces. How do we get to infinitely small pieces? To answer that, we need to go looking for continuity.
  
  Generally, continuity is defined by what it is and what its properties are: a noticeable lack of holes, and no paradoxical values. But that's putting the cart before the horse. First, we have to show which holes we're trying to plug.
  




  
    
  

  

    
      Let's imagine the rational numbers.
    

    
      Actually, hold on. Is this really a line? The integers certainly weren't connected.
    

    
      Rather than assume anything, we're going to attempt to visualize all the rational numbers. We'll start with the numbers between $ 0 $ and $ 1 $.
    

    
      $$ \class{blue}{\frac{0 + 1}{2}} $$
      Between any two numbers, we can find a new number in between: their average. This leads to $ \frac{1}{2} $.
    

    
      $$ \frac{a + b}{2} $$
      By repeatedly taking averages, we keep finding new numbers, filling up the interval.
    

    
      If we separate out every step, we get a binary tree.
    

    
      You can think of this as a map of all the fractions of $ 2^n $. Given any such fraction, say $ \frac{13}{32} = \frac{13}{2^5} $, there is a unique path of lefts and rights that leads directly to it. At least, as long as it lies between $ 0 $ and $ 1 $.
    
  
    
      
      Note that the graph resembles a fractal and that the distance to the top edge is divided in half with every step. But we only ever explore a finite amount of steps. Therefor, we are not taking a limit and we'll never actually touch the edge. 
      
    

    
      $$ \frac{2 \cdot a + b}{3} $$
      $$ \frac{a + 2 \cdot b}{3} $$
      But we can take thirds as well, leading to fractions with a power of $ 3^n $ in their denominator.
    

    
      As some numbers can be reached in multiple ways, we can eliminate some lines, and end up with this graph, where every number sprouts into a three-way, ternary tree. Again, we have a map that gives us a unique path to any fraction of $ 3^n $ in this range, like $ \frac{11}{27} = \frac{11}{3^3} $.
    

    
      
        $$ \frac{21}{60} = \frac{21}{2^2 \cdot 3 \cdot 5} $$
      
      
      Because we can do this for any denominator, we can define a way to get to any rational number in a finite amount of steps. Take for example $ \frac{21}{60} $. We decompose its denominator into prime numbers and begin with $ 0 $ and $ 1 $ again.
    

    
      
        
        $$ \frac{21}{60} = \frac{21}{2^2 \cdot 3 \cdot 5} $$
        
      

      There is a division of $ 2^2 $, so we do two binary splits. This time, I'm repeating the previously found numbers so you can see the regular divisions more clearly. We get quarters.
    

    
      The next factor is $ 3 $ so we divide into thirds once. We now have twelfths.
    

    
      For the last division we chop into fifths and get sixtieths.
    
    
    
      $ \frac{21}{60} $ is now the 21st number from the left.
    

    
      But this means we've found a clear way to visualize all the rational numbers between $ 0 $ and $ 1 $: it's all the numbers we can reach by applying a finite number of binary (2), ternary (3), quinary (5) etc. divisions, for any denominator. So there's always a finite gap between any two rational numbers, even though there are infinitely many of them.
    

    
      The rational numbers are not continuous. Therefor, it is more accurate to picture them as a set of tick marks than a connected number line.
    

    
      To find continuity then, we need to revisit one of our earlier trees. We'll pick the binary one.
While every fork goes two ways, we actually have a third choice at every step: we can choose to stop. That's how we get a finite path to a whole fraction of $ 2^n $.
    

    
      But what if we never stop? We have to apply a limit: we try to spot a pattern and try to fast-forward it. Note that by halving each step vertically on the graph, we've actually linearized each approach into a straight line which ends. Now we can take limits visually just by intersecting lines with the top edge.
    

    
      Right away we can spot two convergent limits: by always choosing either the left or the right branch, we end up at respectively $ 0 $ and $ 1 $.
    

    
      These two sequences both converge to $ \frac{1}{2} $. It seems that 'at infinity steps', the graph meets up with itself in the middle.
    

    
      But the graph is now a true fractal. So the same convergence can be found here. In fact, the graph meets up with itself anywhere there is a multiple of $ \frac{1}{2^n} $.
    
    
    
      That's pretty neat: now we can eliminate the option of stopping altogether. Instead of ending at $  \frac{5}{16} $, we can simply take one additional step in either direction, followed by infinitely many opposite steps. Now we're only considering paths that are infinitely long.
    

    
      But if this graph only leads to fractions of $ 2^n $, then there must be gaps between them. In the limit, the distance between any two adjacent numbers in the graph shrinks down to exactly $ 0 $, which suggests there are no gaps. This infinite version of the binary tree must lead to a lot more numbers than we might think.

         Suppose we take a path of alternating left and right steps, and extend it forever. Where do we end up?
    

    
      We can apply the same principle of an upper and lower bound, but now we're approaching from both sides at once. Thanks to our linearization trick, the entire sequence fits snugly inside a triangle.
    

    
      If we zoom into the convergence at infinity, we actually end up at $ \class{orangered}{\frac{2}{3}} $.

        Somehow we've managed to coax a fraction of $ 3 $ out of a perfectly regular binary tree.
    

    
      If we alternate two lefts with one right, we can end up at $ \class{orangered}{\frac{4}{7}} $. This is remarkable: when we tried to visualize all the rational numbers by combining all kinds of divisions, we were overthinking it. We only needed to take binary divisions and repeat them infinitely with a limit.
    

    
      Every single rational number can then be found by taking a finite amount of steps to get to a certain point, and then settling into a repeating pattern of lefts and/or rights all the way to infinity.
    

    
      If we can find numbers between $ 0 $ and $ 1 $ this way, we can apply the exact same principle to the range $ 1 $ to $ 2 $. So we can connect two of these graphs into a single graph with its tip at $ 1 $.
    

    
      But we can repeat it as much as we like. The full graph is not just infinitely divided, but infinitely big, in that no finite box can contain it. That means it leads to every single positive rational number. We can start anywhere we like. Is your mind blown yet?
    

    
      No? Ok. But if this works for positives, we can build a similar graph for the negatives just by mirroring it. So we now have a map of the entire rational number set. All we need to do is take infinite paths that settle into a repeating pattern from either a positive or a negative starting point. When we do, we find every such path leads to a rational number.

        So any rational number can be found by taking an infinite stroll on one of two infinite binary trees.
    

    
      Wait, did I say two infinite trees? Sorry, I meant one infinitely big tree.
See, if we repeatedly scale up a fractal binary tree and apply a limit to that, we end up with almost exactly the same thing. Only this time, the two downward diagonals always eventually fold back towards $ 0 $. This creates a path of infinity + 1 steps downward. While that might not be very practical, it suggests you can ride out to the restaurant at the end of the universe, have dinner, and take a single step to get back home.
    

    
      Is it math, or visual poetry? It's time to bring this fellatio of the mind to its inevitable climax.
    
    
    
      $ \class{blue}{0} $
      $ \class{green}{1} $

      $ \class{blue}{0} $
      $ \class{green}{1} $

      $ \class{blue}{0} $
      $ \class{green}{1} $
      
      You may wonder, if this map is so amazing, how did we ever do without?

        Let's label our branches. If we go left, we call it $ 0 $. If we go right, we call it $ 1 $.
    

    
      
        $$
        \frac{5}{3} = \class{green}{11}\class{blue}{0}\hspace{2pt}\class{green}{1}\class{blue}{0}\hspace{2pt}\class{green}{1}\class{blue}{0}…
        $$
      

      We can then identify any number by writing out the infinite path that leads there as a sequence of ones and zeroes—bits.

But you already knew that.
    

    
      
        $$
        \frac{5}{3} = \class{green}{1}.\class{green}{1}\class{blue}{0}\hspace{2pt}\class{green}{1}\class{blue}{0}\hspace{2pt}\class{green}{1}\class{blue}{0}…_2
        $$
      

      See we've just rediscovered the binary number system. We're so used to numbers in decimal, base 10, we didn't notice. Yet we all learned that rational numbers consist of digits that settle into a repeating sequence, a repeating pattern of turns. Disallowing finite paths works the same, even in decimal: the number $ 0.95 $ can be written as $\, 0.94999…\, $, i.e. take one final step in one direction, followed by infinitely many steps the other way.
    

    
      
        $$
        \frac{4}{5} = \class{blue}{0}.\class{green}{11}\class{blue}{00}\hspace{2pt}\class{green}{11}\class{blue}{00}…_2 
        $$
      

      
        When we write down a number digit by digit, we're really following the path to it in a graph like this, dialing the number's … er … number. The rationals aren't shaped like a binary tree, rather, they look like a binary tree when viewed through the lens of binary division. Every infinite binary, ternary, quinary, etc. tree is then a different but complete perspective of the same underlying thing. We don't have the map, we have one of infinitely many maps.
    

    
      
        $$
        π = \class{green}{11}.\class{blue}{00}\class{green}{1}\class{blue}{00}\class{green}{1}\class{blue}{0000}\class{green}{1}…_2
        $$
      
      
      
       Which means we can show this graph is actually an interdimensional number portal.

       See, we already know where the missing numbers are. Irrational numbers like $ π $ form a never-repeating sequence of digits. If we want to reach $ π $, we find it's at the end of an infinite path whose turns do not repeat. By allowing such paths, our map leads us straight to them. Even though it's made out of only  one kind of rational number: division by two.
    

    
      
        $$
          π = \mathop{\class{no-outline}{►\hspace{-2pt}►}}_{\infty\hspace{2pt}} x_n \,?
        $$
        
      

      
        So now we've invented real numbers. How do we visualize this invention? And where does continuity come in? What we need is a procedure that generates such a non-repeating path when taken to the limit. Then we can figure out where the behavior at infinity comes from.
      
    

    
      Because the path never settles into a pattern, we can't pin it down with a single neat triangle like before. We try something else. At every step, we can see that the smallest number we can still reach is found by always going left. Similarly, the largest available number is found by always going right. Wherever we go from here, it will be somewhere in this range.
    

    
      We can set up shrinking intervals by placing such triangles along the path, forming a nested sequence.
    

    
      
        $$
          \begin{align}
            3 \leq & π \leq 4 \\
            3.1 \leq & π \leq 3.2 \\
            3.14 \leq & π \leq 3.15 \\ 
            3.141 \leq & π \leq 3.142 \\ 
            3.1415 \leq & π \leq 3.1416 \\
            3.14159 \leq & π \leq 3.14160 \\
          \end{align}
        $$
      

      
        $$
          \begin{align}
            11_2 \leq & π \leq 100_2 \\ 
            11.0_2 \leq & π \leq 11.1_2 \\ 
            11.00_2 \leq & π \leq 11.01_2 \\
            11.001_2 \leq & π \leq 11.010_2 \\
            11.0010_2 \leq & π \leq 11.0011_2 \\
            11.00100_2 \leq & π \leq 11.00101_2 \\
          \end{align}
        $$
      

      
        What we've actually done is rounded up and down at every step, to find an upper and lower bound with a certain amount of digits. This works in any number base.
      
    

    
      Let's examine these intervals by themselves. We can see that due to the binary nature, each interval covers either the left or right side of its ancestor. Because our graph goes on forever, there are infinitely many nested intervals. This tower of $ π $ never ends and never repeats itself, we just squeezed it into a finite space so we could see it better.
    

    
      If we instead approach a rational number like $ \frac{10}{3} = 3.333…\, $ then the tower starts repeating itself at some point. Note that the intervals don't slide smoothly. Each can only be in one of two places relative to its ancestor.
    

    
      In order to reach a different rational number, like $ 3.999… = 4 $, we have to establish a different repeating pattern. So we have to rearrange infinitely many levels of the tower all at once, from one configuration to another. This reinforces the notion that rational numbers are not continuous.
    

    
      If the tower converges to a number, then the top must be infinitely thin, i.e. $ 0 $ units wide. That would suggest it's meaningless to say what the interval at infinity looks like, because it stops existing. Let's try it anyway.
    

    
      
        There is only one question to answer: does the interval cover the left side, or the right?
      
    

    
      
        Oddly enough, in this specific case of $ 3.999…\, $ there is an answer. The tower leans to the right. Therefor, the state of the interval is the same all the way up. If we take the limit, it converges and the final interval goes right.
      
    

    
      
        But we can immediately see that we can build a second tower that leans left, which converges on the same number. We could distinguish between the two by writing it as $ 4.000…\, $ In this case the final interval goes left.
      
    

    
      
        If we approach $ 10/3 $, we take a path of alternating left and right steps. The state of the interval at infinity becomes like our paradoxical lamp from before: it has to be both left and right, and therefor it is neither, it's simply undefined.
      
    

    
      
        The same applies to irrational numbers like $ π $. Because the sequence of turns never repeats itself, the interval flips arbitrarily between left and right forever, therefor it is in an undefined state at the end.
      
    

    
      
        But there's another way to look at this.

        If the interval converges to the number $ π $, then the two sequences of respectively lower and upper bounds also converge to $ π $ individually.
      
    

    
      
        Remember how we derived our bounds: we rounded down by always taking lefts and rounded up by always taking rights. The shape of the tower depends on the specific path you're taking, not just the number you reach at the end.
      
    

    
      
        That means we're approaching the lower bounds so they all end in $ 0000… \, $ Their towers always lean left.
      
    
    
    
      If we then take the limit of their final intervals as we approach $ π $, that goes left too. Note that this is a double limit: first we find the limit of the intervals of each tower individually, then we take the limit over all the towers as we approach $ π $.
      
    

    
      
        For the same reason, we can think of all the upper bounds as ending in $ 1111 …\, $ Their towers always lean right. When we take the limit of their final intervals and approach $ π $, we find it points right.
      
    

    
      
        But, we could actually just reverse the rounding for the upper and lower bounds, and end up with the exact opposite situation. Therefor it doesn't mean that we've invented a red $ π $ to the left and green $ π $ to the right which are somehow different. $ π $ is $ π $. This only says something about our procedure of building towers. It matters because the towers is how we're trying to reach a real number in the first place.
      
    

    
      
        See, our tower still represents a binary number of infinitely many bits. Every interval can still only be in one of two places. To run along the real number line, we'd have to rearrange infinitely many levels of the tower all at once to create motion. That still does not seem continuous.
      
    

    
      
        We can resolve this if we picture the final interval of each tower as a bit at infinity. If we flip the bit at infinity, we swap between two equivalent ways of reaching a number, so this has no effect on the resulting number.
      
    

    
      
         In doing so, we're actually imagining that every real number is a rational number whose non-repeating head has grown infinitely big. Its repeating tail has been pushed out all the way past infinity. That means we can flip the repeating part of our tower between different configurations without creating any changes in the number it leads to.
      
    
    
    
      
        That helps a little bit with the intuition: if the tower keeps working all the way up there, it must be continuous at its actual tip, wherever that really is. A continuum is then what happens when the smallest possible step you can take isn't just as small as you want. It's so small that it no longer makes any noticeable difference. While that's not a very mathematical definition, I find it very helpful in trying to imagine how this might work.
      
    

    
      
        $ 1, 2, 3, 4, 5, 6, … $
      
      
      
        Finally, we might wonder how many of each type of number there are.
The natural numbers are countably infinite: there is a procedure of steps which, in the limit, counts all of them. Just start at the beginning, and fast-forward.
      
    

    
      
        $$ 1, 2, 3, 4, 5, 6, … $$
        




      
      
        


        $$ \class{orangered}{2, 4, 6, 8, 10, 12, …} $$
        


      
      
        




        $$ \class{green}{0, 1, -1, 2, -2, 3, …} $$
      
    
      
        We can find a similar sequence for the even natural numbers by multiplying each number by two. We can also alternate between a positive and negative sequence to count the integers. We can match up the elements one-to-one, which means all three sequences are equally long. They're all countably infinite.
There are as many even positives as positives. Which is exactly as many as all the integers combined. As counter-intuitive as it is, it is the only consistent answer.
      
    
    
    
      
        $$
\begin{array}{cccccccc}
 1 \hspace{2pt}&\hspace{2pt} 2 \hspace{2pt}&\hspace{2pt} 3 \hspace{2pt}&\hspace{2pt} 4 \hspace{2pt}&\hspace{2pt} 5 \hspace{2pt}&\hspace{2pt} 6 \hspace{2pt}&\hspace{2pt} … \\[6pt]
 \frac{1}{2}
  \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{2}{2}}
  \hspace{2pt}&\hspace{2pt} \frac{3}{2}
  \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{4}{2}}
  \hspace{2pt}&\hspace{2pt} \frac{5}{2}
  \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{6}{2}}
  \hspace{2pt}&\hspace{2pt}  \\[3pt]
 \frac{1}{3} 
  \hspace{2pt}&\hspace{2pt} \frac{2}{3} 
  \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{3}{3}}
  \hspace{2pt}&\hspace{2pt} \frac{4}{3} 
  \hspace{2pt}&\hspace{2pt} \frac{5}{3} 
  \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{6}{3}}
  \hspace{2pt}&\hspace{2pt}   \cdots \\[3pt]
 \frac{1}{4} 
  \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{2}{4}}
  \hspace{2pt}&\hspace{2pt} \frac{3}{4} 
  \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{4}{4}}
  \hspace{2pt}&\hspace{2pt} \frac{5}{4} 
  \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{6}{4}}
  \hspace{2pt}&\hspace{2pt}  \\[3pt]
 \frac{1}{5} 
  \hspace{2pt}&\hspace{2pt} \frac{2}{5} 
  \hspace{2pt}&\hspace{2pt} \frac{3}{5} 
  \hspace{2pt}&\hspace{2pt} \frac{4}{5} 
  \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{5}{5}}
  \hspace{2pt}&\hspace{2pt} \frac{6}{5} 
  \hspace{2pt}&\hspace{2pt}  \\[3pt]
 \frac{1}{6} 
  \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{2}{6}}
  \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{3}{6}}
  \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{4}{6}}
  \hspace{2pt}&\hspace{2pt} \frac{5}{6} 
  \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{6}{6}}
  \hspace{2pt}&\hspace{2pt}  \\[3pt]
  \hspace{2pt}&\hspace{2pt} \vdots  \hspace{2pt}&\hspace{2pt}   \hspace{2pt}&\hspace{2pt}  \vdots \hspace{2pt}&\hspace{2pt}   \hspace{2pt}&\hspace{2pt}   \hspace{2pt}&\hspace{2pt}   \hspace{2pt}&\hspace{2pt} \class{white}{\ddots}
 \end{array}
        $$
        
      
      
      
         But we can take it one step further: we can find such a sequence for the rational numbers too, by laying out all the fractions on a grid. We can follow diagonals up and down and pass through every single one. If we eliminate duplicates like $ 1 = 2/2 = 3/3 $ and alternate positives and negatives, we can 'count them all'. So there are as many fractions as there are natural numbers. "Deal with it", says Infinity, donning its sunglasses.
      
    
    
    
      
        $$
        \begin{array}{c}
               0.\hspace{1pt}\class{green}{1}\hspace{1pt}0\hspace{1pt}0\hspace{1pt}1\hspace{1pt}1\hspace{1pt}1\hspace{1pt}0\hspace{1pt}…_2 \\
               0.\hspace{1pt}1\hspace{1pt}\class{blue}{0}\hspace{1pt}0\hspace{1pt}1\hspace{1pt}0\hspace{1pt}0\hspace{1pt}1\hspace{1pt}…_2 \\
               0.\hspace{1pt}1\hspace{1pt}0\hspace{1pt}\class{green}{1}\hspace{1pt}0\hspace{1pt}0\hspace{1pt}1\hspace{1pt}0\hspace{1pt}…_2 \\
               0.\hspace{1pt}0\hspace{1pt}1\hspace{1pt}1\hspace{1pt}\class{green}{1}\hspace{1pt}0\hspace{1pt}1\hspace{1pt}1\hspace{1pt}…_2 \\
               0.\hspace{1pt}1\hspace{1pt}0\hspace{1pt}1\hspace{1pt}1\hspace{1pt}\class{blue}{0}\hspace{1pt}0\hspace{1pt}1\hspace{1pt}…_2 \\
               0.\hspace{1pt}0\hspace{1pt}1\hspace{1pt}0\hspace{1pt}1\hspace{1pt}0\hspace{1pt}\class{blue}{0}\hspace{1pt}0\hspace{1pt}…_2 \\
               0.\hspace{1pt}0\hspace{1pt}1\hspace{1pt}1\hspace{1pt}1\hspace{1pt}1\hspace{1pt}0\hspace{1pt}\class{green}{1}\hspace{1pt}…_2 \\
               … \\
               \\
               0.\hspace{1pt}\class{blue}{0}\hspace{1pt}\class{green}{1}\hspace{1pt}\class{blue}{0\hspace{1pt}0}\hspace{1pt}\class{green}{1\hspace{1pt}1}\hspace{1pt}\class{blue}{0}\hspace{1pt}…_2
         \end{array}
        $$
      
      
      
        The real numbers on the other hand are uncountably infinite: no process can list them all in the limit. The basic proof is short: suppose we did have a sequence of all the real numbers between $ 0 $ and $ 1 $ in some order. We could then build a new number by taking all the bits on the diagonal, and flipping zeroes and ones.
That means this number is different from every listed number in at least one digit, so it's not on the list. But it's also between $ 0 $ and $ 1 $, so it should be on the list. Therefor, the list can't exist.
      
    

    
      
        This even matches our intuitive explanation from earlier. There are so many real numbers, that we had to invent a bit at infinity to try and count them, and find something that would tick at least once for every real number. Even then we couldn't say whether it was $ 0 $ or $ 1 $ anywhere in particular, because it literally depends on how you approach it.
      
    

  




  
  What we just did was a careful exercise in hiding the obvious, namely the digit-based number systems we are all familiar with. By viewing them not as digits, but as paths on a directed graph, we get a new perspective on just what it means to use them. We've also seen how this means we can construct the rationals and reals using the least possible ingredients required: division by two, and limits.
  
Drowning By Numbers
  
  In school, we generally work with the decimal representation of numbers. As a result, the popular image of mathematics is that it's the science of digits, not the underlying structures they represent. This permanently skews our perception of what numbers really are, and is easy to demonstrate. You can google to find countless arguments of why $ 0.999… $ is or isn't equal to $ 1 $. Yet nobody's wondering why $ 0.000… = 0 $, though it's practically the same problem: $ 0.1, 0.01, 0.001, 0.0001, … $
  
  Furthermore, in decimal notation, rational numbers and real numbers look incredibly alike: $ 3.3333… $ vs $ 3.1415…\, $ The question of what it actually means to have infinitely many non-repeating digits, and why this results in continuous numbers, is hidden away in those 3 dots at the end. By imagining $ π $ as $ 3.1415…0000… $ or $ 3.1415…1111… $ we can intuitively bridge the gap to the infinitely small. We see how the distance between two neighbouring real numbers must be so small, that it really is equivalent to $ 0 $.
  
  That's not as crazy as it sounds. In the field of hyperreal numbers, every number actually has additional digits 'past infinity': that's its infinitesimal part. You can imagine this to be a multiple of $ \frac{1}{\infty} $, an infinitely small unit greater than $ 0 $, which I'll call $ ε $. You can add $ ε $ to a real number to take an infinitely small step. It represents a difference that can only be revealed with an infinitely strong microscope. Equality is replaced with adequality: being equal aside from an infinitely small difference.
  
You can explore this hyperreal number line below.





  
    
  
    



  
  As $ ε $ is a fully functioning hyperreal number, $ ε^2 $ is also infinitesimal. In fact, it's even infinitely smaller than $ ε $, and we can keep doing this for $ ε^3, ε^4, …\,$ To make matters worse, if $ ε $ is infinitesimal, then $ \frac{1}{ε} $ must be infinitely big, and $ \frac{1}{ε^2} $ infinitely bigger than that. So hyperreal numbers don't just have inwardly nested infinitesimal levels, but outward levels of increasing infinity too. They have infinitely many dimensions of infinity both ways.
  
  So it's perfectly possible to say that $ 0.999… $ does not equal $ 1 $, if you mean they differ by an infinitely small amount. The only problem is that in doing so, you get much, much more than you bargained for.

  A Tug of War Between the Gods
  
  That means we can finally answer the question we started out with: why did our continuous atoms seemingly all have $ 0 $ mass, when the total mass was not $ 0 $? The answer is that the mass per atom was infinitesimal. So was each atom's volume. The density, mass per volume, was the result of dividing one infinitesimal amount by another, to get a normal sized number again. To create a finite mass in a finite volume, we have to add up infinitely many of these atoms.
  
  These are the underlying principles of calculus, and the final puzzle piece to cover. The funny thing about calculus is, it's conceptually easy, especially if you start with a good example. What is hard is actually working with the formulas, because they can get hairy very quickly. Luckily, your computer will do them for you:







  
    
  

  

    
      We're going to go for a drive.
    

    
      We'll graph speed versus time. We have kilometers per hour vertically, and hours horizontally. We've also got a speedometer—how fast—and an odometer—how far.
    

    
      Suppose we drive for half an hour at 50 km/h.
    

    
      $ \class{orangered}{25} $
      
      We end up driving for 25 km. This is the area of spanned by the two lengths: $ 50 \cdot \frac{1}{2} $, a rectangle.
    

    
      $ \class{orangered}{60} $

      Now we hit the highway and maintain 120 km/h for the rest of the hour. We go an additional 60 km, the area of the second rectangle, $ 120 \cdot \frac{1}{2} $.
Whenever we multiply two units like speed and time, we can always visualize the result as an area.
    

    
      $ \class{slate}{85} $

      Because we crossed 85 km in one hour, this is equivalent to driving at a constant speed of 85 km/h for the duration. The total area is the same.
    

    
      If this were a race between two different cars, we would see a photo finish. The distance travelled in kilometers is identical at the 1 hour mark. Where they differ is in their speed along the way, with the red car falling behind and then catching up.
    

    
      The difference is visible in the slope of both paths. The faster the car, the more quickly it accumulates kilometers. If it drove 25 km in half an hour, then its speed was 50 km/h, $ \frac{25}{0.5} $. This is the distance travelled divided by the time it took, vertical divided by horizontal.
    

    
      Slope is a relative thing. If we shrink the considered time, the distance shrinks along with it, and the resulting speed is the same. What we're really doing is formalizing the concept of a rate of change, of distance over time.
    

    
      Constant speed means a constant increase in distance. We can directly relate the area being swept out left to right with the accumulated distance by each car. This is clue number 1.
    

    
      Now suppose the red car starts ahead by 10 km and drives the same speeds.

        It will also end up 10 km ahead after 1 hour, its path has simply been shifted by 10 units. The slope is unchanged: it doesn't matter where you are and where you've been, only how fast you're going right now. It's what's called an instantaneous quantity, it describes a situation only in the moment. This is clue number 2.
    

    
      In order to get ahead, the red car had to drive there. So we can imagine it started earlier, $ \frac{1}{5} $ of an hour, driving for 10 km at the same speed. Again, the equality holds: area swept out equals accumulated distance, we add another $ 50 \cdot \frac{1}{5} $. Constant slope still equals constant speed.
    

    
      One curve describes how the other changes in the moment, therefor the two quantities are linked somehow. We add up area to go from speed to distance; we find slope to go from distance to speed. We're going to examine this two-way relationship more.
    

    
      Real cars don't start or stop on a dime, they accelerate and decelerate. So we're going to try more realistic behavior.
    

    
      Suppose the speed follows a curve. In one hour, the car starts from 0 km/h, accelerates to over 100 km/h and then smoothly decelerates back to standstill. The distance travelled also curves smoothly, from 0 to 60 km, so we've driven 60 km in total.
    

    
      We can immediately see that at the point where the car was going fastest, the distance was increasing the most. Its slope is steepest at that point. The relationship between the two curves holds.
    

    
      But actually measuring it is a problem. First, there are no more straight sections to measure the slope on. If we take two points on a curve, the line that connects them doesn't touch the curve, it crosses it at an angle.
    

    
      Second, we can no longer measure the area by dividing it into rectangles, or any other simple geometric shape. There will always be gaps. We can solve both of these problems with a dash of infinity.
    

    
      
        We'll start with area. We have to find an upper and a lower bound again.
We're going to divide the curve into 4 sections.
      
    

    
      
        First, the upper bound. We find the highest value in each section and make a rectangle of that height. This approach is too greedy and overestimates.
      
    

    
      
        The lower bound is similar. We find the smallest value in each interval and make rectangles of that height.
This underestimates and leaves areas uncovered.
      
    

    
      
        If we do 7 divisions instead. We can see that the upper bound has decreased: there is less excess area. The lower bound has increased: the gaps are smaller and more area is covered.
      
    

    
      
        With 10 divisions, it's even better. It seems the upper and lower bounds are approaching each other.
      
    

    
      
        And the same at 13 divisions. If we keep doing this, our slices will get thinner and thinner, and we'll be adding more of them together. If we take a limit, each slice becomes infinitely thin, and there are infinitely many of them. Let's step back and see what that means.
      
    

    
      
        Take for example the sequence of lower bounds.
      
    

    
      
        Because every slice is equally wide, we can glue them together into a single rectangle per step.
Its width $ w $ is the thickness of a single slice, and its height $ h $ is the sum of the heights of the slices.
      
    

    
      
        In the limit, this rectangle becomes both infinitely thin and infinitely tall. This is a tug of war between Zero and Infinity where at first sight, they both seem to win. That's a problem. Luckily, we're not interested in the rectangle itself, but rather its area.
      
    

    
      
        We can change a rectangle's sides without changing its area. We multiply its width by one factor (e.g. $ 2 $), and divide the height by the same amount. The area $ 2w \cdot \frac{h}{2} $ is unchanged. Hence, we can normalize our rectangles to all have the same width, for example $ 1 $.
      
    

    
      
        We can do the same for the upper bounds. We can see that both areas are converging on the same value. This is the true area under the curve, which is neither zero nor infinite. In this tug of war, both parties are equally matched.
      
    
    
    
      
        Now our sequence looks very different: it's approaching a definite area, sandwiched between red and blue.
      
    

    
      $ \class{slate}{60} $
      
        If we take the limit, we get the area under our curve.
      
    

    
      $ \class{orangered}{60} $
      
        This way we can find the area under any smooth curve. This process is called integration. The symbol for integration is $ \int_a^b $ where $ a $ and $ b $ are the start and end points you're integrating between. The S-shape stands for our sum, adding up infinitely many pieces.
      
    

    
      $$ \int_0^T \! f(t) \mathrm{d} t $$
      
        We can then integrate one curve to make another, by sweeping out area horizontally from a fixed starting point. We move the end point to a time $ T $ and plot the accumulated value along the way. Using limits, we can do this continuously. This takes us from speed to distance travelled. The quantity $ \,\mathrm{d}t\, $ is the infinitesimal width of each slice, an infinitely small amount of time.
      
    
     
    
       Now we just need to figure out the reverse and find slopes. We'll go back to our failed attempt from earlier.
     

     
       If we shrink the distance we're considering, our slope estimate gets closer to the true value. But if we try to take a limit, we end up dividing $ 0 $ by $ 0 $.
     

     
       Instead we need to normalize our sequence again so it doesn't vanish.
     

     
       We only care about slope: the ratio of the two right sides. Which means, if we scale up each triangle, the ratio is unchanged. That just comes down to multiplying both sides by the same number. Again we can scale them all to the exact same width.
     

     
       Now we've created a limit that does converge to something rather than nothing.
     

     
       This finite value is the slope at the point we were homing in on. Because we can apply this process at any point on the curve, we can find the exact slope anywhere. This is called finding the derivative or differentiation.
     

     
       $$ \frac{ \mathrm{d} f(t) }{\mathrm{d} t} $$
       We can also apply this process over an entire curve to generate a new one.  So now we know how to go the other way: distance to speed. Mathematically, we are dividing an infinitesimal piece of the distance, $ \,\mathrm{d} \class{slate}{f(t)}\, $, by an infinitesimal slice of time $ \,\mathrm{d} t\, $. Working with infinitesimal formulas is tricky however. There's always an implied limit being taken in order to reach them in the first place. Indeed, it took centuries to formalize this fuzzy explanation into what we call differential forms today.
     

     
       We can note that if we shift the distance curve up or down, the speed is unchanged. When you take a derivative, any constant value you've added to your function simply disappears. This shows again that speed is always in the moment, it only describes what's going on in an infinitely short piece of curve.
     

     
       Differentiation is then like x-ray specs for curves and quantities, and it's turtles all the way down. For example, if we differentiate speed, we get acceleration. This is another rate of change, of speed over time. We see the car's acceleration is initially positive, speeding up, and then goes negative, to slow down, i.e. accelerate in the opposite direction.
Note: The acceleration has been divided by 4 to fit.
     

     
       If we integrate acceleration to get speed, we have to count the second part as negative area: it is causing the speed to decrease.
     

     
       We can see that the point of maximum speed is the point where the acceleration passes through $ 0 $. One of the most useful applications of derivatives is indeed to find a maximum or minimum of a curve more easily. No matter where it is, the slope at such a point must always be horizontal—provided the curve is smooth.
     

     
       Let's end this with a more exciting example. What's tall, fast and makes kids scream?
     

     
       A roller coaster! We'll construct a little track by welding together pieces of circles and lines.
     

     
       Alas, we shouldn't be too proud of our creation. Even though it looks smooth, there's something very wrong. This is how you build roller coasters when you don't want people to have fun. To see the problem, we need to use our x-ray specs.
     

     
       $$ \class{orangered}{f^{\prime}(x)} = \frac{\mathrm{d}\class{slate}{f(x)}}{\mathrm{d}x} $$
       
       We differentiate the height into its slope. It has sharp corners all over the place. Even though the track itself looks smooth, it doesn't change smoothly. The slope is constant in the straight sections and changes rapidly in the curved sections.
     

     
       $$ \class{green}{f^{\prime\prime}(x)} = \frac{\mathrm{d^2}\class{slate}{f(x)}}{\mathrm{d}x^2} $$

       If we take the derivative of the slope, i.e. find the slope's slope, we get a measure of curvature. It's positive inside valleys, negative on top of crests. This graph is even worse: there are sharp peaks and cliffs. Note that in the formula, we are now dividing by the square of the infinitesimal distance $ \mathrm{d}x $. This is like going two levels down on the hyperreal number line and back up again.
     

     
       $$ \class{teal}{κ(x)} = \frac{1}{ρ} = \frac{ \class{green}{f^{\prime\prime}(x)} } { (1 + \class{orangered}{f^{\prime}(x)}^2)^{3/2} } $$

             
         We can see better if we replace the second derivative with the 2D curvature.
This is the radius of the circle that touches the curve at a given point. As this radius gets infinitely big on straight sections, we use its inverse, $ \class{teal}{κ} $. Because of how we built the track, $ κ $ switches between $ 0 $ and a constant positive or negative value.
At every switch, there will be a corresponding change in force, a jerk.
     

     

             
         Let's simulate a ride. As riders go through our curved sections, their inertia will push them to the outside of the curve. From their point of view, this is a centrifugal force up or down. We'll plot the (subjective) vertical G force including gravity. It starts at a comfy 1 G, but then swings wildly between 0.5 G and 1.25 G.
     

     
       Even though the track seems smooth, we can see that the vertical G's are not. Every time we enter a curve, we experience a sudden jerk up or down. This is due to the jumps in the curvature. The G's are themselves curved, because the rider's sense of gravity decreases as the cart goes vertical. The sharp dips below 0.5 G are not simulation errors: this is actually what it would feel like.
     

     
       To really highlight the problem, we need to x-ray the G's and derive again. G forces are a form of acceleration. The derivative of acceleration is a change in force, called jerk. Whenever it's non-zero, you feel jerked in a particular direction.
     

     
       
         To fix this, we need to alter the curve of the track and smooth it out at all the different levels of differentiation. Here I've applied a relaxation procedure. It's like a blur filter in photoshop: we replace every point on the track with the average of its neighbours. We get a subtly different curve. Its height hasn't changed much at all, it's just a little bit less tense.
      
     

     
       
         But this minor change has a huge effect on both slope and radius of curvature. They are completely smoothed out, with all corners and jumps removed.
      
     

     
       
         If we do another simulation, the G force graph looks completely different. There are no more jumps.
      
     

     
       
         But the real difference is in jerk. There are no more actual jerks, only smooth oscillations. Instead of bruises, riders will get butterflies. Thanks to calculus, we avoided that painful lesson without ever having to ride it ourselves.
      
     
     
     
       
         Please check your pockets for loose items. Lost property will not be returned.
      
     

     
       
         Let's start with the original, unrelaxed track. Thanks to calculus, we can simulate head-bobbing so you can get a feel for how jerky this is. Even virtually, this isn't very pleasant.
      
     

     
       
         This is the improved track. Notice the smooth transitions in and out of curves.
      
     

     
       
         And that's how you make sweet roller coasters: by building them out of infinitely small, smooth pieces, so you don't get jerked around too much.
      
     

  




  
That was differential and integral calculus in a nutshell. We saw how many people actually spend hours every day sitting in front of an integrator: the odometers in their cars, which integrate speed into distance. And the derivative of speed is acceleration—i.e. how hard you're pushing on the gas pedal or brake, combined with forces like drag and friction.

By using these tools in equations, we can describe laws that relate quantities to their rates of change. Drag, also known as air resistance, is a force which gets stronger the faster you go. This is a relationship between the first and second derivatives of position.

In fact, the relaxation procedure we applied to our track is equivalent to another physical phenomenon. If the curve of the coaster represented the temperature along a thin metal rod, then the heat would start to equalize itself in exactly that fashion. Temperature wants to be smooth, eventually averaging out completely into a flat curve.

Whether it's heat distribution, fluid dynamics, wave propagation or a head bobbing in a roller coaster, all of these problems can be naturally expressed as so called differential equations. Solving them is a skill learned over many years, and some solutions come in the form of infinite series. Again, infinity shows up, ever the uninvited guest at the dinner table.

Closing Thoughts
  
Infinity is a many splendored thing but it does not lift us up where we belong. It boggles our mind with its implications, yet is absolutely essential in math, engineering and science. It grants us the ability to see the impossible and build new ideas within it. That way, we can solve intractable problems and understand the world better.

What a shame then that in pop culture, it only lives as a caricature. Conversations about infinity occupy a certain sphere of it—Pink Floyd has been playing on repeat, and there's usually someone peddling crystals and incense nearby.

"Man, have you ever, like, tried to imagine infinity…?" they mumble, staring off into the distance.

"Funny story, actually. We just came from there…"

Comments, feedback and corrections are welcome on Google Plus. Diagrams powered by MathBox.

More like this: How to Fold a Julia Fractal.
  




How to Fold a Julia Fractal
2013-01-05T00:00:00+01:00








A tale of numbers that like to turn








  "Take the universe and grind it down to the finest powder and sieve it through the finest sieve and then show me one atom of justice, one molecule of mercy. And yet," Death waved a hand, "And yet you act as if there is some ideal order in the world, as if there is some… some rightness in the universe by which it may be judged."
  – The Hogfather, Discworld, Terry Pratchett





  




Mathematics has a dirty little secret. Okay, so maybe it's not so dirty. But neither is it little. It goes as follows:

Everything in mathematics is a choice.

You'd think otherwise, going through the modern day mathematics curriculum. Each theorem and proof is provided, each formula bundled with convenient exercises to apply it to. A long ladder of subjects is set out before you, and you're told to climb, climb, climb, with the promise of a payoff at the end. "You'll need this stuff in real life!", they say, oblivious to the enormity of this lie, to the fact that most of the educated population walks around with "vague memories of math class and clear memories of hating it."

Rarely is it made obvious that all of these things are entirely optional—that mathematics is the art of making choices so you can discover what the consequences are. That algebra, calculus, geometry are just words we invented to group the most interesting choices together, to identify the most useful tools that came out of them. The act of mathematics is to play around, to put together ideas and see whether they go well together. Unfortunately that exploration is mostly absent from math class and we are fed pre-packaged, pre-digested math pulp instead.





And so it also goes with the numbers. We learn about the natural numbers, the integers, the fractions and eventually the real numbers. At each step, we feel hoodwinked: we were only shown a part of the puzzle! As it turned out, there was a 'better' set of numbers waiting to be discovered, more comprehensive than the last.

Along the way, we feel like our intuition is mostly preserved. Negative numbers help us settle debts, fractions help us divide pies fairly, and real numbers help us measure diagonals and draw circles. But then there's a break. If you manage to get far enough, you'll learn about something called the imaginary numbers, where it seems sanity is thrown out the window in a variety of ways. Negative numbers can have square roots, you can no longer say whether one number is bigger than the other, and the whole thing starts to look like a pointless exercise for people with far too much time on their hands.

I blame it on the name. It's misleading for one very simple reason: all numbers are imaginary. You cannot point to anything in the world and say, "This is a 3, and that is a 5." You can point to three apples, five trees, or chalk symbols that represent 3 and 5, but the concepts of 3 and 5, the numbers themselves, exist only in our heads. It's only because we are taught them at such a young age that we rarely notice.




  
    $$ 3 - 5 = \,? $$
    $$ 4\;/\; 6 = \,? $$
    $$ \sqrt{50} = \,? $$
    $$ \sqrt{-4} = \,? $$
  
  
    Questions that required us to invent new numbers in order to answer them consistently.
  






So when mathematicians finally encountered numbers that acted just a little bit different, they couldn't help but call them fictitious and imaginary, setting the wrong tone for generations to follow. Expectations got in the way of seeing what was truly there, and it took decades before the results were properly understood.

Now, this is not some esoteric point about a mathematical curiosity. These imaginary numbers—called complex numbers when combined with our ordinary real numbers—are essential to quantum physics, electromagnetism, and many more fields. They are naturally suited to describe anything that turns, waves, ripples, combines or interferes, with itself or with others. But it was also their unique structure that allowed Benoit Mandelbrot to create his stunning fractals in the late 70s, dazzling every math enthusiast that saw them.

Yet for the most part, complex numbers are treated as an inconvenience. Because they are inherently multi-dimensional, they defy our attempts to visualize them easily. Graphs describing complex math are usually simplified schematics that only hint at what's going on underneath. Because our brains don't do more than 3D natively, we can glimpse only slices of the hyperspaces necessary to put them on full display. But it's not impossible to peek behind the curtain, and we can gain some unique insights in doing so. All it takes is a willingness to imagine something different.

So that's what this is about. And a lesson to be remembered: complex numbers are typically the first kind of numbers we see that are undeniably strange. Rather than seeing a sign that says Here Be Dragons, Abandon All Hope, we should explore and enjoy the fascinating result that comes from one very simple choice: letting our numbers turn. That said, there are dragons. Very pretty ones in fact.




  
  
    The Mandelbrot Fractal, powered by the simple formula $ f(z) = z^2 + c $ where $ z $ is a complex number. These sorts of relations were first studied by Gaston Julia.
  



  
  
    The Heighway Dragon Curve, which has a surprising connection to complex numbers.
  






Like Hands on a Clock

What does it mean to let numbers turn? Well, when making mathematical choices, we have to be careful. You could declare that $ 1 + 1 $ should equal $ 3 $, but that only opens up more questions. Does $ 1 + 1 + 1 $ equal $ 4 $ or $ 5 $ or $ 6 $? Can you even do meaningful arithmetic this way? If not, what good are these modified numbers? The most important thing is that our rules need to be consistent for them to work. But if all we do is swap out the symbols for $ 2 $ and $ 3 $, we didn't actually change anything in the underlying mathematics at all.

So we're looking for choices that don't interfere with what already works, but add something new. Just like the negative numbers complemented the positives, and the fractions snugly filled the space between them—and the reals somehow fit in between that—we need to go look for new numbers where there currently aren't any.





  
    
  

  
    
      We'll start with the classic real number line, marked at the integer positions, and poke around.

         We imagine the line continues to the left and right indefinitely.
    

    
      
        $$ \class{blue}{2} + \class{green}{3} = \class{red}{5} $$
      
      But there's a problem with this visualization: by picturing numbers as points, 
it's not clear how they act upon each other.

         For example, the two adjacent numbers $ \class{blue}{2} + \class{green}{3} $ sum to $ \class{red}{5} $ …
        
    

    
      
        $$ \class{blue}{-2} + \class{green}{-1} = \class{red}{-3} $$
      
      
        … but the similarly adjacent pair $ \class{blue}{-2} + \class{green}{-1} = \class{red}{-3} $.
We can't easily spot where the red point is going to be based on the blue and green.
    

    
      
        A better solution is to represent our numbers using arrows instead, or vectors.

        Each arrow represents a number through its length, pointing right/left for positive/negative.
      
    

    
      
        The nice thing about arrows is that you can move them around without changing them.

        To add two arrows, just lay them end to end. You can easily spot why $ \class{blue}{-2} + \class{green}{-1} = \class{red}{-3} $ …
      
    

    
      
        … and why $ \class{blue}{2} + \class{green}{3} = \class{red}{5} $, similarly.
As long as we apply positives and negatives correctly, everything still works.
      
    

    
      
        $$ \times \class{green}{1.5} ... $$
      
      
        Now let's examine multiplication. We're going to start with $ \class{blue}{1} $ and then we'll multiply it by $ \class{green}{1.5} $ repeatedly.
      
    

    
      
        With every multiplication, the vector gets longer by 50 percent.
These vectors represent the numbers $ \class{red}{1}, \class{red}{1.5}, \class{red}{2.25}, \class{red}{3.375} $, $ \class{red}{5.0625} $, a nice exponential sequence.        
      
    

    
      
        $$ \times (\class{green}{-1.5}) ... $$
      
      
        Now we're going to do the same, but multiplying by the negative, $ \class{green}{-1.5} $, repeatedly.
      
    

    
      
        The vectors still grow by 50%, but they also flip around, alternating between positive and negative.
These vectors represent the sequence $ \class{red}{1}, \class{red}{-1.5}, \class{red}{2.25}, \class{red}{-3.375}, \class{red}{5.0625} $.
      
    

    
      
        But there's another way of looking at this. What if instead of flipping from positive to negative, passing through zero, we went around instead, by rotating the vector as we're growing it?
      
    

    
      
        We'd get the same numbers, but we've discovered something remarkable: a way to enter and pass through the netherworld around the number line. The question is, is this mathematically sound, or plain non-sense?
      
    

    
      $$ +180^\circ $$
      $$ 0^\circ $$
      
        The challenge is to come up with a consistent rule for applying these rotations. We start with normal arithmetic. Multiplying by a positive didn't flip the sign, so we say we rotated by $ 0^\circ $. Multiplying by a negative flips the sign, so we rotated by $ \class{green}{180^\circ} $. The lengths are multiplied normally in both cases.
      
    

    
      
        $$ \times \class{green}{1.5 \angle 90^\circ} ... $$
      
      $$ +90^\circ $$
      $$ +270^\circ $$

      
        Now suppose we pick one of the in-between nether-numbers, say the vector of length $ 1.5 $, at a $ 90^\circ $ angle. What does that mean? That's what we're trying to find out! We'll write that as $ \class{green}{1.5 \angle 90^\circ} $ (1.5 at 90 degrees). It could make sense to say that multiplying by this number should rotate by $ \class{green}{90^\circ} $ while again growing the length by 50%.
      
    

    
      
        This creates the spiral of points: $ \class{red}{1 \angle 0^\circ} $, $ \class{red}{1.5 \angle 90^\circ} $, $ \class{red}{2.25 \angle 180^\circ} $, $ \class{red}{3.375 \angle 270^\circ} $, $ \class{red}{5.0625 \angle 360^\circ} $. Three of those are normal numbers: $ +1 $, $ -2.25 $ and $ +5.0625 $, lying neatly on the real number line. The other two are new numbers conjured up from the void.
      
    

    

      $$ +135^\circ $$
      $$ +45^\circ $$
      $$ +225^\circ $$
      $$ +315^\circ $$

      
        $$ \times \class{green}{1 \angle 45^\circ} ... $$
      

      
        Let's examine this rotation more. We can pick $ 1 $ at a $ \class{green}{45^\circ} $ angle. Multiplying by a $ 1 $ probably shouldn't change a vector's length, which means we'd get a pure rotation effect.
      
    
    
    
      
        By multiplying by $ \class{green}{1 \angle 45^\circ} $, we can rotate in increments of $ 45^\circ $.
It takes 4 multiplications to go from $ +1 $, around the circle of ones, and back to the real number $ -1 $.
      
    

    
      
        And that's actually a remarkable thing, because it means our invented rule has created a square root of $ -1 $.
It's the number $ \class{green}{1 \angle 90^\circ} $.
      
    

    
      
        $ (\class{green}{1 \angle 90^\circ})^2 = \class{blue}{-1} $
      

      
      If we multiply it by itself, we end up at angle $ \class{green}{90} + \class{green}{90} = \class{blue}{180^\circ} $, which is $ \class{blue}{-1} $ on the real line.

      
    
    
    
      
      But actually, the same goes for $ \class{green}{1 \angle 270^\circ} $.
      
    

    
      
        $ (\class{green}{1 \angle 270^\circ})^2 = \class{blue}{-1} $
      

      
      When we multiply it by itself, we end up at angle $ \class{green}{270} + \class{green}{270} = \class{blue}{540^\circ} $. But because we went around the circle once, that's the same as rotating by $ \class{blue}{180^\circ} $. So that's also equal to $ \class{blue}{-1} $.
      
    

    
      $$ \pm180^\circ $$
      $$ 0^\circ $$
      $$ -90^\circ $$
      $$ +90^\circ $$

      $$ -135^\circ $$
      $$ -45^\circ $$
      $$ +135^\circ $$
      $$ +45^\circ $$

      
        $ (\class{green}{1 \angle -90^\circ})^2 = \class{blue}{-1} $
      

      
        Or we could think of $ +270^\circ $ as $ -90^\circ $, and rotate the other way. It works out just the same. This is quite remarkable: our rule is consistent no matter how many times we've looped around the circle.
      
    

    
      
        $ (\class{green}{1 \angle 90^\circ})^2 = \class{blue}{-1} $
      
      
        $ (\class{green}{1 \angle 270^\circ})^2 = \class{blue}{-1} $
      

      
        Either way, $ \class{blue}{-1} $ has two square roots, separated by $ 180^\circ $, namely $ \class{green}{1 \angle 90^\circ} $ and $ \class{green}{1 \angle 270^\circ} $.
This is analogous to how both $ 2 $ and $ -2 $ are square roots of $ 4 $.
      
    

    
      $$ \class{blue}{a} \cdot \class{green}{b} = \class{red}{c}$$
      
        Complex multiplication can then be summarized as: angles add up, lengths multiply, taking care to preserve clockwise and counterwise angles. Above, we multiply two random complex numbers a and b to get c.
      
    

    
      $$ \class{blue}{a} \cdot \class{green}{b} = \class{red}{c}$$
      
        When we start changing the vectors, c turns along, being tugged by both a and b's angles. It wraps around the circle, while its length changes. Hence, complex numbers like to turn, and it's this rule that separates them from ordinary vectors.
      
    
    
     

    
      $$ \hspace{35 pt} + $$
      $$ - \hspace{35 pt} $$

      
        We can then picture the complex plane as a grid of concentric circles. There's a circle of ones, a circle of twos, a circle of one-and-a-halfs, etc. Each number comes in many different versions or flavors, one positive, one negative, and infinitely many others in between, at arbitrary angles on both sides of the circle.
      
    

    
      $$ \pm180^\circ $$
      $$ 0^\circ $$
      $$ +90^\circ $$

      $$ \hspace{15pt} \class{blue}{i} $$
      
        Which brings us to our reluctant and elusive friend, $ \class{blue}{i} $. This is the proper name for $ \class{blue}{1 \angle 90^\circ} $, and the way complex numbers are normally introduced: $ i^2 = -1 $. The magic is that we can put a complex number anywhere a real number goes, and the math still works out, oddly enough. We get complex answers about complex inputs.
      
    

    
      
        Complex numbers are then usually written as the sum of their (real) X coordinate, and their (imaginary) Y coordinate, much like ordinary 2D vectors. But this is misleading: the ugly number $ \class{red}{\frac{\sqrt{3}}{2} + \frac{1}{2}i } $ is actually just $ \class{green}{1 \angle 30^\circ} $ in disguise, and it acts more like a $ 1 $ than a $ \frac{1}{2} $ or $ \frac{\sqrt{3}}{2} $. While knowing how to convert between the two is required for any real calculations, you can cheat by doing it visually.
      
    

    
      $$ \pm180^\circ $$
      $$ 0^\circ $$
      $$ -90^\circ $$
      $$ +90^\circ $$

      $$ -135^\circ $$
      $$ -45^\circ $$
      $$ +135^\circ $$
      $$ +45^\circ $$

      



$$ \class{blue}{+1} $$
      $$ \hspace{55pt}\class{green}{+i} $$
      $$ \class{blue}{-1} $$




      $$ \class{green}{-i}\hspace{55pt} $$

      
        But looking at individual vectors only gets us so far. We study functions of real numbers by looking at a graph that shows us every output for every input. To do the same for complex numbers, we need to understand how these numbers-that-like-to-turn, this field of vectors, change as a whole.

        Note: from now on, I'll put $ +1 $, i.e. $ 0^\circ $ at the 12 o'clock position for simplicity.
      
    

    
      
        When we apply a square root, each vector shifts. But really, it's the entire fabric of the complex plane that's warping. Each circle has been squeezed into a half-circle, because all the angles have been halved—the opposite of squaring, i.e. doubling the angle. The lengths have had a normal square root applied to them, compressing the grid at the edges and bulging it in the middle.
      
    

    
      
        But remember how every number had two opposite square roots? This comes from the circular nature of complex math. If we take a vector and rotate it $ 360 ^\circ $, we end up in the same place, and the two vectors are equal. But after dividing the angles in half, those two vectors are now separated by only $ 180 ^\circ $ and lie on opposite ends of the circle. In complex math, they can both emerge.
      
    

    
      
        Complex operations are then like folding or unfolding a piece of paper, only it's weird and stretchy and circular. This can be hard to grasp, but is easier to see in motion. To help see what's going on, I've cut the disc and separated the positive from the negative angles in 3D.
      
    

    
      
        When we square our numbers to undo the square root, the angles double, folding the plane in on itself. The lengths are also squared, restoring the grid spacing to normal.
      
    

    
      
        After squaring, each square root has now ended up on top of its identical twin, and we can merge everything back down to a flat plane. Everything matches up perfectly.
      
    
    
    
      
        Thus the square root actually looks like this. New numbers flow in from the 'far side' as we try and shear the disc apart. The complex plane is stubborn and wants to stay connected, and will fold and unfold to ensure this is always the case. This is one of its most remarkable properties. 
      
    

    
      
        There's no limit to this folding or unfolding. If we take every number to the fourth power, angles are multiplied by four, while lengths are taken to the fourth power. This results in 4 copies of the plane being folded into one.
      
    

    
      
        However, things are not always so neat. What happens if we were to take everything to an irrational power, say $ \frac{1}{\sqrt{2}} $? Angles get multiplied by $ 0.707106... $, which means a rotation of $ 360^\circ $ now becomes $ \sim 254.56^\circ $.
      
    

    
      
        Because no multiple of $ 360 $ is divisible by $ \frac{1}{\sqrt{2}} $, the circular grid never matches up with itself again no matter how far we extend it. Hence, this operation splits a single unique complex number into an infinite amount of distinct copies.
      
    

    
      
        For any irrational power $ p $, there are an infinite number of solutions to $ z^p = c $, all lying on a circle. For a hint as to why this is so, we can look at Taylor series: an arbitrary function $ f(z) $ can be written as an infinite sum $ a + bz + cz^2 + dz^3 + ... \,$ When z is complex, such a sum doesn't just represent a finite amount of folds, but a mindboggling infinite origami of complex space.
      
    
  

  







We've seen how complex numbers are arrows that like to turn, which can be made to behave like numbers: we can add and multiply them, because we can come up with a consistent rule for doing so. We've also seen what powers of complex numbers look like: we fold or unfold the entire plane by multiplying or dividing angles, while simultaneously applying a power to the lengths.







Pulling a Dragon out of a Hat

With a basic grasp of what complex numbers are and how they move, we can start making Julia fractals.

At their heart lies the following function:

$$ f(z) = z^2 + c $$

This says: map the complex number $ z $ onto its square, and then add a constant number to it. To generate a Julia fractal, we have to apply this formula repeatedly, feeding the result back into $ f $ every time.

$$ z_{n+1} = (z_n)^2 + c $$

We want to examine how $ z_n $ changes when we plug in different starting values for $ z_1 $ and iterate $ n $ times. So let's try that and see what happens.





  
    
  

  
    
      Our region of interest is the disc of complex numbers less than $ 2 $ in length. I've marked the circle of ones as a reference.
    

    
      We take an arbitrary set of numbers, like this grid, and start applying the formula $ f(z) = z^2 + c $ to each. Rather than use vectors, I'll just draw points, to avoid cluttering the diagram.
    

    
      First we square each number. That is, their lengths are squared, their angles are doubled.
The squaring has a dual effect: numbers larger than $ 1 $ grow bigger and are pushed outwards, numbers less than $ 1 $ grow smaller and are pulled inwards.
    

    
      Next, we reset the grid back to neutral, keeping the numbers in their new place.
We also pick a random value for the constant $ \class{green}{c} $, e.g. $ \class{green}{0.57 \angle -59^\circ} $.
    

    
      Now we add $ \class{green}{c} $ to each point, completing one round of Julia iteration, $ f(z) = z^2 + c $. As a result, some numbers have ended up closer towards the origin (i.e. $ 0 $), others further away from it. The combination of folding + shifting has had a non-obvious effect on the numbers.
    

    
      We begin the second iteration and square each number again. Any number not inside the critical circle of $ 1 $ in the middle will get pushed out again. The other numbers continue to linger in the middle.
    

    
      If we zoom out, we can see the larger numbers are spiralling outwards and are permanently lost. The minor nudge by $ \class{green}{c} $ won't be enough to bring them back.
    

    
      Others remain in the middle, being drawn in, but are also at risk of being pushed out of the circle by $ \class{green}{c} $.
    

    
      Resetting the grid again, we add the same value $ \class{green}{c} $ to our vectors again to finish. At this point, our original grid of numbers has been completely jumbled up.
      
    
    
    
      
        If we continued this process would any numbers remain in the middle? Or would they eventually all get flung out? Unfortunately it's very hard to see what's going on while iterating forwards, because we lose track of where each point came from.
    

    
      So we're going to go backwards instead. We'll establish a safe-zone of all numbers less than $ 2 $, forming a solid disc of all those which aren't irretrievably lost. We want to know where all these numbers can possibly come from. To help track these points, I've coloured one area in a different shade.
    

    
      First we have to shift the numbers again, this time in the opposite direction to subtract $ c $.
    

    
      Now we apply the square root to find $ z_{n-1} = \pm \sqrt{z_n - c} $, which is a Julia iteration in reverse.
    

    
      After one backwards iteration, the disc has been squished down into an oval at an angle.
These are all the points that will definitely stay in the middle after one iteration.
    

    
      When we apply the second iteration, a pattern starts to develop. Because of the repeated unfolding, we create two bulges wherever there was previously only one.
    

    
      At the same time, the square root alters the length of each number as well. As a result, we squeeze in the radial direction, scaling down earlier features as they combine with newly created ones.
    

    
      After 4 iterations, we start to see the first hints of self-similarity. The shape's lobes are sprouting into spirals.
    
    
    
      But all we've really done is narrow down our blue safe-zone to include only those points that 'survive' up to 5 Julia iterations.
    

    
      
        Remarkably this seems to distort the fractal evenly: our highlighted circles don't stretch into ovals. This is not a coincidence. Complex operations are indeed stubborn, in that they all preserve right angles everywhere. To do so, the mapping must act like a pure scaling and rotation at every point, without shearing off in any particular direction. This is what allows the fractal to look like itself at different scales.
      
    

    
      
        Skipping ahead to iteration 12, we've definitely abandoned the realm of neat, traditional geometry.

        Despite curving wildly, the total mapping $ z_{12} $ still has this property of evenness, which is properly referred to as a conformal mapping.
    

    
      After 128 iterations, we end up with this intricate dragon-like shape, approximating the safe zone for the true fractal map $ z_\infty $. The numbers that make up the blue area are the hardiest points that will survive the next 128 attempts on their life. All the others will definitely get flung out.
      
    

    
      Yet this complicated shape is merely the result of folding over and over again, adding a simple constant in between. If we perform a forwards Julia iteration, i.e. squaring and shifting, we see this shape matches up with itself, and looks identical before and after.
      
    

    
      For different values of $ \class{green}{c} $, the fractal morphs into other shapes. There's literally an infinite variety to discover. Some sets are made up of disconnected parts. In this case, $ |c| $ is large enough to push the solid disc away from the center in a single iteration, but not so far that some points can't fold back in. If $ |c| $ gets much larger, the set vanishes.
      
    

    
      For a smaller $ c $, Julia sets are solid. Even a small shift in the value of $ c $ can accumulate into a large difference. Here we zone in on some fluffy clouds right outside the 'solid zone'. Oddly enough, it seems when $ c $ is not inside of its own Julia set, the set is not solid. Note that in this case, 128 iterations is not sufficient: large solid patches remain, which would be divided further with more iterations.
      
    

    
      This area of fractal space is dubbed Seahorse Valley, for rather obvious reasons.
      
    

    
      Nearby, we find these jewel-like spirals.
      
    

    
      Buried deep inside, there are remarkable combinations of shapes, like this pearl necklace covered in something resembling palm trees.
      
    

    
      And we can even make snowflakes. The dramatic changes due to $ c $ reveal the chaotic nature of fractals. Mathematically, chaos occurs when even the tiniest change can accumulate and blow up to an arbitrarily large effect.
      
    

    
      If we change our iteration formula, for example to a fourth power $ f(z) = z^4 + c $, the entire shape changes. Because each iteration now turns one bulge into four, the resulting shape has four-fold rotational symmetry.
      
    

    
      Again, different values of $ \class{green}{c} $ make different shapes, precipitating dramatic changes.
      
    

    
      To understand the effect of $ c $ we need to make a Mandelbrot set. This is similar to a Julia set, but the formula is applied differently. We'll use $ z^2 + c $ again. Instead of different starting values $ z_1 $, we choose different values of $ c $ and start with $ z_1 = 0 $ every time. Because $ c $ is no longer constant, the mapping stops being a simple folding operation. Each iteration is now unique and not so easy to visualize.
        
    
    
    
        
        Because the Mandelbrot set traverses all possible values of $ c $ across its surface, it has a part of every associated Julia set in it. Around any number $ \class{green}{c} $ it looks like the Julia set which has that value as its constant. Here, we move towards the three-way cross at the bottom of the Mandelbrot set. The Julia set develops similar features.
      
    

    
        
        Where the Mandelbrot set is round and bulbous, the Julia set is too.
      
    

    
        
        The spirals and seahorses from earlier are located here. You can literally see the shapes on both sides of the valley evolving towards horseheads and spirals respectively. But the Mandelbrot set acts like a map to Julia sets in a much more direct way: anywhere the Mandelbrot set is filled in (blue), the corresponding Julia set is solid too. The white areas are values of $ c $ which create disconnected Julia sets.
      
    

    
        
        That the Mandelbrot set is a 'pixel-perfect' map of Julia sets is a big clue. It reflects that they're actually both slices of a single higher dimensional object. By viewing these slices as we travel through, we can get a vague idea of its shape and complexity. In this object, every point in the Mandelbrot set is connected to the center of the corresponding Julia set. Actually picturing this 4D object is a challenge.
    

    
      
        But like any fractal, the Mandelbrot set also contains copies of itself, buried inside its edge. This is just one of the many varied copies. As a result, deep Mandelbrot zooms can reach astonishing levels of beauty in complexity. This is best done with specialized software that can calculate with hundreds of digits of precision.
      
    

  

  







Making fractals is probably the least useful application of complex math, but it's an undeniably fascinating one. It also reveals the unique properties of complex operations, like conformal mapping, which provide a certain rigidity to the result.

However, in order to make complex math practical, we have to figure out how to tie it back to the real world.

Travelling without Moving

It's a good thing we don't have to look far to do so. Whenever we're describing wavelike phenomena, whether it's sound, electricity or subatomic particles, we're also interested in how the wave evolves and changes. Complex operations are eminently suited for this, because they naturally take place on circles. Numbers that oppose can cancel out, numbers in the same direction will amplify each other, just like two waves do when they meet. And by folding or unfolding, we can alter the frequency of a pattern, doubling it, halving it, or anything in between.

More complicated operations are used for example to model electromagnetic waves, whether they are FM radio, wifi packets or ADSL streams. This requires precise control of the frequencies you're generating and receiving. Doing it without complex numbers, well, it just sucks. So why use boring real numbers, when complex numbers can do the work for you?





  
    
  

  
    
      $$ w(x) = \sin(x) $$

      Take for example a sine wave $ w(x) $.
    

    
      
        $$
          w(x, t) = \sin(x - t) $$
        $$  \class{blue}{\frac{\partial w(x, t)}{\partial t}}
        $$
      

      For the wave to propagate across a distance, its values have to ripple up and down over time.
The rate of change over time is drawn on top. This is the vertical velocity at every point. Both the wave and its rates of change undergo a complicated numerical dance.
    

    
      
        $$
          w(x, t) = \sin(x - t) $$
        $$  \class{blue}{\frac{\partial w(x, t)}{\partial t}} \,\, \class{green}{\frac{\partial^2 w(x, t)}{\partial t^2}}
        $$
      

      But to properly describe this motion, we have to go one level deeper. We have to examine the rate of change of the vertical velocity of the wave. This is its vertical acceleration. We see that green vectors tug on blue vectors as blue vectors tug on the wave.
    

    
      
        $$
          w(x, t) = \sin(x - t) $$
        $$  \class{green}{\frac{\partial^2 w(x, t)}{\partial t^2}} = \,?
        $$
      

      It's easier to see what's going on if we center the vectors vertically. The acceleration appears to be equal but opposite to the wave itself.
    

    
      
        $$
          w(x, t) = \sin(x - t) + 1 $$
        $$ \class{green}{\frac{\partial^2 w(x, t)}{\partial t^2}} = \,?
        $$
      

      But that's just a lucky coincidence. If we shift the wave up by one unit, its opposite shifts down by a unit. Yet its velocity and acceleration are unaltered. So acceleration is not simply the opposite of the wave.
    

    

      What's actually going on is that the green vectors match the curvature of the wave, positive inside valleys, negative on top of crests. Intuitively, this can be explained by saying that waves tend to bounce towards an average level: this is going to pull the value up out of valleys and down from peaks.
    
    
    
      
        $$
          w(x, t) = \sin(x - t) + 1 $$
        $$  \class{green}{\frac{\partial^2 w(x, t)}{\partial t^2}} = \class{red}{\frac{\partial^2 w(x, t)}{\partial x^2}}
        $$
      

      
      But curvature is the rate of change of the slope, and slope is the rate of change over a distance. So to describe real waves, we need to relate 'second level' change over time and change over distance, each deriving twice. This is Complicated with a capital C.
    

    
      Let's try this with complex numbers instead. Until now, we had a 2D graph, showing the real value of the wave over real distance. We're going to make the wave's value complex. Mapping a 1D number (distance) to a 2D number (the wave function), means we need a 3D diagram.
    

    
      The complex plane is mapped into the old Y direction (real) and the new Z direction (imaginary).
    

    
      
        $$ w(x) = (1 \angle x) $$
      

      To make a complex wave, we do the thing complex numbers are best at: we make them turn, and make a helix. In this case, our wave function is simply the variable number $ 1 \angle x $ , a constant length with a smoothly changing rotation over distance.
    

    
      
        $$ w(x, t) = (1 \angle x) \cdot (1 \angle t) = 1 \angle (x + t) $$
      
      
        $$  \class{blue}{\frac{\partial w(x, t)}{\partial t}} = \,? $$
      

      To make the wave move, we can simply twist it in-place. Which we now know is the same as multiplying by an increasing angle $ 1 \angle t $. If we plot the complex velocity of each point, at first sight this might not look any simpler than the real wave. But in fact, these vectors are not changing in length at all, unlike the real version. As the wave is pulled by the velocity vectors, both undergo a pure rotation.
    

    
      
        $$  \class{blue}{\frac{\partial w(x, t)}{\partial t}} = i \cdot w(x, t)
        $$
      

      At all times, the velocity is offset by $ 90^\circ $ from the wave itself. And that means that described in complex numbers, wave equations are super easy. Instead of involving two derivatives, i.e. the rate of rate of change, we only need one. There is a direct relationship between a value and its rate of change. The necessary rotation by $ 90^\circ $ can then be written simply as multiplying by $ i $.
    

    

      To recover a real wave from a complex wave, we can simply flatten it back to 2D, discarding the imaginary part. By using complex numbers to describe waves, we give them the power to rotate in place without changing their amplitude, which turns out to be much simpler.
    

    

      
        $$ \frac{1}{2} (\class{blue}{ 1 \angle (x + t) } + \class{green}{ 1 \angle -(x + t) }) = \cos(x + t) $$
      

      In fact, flattening the wave has a perfectly reasonable complex interpretation: it's what happens when we average out a counter-clockwise wave (positive frequency) with a clockwise wave (negative frequency). By twisting each in opposite directions, the combined wave travels along, locked to the real number line.
    

    

      
        $$ \frac{1}{2} (\class{blue}{ 1 \angle (x + t) } + \class{green}{ 1 \angle -(\frac{3}{2}x + t) }) = \,? $$
      

      But if we add up two arbitrary complex frequencies, their sum immediately turns into a spirograph pattern that manages to evolve and propagate, even as it just rotates in place. Though the original waves both had a constant amplitude of $ 1 $, the relative differences in angles (i.e. the phase) allows them to cancel out in surprising ways.
    
    
    
      Neither curve is actually moving forward: they're just spinning in place, creating motion anyway. This is actually what quantum superposition looks like, where two or more complex probability waves combine and interfere. Where the result cancels out to zero, that's where two separate possible states are cancelling out each other, creating interference. That the underlying numbers are complex doesn't prevent them from describing real physics, indeed, it seems that's how nature actually works.
    

    
      This serene display hides a whirlwind of phase. We can plot the velocity of the two frequencies, and their combination, scaled down for clarity. Once again you can see the power of describing waves with complex numbers, letting you split up a complicated motion into simple, repetitive rotations… into numbers that like to turn.
    

  

  







The End Is Just The Beginning

In visualizing complex waves, we've seen functions that map real numbers to complex numbers, and back again. These can be graphed easily in 3D diagrams, from $ \mathbb{R} $ to $ \mathbb{C} $ or vice-versa. You cross 1 real dimension with the 2 dimensions of the complex plane.

But complex operations in general work from $ \mathbb{C} $ to $ \mathbb{C} $. To view these, unfortunately you need four-dimensional eyes, which nature has yet to provide. There are ways to project these graphs down to 3D that still somewhat make sense, but it never stops being a challenge to interpret them.

For every mathematical concept that we have a built-in intuition for, there are countless more we can't picture easily. That's the curse of mathematics, yet at the same time, also its charm.

Hence, I tried to stick to the stuff that is (somewhat!) easy to picture. If there's interest, a future post could cover topics like: the nature of $ e^{ix} $, Fourier transforms, some actual quantum mechanics, etc.

For now, this story is over. I hope I managed to spark some light bulbs here and there, and that you enjoyed reading it as much as I did making it.

Comments, feedback and corrections are welcome on Google Plus. Diagrams powered by MathBox.

More like this: To Infinity… And Beyond!.

For extra credit: check out these great stirring visualizations of Julia and Mandelbrot sets. I incorporated a similar graphic above. Hat tip to Tim Hutton for pointing these out. And for some actual paper mathematical origami, check out Vihart's latest video on Snowflakes, Starflakes and Swirlflakes.






Making MathBox
2012-11-14T00:00:00+01:00


  Presentation-Quality Math with Three.js and WebGL






  
  A fun little graph involving rational functions on the real projective line.




  For most of my life, I've found math to be a visual experience. My math scores went from crap to great once I started playing with graphics code, found some demoscene tutorials, and realized I could reason about formulas by picturing the graphs they create. I could apply operators by learning how they morph, shift, turn and fold those graphs and create symmetries. I could remember equations and formulas more easily when I could layer on top the visual relationships they embody. I was less likely to make mistakes when I could augment the boring symbolic manipulation with a mental set of visual cross-checks.

  So, when tasked with holding a conference talk on how to make things out of math at Full Frontal—later redone at Web Directions Code—I knew the resulting presentation would have to consist of intricate visualizations as the main draw, with whatever I had to say as mere glue to hold it together.

  The problem was, I didn't know of a good tool to do so, and creating animations by hand would probably be too time consuming. With the writings of Paul Lockhart and Bret Victor firmly in mind, I also knew I wanted to start blogging more about mathematical concepts in a non-traditional way, showing the principles of calculus, analysis and algebra the way I learnt to see them in my head, rather than through the obscure symbols served up in engineering school.

  So I set out to create that tool, keeping in mind the most important lesson I've picked up as a web developer: one cannot overstate the value in being able to send someone a link and have it just work, right there. It was obvious it would have to be browser-based.

  2015 Update: MathBox has evolved, version 2 is now available.




  
    
    
    Presentation Video
(updated)
    
  



  
    
    
    Slide Deck

    (updated)
    
  




  Choose your Poison

  Now, when people think of graphs in a browser, the natural thought is vector graphics and SVG, which quickly leads to visualization powerhouse d3.js. It really is an amazing piece of tech with a vast library of useful code to accompany it. When I wrapped my head around how d3's enter/exit selections are implemented and how little it actually does to achieve so much, I was blown away. It's just so elegant and simple.

  Unfortunately, d3's core is intricately tied to the DOM through SVG and CSS. And that means ironically that d3 is not really capable of 3D. Additionally, d3 is a power tool that makes no assumptions: it is up to you to choose which visual elements and techniques to use to make your diagrams, and as such it is more like assembly language for graphs than a drop-in tool. These two were show stoppers.

  For one, manually designing layouts, grids, axes, etc. every time is tedious. You should be able to drop in a mathematical expression with as little fanfare as possible and have it come out looking right. This includes sane defaults for transitions and animations.




  




  For another, I've found that, when in doubt, adding an extra dimension always helps. The moment I finally realized that every implicit graph in N dimensions is really just a slice of an explicit one in N+1 dimensions, a ridiculous amount of things clicked together. And it took until years after studying signal processing to at long last discover the 4D picture of complex exponentiation that tied the entire thing together (projected into 3D below): it revealed the famous "magic formula" involving e, i and π to be a meaningless symbological distraction, a pinhole view of a much larger, much more beautiful structure, underpinning every Fourier and Z transform I'd ever encountered.




  
    e^iπ = -1
  
  This particular formula is not that important.

  
    e^x+iy = e^x · e^iy = e^x ∠ y
    
  
  This one is (∠ = rotate by).
Unfortunately it has a four dimensional graph.
  



  




  So, WebGL it was, because I needed 3D. Unfortunately that meant the promise of having it just work everywhere was tempered by a lack of browser support, but I would certainly hope that's something we can overcome sooner than later. Dear Apple and Microsoft: get your shit together already. Dear Firefox and Opera: your WebGL performance could be a lot better.

  Shady Dealings

  These days I don't really touch WebGL without going through Three.js first. Three.js is a wonderful, mature engine that contains tons of useful high-level components. At the same time, it also does a great job in just handling the boilerplate of WebGL while not getting in the way of doing some heavy lifting yourself.

  Rendering vector-style graphics with WebGL is not hard, certainly easier than photorealistic 3D. Primitives like lines and points are sized in absolute pixels by default, and with hardware multisampling for anti-aliasing, you get somewhat decent image quality out of it. Though, as is typical for a Web API, we're treated like children and can only cross our fingers and request anti-aliasing politely, hoping it will be available. Meanwhile native developers have full control over speed and quality and can adjust their strategy to the specific hardware's capabilities. The more things change... And then Chrome decided to disable anti-aliasing altogether due to esoteric security issues with buggy drivers. Bah.

  Now, when rendering with WebGL, you really have two options. One is to just treat it as a dumb output layer, loading or generating all your geometry in JavaScript and rendering it directly in 3D. With the speed of JS engines today, this can get you pretty far.




  The second option is to leverage the GPU's own capabilities as much as possible, doing computations in GLSL through so-called vertex and fragment shader programs. These are run for every vertex in a mesh, every pixel being drawn, and have been the main force driving innovation in real-time graphics for the past decade. With the goal of butter-smooth 60fps graphical goodness, this seemed like the better choice.

  Unfortunately, GLSL shaders are rather monolithic things. While you do have the ability to create subroutines, every shader still has to be a stand-alone program with its own main() function. This means you either need to include a shader for every possible combination of operations, or generate shader code dynamically by concatenating pre-made snippets or using #ifdef switches to knock them out. This is the approach taken by Three.js, which results in some very hairy code that is neither easy to read nor easy to maintain.

  Having made a prototype, I knew I wanted to show continuous transitions between various coordinate systems (e.g. polar and spherical), knew I needed to render shaded and unshaded geometry, and knew I would need to slot in specific snippets for things like point sprites, bezier curves/surfaces, dynamic tick marks, and more. Sorting this all out Three.js-style would be a nightmare.





uniform sampler2D texture;
varying vec2 vUV;

void main() {
  gl_FragColor = texture2D(texture, vUV);
}


A pixel or fragment shader that looks up a pixel's color in a texture.


uniform mat4 projectionMatrix;
uniform mat4 modelViewMatrix;
attribute vec4 position;
attribute vec2 uv;
varying vec2 vUV;

void main() {
  vUV = uv;

  gl_Position = projectionMatrix
              * modelViewMatrix
              * position;
}



A vertex shader that projects a 3D position into 2D by applying two matrices. It also provides UV coordinates for the texture look up.






var graph =
  factory
    .snippet('split')
    .group()
      .snippet('top')
    .next()
      .snippet('middle')
    .next()
      .snippet('bottom')
    .combine()
    .snippet('join')
    .end();


ShaderGraph's factory API lets you build shader chains with very little hassle. In this case, the names refer to IDs of <script> tags in the source.





  So I wrote a library to solve that problem, called ShaderGraph.js. It is best described as a smart code-concatenator, a few steps short of writing a full blown compiler. You feed it snippets of GLSL code, each with one or more inputs and outputs, and these get parsed and turned into lego-like building blocks. Each input/output becomes an outlet, and outlets are wired up in a typical dataflow style. Given a graph of connected snippets, it can be compiled back into a program by assembling the subroutines, assigning intermediate variables and constructing an appropriate main() function to invoke them. It also exports a list of all external variables, i.e. GLSL uniforms and attributes, so you can control the program's behavior easily.

  If I'd stopped there however, I'd have just replaced the act of manual code writing with that of manually wiring graphs. So I applied the principle of convention-over-configuration instead: you tell ShaderGraph to connect two snippets, and it will automatically match up outlets by name and type. This is augmented by a chainable factory API, which allows you to pass a partially built graph around. It allows different classes to work together to build shaders, each inserting their own snippets into the processing chain.
  
For example, to render a Bezier surface, the vertex shader is composed of: cubic interpolation, viewport transform (position + tangents), normal calculation and lighting. When transforming to e.g. a polar viewport, the surface normals are seamlessly recalculated. It really works like magic and I can't wait to use this in my next WebGL projects.




  
  Viewports, Primitives and Renderables

  At its core, Three.js matches pretty directly with WebGL. You can insert objects such as a Mesh, Line or ParticleSystem into your scene, which invokes a specific GL drawing command with high efficiency. As such, I certainly didn't want to reinvent the wheel.

  Hence, MathBox is set up as a sort of scene-manager-within-a-scene-manager. It's a little sandbox that speaks the language of math, allowing you to insert various primitives like curves, vectors, axes and grids. Each of these primitives then instantiates one or more renderables, which simply wrap a native Three.js object and its associated ShaderGraph material. Thus, once instantiated, MathBox gets out of the way and Three.js does the heavy lifting as normal. You can even insert multiple mathboxen into a Three.js scene if you like, mixed in with other objects.



  For example, a vector primitive is rendered as an arrow: it consists of a shaft and an arrowhead, realized as a line segment and a cone. An axis primitive is an arrow as well, but it also has tick marks (specially transformed line segments), and is positioned implicitly just by specifying the axis' direction rather than a start and end point.

  To render curves and surfaces, you can either specify an array of data points or a live expression to be evaluated at every point. This turned out to be essential for the kinds of intricate visualizations I wanted to show, my slides being driven by timed clocks, shared arrays of data points, and live formulas and interpolations. I even fed in data from a physics engine, and it worked perfectly.

  This is all tied together through Viewport objects, which define a specific mapping from a mathematical coordinate space into the 3D world space of Three.js. For example, the default cartesian viewport has the range [–1, 1] in the X, Y and Z directions. Altering the viewport's extents will shift and scale anything rendered within, as well as reflow grids and tick marks on each axis.

  There are two more sophisticated viewport types, polar and spherical, which each apply the relevant coordinate transform, and can transition smoothly to and from cartesian. More viewport types can be added, all that is required is to define an appropriate transformation in JavaScript and GLSL. That said, defining a seamless transition to and from cartesian space is not always easy, particularly if you want to preserve the aspect-ratio through the entire process.

  Interpolate all the things!

  Finally, I had to tackle the problem of animation, keeping in mind a tip I learnt from the ever so mindbending Vihart: "If I can draw the point of a sentence, I don't actually need to say the sentence." This applies doubly so for animation: every time you replace a "before" and "after" with a smooth transition, your audience implicitly understands the change rather than having to go look for it.

  Hence, each primitive can be fully animated. Each has a set of options (controlling behavior) and styles (controlling GLSL shaders), and there is a universal animator that can interpolate between arbitrary data types in a smart fashion.

  For example, given a viewport with the XYZ range [[–1, 1], [–1, 1], [–1, 1]], you can tell it to animate to [[0, 2], [0, 1], [–3, 3]], and it just works. The animator will recursively animate each subarray's elements, and any dependent objects like grids and axes will reflow to match the intermediate values. This works for colors, vectors and matrices too. In case of live curves with custom expressions, the animator will invoke both the old and the new, and interpolate between the results.




  
    
      
    
    
      
      
      
      
      
      
      
    
  



  However, executing animations manually in code is tedious, particularly in a presentation, where you want to be able to step forward and backward. So I added a Director class whose job it is to coordinate things. All you do is feed it a script of steps (add this object, animate that object). Then, as it applies them, it remembers the previous state of each object and generates an automatic rollback script. It also contains logic to detect rapid navigation, and will hurry up animations appropriately. This avoids that agonizing situation of watching someone skip through their slide deck, playing the same cheesy PowerPoint transitions over and over again.

  Presenting Naturally
  
  With MathBox's core working, it was time to build my slides for the conference. After a quick survey, I quickly settled on deck.js as an HTML5 slidedeck solution that was clean and flexible enough for my purposes. However, while MathBox can be spawned inside any DOM element, it wouldn't work to insert a dozen live WebGL canvases into the presentation. The entire thing would grind to a halt or at least become very choppy.
  
  So instead, I integrated each MathBox graphic as an IFRAME, and added some logic that only loads each IFRAME one slide before it's needed, and unloads it one slide after it's gone off screen. To sync up with the main presentation, all deck.js navigation events were forwarded into each active IFRAME using window.postMessage. With the MathBox Director running inside, this was very easy to do, and meant that I could skip around freely during the talk, without any worries of desynchronization between MathBox and the associated HTML5 overlays.
  
  In fact, I applied a similar principle to this post. To avoid rendering all diagrams simultaneously and spinning up laptop fans more than necessary, each MathBox IFRAME is started as it scrolls into view and stopped once it's gone.
  
  I've also found that having a handheld clicker makes a huge difference while speaking—as it allows you to gesture freely and move around. So, I grabbed the infrared remote code from VLC and built a simple bridge from to Cocoa to Node.js to WebSocket to allow the remote to work in a browser. It's a shame Apple's decided to discontinue IR ports on their laptops. I guess I'll have to come up with a BlueTooth-based solution when I upgrade my hardware.

  Towards MathBox 1.0

  In its current state, MathBox is still a bit rough. The selection of primitives and viewports is limited, and only includes the ones I needed for my presentation. That said, it is obvious you can already do quite a lot with it, and I couldn't have been happier to hear that all this effort had the desired response at the conference. I wasn't 100% sure whether other people would have the same a-ha moments that I've had, but I'm convinced more than ever that seeing math in motion is essential for honing our intuition about it. MathBox not only makes animated diagrams much easier to make and share, but it also opens the door to making them interactive in the future.

  I plan to continue to evolve MathBox as needed by using it on this site and addressing gaps that come up, though I've already identified a couple of sore points:

  
  I used tQuery as a boilerplate and because I liked the idea of having a chainable API for this. However, this also means it's currently running off an outdated version of Three.js. I need to look into updating and/or dropping tQuery.
MathBox has been updated to Three.js r53.
  Numeric or text labels are completely unsupported. It should be possible to use my CSS3D renderer for Three.js to layer on beautifully typeset MathJax formulas, positioning them correctly in 3D on top of the WebGL render.
I've added labeling for axes. I've integrated MathJax, but it's tricky because the typesetting is painfully slow in the middle of a 60fps render. But it's automatically used if MathJax is present.
  All styles have to be specified on a per-object basis. Some form of stylesheet, default styles or class mechanism to allow re-use seems like an obvious next step.
  There are undoubtedly memory leaks, as I was focused first and foremost on getting it to work.
  Expressions that don't change frame-to-frame are still continuously re-evaluated, which is wasteful. There is a live: false flag you can set on objects, but it triggers a few bugs here and there.
  There needs to be a predictable, built-in way of running a clock per slide to sync custom expressions off of. In my presentation I used a hack of clocks that start once first invoked, but this lacks repeatability.
I added a director.clock() method that gives you a clock per slide. 
  

  Finally, it doesn't take much imagination to imagine a MathBox Editor that would allow you to build diagrams visually rather than having to use code like I did. However, that's a can of worms I'm not going to open by myself, especially because the API is already quite straightforward to use, and the library itself is still a bit in flux. Perhaps this could be done as an extension of the Three.js editor.

  You can see what MathBox is really capable of in the conference video. I invite you to play around with MathBox and see what you can make it do. Contributions are welcome, and the architecture is modular enough to allow its functionality to grow for quite some time.

Acko.net

Yak Shading

Data-Driven Geometry

Virtual Geometry

Vertex Party

Shader­Graph 2

Functional GLSL

Instanced Data Flow

GLSL Composer

$ cat *.glsl | magic

PowerPoint Must Die

PowerPoint Syndrome

Manifold Dreams

Geometry Streaming

(4-in-1)²

MathBox²

Part 2

I-Can't-Believe-It's-Not-React

One More Thing…

Full Stack

A DOM for Robots

Modelling Live Data

Keep it Simple

Base

Camera

Draw

Data

Operator

Overlay

Present

RTT

Shader

Text

Time

Transform

View

Model-View-Projection

It'll Do

Animate Your Way to Glory

Math and Physics in Motion

A Matter of Time

Customer is King

To Infinity… And Beyond!

Exploring the outer limits

The Shortest Disappearing Trick Ever

Achilles and the Tortoise

Breaking Away From Rationality

Drowning By Numbers

A Tug of War Between the Gods

Closing Thoughts

How to Fold a Julia Fractal

A tale of numbers that like to turn

Like Hands on a Clock

Pulling a Dragon out of a Hat

Travelling without Moving

The End Is Just The Beginning

Making MathBox

Presentation-Quality Math with Three.js and WebGL

Choose your Poison

Shady Dealings

Viewports, Primitives and Renderables

Interpolate all the things!

Presenting Naturally

Towards MathBox 1.0

ShaderGraph 2

`$ cat *.glsl | magic`