Acko.net

Stable Fiddusion

2023-10-02T00:00:00+02:00

Frequency-domain blue noise generator

In computer graphics, stochastic methods are so hot right now. All rendering turns into calculus, except you solve the integrals by numerically sampling them.

As I showed with Teardown, this is all based on random noise, hidden with a ton of spatial and temporal smoothing. For this, you need a good source of high quality noise. There have been a few interesting developments in this area, such as Alan Wolfe et al.'s Spatio-Temporal Blue Noise.

This post is about how I designed noise in frequency space. I will cover:

What is blue noise?
Designing indigo noise
How swap works in the frequency domain
Heuristics and analysis to speed up search
Implementing it in WebGPU

Along the way I will also show you some "street" DSP math. This illustrates how getting comfy in this requires you to develop deep intuition about complex numbers. But complex doesn't mean complicated. It can all be done on a paper napkin.

The WebGPU interface I built

What I'm going to make is this:

If properly displayed, this image should look eerily even. But if your browser is rescaling it incorrectly, it may not be exactly right.

Colorless Blue Ideas

I will start by just recapping the essentials. If you're familiar, skip to the next section.

Ordinary random generators produce uniform white noise: every value is equally likely, and the average frequency spectrum is flat.

Time domain

Frequency domain

To a person, this doesn't actually seem fully 'random', because it has clusters and voids. Similarly, a uniformly random list of coin flips will still have long runs of heads or tails in it occasionally.

What a person would consider evenly random is usually blue noise: it prefers to alternate between heads and tails, and avoids long runs entirely. It is 'more random than random', biased towards the upper frequencies, i.e. the blue part of the spectrum.

Time domain

Frequency domain

Blue noise is great for e.g. dithering, because when viewed from afar, or blurred, it tends to disappear. With white noise, clumps remain after blurring:

Blurred white noise

Blurred blue noise

Blueness is a delicate property. If you have e.g. 3D blue noise in a volume XYZ, then a single 2D XY slice is not blue at all:

XYZ spectrum

XY slice

XY spectrum

The samples are only evenly distributed in 3D, i.e. when you consider each slice in front and behind it too.

Blue noise being delicate means that nobody really knows of a way to generate it statelessly, i.e. as a pure function f(x,y,z). Algorithms to generate it must factor in the whole, as noise is only blue if every single sample is evenly spaced. You can make blue noise images that tile, and sample those, but the resulting repetition may be noticeable.

Because blue noise is constructed, you can make special variants.

Uniform Blue Noise has a uniform distribution of values, with each value equally likely. An 8-bit 256x256 UBN image will have each unique byte appear exactly 256 times.
Projective Blue Noise can be projected down, so that a 3D volume XYZ flattened into either XY, YZ or ZX is still blue in 2D, and same for X, Y and Z in 1D.
Spatio-Temporal Blue Noise (STBN) is 3D blue noise created specifically for use in real-time rendering:
- Every 2D slice XY is 2D blue noise
- Every Z row is 1D blue noise

This means XZ or YZ slices of STBN are not blue. Instead, it's designed so that when you average out all the XY slices over Z, the result is uniform gray, again without clusters or voids. This requires the noise in all the slices to perfectly complement each other, a bit like overlapping slices of translucent swiss cheese.

This is the sort of noise I want to generate.

Indigo STBN 64x64x16

XYZ spectrum

Sleep Furiously

A blur filter's spectrum is the opposite of blue noise: it's concentrated in the lowest frequencies, with a bump in the middle.

If you blur the noise, you multiply the two spectra. Very little is left: only the ring-shaped overlap, creating a band-pass area.

This is why blue noise looks good when smoothed, and is used in rendering, with both spatial (2D) and temporal smoothing (1D) applied.

Blur filters can be designed. If a blur filter is perfectly low-pass, i.e. ~zero amplitude for all frequencies > $ f_{\rm{lowpass}} $ , then nothing is left of the upper frequencies past a point.

If the noise is shaped to minimize any overlap, then the result is actually noise free. The dark part of the noise spectrum should be large and pitch black. The spectrum shouldn't just be blue, it should be indigo.

When people say you can't design noise in frequency space, what they mean is that you can't merely apply an inverse FFT to a given target spectrum. The resulting noise is gaussian, not uniform. The missing ingredient is the phase: all the frequencies need to be precisely aligned to have the right value distribution.

This is why you need a specialized algorithm.

The STBN paper describes two: void-and-cluster, and swap. Both of these are driven by an energy function. It works in the spatial/time domain, based on the distances between pairs of samples. It uses a "fall-off parameter" sigma to control the effective radius of each sample, with a gaussian kernel.

$$ E(M) = \sum E(p,q) = \sum \exp \left( - \frac{||\mathbf{p} - \mathbf{q}||^2}{\sigma^2_i}-\frac{||\mathbf{V_p} - \mathbf{V_q}||^{d/2}}{\sigma^2_s} \right) $$

STBN (Wolfe et al.)

The swap algorithm is trivially simple. It starts from white noise and shapes it:

Start with e.g. 256x256 pixels initialized with the bytes 0-255 repeated 256 times in order
Permute all the pixels into ~white noise using a random order
Now iterate: randomly try swapping two pixels, check if the result is "more blue"

This is guaranteed to preserve the uniform input distribution perfectly.

The resulting noise patterns are blue, but they still have some noise in all the lower frequencies. The only blur filter that could get rid of it all, is one that blurs away all the signal too. My 'simple' fix is just to score swaps in the frequency domain instead.

If this seems too good to be true, you should know that a permutation search space is catastrophically huge. If any pixel can be swapped with any other pixel, the number of possible swaps at any given step is O(N²). In a 256x256 image, it's ~2 billion.

The goal is to find a sequence of thousands, millions of random swaps, to turn the white noise into blue noise. This is basically stochastic bootstrapping. It's the bulk of good old fashioned AI, using simple heuristics, queues and other tools to dig around large search spaces. If there are local minima in the way, you usually need more noise and simulated annealing to tunnel over those. Usually.

This set up is somewhat simplified by the fact that swaps are symmetric (i.e. (A,B) = (B,A)), but also that applying swaps S1 and S2 is the same as applying swaps S2 and S1 as long as they don't overlap.

Good Energy

Let's take it one hurdle at a time.

It's not obvious that you can change a signal's spectrum just by re-ordering its values over space/time, but this is easy to illustrate.

Take any finite 1D signal, and order its values from lowest to highest. You will get some kind of ramp, approximating a sawtooth wave. This concentrates most of the energy in the first non-DC frequency:

Now split the odd values from the even values, and concatenate them. You will now have two ramps, with twice the frequency:

You can repeat this to double the frequency all the way up to Nyquist. So you have a lot of design freedom to transfer energy from one frequency to another.

In fact the Fourier transform has the property that energy in the time and frequency domain is conserved:

$$ \int_{-\infty}^\infty |f(x)|^2 \, dx = \int_{-\infty}^\infty |\widehat{f}(\xi)|^2 \, d\xi $$

This means the sum of $ |\mathrm{spectrum}_k|^2 $ remains constant over pixel swaps. We then design a target curve, e.g. a high-pass cosine:

$$ \mathrm{target}_k = \frac{1 - \cos \frac{k \pi}{n} }{2} $$

This can be fit and compared to the current noise spectrum to get the error to minimize.

However, I don't measure the error in energy $ |\mathrm{spectrum}_k|^2 $ but in amplitude $ |\mathrm{spectrum}_k| $. I normalize the spectrum and the target into distributions, and take the L2 norm of the difference, i.e. a sqrt of the sum of squared errors:

$$ \mathrm{error}_k = \frac{\mathrm{target}_k}{||\mathbf{target}||} - \frac{|\mathrm{spectrum}_k|}{||\mathbf{spectrum}||} $$ $$ \mathrm{loss}^2 = ||\mathbf{error}||^2 $$

This keeps the math simple, but also helps target the noise in the ~zero part of the spectrum. Otherwise, deviations near zero would count for less than deviations around one.

Go Banana

So I tried it.

With a lot of patience, you can make 2D blue noise images up to 256x256 on a single thread. A naive random search with an FFT for every iteration is not fast, but computers are.

Making a 64x64x16 with this is possible, but it's certainly like watching paint dry. It's the same number of pixels as 256x256, but with an extra dimension worth of FFTs that need to be churned.

Still, it works and you can also make 3D STBN with the spatial and temporal curves controlled independently:

Converged spectra

I built command-line scripts for this, with a bunch of quality of life things. If you're going to sit around waiting for numbers to go by, you have a lot of time for this...

Save and load byte/float-exact state to a .png, save parameters to .json
Save a bunch of debug viz as extra .pngs with every snapshot
Auto-save state periodically during runs
Measure and show rate of convergence every N seconds, with smoothing
Validate the histogram before saving to detect bugs and avoid corrupting expensive runs

I could fire up a couple of workers to start churning, while continuing to develop the code liberally with new variations. I could also stop and restart workers with new heuristics, continuing where it left off.

Protip: you can write C in JavaScript

Drunk with power, I tried various sizes and curves, which created... okay noise. Each has the exact same uniform distribution so it's difficult to judge other than comparing to other output, or earlier versions of itself.

To address this, I visualized the blurred result, using a [1 4 6 4 1] kernel as my base line. After adjusting levels, structure was visible:

Semi-converged

Blurred

The resulting spectra show what's actually going on:

Semi-converged

Blurred

The main component is the expected ring of bandpass noise, the 2D equivalent of ringing. But in between there is also a ton of redder noise, in the lower frequencies, which all remains after a blur. This noise is as strong as the ring.

So while it's easy to make a blue-ish noise pattern that looks okay at first glance, there is a vast gap between having a noise floor and not having one. So I kept iterating:

It takes a very long time, but if you wait, all those little specks will slowly twinkle out, until quantization starts to get in the way, with a loss of about 1/255 per pixel (0.0039).

Semi converged

Fully converged

The effect on the blurred output is remarkable. All the large scale structure disappears, as you'd expect from spectra, leaving only the bandpass ringing. That goes away with a strong enough blur, or a large enough dark zone.

The visual difference between the two is slight, but nevertheless, the difference is significant and pervasive when amplified:

Semi converged

Fully converged

Difference

Final spectrum

I tried a few indigo noise curves, with different % of the curve zero. The resulting noise is all extremely equal, even after a blur and amplify. The only visible noise left is bandpass, and the noise floor is so low it may as well not be there.

As you make the black exclusion zone bigger, the noise gets concentrated in the edges and corners. It becomes a bit more linear and squarish, a contender for violet noise. This is basically a smooth evolution towards a pure pixel checkboard in the limit. Using more than 50% zero seems inadvisable for this reason:

Time domain

Frequency domain

At this point the idea was validated, but it was dog slow. Can it be done faster?

Spatially Sparse

An FFT scales like O(N log N). When you are dealing with images and volumes, that N is actually an N² or N³ in practice.

The early phase of the search is the easiest to speed up, because you can find a good swap for any pixel with barely any tries. There is no point in being clever. Each sub-region is very non-uniform, and its spectrum nearly white. Placing pixels roughly by the right location is plenty good enough.

You might try splitting a large volume into separate blocks, and optimize each block separately. That wouldn't work, because all the boundaries remain fully white. Overlapping doesn't fix this, because they will actively create new seams. I tried it.

What does work is a windowed scoring strategy. It avoids a full FFT for the entire volume, and only scores each NxN or NxNxN region around each swapped point, with N-sized FFTs in each dimension. This is enormously faster and can rapidly dissolve larger volumes of white noise into approximate blue even with e.g. N = 8 or N = 16. Eventually it stops improving and you need to bump the range or switch to a global optimizer.

Here's the progression from white noise, to when sparse 16x16 gives up, followed by some additional 64x64:

Time domain

Frequency domain

Time domain

Frequency domain

Time domain

Frequency domain

A naive solution does not work well however. This is because the spectrum of a subregion does not match the spectrum of the whole.

The Fourier transform assumes each signal is periodic. If you take a random subregion and forcibly repeat it, its new spectrum will have aliasing artifacts. This would cause you to consistently misjudge swaps.

To fix this, you need to window the signal in the space/time-domain. This forces it to start and end at 0, and eliminates the effect of non-matching boundaries on the scoring. I used a smoothStep window because it's cheap, and haven't needed to try anything else:

16x16 windowed data

$$ w(t) = 1 - (3|t|^2 - 2|t|^3) , t=-1..1 $$

This still alters the spectrum, but in a predictable way. A time-domain window is a convolution in the frequency domain. You don't actually have a choice here: not using a window is mathematically equivalent to using a very bad window. It's effectively a box filter covering the cut-out area inside the larger volume, which causes spectral ringing.

The effect of the chosen window on the target spectrum can be modeled via convolution of their spectral magnitudes:

$$ \mathbf{target}' = |\mathbf{target}| \circledast |\mathcal{F}(\mathbf{window})| $$

This can be done via the time domain as:

$$ \mathbf{target}' = \mathcal{F}(\mathcal{F}^{-1}(|\mathbf{target}|) \cdot \mathcal{F}^{-1}(|\mathcal{F}(\mathbf{window})|)) $$

Note that the forward/inverse Fourier pairs are not redundant, as there is an absolute value operator in between. This discards the phase component of the window, which is irrelevant.

Curiously, while it is important to window the noise data, it isn't very important to window the target. The effect of the spectral convolution is small, amounting to a small blur, and the extra error is random and absorbed by the smooth scoring function.

The resulting local loss tracks the global loss function pretty closely. It massively speeds up the search in larger volumes, because the large FFT is the bottleneck. But it stops working well before anything resembling convergence in the frequency-domain. It does not make true blue noise, only a lookalike.

The overall problem is still that we can't tell good swaps from bad swaps without trying them and verifying.

Sleight of Frequency

So, let's characterize the effect of a pixel swap.

Given a signal [A B C D E F G H], let's swap C and F.

Swapping the two values is the same as adding F - C = Δ to C, and subtracting that same delta from F. That is, you add the vector:

V = [0 0 Δ 0 0 -Δ 0 0]

This remains true if you apply a Fourier transform and do it in the frequency domain.

To best understand this, you need to develop some intuition around FFTs of Dirac deltas.

Consider the short filter kernel [1 4 6 4 1]. It's a little known fact, but you can actually sight-read its frequency spectrum directly off the coefficients, because the filter is symmetrical. I will teach you.

The extremes are easy:

The DC amplitude is the sum 1 + 4 + 6 + 4 + 1 = 16
The Nyquist amplitude is the modulated sum 1 - 4 + 6 - 4 + 1 = 0

So we already know it's an 'ideal' lowpass filter, which reduces the Nyquist signal +1, -1, +1, -1, ... to exactly zero. It also has 16x DC gain.

Now all the other frequencies.

First, remember the Fourier transform works in symmetric ways. Every statement "____ in the time domain = ____ in the frequency domain" is still true if you swap the words time and frequency. This has lead to the grotesquely named sub-field of cepstral processing where you have quefrencies and vawes, and it kinda feels like you're having a stroke. The cepstral convolution filter from earlier is called a lifter.

Usually cepstral processing is applied to the real magnitude of the spectrum, i.e. $ |\mathrm{spectrum}| $, instead of its true complex value. This is a coward move.

So, decompose the kernel into symmetric pairs:

[· · 6 · ·]
[· 4 · 4 ·]
[1 · · · 1]

All but the first row is a pair of real Dirac deltas in the time domain. Such a row is normally what you get when you Fourier transform a cosine, i.e.:

$$ \cos \omega = \frac{\mathrm{e}^{i\omega} + \mathrm{e}^{-i\omega}}{2} $$

A cosine in time is a pair of Dirac deltas in the frequency domain. The phase of a (real) cosine is zero, so both its deltas are real.

Now flip it around. The Fourier transform of a pair [x 0 0 ... 0 0 x] is a real cosine in frequency space. Must be true. Each new pair adds a new higher cosine on top of the existing spectrum. For the central [... 0 0 x 0 0 ...] we add a DC term. It's just a Fourier transform in the other direction:

|FFT([1 4 6 4 1])| =

  [· · 6 · ·] => 6 
  [· 4 · 4 ·] => 8 cos(ɷ)
  [1 · · · 1] => 2 cos(2ɷ)
  
 = |6 + 8 cos(ɷ) + 2 cos(2ɷ)|

Normally you have to use the z-transform to analyze a digital filter. But the above is a shortcut. FFTs and inverse FFTs do have opposite phase, but that doesn't matter here because cos(ɷ) = cos(-ɷ).

This works for the symmetric-even case too: you offset the frequencies by half a band, ɷ/2, and there is no DC term in the middle:

|FFT([1 3 3 1])| =

  [· 3 3 ·] => 6 cos(ɷ/2)
  [1 · · 1] => 2 cos(3ɷ/2)

 = |6 cos(ɷ/2) + 2 cos(3ɷ/2)|

So, symmetric filters have spectra that are made up of regular cosines. Now you know.

For the purpose of this trick, we centered the filter around $ t = 0 $. FFTs are typically aligned to array index 0. The difference between the two is however just phase, so it can be disregarded.

What about the delta vector [0 0 Δ 0 0 -Δ 0 0]? It's not symmetric, so we have to decompose it:

V1 = [· · · · · Δ · ·]
V2 = [· · Δ · · · · ·]

V = V2 - V1

Each is now an unpaired Dirac delta. Each vector's Fourier transform is a complex wave $ Δ \cdot \mathrm{e}^{-i \omega k} $ in the frequency domain (the k'th quefrency). It lacks the usual complementary oppositely twisting wave $ Δ \cdot \mathrm{e}^{i \omega k} $, so it's not real-valued. It has constant magnitude Δ and varying phase:

FFT(V1) = [Δ
Δ
Δ
Δ
Δ
Δ
Δ
Δ]
FFT(V2) = [Δ
Δ
Δ
Δ
Δ
Δ
Δ
Δ
]

These are vawes.

The effect of a swap is still just to add FFT(V), aka FFT(V2) - FFT(V1) to the (complex) spectrum. The effect is to transfer energy between all the bands simultaneously. Hence, FFT(V1) and FFT(V2) function as a source and destination mask for the transfer.

However, 'mask' is the wrong word, because the magnitude of $ \mathrm{e}^{i \omega k} $ is always 1. It doesn't have varying amplitude, only varying phase. -FFT(V1) and FFT(V2) define the complex direction in which to add/subtract energy.

When added together their phases interfere constructively or destructively, resulting in an amplitude that varies between 0 and 2Δ: an actual mask. The resulting phase will be halfway between the two, as it's the sum of two equal-length complex numbers.

FFT(V) = [·
Δ
Δ
Δ
Δ
Δ
Δ
Δ
]

For any given pixel A and its delta FFT(V1), it can pair up with other pixels B to form N-1 different interference masks FFT(V2) - FFT(V1). There are N(N-1)/2 unique interference masks, if you account for (A,B) (B,A) symmetry.

Worth pointing out, the FFT of the first index:

FFT([Δ 0 0 0 0 0 0 0]) = [Δ Δ Δ Δ Δ Δ Δ Δ]

This is the DC quefrency, and the fourier symmetry continues to work. Moving values in time causes the vawe's quefrency to change in the frequency domain. This is the upside-down version of how moving energy to another frequency band causes the wave's frequency to change in the time domain.

What's the Gradient, Kenneth?

Using vectors as masks... shifting energy in directions... this means gradient descent, no?

Well.

It's indeed possible to calculate the derivative of your loss function as a function of input pixel brightness, with the usual bag of automatic differentiation/backprop tricks. You can also do it numerically.

But, this doesn't help you directly because the only way you can act on that per-pixel gradient is by swapping a pair of pixels. You need to find two quefrencies FFT(V1) and FFT(V2) which interfere in exactly the right way to decrease the loss function across all bad frequencies simultaneously, while leaving the good ones untouched. Even if the gradient were to help you pick a good starting pixel, that still leaves the problem of finding a good partner.

There are still O(N²) possible pairs to choose from, and the entire spectrum changes a little bit on every swap. Which means new FFTs to analyze it.

Random greedy search is actually tricky to beat in practice. Whatever extra work you spend on getting better samples translates into less samples tried per second. e.g. Taking a best-of-3 approach is worse than just trying 3 swaps in a row. Swaps are almost always orthogonal.

But random() still samples unevenly because it's white noise. If only we had.... oh wait. Indeed if you already have blue noise of the right size, you can use that to mildly speed up the search for more. Use it as a random permutation to drive sampling, with some inevitable whitening over time to keep it fresh. You can't however use the noise you're generating to accelerate its own search, because the two are highly correlated.

What's really going on is all a consequence of the loss function.

Given any particular frequency band, the loss function is only affected when its magnitude changes. Its phase can change arbitrarily, rolling around without friction. The complex gradient must point in the radial direction. In the tangential direction, the partial derivative is zero.

The value of a given interference mask FFT(V1) - FFT(V2) for a given frequency is also complex-valued. It can be projected onto the current phase, and split into its radial and tangential component with a dot product.

The interference mask has a dual action. As we saw, its magnitude varies between 0 and 2Δ, as a function of the two indices k1 and k2. This creates a window that is independent of the specific state or data. It defines a smooth 'hash' from the interference of two quefrency bands.

But its phase adds an additional selection effect: whether the interference in the mask is aligned with the current band's phase: this determines the split between radial and tangential. This defines a smooth phase 'hash' on top. It cycles at the average of the two quefrencies, i.e. a different, third one.

Energy is only added/subtracted if both hashes are non-zero. If the phase hash is zero, the frequency band only turns. This does not affect loss, but changes how each mask will affect it in the future. This then determines how it is coupled to other bands when you perform a particular swap.

Note that this is only true differentially: for a finite swap, the curvature of the complex domain comes into play.

The loss function is actually a hyper-cylindrical skate bowl you can ride around. Just the movement of all the bands is tied together.

Frequency bands with significant error may 'random walk' freely clockwise or counterclockwise when subjected to swaps. A band can therefor drift until it gets a turn where its phase is in alignment with enough similar bands, where the swap makes them all descend along the local gradient, enough to counter any negative effects elsewhere.

In the time domain, each frequency band is a wave that oscillates between -1...1: it 'owns' some of the value of each pixel, but there are places where its weight is ~zero (the knots).

So when a band shifts phase, it changes how much of the energy of each pixel it 'owns'. This allows each band to 'scan' different parts of the noise in the time domain. In order to fix a particular peak or dent in the frequency spectrum, the search must rotate that band's phase so it strongly owns any defect in the noise, and then perform a swap to fix that defect.

Thus, my mental model of this is not actually disconnected pixel swapping.

It's more like one of those Myst puzzles where flipping a switch flips some of the neighbors too. You press one pair of buttons at a time. It's a giant haunted dimmer switch.

We're dealing with complex amplitudes, not real values, so the light also has a color. Mechanically it's like a slot machine, with dials that can rotate to display different sides. The cherries and bells are the color: they determine how the light gets brighter or darker. If a dial is set just right, you can use it as a /dev/null to 'dump' changes.

That's what theory predicts, but does it work? Well, here is a (blurred noise) spectrum being late-optimized. The search is trying to eliminate the left-over lower frequency noise in the middle:

Semi converged

Here's the phase difference from the late stages of search, each a good swap. Left to right shows 4 different value scales:

x16

x256

x4096

x16

x256

x4096

At first it looks like just a few phases are changing, but amplification reveals it's the opposite. There are several plateaus. Strongest are the bands being actively modified. Then there's the circular error area around it, where other bands are still swirling into phase. Then there's a sharp drop-off to a much weaker noise floor, present everywhere. These are the bands that are already converged.

Compare to a random bad swap:

x16

x256

x4096

Now there is strong noise all over the center, and the loss immediately gets worse, as a bunch of amplitudes start shifting in the wrong direction randomly.

So it's true. Applying the swap algorithm with a spectral target naturally cycles through focusing on different parts of the target spectrum as it makes progress. This information is positionally encoded in the phases of the bands and can be 'queried' by attempting a swap.

This means the constraint of a fixed target spectrum is actually a constantly moving target in the complex domain.

Frequency bands that reach the target are locked in. Neither their magnitude nor phase changes in aggregate. The random walks of such bands must have no DC component... they must be complex-valued blue noise with a tiny amplitude.

Knowing this doesn't help directly, but it does explain why the search is so hard. Because the interference masks function like hashes, there is no simple pattern to how positions map to errors in the spectrum. And once you get close to the target, finding new good swaps is equivalent to digging out information encoded deep in the phase domain, with O(N²) interference masks to choose from.

Gradient Sampling

As I was trying to optimize for evenness after blur, it occurred to me to simply try selecting bright or dark spots in the blurred after-image.

This is the situation where frequency bands are in coupled alignment: the error in the spectrum has a relatively concentrated footprint in the time domain. But, this heuristic merely picks out good swaps that are already 'lined up' so to speak. It only works as a low-hanging fruit sampler, with rapidly diminishing returns.

Next I used the gradient in the frequency domain.

The gradient points towards increasing loss, which is the sum of squared distance $ (…)^2 $. So the slope is $ 2(…) $, proportional to distance to the goal:

$$ |\mathrm{gradient}_k| = 2 \cdot \left( \frac{|\mathrm{spectrum}_k|}{||\mathbf{spectrum}||} - \frac{\mathrm{target}_k}{||\mathbf{target}||} \right) $$

It's radial, so its phase matches the spectrum itself:

$$ \mathrm{gradient}_k = \mathrm{|gradient_k|} \cdot \left(1 ∠ \mathrm{arg}(\mathrm{spectrum}_k) \right) $$

Eagle-eyed readers may notice the sqrt part of the L2 norm is missing here. It's only there for normalization, and in fact, you generally want a gradient that decreases the closer you get to the target. It acts as a natural stabilizer, forming a convex optimization problem.

You can transport this gradient backwards by applying an inverse FFT. Usually derivatives and FFTs don't commute, but that's only when you are deriving in the same dimension as the FFT. The partial derivative here is neither over time nor frequency, but by signal value.

The resulting time-domain gradient tells you how fast the (squared) loss would change if a given pixel changed. The sign tells you whether it needs to become lighter or darker. In theory, a pixel with a large gradient can enable larger score improvements per step.

It says little about what's a suitable pixel to pair with though. You can infer that a pixel needs to be paired with one that is brighter or darker, but not how much. The gradient only applies differentially. It involves two pixels, so it will cause interference between the two deltas, and also with the signal's own phase.

The time-domain gradient does change slowly after every swap—mainly the swapping pixels—so this only needs to add an extra IFFT every N swap attempts, reusing it in between.

I tried this in two ways. One was to bias random sampling towards points with the largest gradients. This barely did anything, when applied to one or both pixels.

Then I tried going down the list in order, and this worked better. I tried a bunch of heuristics here, like adding a retry until paired, and a 'dud' tracker to reject known unpairable pixels. It did lead to some minor gains in successful sample selection. But beating random was still not a sure bet in all cases, because it comes at the cost of ordering and tracking all pixels to sample them.

All in all, it was quite mystifying.

Pair Analysis

Hence I analyzed all possible swaps (A,B) inside one 64x64 image at different stages of convergence, for 1024 pixels A (25% of total).

The result was quite illuminating. There are 2 indicators of a pixel's suitability for swapping:

% of all possible swaps (A,_) that are good
score improvement of best possible swap (A,B)

They are highly correlated, and you can take the geometric average to get a single quality score to order by:

The curve shows that the best possible candidates are rare, with a sharp drop-off at the start. Here the average candidate is ~1/3rd as good as the best, though every pixel is pairable. This represents the typical situation when you have unconverged blue-ish noise.

Order all pixels by their (signed) gradient, and plot the quality:

The distribution seems biased towards the ends. A larger absolute gradient at A can indeed lead to both better scores and higher % of good swaps.

Notice that it's also noisier at the ends, where it dips below the middle. If you order pixels by their quality, and then plot the absolute gradient, you see:

Selecting for large gradient at A will select both the best and the worst possible pixels A. This implies that there are pixels in the noise that are very significant, but are nevertheless currently 'unfixable'. This corresponds to the 'focus' described earlier.

By drawing from the 'top', I was mining the imbalance between the good/left and bad/right distribution. Selecting for a vanishing gradient would instead select the average-to-bad pixels A.

I investigated one instance of each: very good, average or very bad pixel A. I tried every possible swap (A, B) and plotted the curve again. Here the quality is just the actual score improvement:

The three scenarios have similar curves, with the bulk of swaps being negative. Only a tiny bit of the curve is sticking out positive, even in the best case. The potential benefit of a good swap is dwarfed by the potential harm of bad swaps. The main difference is just how many positive swaps there are, if any.

So let's focus on the positive case, where you can see best.

You can order by score, and plot the gradient of all the pixels B, to see another correlation.

It looks kinda promising. Here the sign matters, with left and right being different. If the gradient of pixel A is the opposite sign, then this graph is mirrored.

But if you order by (signed) gradient and plot the score, you see the real problem, caused by the noise:

The good samples are mixed freely among the bad ones, with only a very weak trend downward. This explains why sampling improvements based purely on gradient for pixel B are impossible.

You can see what's going on if you plot Δv, the difference in value between A and B:

For a given pixel A, all the good swaps have a similar value for B, which is not unexpected. Its mean is the ideal value for A, but there is a lot of variance. In this case pixel A is nearly white, so it is brighter than almost every other pixel B.

If you now plot Δv * -gradient, you see a clue on the left:

Almost all of the successful swaps have a small but positive value.

This represents what we already knew: the gradient's sign tells you if a pixel should be brighter or darker. If Δv has the opposite sign, the chances of a successful swap are slim.

Ideally both pixels 'face' the right way, so the swap is beneficial on both ends. But only the combined effect on the loss matters: i.e. Δv * Δgradient < 0.

It's only true differentially so it can misfire. But compared to blind sampling of pairs, it's easily 5-10x better and faster, racing towards the tougher parts of the search.

What's more... while this test is just binary, I found that any effort spent on trying to further prioritize swaps by the magnitude of the gradient is entirely wasted. Maximizing Δv * Δgradient by repeated sampling is counterproductive, because it selects more bad candidates on the right. Minimizing Δv * Δgradient creates more successful swaps on the left, but lowers the average improvement per step so the convergence is net slower. Anything more sophisticated incurs too much computation to be worth it.

It does have a limit. This is what it looks like when an image is practically fully converged:

Eventually you reach the point where there are only a handful of swaps with any real benefit, while the rest is just shaving off a few bits of loss at a time. It devolves back to pure random selection, only skipping the coin flip for the gradient. It is likely that more targeted heuristics can still work here.

The gradient also works in the early stages. As it barely changes over successive swaps, this leads to a different kind of sparse mode. Instead of scoring only a subset of pixels, simply score multiple swaps as a group over time, without re-scoring intermediate states. This lowers the success rate roughly by a power (e.g. 0.8 -> 0.64), but cuts the number of FFTs by a constant factor (e.g. 1/2). Early on this trade-off can be worth it.

Even faster: don't score steps at all. In the very early stage, you can easily get up to 80-90% successful swaps just by filtering on values and gradients. If you just swap a bunch in a row, there is a very good chance you will still end up better than before.

It works better than sparse scoring: using the gradient of your true objective approximately works better than using an approximate objective exactly.

The latter will miss the true goal by design, while the former continually re-aims itself to the destination despite inaccuracy.

Obviously you can mix and match techniques, and gradient + sparse is actually a winning combo. I've only scratched the surface here.

Warp Speed

Time to address the elephant in the room. If the main bottleneck is an FFT, wouldn't this work better on a GPU?

The answer to that is an unsurprising yes, at least for large sizes where the overhead of async dispatch is negligible. However, it would have been endlessly more cumbersome to discover all of the above based on a GPU implementation, where I can't just log intermediate values to a console.

After checking everything, I pulled out my bag of tricks and ported it to Use.GPU. As a result, the algorithm runs entirely on the GPU, and provides live visualization of the entire process. It requires a WebGPU-enabled browser, which in practice means Chrome on Windows or Mac, or a dev build elsewhere.

I haven't particularly optimized this—the FFT is vanilla textbook—but it works. It provides an easy ~8x speed up on an M1 Mac on beefier target sizes. With a desktop GPU, 128x128x32 and larger become very feasible.

It lacks a few goodies from the scripts, and only does gradient + optional sparse. You can however freely exchange PNGs between the CPU and GPU version via drag and drop, as long as the settings match.

Layout components

Compute components

Worth pointing out: this visualization is built using Use.GPU's HTML-like layout system. I can put div-like blocks inside a flex box wrapper, and put text beside it... while at the same time using raw WGSL shaders as the contents of those divs. These visualization shaders sample and colorize the algorithm state on the fly, with no CPU-side involvement other than a static dispatch. The only GPU -> CPU readback is for the stats in the corner, which are classic React and real HTML, along with the rest of the controls.

I can then build an component and drop it inside an async , and it does exactly what it should. The rest is just a handful of elements and the ordinary headache of writing compute shaders. ensures all the shaders are compiled before dispatching.

While running, the bulk of the tree is inert, with only a handful of reducers triggering on a loop, causing a mere 7 live components to update per frame. The compute dispatch fights with the normal rendering for GPU resources, so there is an auto-batching mechanism that aims for approximately 30-60 FPS.

The display is fully anti-aliased, including the pixelized data. I'm using the usual per-pixel SDF trickery to do this... it's applied as a generic wrapper shader for any UV-based sampler.

It's a good showcase that Use.GPU really is React-for-GPUs with less hassle, but still with all the goodies. It bypasses most of the browser once the canvas gets going, and it isn't just for UI: you can express async compute just fine with the right component design. The robust layout and vector plotting capabilities are just extra on top.

I won't claim it's the world's most elegant abstraction, because it's far too pragmatic for that. But I simply don't know any other programming environment where I could even try something like this and not get bogged down in GPU binding hell, or have to round-trip everything back to the CPU.

* * *

So there you have it: blue and indigo noise à la carte.

What I find most interesting is that the problem of generating noise in the time domain has been recast into shaping and denoising a spectrum in the frequency domain. It starts as white noise, and gets turned into a pre-designed picture. You do so by swapping pixels in the other domain. The state for this process is kept in the phase channel, which is not directly relevant to the problem, but drifts into alignment over time.

Hence I called it Stable Fiddusion. If you swap the two domains, you're turning noise into a picture by swapping frequency bands without changing their values. It would result in a complex-valued picture, whose magnitude is the target, and whose phase encodes the progress of the convergence process.

This is approximately what you get when you add a hidden layer to a diffusion model.

What I also find interesting is that the notion of swaps naturally creates a space that is O(N²) big with only N samples of actual data. Viewed from the perspective of a single step, every pair (A,B) corresponds to a unique information mask in the frequency domain that extracts a unique delta from the same data. There is redundancy, of course, but the nature of the Fourier transform smears it out into one big superposition. When you do multiple swaps, the space grows, but not quite that fast: any permutation of the same non-overlapping swaps is equivalent. There is also a notion of entanglement: frequency bands / pixels are linked together to move as a whole by default, but parts will diffuse into being locked in place.

Phase is kind of the bugbear of the DSP world. Everyone knows it's there, but they prefer not to talk about it unless its content is neat and simple. Hopefully by now you have a better appreciation of the true nature of a Fourier transform. Not just as a spectrum for a real-valued signal, but as a complex-valued transform of a complex-valued input.

During a swap run, the phase channel continuously looks like noise, but is actually highly structured when queried with the right quefrency hashes. I wonder what other things look like that, when you flip them around.

More:

Sub-pixel Distance Transform

2023-07-17T00:00:00+02:00

High quality font rendering for WebGPU

This page includes diagrams in WebGPU, which has limited browser support. For the full experience, use Chrome on Windows or Mac, or a developer build on other platforms.

In this post I will describe Use.GPU's text rendering, which uses a bespoke approach to Signed Distance Fields (SDFs). This was borne out of necessity: while SDF text is pretty common on GPUs, some of the established practice on generating SDFs from masks is incorrect, and some libraries get it right only by accident. So this will be a deep dive from first principles, about the nuances of subpixels.

SDFs

The idea behind SDFs is quite simple. To draw a crisp, anti-aliased shape at any size, you start from a field or image that records the distance to the shape's edge at every point, as a gradient. Lighter grays are inside, darker grays are outside. This can be a lower resolution than the target.

Then you increase the contrast until the gradient is exactly 1 pixel wide at the target size. You can sample it to get a perfectly anti-aliased opacity mask:

This works fine for text at typical sizes, and handles fractional shifts and scales perfectly with zero shimmering. It's also reasonably correct from a signal processing math point-of-view: it closely approximates averaging over a pixel-sized circular window, i.e. a low-pass convolution.

Crucially, it takes a rendered glyph as input, which means I can remain blissfully unaware of TrueType font specifics, and bezier rasterization, and just offload that to an existing library.

To generate an SDF, I started with MapBox's TinySDF library. Except, what comes out of it is wrong:

The contours are noticeably wobbly and pixelated. The only reason the glyph itself looks okay is because the errors around the zero-level are symmetrical and cancel out. If you try to dilate or contract the outline, which is supposed to be one of SDF's killer features, you get ugly gunk.

Compare to:

The original Valve paper glosses over this aspect and uses high resolution inputs (4k) for a highly downscaled result (64). That is not an option for me because it's too slow. But I did get it to work. As a result Use.GPU has a novel subpixel-accurate distance transform (ESDT), which even does emoji. It's a combination CPU/GPU approach, with the CPU generating SDFs and the GPU rendering them, including all the debug viz.

The Classic EDT

The common solution is a Euclidean Distance Transform. Given a binary mask, it will produce an unsigned distance field. This holds the squared distance d² for either the inside or outside area, which you can sqrt.

Like a Fourier Transform, you can apply it to 2D images by applying it horizontally on each row X, then vertically on each column Y (or vice versa). To make a signed distance field, you do this for both the inside and outside separately, and then combine the two as inside – outside or vice versa.

The algorithm is one of those clever bits of 80s-style C code which is O(N), has lots of 1-letter variable names, and is very CPU cache friendly. Often copy/pasted, but rarely understood. In TypeScript it looks like this, where array is modified in-place and f, v and z are temporary buffers up to 1 row/column long. The arguments offset and stride allow the code to be used in either the X or Y direction in a flattened 2D array.

for (let q = 1, k = 0, s = 0; q < length; q++) {
  f[q] = array[offset + q * stride];

  do {
    let r = v[k];
    s = (f[q] - f[r] + q * q - r * r) / (q - r) / 2;
  } while (s <= z[k] && --k > -1);

  k++;
  v[k] = q;
  z[k] = s;
  z[k + 1] = INF;
}

for (let q = 0, k = 0; q < length; q++) {
  while (z[k + 1] < q) k++;
  let r = v[k];
  let d = q - r;
  array[offset + q * stride] = f[r] + d * d;
}

To explain what this code does, let's start with a naive version instead.

Given a 1D input array of zeroes (filled), with an area masked out with infinity (empty):

O = [·, ·, ·, 0, 0, 0, 0, 0, ·, 0, 0, 0, ·, ·, ·]

Make a matching sequence … 3 2 1 0 1 2 3 … for each element, centering the 0 at each index:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14] + ∞
[1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13] + ∞
[2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12] + ∞
[3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11] + 0
[4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10] + 0
[5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] + 0
[6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8] + 0
[7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7] + 0
[8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6] + ∞
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5] + 0
[10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4] + 0
[11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3] + 0
[12,11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2] + ∞
[13,12,11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1] + ∞
[14,13,12,11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0] + ∞

You then add the value from the array to each element in the row:

[∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞]
[∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞]
[∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞]
[3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11]
[4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10]
[5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8]
[7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7]
[∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5]
[10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4]
[11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3]
[∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞]
[∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞]
[∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞]

And then take the minimum of each column:

P = [3, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 3]

This sequence counts up inside the masked out area, away from the zeroes. This is the positive distance field P.

You can do the same for the inverted mask:

I = [0, 0, 0, ·, ·, ·, ·, ·, 0, ·, ·, ·, 0, 0, 0]

to get the complementary area, i.e. the negative distance field N:

N = [0, 0, 0, 1, 2, 3, 2, 1, 0, 1, 2, 1, 0, 0, 0]

That's what the EDT does, except it uses square distance … 9 4 1 0 1 4 9 …:

When you apply it a second time in the second dimension, these outputs are the new input, i.e. values other than 0 or ∞. It still works because of Pythagoras' rule: d² = x² + y². This wouldn't be true if it used linear distance instead. The net effect is that you end up intersecting a series of parabolas, somewhat like a 1D slice of a Voronoi diagram:

I' = [0, 0, 1, 4, 9, 4, 4, 4, 1, 1, 4, 9, 4, 9, 9]

Each parabola sitting above zero is the 'shadow' of a zero-level paraboloid located some distance in a perpendicular dimension:

The code is just a more clever way to do that, without generating the entire N² grid per row/column. It instead scans through the array left to right, building up a list v[k] of significant minima, with thresholds s[k] where two parabolas intersect. It adds them as candidates (k++) and discards them (--k) if they are eclipsed by a newer value. This is the first for/while loop:

for (let q = 1, k = 0, s = 0; q < length; q++) {
  f[q] = array[offset + q * stride];

  do {
    let r = v[k];
    s = (f[q] - f[r] + q * q - r * r) / (q - r) / 2;
  } while (s <= z[k] && --k > -1);

  k++;
  v[k] = q;
  z[k] = s;
  z[k + 1] = INF;
}

Then it goes left to right again (for), and fills out the values, skipping ahead to the right minimum (k++). This is the squared distance from the current index q to the nearest minimum r, plus the minimum's value f[r] itself. The paper has more details:

for (let q = 0, k = 0; q < length; q++) {
  while (z[k + 1] < q) k++;
  let r = v[k];
  let d = q - r;
  array[offset + q * stride] = f[r] + d * d;
}

The Broken EDT

So what's the catch? The above assumes a binary mask.

As it happens, if you try to subtract a binary N from P, you have an off-by-one error:

    O = [·, ·, ·, 0, 0, 0, 0, 0, ·, 0, 0, 0, ·, ·, ·]
    I = [0, 0, 0, ·, ·, ·, ·, ·, 0, ·, ·, ·, 0, 0, 0]

    P = [3, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 3]
    N = [0, 0, 0, 1, 2, 3, 2, 1, 0, 1, 2, 1, 0, 0, 0]

P - N = [3, 2, 1,-1,-2,-3,-2,-1, 1,-1,-2,-1, 1, 2, 3]

It goes directly from 1 to -1 and back. You could add +/- 0.5 to fix that.

But if there is a gray pixel in between each white and black, which we classify as both inside (0) and outside (0), it seems to work out just fine:

    O = [·, ·, ·, 0, 0, 0, 0, 0, ·, 0, 0, 0, ·, ·, ·]
    I = [0, 0, 0, 0, ·, ·, ·, 0, 0, 0, ·, 0, 0, 0, 0]

    P = [3, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 3]
    N = [0, 0, 0, 0, 1, 2, 1, 0, 0, 0, 1, 0, 0, 0, 0]

P - N = [3, 2, 1, 0,-1,-2,-1, 0, 1, 0,-1, 0, 1, 2, 3]

This is a realization that somebody must've had, and they reasoned on: "The above is correct for a 50% opaque pixel, where the edge between inside and outside falls exactly in the middle of a pixel."

"Lighter grays are more inside, and darker grays are more outside. So all we need to do is treat l = level - 0.5 as a signed distance, and use l² for the initial inside or outside value for gray pixels. This will cause either the positive or negative distance field to shift by a subpixel amount l. And then the EDT will propagate this in both X and Y directions."

The initial idea is correct, because this is just running SDF rendering in reverse. A gray pixel in an opacity mask is what you get when you contrast adjust an SDF and do not blow it out into pure black or white. The information inside the gray pixels is "correct", up to rounding.

But there are two mistakes here.

The first is that even in an anti-aliased image, you can have white pixels right next to black ones. Especially with fonts, which are pixel-hinted. So the SDF is wrong there, because it goes directly from -1 to 1. This causes the contours to double up, e.g. around this bottom edge:

To solve this, you can eliminate the crisp case by deliberately making those edges very dark or light gray.

But the second mistake is more subtle. The EDT works in 2D because you can feed the output of X in as the input of Y. But that means that any non-zero input to X represents another dimension Z, separate from X and Y. The resulting squared distance will be x² + y² + z². This is a 3D distance, not 2D.

If an edge is shifted by 0.5 pixels in X, you would expect a 1D SDF like:

  […, 0.5, 1.5, 2.5, 3.5, …]
= […, 0.5, 1 + 0.5, 2 + 0.5, 3 + 0.5, …]

Instead, because of the squaring + square root, you will get:

  […, 0.5, 1.12, 2.06, 3.04, …]
= […, sqrt(0.25), sqrt(1 + 0.25), sqrt(4 + 0.25), sqrt(9 + 0.25), …]

The effect of l² = 0.25 rapidly diminishes as you get away from the edge, and is significantly wrong even just one pixel over.

The correct shift would need to be folded into (x + …)² + (y + …)² and depends on the direction. e.g. If an edge is shifted horizontally, it ought to be (x + l)² + y², which means there is a term of 2*x*l missing. If the shift is vertical, it's 2*y*l instead. This is also a signed value, not positive/unsigned.

Given all this, it's a miracle this worked at all. The only reason this isn't more visible in the final glyph is because the positive and negative fields contains the same but opposite errors around their respective gray pixels.

The Not-Subpixel EDT

As mentioned before, the EDT algorithm is essentially making a 1D Voronoi diagram every time. It finds the distance to the nearest minimum for every array index. But there is no reason for those minima themselves to lie at integer offsets, because the second for loop effectively resamples the data.

So you can take an input mask, and tag each index with a horizontal offset Δ:

O = [·, ·, ·, 0, 0, 0, 0, 0, ·, ·, ·]
Δ = [A, B, C, D, E, F, G, H, I, J, K]

As long as the offsets are small, no two indices will swap order, and the code still works. You then build the Voronoi diagram out of the shifted parabolas, but sample the result at unshifted indices.

Problem 1 - Opposing Shifts

This lead me down the first rabbit hole, which was an attempt to make the EDT subpixel capable without losing its appealing simplicity. I started by investigating the nuances of subpixel EDT in 1D. This was a bit of a mirage, because most real problems only pop up in 2D. Though there was one important insight here.

O = [·, ·, ·, 0, 0, 0, 0, 0, ·, ·, ·]
Δ = [·, ·, ·, A, ·, ·, ·, B, ·, ·, ·]

Given a mask of zeroes and infinities, you can only shift the first and last point of each segment. Infinities don't do anything, while middle zeroes should remain zero.

Using an offset A works sort of as expected: this will increase or decrease the values filled in by a fractional pixel, calculating a squared distance (d + A)² where A can be positive or negative. But the value at the shifted index itself is always (0 + A)² (positive). This means it is always outside, regardless of whether it is moving left or right.

If A is moving left (–), the point is inside, and the (unsigned) distance should be 0. At B the situation is reversed: the distance should be 0 if B is moving right (+). It might seem like this is annoying but fixable, because the zeroes can be filled in by the opposite signed field. But this is only looking at the binary 1D case, where there are only zeroes and infinities.

In 2D, a second pass has non-zero distances, so every index can be shifted:

O = [a, b, c, d, e, f, g, h, i, j, k]
Δ = [A, B, C, D, E, F, G, H, I, J, K]

Now, resolving every subpixel unambiguously is harder than you might think:

It's important to notice that the function being sampled by an EDT is not actually smooth: it is the minimum of a discrete set of parabolas, which cross at an angle. The square root of the output only produces a smooth linear gradient because it samples each parabola at integer offsets. Each center only shifts upward by the square of an integer in every pass, so the crossings are predictable. You never sample the 'wrong' side of (d + ...)². A subpixel EDT does not have this luxury.

Subpixel EDTs are not irreparably broken though. Rather, they are only valid if they cause the unsigned distance field to increase, i.e. if they dilate the empty space. This is a problem: any shift that dilates the positive field contracts the negative, and vice versa.

To fix this, you need to get out of the handwaving stage and actually understand P and N as continuous 2D fields.

Problem 2 - Diagonals

Consider an aliased, sloped edge. To understand how the classic EDT resolves it, we can turn it into a voronoi diagram for all the white pixel centers:

Near the bottom, the field is dominated by the white pixels on the corners: they form diagonal sections downwards. Near the edge itself, the field runs perfectly vertical inside a roughly triangular section. In both cases, an arrow pointing back towards the cell center is only vaguely perpendicular to the true diagonal edge.

Near perfect diagonals, the edge distances are just wrong. The distance of edge pixels goes up or right (1), rather than the more logical diagonal 0.707…. The true closest point on the edge is not part of the grid.

These fields don't really resolve properly until 6-7 pixels out. You could hide these flaws with e.g. an 8x downscale, but that's 64x more pixels. Either way, you shouldn't expect perfect numerical accuracy from an EDT. Just because it's mathematically separable doesn't mean it's particularly good.

In fact, it's only separable because it isn't very good at all.

Problem 3 - Gradients

In 2D, there is also only one correct answer to the gray case. Consider a diagonal edge, anti-aliased:

Thresholding it into black, grey or white, you get:

If you now classify the grays as both inside and outside, then the highlighted pixels will be part of both masks. Both the positive and negative field will be exactly zero there, and so will the SDF (P - N):

This creates a phantom vertical edge that pushes apart P and N, and causes the average slope to be less than 45º. The field simply has the wrong shape, because gray pixels can be surrounded by other gray pixels.

This also explains why TinySDF magically seemed to work despite being so pixelized. The l² gray correction fills in exactly the gaps in the bad (P - N) field where it is zero, and it interpolates towards a symmetrically wrong P and N field on each side.

If we instead classify grays as neither inside nor outside, then P and N overlap in the boundary, and it is possible to resolve them into a coherent SDF with a clean 45 degree slope, if you do it right:

What seemed like an off-by-one error is actually the right approach in 2D or higher. The subpixel SDF will then be a modified version of this field, where the P and N sides are changed in lock-step to remain mutually consistent.

Though we will get there in a roundabout way.

Problem 4 - Commuting

It's worth pointing out: a subpixel EDT simply cannot commute in 2D.

First, consider the data flow of an ordinary EDT:

Information from a corner pixel can flow through empty space both when doing X-then-Y and Y-then-X. But information from the horizontal edge pixels can only flow vertically then horizontally. This is okay because the separating lines between adjacent pixels are purely vertical too: the red arrows never 'win'.

But if you introduce subpixel shifts, the separating lines can turn:

The data flow is still limited to the original EDT pattern, so the edge pixels at the top can only propagate by starting downward. They can only influence adjacent columns if the order is Y-then-X. For vertical edges it's the opposite.

That said, this is only a problem on shallow concave curves, where there aren't any corner pixels nearby. The error is that it 'snaps' to the wrong edge point, but only when it is already several pixels away from the edge. In that case, the smaller x² term is dwarfed by the much larger y² term, so the absolute error is small after sqrt.

The ESDT

Knowing all this, here's how I assembled a "true" Euclidean Subpixel Distance Transform.

Subpixel offsets

To start we need to determine the subpixel offsets. We can still treat level - 0.5 as the signed distance for any gray pixel, and ignore all white and black for now.

The tricky part is determining the exact direction of that distance. As an approximation, we can examine the 3x3 neighborhood around each gray pixel and do a least-squares fit of a plane. As long as there is at least one white and one black pixel in this neighborhood, we get a vector pointing towards where the actual edge is. In practice I apply some horizontal/vertical smoothing here using a simple [1 2 1] kernel.

The result is numerically very stable, because the originally rasterized image is visually consistent.

This logic is disabled for thin creases and spikes, where it doesn't work. Such points are treated as fully masked out, so that neighboring distances propagate there instead. This is needed e.g. for the pointy negative space of a W to come out right.

I also implemented a relaxation step that will smooth neighboring vectors if they point in similar directions. However, the effect is quite minimal, and it rounds very sharp corners, so I ended up disabling it by default.

The goal is then to do an ESDT that uses these shifted positions for the minima, to get a subpixel accurate distance field.

P and N junction

We saw earlier that only non-masked pixels can have offsets that influence the output (#1). We only have offsets for gray pixels, yet we concluded that gray pixels should be masked out, to form a connected SDF with the right shape (#3). This can't work.

SDFs are both the problem and the solution here. Dilating and contracting SDFs is easy: add or subtract a constant. So you can expand both P and N fields ahead of time geometrically, and then undo it numerically. This is done by pushing their respective gray pixel centers in opposite directions, by half a pixel, on top of the originally calculated offset:

This way, they can remain masked in in both fields, but are always pushed between 0 and 1 pixel inwards. The distance between the P and N gray pixel offsets is always exactly 1, so the non-zero overlap between P and N is guaranteed to be exactly 1 pixel wide everywhere. It's a perfect join anywhere we sample it, because the line between the two ends crosses through a pixel center.

When we then calculate the final SDF, we do the opposite, shifting each by half a pixel and trimming it off with a max:

SDF = max(0, P - 0.5) - max(0, N - 0.5)

Only one of P or N will be > 0.5 at a time, so this is exact.

To deal with pure black/white edges, I treat any black neighbor of a white pixel (horizontal or vertical only) as gray with a 0.5 pixel offset (before P/N dilation). No actual blurring needs to happen, and the result is numerically exact minus epsilon, which is nice.

ESDT state

The state for the ESDT then consists of remembering a signed X and Y offset for every pixel, rather than the squared distance. These are factored into the distance and threshold calculations, separated into its proper parallel and orthogonal components, i.e. X/Y or Y/X. Unlike an EDT, each X or Y pass has to be aware of both axes. But the algorithm is mostly unchanged otherwise, here X-then-Y.

The X pass:

At the start, only gray pixels have offsets and they are all in the range -1…1 (exclusive). With each pass of ESDT, a winning minima's offsets propagate to its affecting range, tracking the total distance (Δx, Δy) (> 1). At the end, each pixel's offset points to the nearest edge, so the squared distance can be derived as Δx² + Δy².

The Y pass:

You can see that the vertical distances in the top-left are practically vertical, and not oriented perpendicular to the contour on average: they have not had a chance to propagate horizontally. But they do factor in the vertical subpixel offset, and this is the dominant component. So even without correction it still creates a smooth SDF with a surprisingly small error.

Fix ups

The commutativity errors are all biased positively, meaning we get an upper bound of the true distance field.

You could take the min of X then Y and Y then X. This would re-use all the same prep and would restore rotation-independence at the cost of 2x as many ESDTs. You could try X then Y then X at 1.5x cost with some hacks. But neither would improve diagonal areas, which were still busted in the original EDT.

Instead I implemented an additional relaxation pass. It visits every pixel's target, and double checks whether one of the 4 immediate neighbors (with subpixel offset) isn't a better solution:

It's a good heuristic because if the target is >1px off there is either a viable commutative propagation path, or you're so far away the error is negligible. It fixes up the diagonals, creating tidy lines when the resolution allows for it:

You could take this even further, given that you know the offsets are supposed to be perpendicular to the glyph contour. You could add reprojection with a few dot products here, but getting it to not misfire on edge cases would be tricky.

While you can tell the unrelaxed offsets are wrong when visualized, and the fixed ones are better, the visual difference in the output glyphs is tiny. You need to blow glyphs up to enormous size to see the difference side by side. So it too is disabled by default. The diagonals in the original EDT were wrong too and you could barely tell.

Emoji

An emoji is generally stored as a full color transparent PNG or SVG. The ESDT can be applied directly to its opacity mask to get an SDF, so no problem there.

There are an extremely rare handful of emoji with semi-transparent areas, but you can get away with making those solid. For this I just use a filter that detects '+' shaped arrangements of pixels that have (almost) the same transparency level. Then I dilate those by 3x3 to get the average transparency level in each area. Then I divide by it to only keep the anti-aliased edges transparent.

The real issue is blending the colors at the edges, when the emoji is being rendered and scaled. The RGB color of transparent pixels is undefined, so whatever values are there will blend into the surrounding pixels, e.g. creating a subtle black halo:

Not Premultiplied

Premultiplied

A common solution is premultiplied alpha. The opacity is baked into the RGB channels as (R * A, G * A, B * A, A), and transparent areas must be fully transparent black. This allows you to use a premultiplied blend mode where the RGB channels are added directly without further scaling, to cancel out the error.

But the alpha channel of an SDF glyph is dynamic, and is independent of the colors, so it cannot be premultiplied. We need valid color values even for the fully transparent areas, so that up- or downscaling is still clean.

Luckily the ESDT calculates X and Y offsets which point from each pixel directly to the nearest edge. We can use them to propagate the colors outward in a single pass, filling in the entire background. It doesn't need to be very accurate, so no filtering is required.

RGB channel

Output

The result looks pretty great. At normal sizes, the crisp edge hides the fact that the interior is somewhat blurry. Emoji fonts are supported via the underlying ab_glyph library, but are too big for the web (10MB+). So you can just load .PNGs on demand instead, at whatever resolution you need. Hooking it up to the 2D canvas to render native system emoji is left as an exercise for the reader.

Use.GPU does not support complex Unicode scripts or RTL text yet—both are a can of worms I wish to offload too—but it does support composite emoji like "pirate flag" (white flag + skull and crossbones) or "male astronaut" (astronaut + man) when formatted using the usual Zero-Width Joiners (U+200D) or modifiers.

Shading

Finally, a note on how to actually render with SDFs, which is more nuanced than you might think.

I pack all the SDF glyphs into an atlas on-demand, the same I use elsewhere in Use.GPU. This has a custom layout algorithm that doesn't backtrack, optimized for filling out a layout at run-time with pieces of a similar size. Glyphs are rasterized at 1.5x their normal font size, after rounding up to the nearest power of two. The extra 50% ensures small fonts on low-DPI displays still use a higher quality SDF, while high-DPI displays just upscale that SDF without noticeable quality loss. The rounding ensures similar font sizes reuse the same SDFs. You can also override the detail independent of font size.

To determine the contrast factor to draw an SDF, you generally use screen-space derivatives. There are good and bad ways of doing this. Your goal is to get a ratio of SDF pixels to screen pixels, so the best thing to do is give the GPU the coordinates of the SDF texture pixels, and ask it to calculate the difference for that between neighboring screen pixels. This works for surfaces in 3D at an angle too. Bad ways of doing this will instead work off relative texture coordinates, and introduce additional scaling factors based on the view or atlas size, when they are all just supposed to cancel out.

As you then adjust the contrast of an SDF to render it, it's important to do so around the zero-level. The glyph's ideal vector shape should not expand or contract as you scale it. Like TinySDF, I use 75% gray as the zero level, so that more SDF range is allocated to the outside than the inside, as dilating glyphs is much more common than contraction.

At the same time, a pixel whose center sits exactly on the zero level edge is actually half inside, half outside, i.e. 50% opaque. So, after scaling the SDF, you need to add 0.5 to the value to get the correct opacity for a blend. This gives you a mathematically accurate font rendering that approximates convolution with a pixel-sized circle or box.

But I go further. Fonts were not invented for screens, they were designed for paper, with ink that bleeds. Certain renderers, e.g. MacOS, replicate this effect. The physical bleed distance is relatively constant, so the larger the font, the smaller the effect of the bleed proportionally. I got the best results with a 0.25 pixel bleed at 32px or more. For smaller sizes, it tapers off linearly. When you zoom out blocks of text, they get subtly fatter instead of thinning out, and this is actually a great effect when viewing document thumbnails, where lines of text become a solid mass at the point where the SDF resolution fails anyway.

In Use.GPU I prefer to use gamma correct, linear RGB color, even for 2D. What surprised me the most is just how unquestionably superior this looks. Text looks rock solid and readable even at small sizes on low-DPI. Because the SDF scales, there is no true font hinting, but it really doesn't need it, it would just be a nice extra.

Presumably you could track hinted points or edges inside SDF glyphs and then do a dynamic distortion somehow, but this is an order of magnitude more complex than what it is doing now, which is splat a contrasted texture on screen. It does have snapping you can turn on, which avoids jiggling of individual letters. But if you turn it off, you get smooth subpixel everything:

I was always a big fan of the 3x1 subpixel rendering used on color LCDs (i.e. ClearType and the like), and I was sad when it was phased out due to the popularity of high-DPI displays. But it turns out the 3x res only offered marginal benefits... the real improvement was always that it had a custom gamma correct blend mode, which is a thing a lot of people still get wrong. Even without RGB subpixels, gamma correct AA looks great. Converting the entire desktop to Linear RGB is also not going to happen in our lifetime, but I really want it more now.

The "blurry text" that some people associate with anti-aliasing is usually just text blended with the wrong gamma curve, and without an appropriate bleed for the font in question.

* * *

If you want to make SDFs from existing input data, subpixel accuracy is crucial. Without it, fully crisp strokes actually become uneven, diagonals can look bumpy, and you cannot make clean dilated outlines or shadows. If you use an EDT, you have to start from a high resolution source and then downscale away all the errors near the edges. But if you use an ESDT, you can upscale even emoji PNGs with decent results.

It might seem pretty obvious in hindsight, but there is a massive difference between getting it sort of working, and actually getting all the details right. There were many false starts and dead ends, because subpixel accuracy also means one bad pixel ruins it.

In some circles, SDF text is an old party trick by now... but a solid and reliable implementation is still a fair amount of work, with very little to go off for the harder parts.

By the way, I did see if I could use voronoi techniques directly, but in terms of computation it is much more involved. Pretty tho:

The ESDT is fast enough to use at run-time, and the implementation is available as a stand-alone import for drop-in use.

This post started as a single live WebGPU diagram, which you can play around with. The source code for all the diagrams is available too.

How to Fold a Julia Fractal

2013-01-05T00:00:00+01:00

A tale of numbers that like to turn

"Take the universe and grind it down to the finest powder and sieve it through the finest sieve and then show me one atom of justice, one molecule of mercy. And yet," Death waved a hand, "And yet you act as if there is some ideal order in the world, as if there is some… some rightness in the universe by which it may be judged."
– The Hogfather, Discworld, Terry Pratchett

Mathematics has a dirty little secret. Okay, so maybe it's not so dirty. But neither is it little. It goes as follows:

Everything in mathematics is a choice.

You'd think otherwise, going through the modern day mathematics curriculum. Each theorem and proof is provided, each formula bundled with convenient exercises to apply it to. A long ladder of subjects is set out before you, and you're told to climb, climb, climb, with the promise of a payoff at the end. "You'll need this stuff in real life!", they say, oblivious to the enormity of this lie, to the fact that most of the educated population walks around with "vague memories of math class and clear memories of hating it."

Rarely is it made obvious that all of these things are entirely optional—that mathematics is the art of making choices so you can discover what the consequences are. That algebra, calculus, geometry are just words we invented to group the most interesting choices together, to identify the most useful tools that came out of them. The act of mathematics is to play around, to put together ideas and see whether they go well together. Unfortunately that exploration is mostly absent from math class and we are fed pre-packaged, pre-digested math pulp instead.

And so it also goes with the numbers. We learn about the natural numbers, the integers, the fractions and eventually the real numbers. At each step, we feel hoodwinked: we were only shown a part of the puzzle! As it turned out, there was a 'better' set of numbers waiting to be discovered, more comprehensive than the last.

Along the way, we feel like our intuition is mostly preserved. Negative numbers help us settle debts, fractions help us divide pies fairly, and real numbers help us measure diagonals and draw circles. But then there's a break. If you manage to get far enough, you'll learn about something called the imaginary numbers, where it seems sanity is thrown out the window in a variety of ways. Negative numbers can have square roots, you can no longer say whether one number is bigger than the other, and the whole thing starts to look like a pointless exercise for people with far too much time on their hands.

I blame it on the name. It's misleading for one very simple reason: all numbers are imaginary. You cannot point to anything in the world and say, "This is a 3, and that is a 5." You can point to three apples, five trees, or chalk symbols that represent 3 and 5, but the concepts of 3 and 5, the numbers themselves, exist only in our heads. It's only because we are taught them at such a young age that we rarely notice.

So when mathematicians finally encountered numbers that acted just a little bit different, they couldn't help but call them fictitious and imaginary, setting the wrong tone for generations to follow. Expectations got in the way of seeing what was truly there, and it took decades before the results were properly understood.

Now, this is not some esoteric point about a mathematical curiosity. These imaginary numbers—called complex numbers when combined with our ordinary real numbers—are essential to quantum physics, electromagnetism, and many more fields. They are naturally suited to describe anything that turns, waves, ripples, combines or interferes, with itself or with others. But it was also their unique structure that allowed Benoit Mandelbrot to create his stunning fractals in the late 70s, dazzling every math enthusiast that saw them.

Yet for the most part, complex numbers are treated as an inconvenience. Because they are inherently multi-dimensional, they defy our attempts to visualize them easily. Graphs describing complex math are usually simplified schematics that only hint at what's going on underneath. Because our brains don't do more than 3D natively, we can glimpse only slices of the hyperspaces necessary to put them on full display. But it's not impossible to peek behind the curtain, and we can gain some unique insights in doing so. All it takes is a willingness to imagine something different.

So that's what this is about. And a lesson to be remembered: complex numbers are typically the first kind of numbers we see that are undeniably strange. Rather than seeing a sign that says Here Be Dragons, Abandon All Hope, we should explore and enjoy the fascinating result that comes from one very simple choice: letting our numbers turn. That said, there are dragons. Very pretty ones in fact.

Like Hands on a Clock

What does it mean to let numbers turn? Well, when making mathematical choices, we have to be careful. You could declare that $ 1 + 1 $ should equal $ 3 $, but that only opens up more questions. Does $ 1 + 1 + 1 $ equal $ 4 $ or $ 5 $ or $ 6 $? Can you even do meaningful arithmetic this way? If not, what good are these modified numbers? The most important thing is that our rules need to be consistent for them to work. But if all we do is swap out the symbols for $ 2 $ and $ 3 $, we didn't actually change anything in the underlying mathematics at all.

So we're looking for choices that don't interfere with what already works, but add something new. Just like the negative numbers complemented the positives, and the fractions snugly filled the space between them—and the reals somehow fit in between that—we need to go look for new numbers where there currently aren't any.

We'll start with the classic real number line, marked at the integer positions, and poke around.
We imagine the line continues to the left and right indefinitely.

$$ \class{blue}{2} + \class{green}{3} = \class{red}{5} $$

But there's a problem with this visualization: by picturing numbers as points,
it's not clear how they act upon each other.
For example, the two adjacent numbers $ \class{blue}{2} + \class{green}{3} $ sum to $ \class{red}{5} $ …

$$ \class{blue}{-2} + \class{green}{-1} = \class{red}{-3} $$

… but the similarly adjacent pair $ \class{blue}{-2} + \class{green}{-1} = \class{red}{-3} $.
We can't easily spot where the red point is going to be based on the blue and green.

A better solution is to represent our numbers using arrows instead, or vectors.
Each arrow represents a number through its length, pointing right/left for positive/negative.

The nice thing about arrows is that you can move them around without changing them.
To add two arrows, just lay them end to end. You can easily spot why $ \class{blue}{-2} + \class{green}{-1} = \class{red}{-3} $ …

… and why $ \class{blue}{2} + \class{green}{3} = \class{red}{5} $, similarly.
As long as we apply positives and negatives correctly, everything still works.

$$ \times \class{green}{1.5} ... $$

Now let's examine multiplication. We're going to start with $ \class{blue}{1} $ and then we'll multiply it by $ \class{green}{1.5} $ repeatedly.

With every multiplication, the vector gets longer by 50 percent.
These vectors represent the numbers $ \class{red}{1}, \class{red}{1.5}, \class{red}{2.25}, \class{red}{3.375} $, $ \class{red}{5.0625} $, a nice exponential sequence.

$$ \times (\class{green}{-1.5}) ... $$

Now we're going to do the same, but multiplying by the negative, $ \class{green}{-1.5} $, repeatedly.

The vectors still grow by 50%, but they also flip around, alternating between positive and negative.
These vectors represent the sequence $ \class{red}{1}, \class{red}{-1.5}, \class{red}{2.25}, \class{red}{-3.375}, \class{red}{5.0625} $.

But there's another way of looking at this. What if instead of flipping from positive to negative, passing through zero, we went around instead, by rotating the vector as we're growing it?

We'd get the same numbers, but we've discovered something remarkable: a way to enter and pass through the netherworld around the number line. The question is, is this mathematically sound, or plain non-sense?

$$ +180^\circ $$

$$ 0^\circ $$

The challenge is to come up with a consistent rule for applying these rotations. We start with normal arithmetic. Multiplying by a positive didn't flip the sign, so we say we rotated by $ 0^\circ $. Multiplying by a negative flips the sign, so we rotated by $ \class{green}{180^\circ} $. The lengths are multiplied normally in both cases.

$$ \times \class{green}{1.5 \angle 90^\circ} ... $$

$$ +90^\circ $$

$$ +270^\circ $$

Now suppose we pick one of the in-between nether-numbers, say the vector of length $ 1.5 $, at a $ 90^\circ $ angle. What does that mean? That's what we're trying to find out! We'll write that as $ \class{green}{1.5 \angle 90^\circ} $ (1.5 at 90 degrees). It could make sense to say that multiplying by this number should rotate by $ \class{green}{90^\circ} $ while again growing the length by 50%.

This creates the spiral of points: $ \class{red}{1 \angle 0^\circ} $, $ \class{red}{1.5 \angle 90^\circ} $, $ \class{red}{2.25 \angle 180^\circ} $, $ \class{red}{3.375 \angle 270^\circ} $, $ \class{red}{5.0625 \angle 360^\circ} $. Three of those are normal numbers: $ +1 $, $ -2.25 $ and $ +5.0625 $, lying neatly on the real number line. The other two are new numbers conjured up from the void.

$$ +135^\circ $$

$$ +45^\circ $$

$$ +225^\circ $$

$$ +315^\circ $$

$$ \times \class{green}{1 \angle 45^\circ} ... $$

Let's examine this rotation more. We can pick $ 1 $ at a $ \class{green}{45^\circ} $ angle. Multiplying by a $ 1 $ probably shouldn't change a vector's length, which means we'd get a pure rotation effect.

By multiplying by $ \class{green}{1 \angle 45^\circ} $, we can rotate in increments of $ 45^\circ $.
It takes 4 multiplications to go from $ +1 $, around the circle of ones, and back to the real number $ -1 $.

And that's actually a remarkable thing, because it means our invented rule has created a square root of $ -1 $.
It's the number $ \class{green}{1 \angle 90^\circ} $.

$ (\class{green}{1 \angle 90^\circ})^2 = \class{blue}{-1} $

If we multiply it by itself, we end up at angle $ \class{green}{90} + \class{green}{90} = \class{blue}{180^\circ} $, which is $ \class{blue}{-1} $ on the real line.

But actually, the same goes for $ \class{green}{1 \angle 270^\circ} $.

$ (\class{green}{1 \angle 270^\circ})^2 = \class{blue}{-1} $

When we multiply it by itself, we end up at angle $ \class{green}{270} + \class{green}{270} = \class{blue}{540^\circ} $. But because we went around the circle once, that's the same as rotating by $ \class{blue}{180^\circ} $. So that's also equal to $ \class{blue}{-1} $.

$$ \pm180^\circ $$

$$ 0^\circ $$

$$ -90^\circ $$

$$ +90^\circ $$

$$ -135^\circ $$

$$ -45^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$ (\class{green}{1 \angle -90^\circ})^2 = \class{blue}{-1} $

Or we could think of $ +270^\circ $ as $ -90^\circ $, and rotate the other way. It works out just the same. This is quite remarkable: our rule is consistent no matter how many times we've looped around the circle.

$ (\class{green}{1 \angle 90^\circ})^2 = \class{blue}{-1} $

$ (\class{green}{1 \angle 270^\circ})^2 = \class{blue}{-1} $

Either way, $ \class{blue}{-1} $ has two square roots, separated by $ 180^\circ $, namely $ \class{green}{1 \angle 90^\circ} $ and $ \class{green}{1 \angle 270^\circ} $.
This is analogous to how both $ 2 $ and $ -2 $ are square roots of $ 4 $.

$$ \class{blue}{a} \cdot \class{green}{b} = \class{red}{c}$$

Complex multiplication can then be summarized as: angles add up, lengths multiply, taking care to preserve clockwise and counterwise angles. Above, we multiply two random complex numbers a and b to get c.

$$ \class{blue}{a} \cdot \class{green}{b} = \class{red}{c}$$

When we start changing the vectors, c turns along, being tugged by both a and b's angles. It wraps around the circle, while its length changes. Hence, complex numbers like to turn, and it's this rule that separates them from ordinary vectors.

$$ \hspace{35 pt} + $$

$$ - \hspace{35 pt} $$

We can then picture the complex plane as a grid of concentric circles. There's a circle of ones, a circle of twos, a circle of one-and-a-halfs, etc. Each number comes in many different versions or flavors, one positive, one negative, and infinitely many others in between, at arbitrary angles on both sides of the circle.

$$ \pm180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ \hspace{15pt} \class{blue}{i} $$

Which brings us to our reluctant and elusive friend, $ \class{blue}{i} $. This is the proper name for $ \class{blue}{1 \angle 90^\circ} $, and the way complex numbers are normally introduced: $ i^2 = -1 $. The magic is that we can put a complex number anywhere a real number goes, and the math still works out, oddly enough. We get complex answers about complex inputs.

Complex numbers are then usually written as the sum of their (real) X coordinate, and their (imaginary) Y coordinate, much like ordinary 2D vectors. But this is misleading: the ugly number $ \class{red}{\frac{\sqrt{3}}{2} + \frac{1}{2}i } $ is actually just $ \class{green}{1 \angle 30^\circ} $ in disguise, and it acts more like a $ 1 $ than a $ \frac{1}{2} $ or $ \frac{\sqrt{3}}{2} $. While knowing how to convert between the two is required for any real calculations, you can cheat by doing it visually.

$$ \pm180^\circ $$

$$ 0^\circ $$

$$ -90^\circ $$

$$ +90^\circ $$

$$ -135^\circ $$

$$ -45^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ \class{blue}{+1} $$

$$ \hspace{55pt}\class{green}{+i} $$

$$ \class{blue}{-1} $$

$$ \class{green}{-i}\hspace{55pt} $$

But looking at individual vectors only gets us so far. We study functions of real numbers by looking at a graph that shows us every output for every input. To do the same for complex numbers, we need to understand how these numbers-that-like-to-turn, this field of vectors, change as a whole.
Note: from now on, I'll put $ +1 $, i.e. $ 0^\circ $ at the 12 o'clock position for simplicity.

When we apply a square root, each vector shifts. But really, it's the entire fabric of the complex plane that's warping. Each circle has been squeezed into a half-circle, because all the angles have been halved—the opposite of squaring, i.e. doubling the angle. The lengths have had a normal square root applied to them, compressing the grid at the edges and bulging it in the middle.

But remember how every number had two opposite square roots? This comes from the circular nature of complex math. If we take a vector and rotate it $ 360 ^\circ $, we end up in the same place, and the two vectors are equal. But after dividing the angles in half, those two vectors are now separated by only $ 180 ^\circ $ and lie on opposite ends of the circle. In complex math, they can both emerge.

Complex operations are then like folding or unfolding a piece of paper, only it's weird and stretchy and circular. This can be hard to grasp, but is easier to see in motion. To help see what's going on, I've cut the disc and separated the positive from the negative angles in 3D.

When we square our numbers to undo the square root, the angles double, folding the plane in on itself. The lengths are also squared, restoring the grid spacing to normal.

After squaring, each square root has now ended up on top of its identical twin, and we can merge everything back down to a flat plane. Everything matches up perfectly.

Thus the square root actually looks like this. New numbers flow in from the 'far side' as we try and shear the disc apart. The complex plane is stubborn and wants to stay connected, and will fold and unfold to ensure this is always the case. This is one of its most remarkable properties.

There's no limit to this folding or unfolding. If we take every number to the fourth power, angles are multiplied by four, while lengths are taken to the fourth power. This results in 4 copies of the plane being folded into one.

However, things are not always so neat. What happens if we were to take everything to an irrational power, say $ \frac{1}{\sqrt{2}} $? Angles get multiplied by $ 0.707106... $, which means a rotation of $ 360^\circ $ now becomes $ \sim 254.56^\circ $.

Because no multiple of $ 360 $ is divisible by $ \frac{1}{\sqrt{2}} $, the circular grid never matches up with itself again no matter how far we extend it. Hence, this operation splits a single unique complex number into an infinite amount of distinct copies.

For any irrational power $ p $, there are an infinite number of solutions to $ z^p = c $, all lying on a circle. For a hint as to why this is so, we can look at Taylor series: an arbitrary function $ f(z) $ can be written as an infinite sum $ a + bz + cz^2 + dz^3 + ... \,$ When z is complex, such a sum doesn't just represent a finite amount of folds, but a mindboggling infinite origami of complex space.

We've seen how complex numbers are arrows that like to turn, which can be made to behave like numbers: we can add and multiply them, because we can come up with a consistent rule for doing so. We've also seen what powers of complex numbers look like: we fold or unfold the entire plane by multiplying or dividing angles, while simultaneously applying a power to the lengths.

Pulling a Dragon out of a Hat

With a basic grasp of what complex numbers are and how they move, we can start making Julia fractals.

At their heart lies the following function:

$$ f(z) = z^2 + c $$

This says: map the complex number $ z $ onto its square, and then add a constant number to it. To generate a Julia fractal, we have to apply this formula repeatedly, feeding the result back into $ f $ every time.

$$ z_{n+1} = (z_n)^2 + c $$

We want to examine how $ z_n $ changes when we plug in different starting values for $ z_1 $ and iterate $ n $ times. So let's try that and see what happens.

Our region of interest is the disc of complex numbers less than $ 2 $ in length. I've marked the circle of ones as a reference.

We take an arbitrary set of numbers, like this grid, and start applying the formula $ f(z) = z^2 + c $ to each. Rather than use vectors, I'll just draw points, to avoid cluttering the diagram.

First we square each number. That is, their lengths are squared, their angles are doubled.
The squaring has a dual effect: numbers larger than $ 1 $ grow bigger and are pushed outwards, numbers less than $ 1 $ grow smaller and are pulled inwards.

Next, we reset the grid back to neutral, keeping the numbers in their new place.
We also pick a random value for the constant $ \class{green}{c} $, e.g. $ \class{green}{0.57 \angle -59^\circ} $.

Now we add $ \class{green}{c} $ to each point, completing one round of Julia iteration, $ f(z) = z^2 + c $. As a result, some numbers have ended up closer towards the origin (i.e. $ 0 $), others further away from it. The combination of folding + shifting has had a non-obvious effect on the numbers.

We begin the second iteration and square each number again. Any number not inside the critical circle of $ 1 $ in the middle will get pushed out again. The other numbers continue to linger in the middle.

If we zoom out, we can see the larger numbers are spiralling outwards and are permanently lost. The minor nudge by $ \class{green}{c} $ won't be enough to bring them back.

Others remain in the middle, being drawn in, but are also at risk of being pushed out of the circle by $ \class{green}{c} $.

Resetting the grid again, we add the same value $ \class{green}{c} $ to our vectors again to finish. At this point, our original grid of numbers has been completely jumbled up.

If we continued this process would any numbers remain in the middle? Or would they eventually all get flung out? Unfortunately it's very hard to see what's going on while iterating forwards, because we lose track of where each point came from.

So we're going to go backwards instead. We'll establish a safe-zone of all numbers less than $ 2 $, forming a solid disc of all those which aren't irretrievably lost. We want to know where all these numbers can possibly come from. To help track these points, I've coloured one area in a different shade.

First we have to shift the numbers again, this time in the opposite direction to subtract $ c $.

Now we apply the square root to find $ z_{n-1} = \pm \sqrt{z_n - c} $, which is a Julia iteration in reverse.

After one backwards iteration, the disc has been squished down into an oval at an angle.
These are all the points that will definitely stay in the middle after one iteration.

When we apply the second iteration, a pattern starts to develop. Because of the repeated unfolding, we create two bulges wherever there was previously only one.

At the same time, the square root alters the length of each number as well. As a result, we squeeze in the radial direction, scaling down earlier features as they combine with newly created ones.

After 4 iterations, we start to see the first hints of self-similarity. The shape's lobes are sprouting into spirals.

But all we've really done is narrow down our blue safe-zone to include only those points that 'survive' up to 5 Julia iterations.

Remarkably this seems to distort the fractal evenly: our highlighted circles don't stretch into ovals. This is not a coincidence. Complex operations are indeed stubborn, in that they all preserve right angles everywhere. To do so, the mapping must act like a pure scaling and rotation at every point, without shearing off in any particular direction. This is what allows the fractal to look like itself at different scales.

Skipping ahead to iteration 12, we've definitely abandoned the realm of neat, traditional geometry.
Despite curving wildly, the total mapping $ z_{12} $ still has this property of evenness, which is properly referred to as a conformal mapping.

After 128 iterations, we end up with this intricate dragon-like shape, approximating the safe zone for the true fractal map $ z_\infty $. The numbers that make up the blue area are the hardiest points that will survive the next 128 attempts on their life. All the others will definitely get flung out.

Yet this complicated shape is merely the result of folding over and over again, adding a simple constant in between. If we perform a forwards Julia iteration, i.e. squaring and shifting, we see this shape matches up with itself, and looks identical before and after.

For different values of $ \class{green}{c} $, the fractal morphs into other shapes. There's literally an infinite variety to discover. Some sets are made up of disconnected parts. In this case, $ |c| $ is large enough to push the solid disc away from the center in a single iteration, but not so far that some points can't fold back in. If $ |c| $ gets much larger, the set vanishes.

For a smaller $ c $, Julia sets are solid. Even a small shift in the value of $ c $ can accumulate into a large difference. Here we zone in on some fluffy clouds right outside the 'solid zone'. Oddly enough, it seems when $ c $ is not inside of its own Julia set, the set is not solid. Note that in this case, 128 iterations is not sufficient: large solid patches remain, which would be divided further with more iterations.

This area of fractal space is dubbed Seahorse Valley, for rather obvious reasons.

Nearby, we find these jewel-like spirals.

Buried deep inside, there are remarkable combinations of shapes, like this pearl necklace covered in something resembling palm trees.

And we can even make snowflakes. The dramatic changes due to $ c $ reveal the chaotic nature of fractals. Mathematically, chaos occurs when even the tiniest change can accumulate and blow up to an arbitrarily large effect.

If we change our iteration formula, for example to a fourth power $ f(z) = z^4 + c $, the entire shape changes. Because each iteration now turns one bulge into four, the resulting shape has four-fold rotational symmetry.

Again, different values of $ \class{green}{c} $ make different shapes, precipitating dramatic changes.

To understand the effect of $ c $ we need to make a Mandelbrot set. This is similar to a Julia set, but the formula is applied differently. We'll use $ z^2 + c $ again. Instead of different starting values $ z_1 $, we choose different values of $ c $ and start with $ z_1 = 0 $ every time. Because $ c $ is no longer constant, the mapping stops being a simple folding operation. Each iteration is now unique and not so easy to visualize.

Because the Mandelbrot set traverses all possible values of $ c $ across its surface, it has a part of every associated Julia set in it. Around any number $ \class{green}{c} $ it looks like the Julia set which has that value as its constant. Here, we move towards the three-way cross at the bottom of the Mandelbrot set. The Julia set develops similar features.

Where the Mandelbrot set is round and bulbous, the Julia set is too.

The spirals and seahorses from earlier are located here. You can literally see the shapes on both sides of the valley evolving towards horseheads and spirals respectively. But the Mandelbrot set acts like a map to Julia sets in a much more direct way: anywhere the Mandelbrot set is filled in (blue), the corresponding Julia set is solid too. The white areas are values of $ c $ which create disconnected Julia sets.

That the Mandelbrot set is a 'pixel-perfect' map of Julia sets is a big clue. It reflects that they're actually both slices of a single higher dimensional object. By viewing these slices as we travel through, we can get a vague idea of its shape and complexity. In this object, every point in the Mandelbrot set is connected to the center of the corresponding Julia set. Actually picturing this 4D object is a challenge.

But like any fractal, the Mandelbrot set also contains copies of itself, buried inside its edge. This is just one of the many varied copies. As a result, deep Mandelbrot zooms can reach astonishing levels of beauty in complexity. This is best done with specialized software that can calculate with hundreds of digits of precision.

Making fractals is probably the least useful application of complex math, but it's an undeniably fascinating one. It also reveals the unique properties of complex operations, like conformal mapping, which provide a certain rigidity to the result.

However, in order to make complex math practical, we have to figure out how to tie it back to the real world.

Travelling without Moving

It's a good thing we don't have to look far to do so. Whenever we're describing wavelike phenomena, whether it's sound, electricity or subatomic particles, we're also interested in how the wave evolves and changes. Complex operations are eminently suited for this, because they naturally take place on circles. Numbers that oppose can cancel out, numbers in the same direction will amplify each other, just like two waves do when they meet. And by folding or unfolding, we can alter the frequency of a pattern, doubling it, halving it, or anything in between.

More complicated operations are used for example to model electromagnetic waves, whether they are FM radio, wifi packets or ADSL streams. This requires precise control of the frequencies you're generating and receiving. Doing it without complex numbers, well, it just sucks. So why use boring real numbers, when complex numbers can do the work for you?

$$ w(x) = \sin(x) $$

Take for example a sine wave $ w(x) $.

$$ w(x, t) = \sin(x - t) $$ $$ \class{blue}{\frac{\partial w(x, t)}{\partial t}} $$

For the wave to propagate across a distance, its values have to ripple up and down over time.
The rate of change over time is drawn on top. This is the vertical velocity at every point. Both the wave and its rates of change undergo a complicated numerical dance.

$$ w(x, t) = \sin(x - t) $$ $$ \class{blue}{\frac{\partial w(x, t)}{\partial t}} \,\, \class{green}{\frac{\partial^2 w(x, t)}{\partial t^2}} $$

But to properly describe this motion, we have to go one level deeper. We have to examine the rate of change of the vertical velocity of the wave. This is its vertical acceleration. We see that green vectors tug on blue vectors as blue vectors tug on the wave.

$$ w(x, t) = \sin(x - t) $$ $$ \class{green}{\frac{\partial^2 w(x, t)}{\partial t^2}} = \,? $$

It's easier to see what's going on if we center the vectors vertically. The acceleration appears to be equal but opposite to the wave itself.

$$ w(x, t) = \sin(x - t) + 1 $$ $$ \class{green}{\frac{\partial^2 w(x, t)}{\partial t^2}} = \,? $$

But that's just a lucky coincidence. If we shift the wave up by one unit, its opposite shifts down by a unit. Yet its velocity and acceleration are unaltered. So acceleration is not simply the opposite of the wave.

What's actually going on is that the green vectors match the curvature of the wave, positive inside valleys, negative on top of crests. Intuitively, this can be explained by saying that waves tend to bounce towards an average level: this is going to pull the value up out of valleys and down from peaks.

$$ w(x, t) = \sin(x - t) + 1 $$ $$ \class{green}{\frac{\partial^2 w(x, t)}{\partial t^2}} = \class{red}{\frac{\partial^2 w(x, t)}{\partial x^2}} $$

But curvature is the rate of change of the slope, and slope is the rate of change over a distance. So to describe real waves, we need to relate 'second level' change over time and change over distance, each deriving twice. This is Complicated with a capital C.

Let's try this with complex numbers instead. Until now, we had a 2D graph, showing the real value of the wave over real distance. We're going to make the wave's value complex. Mapping a 1D number (distance) to a 2D number (the wave function), means we need a 3D diagram.

The complex plane is mapped into the old Y direction (real) and the new Z direction (imaginary).

$$ w(x) = (1 \angle x) $$

To make a complex wave, we do the thing complex numbers are best at: we make them turn, and make a helix. In this case, our wave function is simply the variable number $ 1 \angle x $ , a constant length with a smoothly changing rotation over distance.

$$ w(x, t) = (1 \angle x) \cdot (1 \angle t) = 1 \angle (x + t) $$

$$ \class{blue}{\frac{\partial w(x, t)}{\partial t}} = \,? $$

To make the wave move, we can simply twist it in-place. Which we now know is the same as multiplying by an increasing angle $ 1 \angle t $. If we plot the complex velocity of each point, at first sight this might not look any simpler than the real wave. But in fact, these vectors are not changing in length at all, unlike the real version. As the wave is pulled by the velocity vectors, both undergo a pure rotation.

$$ \class{blue}{\frac{\partial w(x, t)}{\partial t}} = i \cdot w(x, t) $$

At all times, the velocity is offset by $ 90^\circ $ from the wave itself. And that means that described in complex numbers, wave equations are super easy. Instead of involving two derivatives, i.e. the rate of rate of change, we only need one. There is a direct relationship between a value and its rate of change. The necessary rotation by $ 90^\circ $ can then be written simply as multiplying by $ i $.

To recover a real wave from a complex wave, we can simply flatten it back to 2D, discarding the imaginary part. By using complex numbers to describe waves, we give them the power to rotate in place without changing their amplitude, which turns out to be much simpler.

$$ \frac{1}{2} (\class{blue}{ 1 \angle (x + t) } + \class{green}{ 1 \angle -(x + t) }) = \cos(x + t) $$

In fact, flattening the wave has a perfectly reasonable complex interpretation: it's what happens when we average out a counter-clockwise wave (positive frequency) with a clockwise wave (negative frequency). By twisting each in opposite directions, the combined wave travels along, locked to the real number line.

$$ \frac{1}{2} (\class{blue}{ 1 \angle (x + t) } + \class{green}{ 1 \angle -(\frac{3}{2}x + t) }) = \,? $$

But if we add up two arbitrary complex frequencies, their sum immediately turns into a spirograph pattern that manages to evolve and propagate, even as it just rotates in place. Though the original waves both had a constant amplitude of $ 1 $, the relative differences in angles (i.e. the phase) allows them to cancel out in surprising ways.

Neither curve is actually moving forward: they're just spinning in place, creating motion anyway. This is actually what quantum superposition looks like, where two or more complex probability waves combine and interfere. Where the result cancels out to zero, that's where two separate possible states are cancelling out each other, creating interference. That the underlying numbers are complex doesn't prevent them from describing real physics, indeed, it seems that's how nature actually works.

This serene display hides a whirlwind of phase. We can plot the velocity of the two frequencies, and their combination, scaled down for clarity. Once again you can see the power of describing waves with complex numbers, letting you split up a complicated motion into simple, repetitive rotations… into numbers that like to turn.

The End Is Just The Beginning

In visualizing complex waves, we've seen functions that map real numbers to complex numbers, and back again. These can be graphed easily in 3D diagrams, from $ \mathbb{R} $ to $ \mathbb{C} $ or vice-versa. You cross 1 real dimension with the 2 dimensions of the complex plane.

But complex operations in general work from $ \mathbb{C} $ to $ \mathbb{C} $. To view these, unfortunately you need four-dimensional eyes, which nature has yet to provide. There are ways to project these graphs down to 3D that still somewhat make sense, but it never stops being a challenge to interpret them.

For every mathematical concept that we have a built-in intuition for, there are countless more we can't picture easily. That's the curse of mathematics, yet at the same time, also its charm.

Hence, I tried to stick to the stuff that is (somewhat!) easy to picture. If there's interest, a future post could cover topics like: the nature of $ e^{ix} $, Fourier transforms, some actual quantum mechanics, etc.

For now, this story is over. I hope I managed to spark some light bulbs here and there, and that you enjoyed reading it as much as I did making it.

Comments, feedback and corrections are welcome on Google Plus. Diagrams powered by MathBox.

More like this: To Infinity… And Beyond!.

For extra credit: check out these great stirring visualizations of Julia and Mandelbrot sets. I incorporated a similar graphic above. Hat tip to Tim Hutton for pointing these out. And for some actual paper mathematical origami, check out Vihart's latest video on Snowflakes, Starflakes and Swirlflakes.

JavaScript audio synthesis with HTML 5

2009-08-12T00:00:00+02:00

HTML5 gives us a couple new toys to play with, such as and tags. On the visual side, we've already seen live green-screening with Canvas and JS, and in terms of audio there's been several JS drum machines already. But the question I was interested in was: can you use JavaScript to stream live data into these media tags?

Enter the JavaScript audio synth. It generates a handful of samples using very basic time-domain synthesis, wraps them up in a WAVE file header and embeds them in tags using base64-encoded data URIs. Each sample is then triggered using timers to play the drum pattern. It's quite simple to do and runs fast enough in HTML5 capable browsers to be unnoticeable. Yes, it sounds tinny, but that's just because I'm too lazy to design proper filters for toys like this. Unfortunately, while the synthesis is fast enough to run real-time, you can't actually use it for a full live audio stream, as there is no way to queue up chunks of synthesized audio for seamless playback. I tried triggering multiple tags in parallel to address this, but that didn't work either.

My final attempt was to generate tons of periodic audio loops only a couple of ms long, and to play them back with looping turned on while altering each tag's volume in real time, hence doing a sort of additive wavetable synthesis. Unfortunately, looping is not a fully supported feature, and the only browser I found that does it (Safari) doesn't loop seamlessly at all.

All in all, my first brush with the tag was a major disappointment. The tag's high-level approach leads to similar limitations, but they are offset by the flexibility and power of the

Taming complex numbers in Grapher.app

2008-09-24T00:00:00+02:00

Of all the free extras that Mac OS X has, Grapher has to be one of the coolest. This little app, hidden away in the Applications/Utilities folder, is a powerful graphing tool for mathematical equations and data sets.

As you might expect from Apple, it typesets symbolic math beautifully and produces smooth, anti-aliased graphs. But this isn't just a little tech demo to showcase some of OS X's technologies: Grapher's features blow away your crusty old TI-83, and it comes with its own set of surprises. For example, not only can you save graphs as PDF or EPS, but it can export animations and even doubles as a LaTeX formula editor.

In fact, it does so much that its main weakness is the documentation, which only covers the very basics. The best way to learn Grapher is to look at the handful of included examples, although it might take you a while to find out how to replicate them from scratch.

The other day I needed to quickly graph a couple of things involving complex numbers, and it seemed that Grapher was doing some very freaky shit. Either that, or my math was really rusty. It turned out I'm not as stupid as I thought, and there are some weird caveats with using complex numbers in Grapher. Oddly, there is very little information online about it, so I figured for future reference, I should document the workarounds I discovered.

Let's dive in. Fuck MS Paint, I've got math to do.

Refresher

To type formulas into Grapher, you can use the symbol palette, available in the Window menu, or type away using various keyboard shortcuts:

Type ^ for exponents, _ for indices, / for fractions. Grapher understands exponents and other notations, for example the Bessel functions J_n(x).
Use the arrow keys to move around the equation: in and out of parentheses, exponents, fractions, etc. Pay attention to the cursor to see where you're typing.
Type out greek letter names for the symbols: alpha, omega, pi.
Common mathematical constants work: e, π, i.
The very useful 'Copy LaTeX expression' command is hidden away in the editor's right-click menu.

Using complex numbers

At first sight, complex numbers 'just work'. Using i as the imaginary unit, you can use numbers like 1 + 2i or plot graphs like y=e^ix. You can use the Re() and Im() operators to explicitly extract the real or imaginary part of a complex number and use abs() and arg() to extract the modulus and argument. If an expression's result is complex, Grapher will only plot the real part.

This last bit is where things get tricky, because this silent casting of complex numbers to reals also sometimes happens in intermediate values.

Silent truncation

Let's plot a complex parametric curve directly using formulas of the form x + iy=.... As an example, let's look at this:

These equations are using Euler's formula e^i·x = cos x + i·sin x to plot a half circle each. The only difference between the two formulas is that the second one is passing its value through the (useless) function f(t).

Now if we replace e^i·x with 1/e^i·x = e^–i·x = cos x – i·sin x and change f(t) to 1/t, all that should happen is that the graph is mirrored vertically. Instead, this happens:

The blue circle segment is drawn as a broken horizontal line. What's happening is that Grapher is treating the definition f(t) = 1/t as if it said f(t) = 1/Re(t). In other words, it is truncating the complex input of f(t) to a real number.

To fix this, you need to replace the variable t with complex(t). This complex() function is listed in the built-in definitions list in the Help menu, but lacks any documentation. With this fix applied, the graph will plot as expected:

Further tests reveal that complex(t) is in fact equivalent to writing out Re(t) + i·Im(t), thus manually recomposing the complex number from its own real and imaginary parts. If it weren't for the existence of the complex() helper, one might consider this issue a bug. The way it is now, it seems this behaviour is somewhat intentional.

Moral of the story: wrap all your function inputs in complex() to avoid nasty surprises.

Broken built-ins

Another annoying issue is that certain built-in functions don't handle complex inputs. To show this, you can try plotting y=sinh(–i²·x). Mathematically, this is equivalent to plotting y=sinh(x) directly. However the presence of the imaginary unit causes the plot to fail.

As a workaround, you need to define your own functions using known formulas and incorporating the complex() fix.

For example, you might define:

fixsinh(x) = (e^complex(x) – e^-complex(x))/ 2 fixcosh(x) = (e^complex(x) + e^-complex(x))/ 2

Other built-ins are trickier. For example, Γ(z) needs replacing, but mathematically it is defined as an improper integral. Unfortunately, Grapher's integrator doesn't seem to handle the definition for Γ(z) at all — though it's supposed to do improper integrals.

When using built-in definitions, always verify that you're getting the results you need with a simple example.

Math porn

To round this off, here's an example where I use these tricks to plot a Kaiser sampling window and its frequency response:

Happy graphing!