Acko.net

On TermKit

2011-05-17T00:00:00+02:00

I've been administering Unix machines for many years now, and frankly, it kinda sucks. It makes me wonder, when sitting in front of a crisp, 2.3 million pixel display (i.e. a laptop) why I'm telling those pixels to draw me a computer terminal from the 80s.

And yet, that's what us tech nerds do every day. The default Unix toolchain, marked in time by the 1970 epoch, operates in a world where data is either binary or text, and text is displayed in monospace chunks. The interaction is strictly limited to a linear flow of keystrokes, always directed at only one process. And that process is capable of communicating only in short little grunts of text, perhaps coalescing into a cutesy little ASCII art imitation of things that grown-ups call "dialogs", "progress bars", "tables" and "graphs".

The Unix philosophy talks about software as a toolset, about tiny programs that can be composed seamlessly. The principles are sound, and have indeed stood the test of time. But they were implemented in a time when computing resources were orders of magnitude smaller, and computer interaction was undiscovered country.

In the meantime, we've gotten a lot better at displaying information. We've also learned a lot of lessons through the web about data interchange, network transparency, API design, and more. We know better how small tweaks in an implementation can make a world of difference in usability.

And yet the world of Unix is rife with jargon, invisible processes, traps and legacy bits. Every new adept has to pass a constant trial by fire, of not destroying their system at every opportunity it gives them.

So while I agree that having a flexible toolbox is great, in my opinion, those pieces could be built a lot better. I don't want the computer equivalent of a screwdriver and a hammer, I want a tricorder and a laser saw. TermKit is my attempt at making these better tools and addresses a couple of major pain points.

I see TermKit as an extension of what Apple did with OS X, in particular the system tools like Disk Utility and Activity Monitor. Tech stuff doesn't have to look like it comes from the Matrix.

Rich Display

It's 2011, and monospace text just doesn't cut it anymore. In the default ANSI color palette, barely any of the possible color combinations are even readable. We can't display graphs, mathematical formulas, tables, etc. We can't use the principles of modern typography to lay out information in a readable, balanced way.

So instead, I opted for a front-end built in WebKit. Programs can display anything that a browser can, including HTML5 media. The output is built out of generic widgets (lists, tables, images, files, progress bars, etc.). The goal is to offer a rich enough set for the common data types of Unix, extensible with plug-ins. The back-end streams display output to the front-end, as a series of objects and commands.

I should stress that despite WebKit it is not my intent to make HTML the lingua franca of Unix. The front-end is merely implemented in it, as it makes it instantly accessible to anyone with HTML/CSS knowledge.

Pipes

Unix pipes are anonymous binary streams, and each process comes with at least three: Standard In, Standard Out and Standard Error. This corresponds to the typical Input > Processing > Output model, with an additional error channel. However, in actual usage, there are two very different scenarios.

One is the case of interactive usage: a human watches the program output (from Std Out) on a display, and types keystrokes to interact with it (into Std In). Another case is the data processing job: a program accepts a data stream in a particular format on Std In, and immediately outputs a related data stream on Std Out. These two can be mixed, in that a chain of piped commands can have a human at either end, though usually this implies non-interactive operation.

These two cases are shoehorned into the same pipes, but happen quite differently. Human input is spontaneous, sporadic and error prone. Data input is strictly formatted and continuous. Human output is ambiguous, spaced out and wordy. Data output is conservative and monolithic.

As a result, many Unix programs have to be careful about data. For example, many tools dynamically detect whether they are running in interactive mode, and adjust their output to be more human-friendly or computer-friendly. Other tools come with flags to request the input/output in specific formats.

This has lead to "somewhat parseable text" being the default interchange format of choice. This seems like an okay choice, until you start to factor in the biggest lesson learned on the web: there is no such thing as plain text. Text is messy. Text-based formats lie at the basis of every SQL injection, XSS exploit and encoding error. And it's in text-parsing code where you'll likely find buffer overflows.

What this means in practice is that in every context, there are some forbidden characters, either by convention or by spec. For example, no Unicode or spaces in filenames. In theory, it's perfectly fine, but in practice, there's at least one shell script on your system that would blow up if you tried. Despite the promise of text as the universal interchange format, we've been forced to impose tons of vague limits.

So how do we fix this? By separating the "data" part from the "human" part. Then we can use messy text for humans, and pure data for the machines. Enter "Data In/Out", "View In/Out".

The data pipes correspond to the classical Std pipes, with one difference: the stream is prefixed with MIME-like headers (Content-Type, Content-Length, etc). Of these, only the 'Content-Type' is required. It allows programs to know what kind of input they're receiving, and handle it graciously without sniffing. Aside from that, the data on the pipe is a raw binary stream.

The view pipes carry the display output and interaction to the front-end. Widgets and UI commands are streamed back and forth as JSON messages over the view pipes.

The real magic happens when these two are combined. The last dangling Std Out pipe of any command chain needs to go into the Terminal, to be displayed as output. But the data coming out of Data Out is not necessarily human-friendly.

Thanks to the MIME-types, we can solve this universally. TermKit contains a library of output formatters which each handle a certain type of content (text, code, images, ...). It selects the right formatter based on the Content-Type, which then generates a stream of view updates. These go over the View Out pipe and are added to the command output.

As a result, you can cat a PNG and have it just work. TermKit cat doesn't know how to process PNGs or HTML—it only guesses the MIME type based on the filename and pipes the raw data to the next process. Then the formatter sends the image to the front-end. If you cat a source code file, it gets printed with line numbers and syntax highlighting.

So where does "somewhat parseable text" fit in? It turns out to be mostly unnecessary. Commands like ls output structured data by nature, i.e. a listing of files from one or more locations. It makes sense to pipe around this data in machine-form. Output flags like ls -l become mere hints for the final display, which can toggle on-the-fly between compact and full listing.

In TermKit's case, JSON is the interchange format of choice. The Content-Type for file listings is application/json; schema=termkit.files. The schema acts as a marker to select the right output plug-in. In this case, we want the file formatter rather than the generic raw JSON formatter.

Isn't JSON data harder to work with than lines of text? Only in some ways, but parsing JSON is trivial these days in any language. Because of this, I built TermKit grep so it supports grepping JSON data recursively. This happens transparently when the input is application/json instead of text/plain. As a result ls | grep works as you'd expect it to.

To slot in traditional Unix utilities in this model, we can pipe their data as application/octet-stream to start with, and enhance specific applications with type hints and wrapper scripts.

Finally, having type annotations on pipes opens up another opportunity: it allows us to pipe in HTTP GET / POST requests almost transparently. Getting a URL becomes no different from catting a file, and both can have fancy progress bars, even when inside a pipe chain like get | grep.

Synchronous interaction

All interaction in a traditional terminal is synchronous. Only one process is interactive at a time, and each keystroke must be processed by the remote shell before it is displayed. This leads to an obvious daily frustration: SSH keystroke lag.

To fix this, TermKit is built out of a separate front-end and back-end. The front-end can run locally, controlling a back-end on a remote machine. The connection can be tunneled over SSH for security.

Architecture diagram (TK stands for TermKit)

Additionally, all display updates and queries are asynchronous. The WebKit-based HTML display is split up into component views, and the view pipes of each subprocess are routed to their own view. Vice-versa, any interactive widgets inside a view can send callback messages back to their origin process, as long as it's still running.

This also allows background processes to work without overflowing the command prompt.

String-based command line

A lot of my frustration comes from bash's arcane syntax. It has a particularly nasty variant of C-style escaping. Just go ahead and try to match a regular expression involving both types of quotes.

But at its core, a bash command is a series of tokens. Some tokens are single words, some are flags, some are quoted strings, some are modifiers (like | and >). It makes sense for the input to reflect this.

TermKit's input revolves around tokenfield.js, a new snappy widget with plenty of tricks. It can do auto-quoting, inline autocomplete, icon badges, and more. It avoids the escaping issue altogether, by always processing the command as tokens rather than text. Keys that trigger special behaviors (like a quote) can be pressed again to undo the behavior and just type one character.

The behaviors are encoded in a series of objects and regexp-based triggers, which transform and split tokens as they are typed. That means it's extensible too.

Usability

At the end of the day, Unix just has bad usability. It tricks us with unnecessary abbreviations, inconsistent arguments (-r vs -R) and nitpicky syntax. Additionally, Unix has a habit of giving you raw data, but not telling you useful facts, e.g. 'r-xr-xr-x' instead of "You can't touch this" (ba-dum tsshh).

One of the Unix principles is nobly called "Least Surprise", but in practice, from having observed new Unix users, I think it often becomes "Maximum Confusion". We should be more pro-active in nudging our users in the right direction, and our tools should be designed for maximum discoverability.

For example, I want to see the relevant part of a man page in a tooltip when I'm typing argument switches. I'd love for dangerous flags to be highlighted in red. I'd love to see regexp hints of possible patterns inline.

There's tons to be done here, but we can't do anything without modern UI abilities.

Focus and Status

With a project like TermKit, it's easy to look at the shiny exterior and think "meh", or that I'm just doing things differently for difference's sake. But to me, the real action is under the hood. With a couple of tweaks and some uncompromising spring cleaning, we can get Unix to do a lot more for us.

The current version of TermKit is just a rough alpha, and what it does is in many ways just parlour tricks compared to what it could be doing in a few months. The architecture definitely supports it.

I've worked on TermKit off and on for about a year now, so I'd love to hear feedback and ideas. Please go check out the code.

TermKit owes its existence to Node.js, Socket.IO, jQuery and WebKit. Thanks to everyone who has contributed to those projects.

Edit, a couple of quick points:

A Linux port will definitely happen, since it's built out of WebKit and Node.js. Whoever does it first gets a cookie.
TermKit is not tied to JSON except in its own internal communication channels. TermKit Pipes can be in any format, and old-school plain-text still works. JSON just happens to be very handy and very lightweight.
The current output is just a proof of concept and lacks many planned usability enhancements. There are mockups on github.
If you're going to tell me I'm stupid, please read all the other 100 comments doing so first, so we can keep this short for everyone else.

Edit, random fun:

Someone asked for AVS instead of TermKit in the comments... best I could do was JS1K with a PDF surprise:

My JS1K Demo - The Making Of

2010-08-06T00:00:00+02:00

If you haven't seen it yet, check out the JS1K demo contest. The goal is to do something neat in 1 kilobyte of JavaScript code.

I couldn't resist making one myself, so I pulled out my bag of tricks from my Winamp music visualization days and started coding. I'm really happy with how it turned out. And no, it won't work in Internet Explorer 8 or less.

Edit: OH SNAP! I just rewrote the demo to include volumetric light beams and still fit in 1K:

Original Version

Improved Version

Now, whenever size is an issue, the best way to make a small program is to generate all data on the fly, i.e. procedurally. This saves valuable storage space. While this might seem like a black art, often it just comes down to clever use of (high school) math. And as is often the case, the best tricks are also the simplest, as they use the least amount of code.

To illustrate this, I'm going to break down my demo and show you all the major pieces and shortcuts used. Unlike the actual 1K demo, the code snippets here will feature legible spacing and descriptive variable names.

Initialization

JS1K's rules give you a Canvas tag to work with, so the first piece of code initializes it and makes it fill the window.

From then on, it just renders frames of the demo. There are four major parts to this:

Animating the wires
Rotating and projecting the wires into the camera view
Coloring the wires
Animating the camera

All of this is done 30 times per second, using a normal setInterval timer:

setInterval(function () { ... }, 33);

Drawing Wires

The most obvious trick is that everything in the demo is drawn using only a single primitive: a line segment of varying color and stroke width. This allows the whole drawing process to be streamlined into two tight, nested loops. Each inner iteration draws a new line segment from where the previous one ended, while the outer iteration loops over the different wires.

The lines are blended additively, using the built-in 'lighten' mode, which means they can be drawn in any order. This avoids having to manually sort them back-to-front.

To simplify the perspective transformations, I use a coordinate system that places the point (0, 0) in the center of the canvas and ranges from -1 to 1 in both coordinates. This is a compact and convenient way of dealing with varying window sizes, without using up a lot of code:

with (graphics) { ratio = width / height; globalCompositeOperation = 'lighter'; scale(width / 2 / ratio, height / 2); translate(ratio, 1); lineWidthFactor = 45 / height; ...

I add a correction ratio for non-square windows and calculate a reference line width lineWidthFactor for later. Here, I'm using the with construct to save valuable code space, though its use is generally discouraged.

Then there's the two nested for loops: one iterating over the wires, and one iterating over the individual points along each wire. In pseudo-code they look like:

For (12 wires => wireIndex) { Begin new wire For (45 points along each wire => pointIndex) { Calculate path of point on a sphere: (x,y,z) Extrude outwards in swooshes: (x,y,z) Translate and Rotate into camera view: (x,y,z) Project to 2D: (x,y) Calculate color, width and luminance of this line: (r,g,b) (w,l) If (this point is in front of the camera) { If (the last point was visible) { Draw line segment from last point to (x,y) } } else { Mark this point as invisible } Mark beginning of new line segment at (x,y) } }

Mathbending

To generate the wires, I start with a formula which generates a sinuous path on a sphere, using latitude/longitude. This controls the tip of each wire and looks like:

offset = time - pointIndex * 0.03 - wireIndex * 3; longitude = cos(offset + sin(offset * 0.31)) * 2 + sin(offset * 0.83) * 3 + offset * 0.02; latitude = sin(offset * 0.7) - cos(3 + offset * 0.23) * 3;

This is classic procedural coding at its best: take a time-based offset and plug it into a random mish-mash of easily available functions like cosine and sine. Tweak it until it 'does the right thing'. It's a very cheap way of creating interesting, organic looking patterns.

This is more art than science, and mostly just takes practice. Any time spent with a graphical calculator will definitely pay off, as you will know better which mathematical ingredients result in which shapes or patterns. Also, there are a couple of things you can do to maximize the appeal of these formulas.

First, always include some non-linear combinations of operators, e.g. nesting the sin() inside the cos() call. Combined, they are more interesting than when one is merely overlaid on the other. In this case, it turns regular oscillations into time-varying frequencies.

Second, always scale different wave periods using prime numbers. Because primes have no factors in common, this ensures that it takes a very long time before there is a perfect repetition of all the individual periods. Mathematically, the least common multiple of the chosen periods is huge (414253 units ~ 4.8 hours). Plotting the longitude/latitude for offset = 0..600 you get:

The graph looks like a random tangled curve, with no apparent structure, which makes for motions that never seem to repeat. If however, you reduce each constant to only a single significant digit (e.g. 0.31 -> 0.3, 0.83 -> 0.8), then suddenly repetition becomes apparent:

This is because the least common multiple has dropped to 84 units ~ 3.5 seconds. Note that both formulas have the same code complexity, but radically different results. This is why all procedural coding involves some degree of creative fine tuning.

Extrusion

Given the formula for the tip of each wire, I can generate the rest of the wire by sweeping its tail behind it, delayed in time. This is why pointIndex appears as a negative in the formula for offset above. At the same time, I move the points outwards to create long tails.

I also need to convert from lat/long to regular 3D XYZ, which is done using the spherical coordinate transform:

Source: Wikipedia

distance = f.sqrt(pointIndex+.2); x = cos(longitude) * cos(latitude) * distance; y = sin(longitude) * cos(latitude) * distance; z = sin(latitude) * distance;

You might notice that rather than making distance a straight up function of the length pointIndex along the wire, I applied a square root. This is another one of those procedural tricks that seems arbitrary, but actually serves an important visual purpose. This is what the square root looks like (solid curve):

The dotted curve is the square root's derivative, i.e. it indicates the slope of the solid curve. Because the slope goes down with increasing distance, this trick has the effect of slowing down the outward motion of the wires the further they get. In practice, this means the wires are more tense in the middle, and more slack on the outside. It adds just enough faux-physics to make the effect visually appealing.

Rotation and Projection

Once I have absolute 3D coordinates for a point on a wire, I have to render it from the camera's point of view. This is done by moving the origin to the camera's position (X,Y,Z), and applying two rotations: one around the vertical (yaw) and one around the horizontal (pitch). It's like spinning on a wheely chair, while tilting your head up/down.

x -= X; y -= Y; z -= Z; x2 = x * cos(yaw) + z * sin(yaw); y2 = y; z2 = z * cos(yaw) - x * sin(yaw); x3 = x2; y3 = y2 * cos(pitch) + z2 * sin(pitch); z3 = z2 * cos(pitch) - y2 * sin(pitch);

The camera-relative coordinates are projected in perspective by dividing by Z — the further away an object, the smaller it is. Lines with negative Z are behind the camera and shouldn't be drawn. The width of the line is also scaled proportional to distance, and the first line segment of each wire is drawn thicker, so it looks like a plug of some kind:

plug = !pointIndex; lineWidth = lineWidthFactor * (2 + plug) / z3; x = x3 / z3; y = y3 / z3; lineTo(x, y); if (z3 > 0.1) { if (lastPointVisible) { stroke(); } else { lastPointVisible = true; } } else { lastPointVisible = false; } beginPath(); moveTo(x, y);

There's a subtle optimization hiding in plain sight here. Because I'm drawing lines, I need both the previous and the current point's coordinates at each step. To avoid using variables for these, I place the moveTo command at the end of the loop rather than at the beginning. As a result, the previous coordinates are remembered transparently into the next iteration, and all I need to do is make sure the first call to stroke() doesn't happen.

Coloring

Each line segment also needs an appropriate coloring. Again, I used some trial and error to find a simple formula that works well. It uses a sine wave to rotate overall luminance in and out of the (Red, Green, Blue) channels in a deliberately skewed fashion, and shifts the R component slowly over time. This results in a nice varied palette that isn't overly saturated.

pulse = max(0, sin(time * 6 - pointIndex / 8) - 0.95) * 70; luminance = round(45 - pointIndex) * (1 + plug + pulse); strokeStyle='rgb(' + round(luminance * (sin(plug + wireIndex + time * 0.15) + 1)) + ',' + round(luminance * (plug + sin(wireIndex - 1) + 1)) + ',' + round(luminance * (plug + sin(wireIndex - 1.3) + 1)) + ')';

Here, pulse causes bright pulses to run across the wires. I start with a regular sine wave over the length of the wire, but truncate off everything but the last 5% of each crest to turn it into a sparse pulse train:

Camera Motion

With the main visual in place, almost all my code budget is gone, leaving very little room for the camera. I need a simple way to create consistent motion of the camera's X, Y and Z coordinates. So, I use a neat low-tech trick: repeated interpolation. It looks like this:

sample += (target - sample) * fraction

target is set to a random value. Then, every frame, sample is moved a certain fraction towards it (e.g. 0.1). This turns sample into a smoothed version of target. Technically, this is a one-pole low-pass filter.

This works even better when you apply it twice in a row, with an intermediate value being interpolated as well:

intermediate += (target - intermediate) * fraction sample += (intermediate - sample) * fraction

A sample run with target being changed at random might look like this:

You can see that with each interpolation pass, more discontinuities get filtered out. First, jumps are turned into kinks. Then, those are smoothed out into nice bumps.

In my demo, this principle is applied separately to the camera's X, Y and Z positions. Every ~2.5 seconds a new target position is chosen:

if (frames++ > 70) { Xt = random() * 18 - 9; Yt = random() * 18 - 9; Zt = random() * 18 - 9; frames = 0; } function interpolate(a,b) { return a + (b-a) * 0.04; } Xi = interpolate(Xi, Xt); Yi = interpolate(Yi, Yt); Zi = interpolate(Zi, Zt); X = interpolate(X, Xi); Y = interpolate(Y, Yi); Z = interpolate(Z, Zi);

The resulting path is completely smooth and feels quite dynamic.

Camera Rotation

The final piece is orienting the camera properly. The simplest solution would be to point the camera straight at the center of the object, by calculating the appropriate pitch and yaw directly off the camera's position (X,Y,Z):

yaw = atan2(Z, -X); pitch = atan2(Y, sqrt(X * X + Z * Z));

However, this gives the demo a very static, artificial appearance. What's better is making the camera point in the right direction, but with just enough freedom to pan around a bit.

Unfortunately, the 1K limit is unforgiving, and I don't have any space to waste on more 'magic' formulas or interpolations. So instead, I cheat by replacing the formulas above with:

yaw = atan2(Z, -X * 2); pitch = atan2(Y * 2, sqrt(X * X + Z * Z));

By multiplying X and Y by 2 strategically, the formula is 'wrong', but the error is limited to about 45 degrees and varies smoothly. Essentially, I gave the camera a lazy eye, and got the perfect dynamic motion with only 4 bytes extra!

Addendum

After seeing the other demos in the contest, I wasn't so sure about my entry, so I started working on a version 2. The main difference is the addition of glowy light beams around the object.

As you might suspect, I'm cheating massively here: rather than do physically correct light scattering calculations, I'm just using a 2D effect. Thankfully it comes out looking great.

Essentially, I take the rendered image, and process it in a second Canvas that is hidden. This new image is then layered on the original.

I take the image and repeatedly blend it with a zoomed out copy of itself. With every pass the number of copies doubles, and the zoom factor is squared every time. After 3 passes, the image has been smeared out into an 8 = 2³ 'tap' radial blur. I lock the zooming to the center of the 3D object. This makes the beams look like they're part of the 3D world rather than drawn on later.

For additional speed, the beam image is processed at half the resolution. As a side effect, the scaling down acts like a slight blur filter for the beams.

Unfortunately, this effect was not very compact, as it required a lot of drawing mode changes and context switches. I had no room for it in the source code.

So, I had to squeeze out some more room in the original. First, I simplified the various formulas to the bare minimum required for interesting visuals. I replaced the camera code with a much simpler one, and started aggressively shaving off every single byte I could find. Then I got creative, and ended up recreating the secondary canvas every frame just to avoid switching back its state to the default.

Eventually, after a lot of bit twiddling, a version came out that was 1024 bytes long. I had to do a lot of unholy things to get it to fit, but I think the end result is worth it ;).

Closing Thoughts

I've long been a fan of the demo scene, and fondly remember Second Reality in 1993 as my introduction to the genre. Since then, I've always looked at math as a tool to be mastered and wielded rather than subject matter to be absorbed.

With this blog post, I hope to inspire you to take the plunge and see where some simple formulas can take you.

Making Worlds 4 - The Devil's in the Details

2009-12-25T00:00:00+01:00

Last time I'd reached a pretty neat milestone: being able to render a somewhat realistic rocky surface from space. The next step is to add more detail, so it still looks good up close.

Adding detail is, at its core, quite straightforward. I need to increase the resolution of the surface textures, and further subdivide the geometry. Unfortunately I can't just crank both up, because the resulting data is too big to fit in graphics memory. Getting around this will require several changes.

Strategy

Until now, the level-of-detail selection code has only been there to decide which portions of the planet should be drawn on screen. But the geometry and textures to choose from are all prepared up front, at various scales, before the first frame is started. The surface is generated as one high-res planet-wide map, using typical cube map rendering:

This map is then divided into a quad-tree structure of surface tiles. It allows me to adaptively draw the surface at several pre-defined levels of detail, in chunks of various sizes.

Source

This strategy won't suffice, because each new level of detail doubles the work up-front, resulting in exponentially increasing time and memory cost. Instead, I need to write an adaptive system to generate and represent the surface on the fly. This process is driven by the Level-of-Detail algorithm deciding if it needs more detail in a certain area. Unlike before, it will no longer be able to make snap decisions and instant transitions between pre-loaded data: it will need to wait several frames before higher detail data is available.

Uncontrolled growth of increasingly detailed tiles is not acceptable either: I only wish to maintain tiles useful for rendering views from the current camera position. So if a specific detailed portion of the planet is no longer being used—because the camera has moved away from it—it will be discarded to make room for other data.

Generating Individual Tiles

The first step is to be able to generate small portions of the surface on demand. Thankfully, I don't need to change all that much. Until now, I've been generating the cube map one cube face at a time, using a virtual camera at the middle of the cube. To generate only a portion of the surface, I have to narrow the virtual camera's viewing cone and skew it towards a specific point, like so:

This is easy using a mathematical trick called homogeneous coordinates, which are commonly used in 3D engines. This turns 2D and 3D vectors into respectively 3D and 4D. Through this dimensional redundancy, we can then represent most geometrical transforms as a 4x4 matrix multiplication. This covers all transforms that translate, scale, rotate, shear and project, in any combination. The right sequence (i.e. multiplication) of transforms will map regular 3D space onto the skewed camera viewing cone.

Given the usual centered-axis projection matrix, the off-axis projection matrix is found by multiplying with a scale and translate matrix in so-called "screen space", i.e. at the very end. The thing with homogeneous coordinates is that it seems like absolute crazy talk until you get it. I can only recommend you read a good introduction to the concept.

With this in place, I can generate a zoomed height map tile anywhere on the surface. As long as the underlying brushes are detailed enough, I get arbitrarily detailed height textures for the surface. The normal map requires a bit more work however.

Normals and Edges

As I described in my last entry, normals are generated by comparing neighbouring samples in the height map. At the edges of the height map texture, there are no neighbouring samples to use. This wasn't an issue before, because the height map was a seamless planet-wide cube map, and samples were fetched automatically from adjacent cube faces. In an adaptive system however, the map resolution varies across the surface, and there's no guarantee that those neighbouring tiles will be available at the desired resolution.

The easy way out is to make sure the process of generating any single tile is entirely self-sufficient. To do this, I expand each tile with a 1 pixel border when generating it. Each such tile is a perfectly dilated version of its footprint and overlaps with its neighbours in the border area:

This way all the pixels in the undilated area have easily accessible neighbour pixels to sample from. This border is only used during tile generation, and cropped out at the end. Luckily I did something similar when I played with dilated cube maps before, so I already had the technique down. When done correctly, the tiles match up seamlessly without any additional correction.

Adaptive Tree

Now I need to change the data structure holding the mesh. To make it adaptive, I've rewritten it in terms of real-time 'split' and 'merge' operations.

Just like before, the Level-of-Detail algorithm traverses the tree to determine which tiles to render. But if the detail available is not sufficient, the algorithm can decide that a certain tile in the tree needs a more detailed surface texture, or that its geometry should be split up further. Starting with only a single root tile for each cube face, the algorithm divides up the planet surface recursively, quickly converging to a stable configuration around the camera.

As the camera moves around, new tiles are generated, increasing memory usage. To counter this steady stream of new data, the code identifies tiles that fall into disuse and merges them back into their parent. The overall effect is that the tree grows and shrinks depending on the camera position and angle.

Queuing and scheduling

To do all this real-time, I need to queue up the various operations that modify the tree, such as 'split', 'merge' and 'generate new tile'. They need to be executed in between rendering regular frames on screen. Whenever the renderer decides a certain tile is not detailed enough, a request is placed in a job queue to address this.

While continuing to render regular frames, these requests need to be processed. This is harder than it sounds, because both planet rendering and planet generation have to share the GPU, preferably without causing major stutters in rendering speed.

The solution is to spread this process over enough visible frames so that the overal rendering speed is not significantly affected. For example, if a new surface texture is requested, several passes are made. First the height map is rendered, the next frame the normal map is derived from it, then the height/normal maps are analyzed and put into the tree, after which they will finally appear on screen:

I took some inspiration from id Tech 5, the next engine coming from technology powerhouse id Software. They describe a queued job system that covers any frame-to-frame computation in a game engine (from texture management to collision detection), and which schedules tasks intelligently.

Do the Google Earth

With all the above in place, the engine can now progressively increase the detail of the planet across several orders of magnitude. Here's a video that highlights it:

And some shots that show off the detail:

W00t, certainly one of the niftiest things I've built.

Engine tweaks

Along with the architecture changes, I implemented some engine tweaks, noted here for completeness.

In previous comments, Erlend suggested using displacement mapping, so I gave it a shot. Before, the mesh for every tile was calculated on the CPU once, then copied into GPU memory. However, this mesh data was redundant, because it was derived literally from the height map data. Instead I changed it so that now, the transformation of mesh points onto the sphere surface happens real-time on the GPU in a per-vertex program.

This saves memory and pre-calculation time, but increases the rendering load. I'll have to see whether this technique is sustainable, but overall, it seems to be performing just fine. As a side effect, the terrain height map can be changed real-time with very low cost.

Technical hurdles

I spent some time tweaking the engine to run faster, but there's still plenty of work and some technical hurdles to cover.

One involves the Ogre Scene Manager, which is the code object that manages the location of objects in space. In my case, I have to deal with both the 'real world' in space as well as the 'virtual world' of brushes that generate the planet's surface. I chose to use two independent scene managers to represent this, as it seemed like a natural choice. However, it turns out this is unsupported by Ogre and causes random crashes and edge cases. Argh. It looks like I'll have to refactor my code to fix this.

Another major hurdle involves the planet surface itself. Currently I'm still just using a single distored-crater-brush to create it, and the lack of variation is showing.

Finally, surfaces are being generated using 16-bit floating point height values, and their accuracy is not sufficient beyond a couple levels of zooming. This results in ugly bands of flat terrain. To fix this I'll need to increase the surface accuracy.

Future steps

With the basic planet surface covered, I can now start looking at color, atmosphere and clouds. I have plenty of reading and experimentation to do. Thankfully the web is keeping me supplied with a steady stream of awesome papers... nVidia's GPU Gems series has proven to be a gold mine, for example.

Random factoid: what game developers call a "cube map", cartographers call a "cubic gnomonic grid". It turns out that knowing the right terminology is important when you're looking for reference material...

Code

The code is available on GitHub.

References

Great ideas are best discovered when standing on the shoulders of giants:

id Tech 5 Challenges, From Texture Virtualization to Massive Parallelization, J.M.P. van Waveren, id Software
GPU Gems, nVidia

Making Worlds 3 - That's no Moon...

2009-11-05T00:00:00+01:00

It's been over two months since the last installment in this series. Oops. Unfortunately, while trying to get to the next stage of this project, I ran into some walls. My main problem is that I'm not just creating worlds, but also learning to work with the Ogre engine and modern graphics hardware in particular.

This presents some interesting challenges: between my own code and the pixels on the screen, there are no less than three levels of indirection. First, there's Ogre, a complex piece of C++ code that provides me with high-level graphics tools (i.e. objects in space). Ogre talks to OpenGL, which abstracts away low-level graphics operations (i.e. commands necessary to draw a single frame). The OpenGL calls are handed off to the graphics driver, which translates them into operations on the actual hardware (processing vertices and pixels in GPU memory). Given this long dependency chain, it's no surprise that when something goes wrong, it can be hard to pinpoint exactly where the problem lies. In my case, an oversight and misunderstanding of an Ogre feature lead to several days of wasted time and a lot of frustration that made me put aside the project for a while.

With that said, back to the planets...

Normal mapping

Last time, I ended with a bumpy surface, carved by applying brushes to the surface. The geometry was there, but the surface was still just solid white. To make it more visually interesting, I'm going to apply light shading.

The most basic information you need for shading a surface is the surface normal. This is the vector that points straight away from the surface at a particular point. For flat surfaces, the normal is the same everywhere. For curved surfaces, the normal varies continuously across the surface. Typical materials reflect the most light when the surface normal points straight at the light source. By comparing the surface normal with the direction of incoming light (using the vector dot product), you can get a good measure of how bright the surface should be under illumination:

Lighting a surface using its normals.

To use normals for lighting, I have two options. The first is to do this on a geometry basis, assigning a normal to every triangle in the planet mesh. This is straightforward, but ties the quality of the shading to the level of detail in the geometry. A second, better way is to use a normal map. You stretch an image over the surface, as you would for applying textures, but instead of color, each pixel in the image represents a normal vector in 3D. Each pixel's channels (red, green, blue) are used to describe the vector's X, Y and Z values. When lighting the surface, the normal for a particular point is found by looking it up in the normal map.

The benefit of this approach is that you can stretch a high resolution normal map over low resolution geometry, often with almost no visual difference.

Lighting a low-resolution surface using high-resolution normals.

Here's the technique applied to a real model:

(Source - Creative Commons Share-alike Attribution)

Normal mapping helps keep performance up and memory usage down.

Finding Normals

So how do you generate such a normal map, or even a single normal at a single point? There are many ways, but the basic principle is usually the same. First you calculate two different vectors which are tangent to the surface at the point in question. Then you use the cross product to find a vector perpendicular to the two. This third vector is unique and will be the surface normal.

For triangles, you can pick any two triangle edges as vectors. In my case, the surface is described by a heightmap on a sphere, which makes things a bit trickier and requires some math.

I asked my friend Djun Kim, Ph.D. and teacher of mathematics at UBC for help and he recommended Calculus on Manifolds by Michael Spivak. This deceptively small and thin book covers all the basics of calculus in a dense and compact way, and quickly became my new favorite reading material.

Differential Geometry

In this section, I'll describe the formulas needed to calculate the normals of a spherical heightmap. Unlike what I've written before, this will dive shamelessly into specifics and not eschew math. The reason I'm writing it down is because I couldn't find a complete reference online. If math scares you, this section might not be for you. Scroll down until you reach the crater, or take a detour by reading A Mathematician's Lament by Paul Lockhart, which will enlighten you.

First, we're going to derive normals for a regular flat terrain heightmap. To start, we need to define the terrain surface. Starting with a 2D heightmap, i.e. a function f(u,v) of two coordinates that returns a height value, we can create a 3 dimensional surface g:

We can use this formal description to find tangent and normal vectors. A vector is tangent when its direction matches the slope of the surface in a particular direction. Differential math tells us that slope is found by taking the derivative. For our function of 2 variables, that means we can find tangent vectors along curves of constant v or constant u. These curves are the thin grid lines in the diagram. Actually, we can find tangents along any line, in any direction. But along u and v lines, the other variable acts like a constant, which simplifies things.

To do this, we take partial derivatives with respect to u (with v constant) and with respect to v (with u constant). The set of all partial derivatives is called the Jacobian matrix J, whose rows form the tangent vectors t_u and t_v, indicated in red and purple:

The cross product of t_u and t_v gives us n, the surface normal.

When applied to a discrete heightmap, the function f(u,v) is a 2D array map[u][v], and the partial derivatives at the end have to be replaced with something else. We can use finite differences to approximate the slope of the surface by differencing neighbouring samples:

This result and the formula for n are usually provided as-is in terrain mapping guides, without going through the full process of finding tangents first. However, it's important to use the Jacobian matrix formulation once you switch to spherical terrain.

To make a sphere, we add an additional function k which warps the flat terrain into a spherical shell. Each shell is the result of warping a single face of the cubemap and covers exactly 1/6th of the sphere. In what follows, We'll only consider a single face and its shell.

We designate the intermediate pre-warp coordinates (s,t,h), and the final post-warp coordinates as (x,y,z):

The principle behind the spherical mapping is this: first we take the vector (s, t, 1), which lies in the base plane of the flat terrain. We normalize this vector by dividing it by its length w, which has the effect of projecting it onto the sphere: (s/w, t/w, 1/w) will be at unit distance from (0, 0, 0). Then we multiply the resulting vector by the terrain height h to create the terrain on the sphere's surface, relative to its center: (h·s/w, h·t/w, h/w)

Just like with the function g(u,v) and J(u,v), we can find the Jacobian matrix J(s,t,h) of k(s,t,h). Because there are 3 input values for the function k, there are 3 tangents, along curves of varying s (with constant t and h), varying t (constant s and h) and varying h (constant s and t). The three tangents are named t_s, t_t, t_h.

PS: If your skills at derivation are a bit rusty, remember that Wolfram Alpha can do it for you.

How does this help? The three vectors describe a local frame of reference at each point in space. Near the edges of the grid, they get more skewed and angular. We use these vectors to transform the flat frame of reference into the right shape, so we can construct a new 90 degree angle here.

In mathematical terms, we multiply the 'flat' partial derivatives by the Jacobian matrix. This is similar to the chain rule for regular derivatives, only for multiple variables.

That is, to find the partial derivatives (i.e. tangent vectors) of the final spherical terrain with respect to the original terrain coordinates u and v, we can take the flat terrain's tangents t_u and t_v and multiply them by J(s,t,h). Once we have the two post-warp tangents, we take their cross product, and find the normal of the spherical terrain:

It's imporant to note that this is not the same as simply multiplying the flat terrain normal with J(s,t,h). J(s,t,h)'s rows do not form a set of perpendicular vectors (it is not an orthogonal matrix), which means it does not preserve angles between vectors when you multiply by it. In other words, J(s,t,h) * n, with n the flat terrain normal, would not be perpendicular to the spherical terrain. This is why it's important to return to the basic calculus underneath, so we can get the correct, complete formula.

Thus ends the magical math adventure. If you read it all the way through, cheers!

No Wait, It is a Moon.

With the normal map in place, I can now render the planet's surface and get a realistic idea of what it looks like. To show this off, I tweaked the brush system a bit: instead of using the literal brush image (e.g. a smooth, round crater), the brush is distorted with fractal noise. It makes every application of the brush subtly different from the next, and saves me from manually drawing e.g. a hundred different craters.

Here's a side by side comparison of the original brush and a distorted version.

Currently I've only implemented one type of distortion, which lends a rocky appearance to the surface. With that in place, my engine can now generate somewhat realistic looking moon surfaces. Here's the demo:

References

The techniques I used were pioneered by people smarter and older than me, I'm just building my own little digital machine with them.

Creating Spherical Worlds, Maxis/Electronic Arts. (source).
Calculus on Manifolds, Michael Spivak.

Making Worlds 2 - Scaling Heights

2009-08-31T00:00:00+02:00

Last time, I had a working, smooth sphere mesh. The next step is to create terrain.

Scale

Though my goal is to render at a huge range of scales, I'm going to focus on views from space first. That strongly limits how much detail I need to store and render. Aside from being a good initial sandbox in terms of content generation, it also means I can comfortably keep using my current code, which doesn't do any sophisticated memory or resource management yet. I'd much rather work on getting something interesting up first rather than work on invisible infrastructure.

That said, this is not necessarily a limitation. The interesting thing about procedural content is that every generator you build can be combined with many others, including a copy of itself. In the case of terrain, there are definite fractal properties, like self-similarity at different levels of scale. This means that once I've generated the lowest resolution terrain, I can generate smaller scale variations and combine them with the larger levels for more detail. This can be repeated indefinitely and is only limited by the amount of memory available.

Perlin Noise is a celebrated classic procedural algorithm,
often used as a fractal generator.

Height

To build terrain, I need to create heightmaps for all 6 cube faces. Shamelessly stealing more ideas from Spore, I'm doing this on the GPU instead of the CPU, for speed. The GPU normally processes colored pixels, but there's no reason why you can't bind a heightmap's contents as a grayscale (one channel) image and 'draw' into it. As long I build my terrain using simple, repeated drawing operations, this will run incredibly fast.

In this case, I'm stamping various brushes onto the sphere's surface to create bumps and pits. Each brush is a regular PNG image which is projected onto the surface around a particular point. The luminance of the brush's pixels determines whether to raise or lower terrain and by how much.

Three example brushes from Spore. (source)

However, while the brushes need to appear seamless on the final sphere, the drawing area consists only of the straight, square set of cube map faces. It might seem tricky to make this work so that the terrain appears undistorted on the curved sphere grid, but in fact, this distortion is neatly compensated for by good old perspective. All I need to do is set up a virtual scene in 3D, where the brushes are actual shapes hovering around the origin and facing the center. Then, I place a camera in the middle and take a snapshot both ways along each of the main X, Y and Z directions with a perfect 90 degree field of view. The resulting 6 images can then be tiled to form a distortion-free cube map.

Rendering two different cube map faces. The red area is the camera's viewing cone/pyramid, which extends out to infinity.

To get started I built a very simple prototype, using Ogre's scene manager facilities. I'm starting with just a simple, smooth crater/dent brush. I generate all 6 faces in sequence on the GPU, pull the images back to the CPU to create the actual mesh, and push the resulting chunks of geometry into GPU memory. This is only done once at the beginning, although the possibility is there to implement live updates as well.

Here's a demo showing a planet and the brushes that created it, hovering over the surface. I haven't implemented any shading yet, so I have to toggle back and forth to wireframe mode so you can see the dents made by the brushes:

The cubemap for this 'planet' looks like this when flattened. You can see that I haven't actually implemented real terrain carving, because brushes cause sharp edges when they overlap:

The narrow dent on the left gets distorted and angular where it crosses the cube edge. This is a normal consequence of the cubemapping, as it looks perfectly normal when mapped onto the sphere in the video.

Engine Tweaks

The demo above also incorporates a couple of engine improvements. With a real heightmap in place, I can implement real level-of-detail selection. That means the resolution of any terrain tile is decided based on how much detail would be lost if a simpler tile was used. The flatter a tile, the less detail is necessary. This ensures complex geometry is used only on those sections that really need it. This is great for visual fidelity, but causes a lot of geometry to pop up if sharp ridges are present in the terrain. In this case, my rendering engine was happily trying to push 700k triangles through the GPU per frame. While even my laptop GPU can actually do that at pretty smooth frame rates nowadays, some optimizations are in order to give me some breathing room.

The culprit was that I wasn't really doing any early removal of geometry that was hidden or otherwise out of frame. To fix that, I now do visibility checks together with the level-of-detail selection. First it checks if a chunk is over the horizon or not before considering it for selection. This is easy to calculate and eliminates a lot of unnecessary drawing, especially when looking straight down. If that first visibility check passes, I perform a tighter check using the camera's viewing cone. With these two measures in place, I'm only averaging about 50,000-100,000 triangles visible per frame, with room for more optimization. These optimizations only remove geometry that's already off screen, so there is no visual difference.

Cubemap Seams and Dilation

When rendering into cube maps, each side is rendered independently. In theory each face should match perfectly with adjacent ones due to the way they've been created. In practice however, slight mismatches can occur due to rounding errors at the edges, creating seams. This can be fixed by explicitly copying one pixel-wide edges from one face into the adjacent ones, until they all match up.

The next big step is to start shading the surface, but in order to do that I need to be able to run filters on the cube map. Specifically, I need to be able to compare neighbouring height samples anywhere on the surface. In the straight forward cubemap scenario this is non-trivial, because neighbouring samples at the edges need to be fetched from different cube faces at different orientations in space.

I decided to implement something I call 'dilated cubemaps'. I've never really heard this described formally, though I doubt it's never been thought of before:

Instead of every face neatly matching with the next, I dilate the cube faces so they stick through eachother. At the same time, I use a larger texture size to compensate, and I adjust the field-of-view of the rendering camera to match. If done right, the resulting cubemap is a pixel-perfect expanded version of the undilated map.

The dilated cubemap provides reliable neighbouring samples for all samples in the original cube map up to a distance as wide as the new border. Unlike regular cubemap wrapping, the dilated regions are distorted to conform to the current face's grid. This matches the real change in grid direction that occurs on the final sphere mesh and lets you sample exactly across cube map edges.

I played with the cubemap dilation because I was thinking of some complicated filters to run that require regular grids (like CFD). But in retrospect, I probably don't need the exact spacing of sample points at the edges for this, so regular undilated cubemapping will probably do. Still, it's good to have around, and certainly was an interesting exercise in pixel-exact rendering.

What Next?

With basic heightmap generation in place, I can now start putting in some 'tech artist' time to play with various brushes and drawing behaviour. Lighting and shading is another big one and should provide a massive improvement to the visuals.

Right now I've taken a week between postings, though it remains to be seen whether I can maintain that. Creating these blog entries is turning into a pretty time consuming endeavour, especially as I get into territory where I have to make my own diagrams and illustrations.

References

The techniques I used were pioneered by people smarter and older than me, I'm just building my own little digital machine with them.

Creating Spherical Worlds, Maxis/Electronic Arts. (source).
Ken Perlin, who invented a lot of this stuff.

Making Worlds 1 - Of Spheres and Cubes

2009-08-23T00:00:00+02:00

Let's start making some planets! Now, while this started as a random idea kind of project, it was clear from the start that I'd actually need to do a lot of homework for this. Before I could get anywhere, I needed to define exactly what I was aiming for.

The first step in this was to shop around for some inspirational art and reference pictures. While there is plenty space art to be found online, in this case, nothing can substitute for the real thing. So I focused my search on real pictures, both of landscapes (terran or otherwise) as well as from space. I found classy shots like these:

Hopefully I'll be able to render something similar in a while. At the same time, I eagerly devoured any paper I could find on rendering techniques from the past decade, some intended for real-time rendering, some old enough to be real-time today. Out of all this, I quickly settled on my goals:

Represent spherical or lumpy heavenly bodies from asteroids to suns.
With realistic looking topography and features.
Viewable across all scales from surface to space.
At flight-simulator levels of detail.
Rendered with convincing atmosphere, water, clouds, haze.

For most of these points, I found one or more papers describing a useful technique I could use or adapt. At the same time, there are still plenty of unknowns I'll need to figure out along the way, not to mention significant amounts of fudging and experimentation.

The Spherical Grid

To get started I needed to build some geometry, and to do that I needed to figure out what geometry I should use. After reviewing some options, I quickly settled on a regular spherical displacement map (AKA a heightmap). That is, starting with a smooth sphere, move every surface point up or down, perpendicular to the surface, to create terrain on the surface.

If these vertical displacements are very small compared to the sphere radius, this can represent the surface of a typical planet (like Earth) at the levels of detail I'm looking for. If the displacements are of the same order as the sphere radius, you can deform it into very irregular potato-like shapes. The only thing heightmaps can't do is caves, tunnels, overhang and other kinds of holes, which is fine for now.

The big question is, how should the spherical surface be divided up and represented? With a sphere, this is not an easy question, because there is no single obvious way to divide a spherical surface into regular sections or grids. Various techniques exist, each with their own benefits and specific use cases, and I spent quite some time looking into them. Here's a comparison between four different tesselations:

Source

Note that the tesselation labeled ECP is just the regular geographic latitude-longitude grid.

The main features I was looking for were speed and simplicity, so I settled on the 'quadcube'. This is where you start with a cube whose faces have been divided into regular grids, and project every surface point out from the middle to an enclosing sphere. This results in a perfectly smooth sphere, built out of 6 identical round shells with curved edges. This arrangement is better known as the 'cube map' and often used for storing arbitrary 360 degree panorama views.

Here's a cube and its spherical projection:

The projected cube edges are indicated in red. Note that the resulting sphere is perfectly smooth and round, even though the grid has a bulgy appearance.

Cube maps are great, because they are very easy to calculate and do not require complicated trigonometry. In reverse, mapping arbitrary spherical points back onto the cube is even simpler and in fact natively supported by GPUs as a texture mapping feature.

This is important, because I'll be generating the surface terrain and texture dynamically and will need to index and access each surface 'pixel' efficiently. Using a cube map, I simply identify the corresponding face, and then index it using x/y coordinates on the face's grid.

The downside of cube maps is that the distance and area between points varies along the grid, which makes it harder to perform certain operations on a surface equally. However, these area distortions are much smaller than e.g. a lat-long grid, where the grid spacing actually approaches zero near the poles. Even more, the distortions made by a cube map are the exact opposite of those you get with a regular perspective projection. This makes it easy to render into cube maps, which will be useful for texture generation.

Level of Detail

There's another reason I picked the cube map approach, and that has to do with the level of detail requirements. My goal is to make a planet that can be viewed from the ground, the air as well as from space. It would be incredibly slow to always render everything at maximum detail, so I need to adaptively add and remove detail as the viewer gets closer to the surface.

However, increasing the level of detail uniformly across the entire sphere is not enough, because I only want to render detail where the viewer will see it. To a viewer on the ground, most of the planet is hidden by the horizon, and the engine should be able to effectively cut away the unseen pieces, so no wasteful processing takes place.

It is here that I get a huge benefit from the cube map layout of the sphere, because it lets me apply the well-researched realm of grid-based flat terrain rendering with only minor adjustments. Specifically, I am using a 'chunked LOD' approach. Every face of the cube map becomes a quadtree, with each level splitting four ways to form the next level with more detail:

Source

The chunks for the various levels of detail are all loaded into GPU memory, ready to be accessed at any time. When the terrain has to be rendered, the engine walks down the quad-tree, determines the appropriate level-of-detail for each section, and outputs the list of chunks to be rendered for a particular frame. Then, the GPU does its work, blasting through each chunk at a blistering pace, leaving the CPU to do other things.

Because all the data is already in memory, changing the level of detail just means rendering a different set of chunks. Each chunk has the same geometrical complexity, and performance is directly proportional to how many are rendered on screen. More detail means more chunks, but that usually also means you can cut away pieces of the terrain that are far away.

The chunked approach is also very easy to work with, because there is no data dependency between the different chunks. Each chunk has a copy of its own vertex data, which means individual chunks can be paged in and out of GPU memory at will. This is important for keeping memory usage down while still being able to scale to massive sizes.

Putting It All Together

At this point, I have all the pieces in place to render an adaptive sphere mesh. This is what it looks like (sorry, the video capture is a bit jerky):

The detail increases as the camera gets closer to the sphere and shifts around the surface as it moves.

Far from being a little coding experiment, it actually took me quite some time to get to this point, because I was learning OGRE, sharpening my C++ skills, as well as researching the techniques to use.

The next step is to look at generating heightmaps and textures for the surface.

References

The techniques I used were pioneered by people smarter and older than me, I'm just building my own little digital machine with them.

Creating Spherical Worlds, Maxis/Electronic Arts. (source).
Rendering Massive Terrains using Chunked Level of Detail Control, Thatcher Ulrich. (source)

Making Worlds: Introduction

2009-08-22T00:00:00+02:00

For the past year or so I've been reacquainting myself with an old friend: C++.

More specifically, I've been exploring graphics programming again, this time with the luxurious flexibility of the modern GPU at my fingertips. To get me started, I shopped around for an open source engine to play with. After trying Irrlicht and finding its promises to be a bit lacking, Ogre turned out to be a really good choice. Though its architecture is a bit intimidating at first, it is all the more sound. More importantly, it seems to have a relatively healthy open-source community around it.

So with Ogre as my weapon of choice, I've started a new project: Making Worlds. More specifically, I want to procedurally generate a 3D planet, viewable from outer space as well as the ground (at flight-sim levels of detail), which can be rendered real-time on recent graphics hardware.

Why? Because I really like procedural content generation. It's an odd discipline where anything goes, and techniques from across mathematics, engineering and physics are applied. Then, you add a good dose of creativity and artistic sense, and perhaps mix in some real-world data too, until you find something that looks right.

Plus, far from being an exercise in pointlessness, procedural content is gaining in popularity, especially for video games.

So, in the style of Shamus Young's excellent Procedural city series, I'm going to start blogging about Making Planets. Unlike him however, I'm not going to adhere to a strict schedule.

Here's a teaser for the first installment.

JavaScript audio synthesis with HTML 5

2009-08-12T00:00:00+02:00

HTML5 gives us a couple new toys to play with, such as and tags. On the visual side, we've already seen live green-screening with Canvas and JS, and in terms of audio there's been several JS drum machines already. But the question I was interested in was: can you use JavaScript to stream live data into these media tags?

Enter the JavaScript audio synth. It generates a handful of samples using very basic time-domain synthesis, wraps them up in a WAVE file header and embeds them in tags using base64-encoded data URIs. Each sample is then triggered using timers to play the drum pattern. It's quite simple to do and runs fast enough in HTML5 capable browsers to be unnoticeable. Yes, it sounds tinny, but that's just because I'm too lazy to design proper filters for toys like this. Unfortunately, while the synthesis is fast enough to run real-time, you can't actually use it for a full live audio stream, as there is no way to queue up chunks of synthesized audio for seamless playback. I tried triggering multiple tags in parallel to address this, but that didn't work either.

My final attempt was to generate tons of periodic audio loops only a couple of ms long, and to play them back with looping turned on while altering each tag's volume in real time, hence doing a sort of additive wavetable synthesis. Unfortunately, looping is not a fully supported feature, and the only browser I found that does it (Safari) doesn't loop seamlessly at all.

All in all, my first brush with the tag was a major disappointment. The tag's high-level approach leads to similar limitations, but they are offset by the flexibility and power of the

Sadly, undeniably true

2008-01-15T00:00:00+01:00

... at least the last bit:

“when the world ends, the only things left will be cockroaches, rats, Keith Richards, and mangled text that has been escaped one-too-many or one-too-few times” — Dave Walker

(found this little gem via Sam Ruby)

Cocoa, Lemons and Geeks

2006-04-22T00:00:00+02:00

Greetings from Amsterdam. I'm here for the second day of CocoaDevHouse! About 20 geeks have been camping out in the Post C.S. building to gather, discuss, develop and generally have fun.

I've mainly come to work on my first Cocoa app (LemonJuice) and benefit from the expertise of people who actually know the API ;). I've gone from "typing a bunch of random crap" to "doing cool stuff with ridiculously small amounts of code". Cocoa is definitely interesting, and far more powerful than the Windows API I've used for a couple of years.

More details inside...

The application I've been coding is essentially, a WebKit based Terminal. I dislike the fact that a typical bash session in a GUI terminal is not a very consistent experience. For example, bash does not understand the normal home/end keys. And because you're not typing in a normal textbox, you can't select text by holding shift, have spell checking, and non-destructive autocompletion. What's also important to me is that the typical shell command now outputs a very ugly, very UNIXy stream of text. Whereas usually, that information is quite structured (a list of files, a list of key/value pairs, a timestamp, a media file, ...). Files and hyperlinks should be clickable. Tables should be sortable. Etc. etc.

So, I want to use WebKit (the engine behind Safari) to make a Terminal that outputs nicey styled HTML. A directory listing would emerge as a nicely formatted table with icons, for example. But you could just as well pipe that list into another application, without losing the structure (sort of what Microsoft's Monad is supposed to be like).

Of course, I have no clue if I can build such a thing. I'm essentially going blind now, coding until I have a good prototype ;). So far I've got "cd", "ls" and "quit" working. Still, most of the work now has been trying to figure out how I can achieve things in a way that is Cocoa-ish and clean. At least I'm having fun...

Current prototype:

Handy Drupal Core Development

2006-04-12T00:00:00+02:00

Some quick tips for better productivity when developing Drupal core:

Alias your editor to e. If you use a GUI editor see if it comes with a command-line shortcut to use. TextMate by default has mate. Not nearly short enough ;).
Set up a d command to perform diffs. I use the following: #!/bin/bash cvs diff -u -N -F^f . | grep -v -e ^\? > $1.patch e $1.patch This opens up my editor afterwards so I can review the patch before submitting. The grep strips out unnecessary junk (unknown files).
Set up a p command to apply patches. I use the following: #!/bin/bash wget -O - $1 | patch -p0 This will take a patch URL and apply it locally.

Anyone else have anymore ideas?

The Cocoa Journey Begins

2006-04-09T00:00:00+02:00

Life is full of nice surprises.

Last week I decided I would start learning Cocoa (for the unaware, that's one of the two APIs/Frameworks for creating applications for OS X). Partially, because I have some cool ideas I want to try out. Mostly, because I want to be able to make applications that are just as nice as all those other sweet programs I've come to depend on since I joined the cult.

So when I was doing some undirected browsing yesterday I found out that Andy 'Termie' Smith is helping organise CocoaDevHouse Amsterdam. I'm definitely going to go there. It'll be interesting to be a total newbie at a geek event for once ;).

It's only 2.5 hours by regular train, and I have a laptop and a copy of the Apple Developer Docs to keep me occupied on the way. Interesting for those of you on the other side of the pond: the typical Belgian (and generally Western-European) mentality would consider such a trip a significant adventure. That's what happens when everyone is so focused on their own little patch of land. For me, it's now just 'popping over'. I might be back later for Barcamp Amsterdam II too.

Degradable Javascript Widget Fun

2005-10-22T00:00:00+02:00

At BarCamp Amsterdam, I worked with Adrian Rossouw on a UI for styling a website. The result is a pretty cool color picker like in Gimp or Photoshop, but without Flash or Java. It just uses Javascript, CSS and transparent PNGs. It degrades to regular textboxes where you type/paste an HTML color code.

A bit later, Chris Messina suggested a slider control. Not much later, it was finished. It degrades to a plain select box, which is where the slider values are taken from. Its main purpose is to be used to select between options, and not for arbitrary continuous ranges.

These will soon be coming to a Drupal site near you after some more polishing and bug testing. Whether they will be used in Drupal 4.7 remains to be seen (though I can already think of a few spots where they would be useful).

Yay for open-source developer cross pollination :).

Summer of Code - Ajax Functionality for Drupal

2005-09-01T00:00:00+02:00

This last summer I was sponsored by Google as part of their Summer of Code progam to work on Drupal. My goal was to introduce various AJAX functionalities to Drupal.

The official project description was:

"Drupal has recently begun to find meaningful ways to introduce AJAX functionality with the goal of improving the user experience. Work with Drupal's usability experts to identify the next steps and help implement new dynamic functions based on interaction with the XMLHttpRequest object."

I focused on the following Ajax-powered features:

Inline Editing of posts: Though I built a working prototype module, I decided not to develop this feature further because it is not flexible enough to work as a generic Drupal module. It would break on too many configurations and has limited usefulness anyhow.
Uploading of files: allows you to attach files to Drupal nodes (with upload.module) without having to reload the page.
Sorting tables inside a page: this changes the sort order of a table without reloading the entire page. It is not client-side sorting as you'd expect at first sight: because most tables in Drupal are spread across multiple pages, client-side sorting is not very useful.
Switching between multiple pages: this was implemented on top of the sorting functionality, and only works on paged tables (this covers most of the useful pagers though).
Progressbar widget: a typical progressbar that fetches the status from the server through Ajax.

The resulting code can be found in my sandbox in the Drupal contributions repository. Note however that most of the code is in patches against the (rapidly changing) Drupal HEAD, so they are likely to go out of date soon.

The file uploader is now already part of the Drupal HEAD, and at least the tablesorter is sitting in the patch queue being reviewed. I will try and keep them up to date.

A big thanks goes to Google for organising the Summer of Code!

PHP, Unicode and ostriches.

2005-03-25T00:00:00+01:00

Update: I've written a follow-up post that describes how I would like PHP's encoding support to be.

As the resident encoding geek on the Drupal team, it's usually my job to make sure Drupal handles encodings and Unicode correctly. I don't mind doing this, but PHP doesn't exactly make it easy. With the new search.module for Drupal 4.6 being Unicode-aware, this has become very obvious, as we've had to bump up the minimum required version of PHP to 4.3.3. The UTF-8 support in the Perl-compatible regular expressions in PHP 4.3.2 and earlier is completely broken. And now I've had a bug report about someone on PHP 4.3.8 who still had problems getting it to work.

I don't know why exactly, but as far as encodings go PHP is still in the stone-age. This is odd, as you'd expect a web-oriented scripting language to have excellent support for sharing and exchanging textual information. There is a multi-byte string extension available, but it's not available on 90% of PHP hosts out there, and it's more of a black-box library anyway: it does not present you your strings as Unicode character codepoints, but still as an array of bytes. Furthermore, if you actually enable the mbstring overrides, you lose the ability to work with bytes at will. Apparently, the PHP team still hasn't figured out that bytes and characters are not the same. The other extensions which deal with encodings (iconv, recode) are also unavailable on the majority of PHP installs out there.

This means that if you want to make a PHP application which supports any language and runs on the average PHP host out there, that there's only one option: use UTF-8 internally, and write your own functions for string truncation, email header encoding, validation, etc. Using UTF-8 ensures that you only have one encoding to worry about and because it's Unicode it is guaranteed to be able to represent any language. Of course, you will no longer be able to do something simple as upper/lowercasing a string, as these PHP functions don't take UTF-8 at all.

What PHP needs is Unicode string support in the core, along with a good library of useful functions for handling the very large Unicode character range efficiently. ASP, Perl, Python, Java all have it... for me, it's the only thing that would've made PHP5 worth to upgrade to.

It's as if the entire PHP team has stuck their head in the ground, hoping that all this Unicode stuff will somehow blow over. It won't.

Sprankle Character Map

2004-12-23T00:00:00+01:00

It hit me a while ago that entering characters which are not available on your keyboard or through your IME is much too complicated. Usually it involves opening up some character map, scrolling through hundreds of symbols to find the one you need and copy/pasting it into the application of your choice.

Not very handy. Enter Sprankle Character Map. The idea is to hit a special key combination when typing (WIN + S for Sprankle) which pops up a character map where you are typing. You then type a symbol to find similar characters and choose one from the list using either numbers or arrows + space. Here's how it looks.

This is just a prototype, but it demonstrates the idea nicely and it's actually pretty usable. Certainly better than firing up a full character map every time.

Notes:

Sprankle is a Unicode-application and only runs on Windows 2000/XP.
The map appears on top of the current text field. For large, multi-line text fields this is far from ideal. It would be better to have it appear at the current caret position.
Sprankle doesn't work on Mozilla Firefox (or other applications that do special keyboard processing). If anyone has an idea on how to fix this, please tell.
It might be better to implement Sprankle as a real IME so it integrates completely with the text field. I have no idea how to do this though, but I'm sure MSDN has some documentation about it. The downside would be that it might not work in combination with existing IMEs (e.g. for Japanese).
Many of the symbols in the character set are not present in most fonts. Sprankle currently looks for Arial Unicode MS, the universal font that comes with XP and Office.
It might be cool to make a JavaScript version of this, so it can be integrated on websites with CMSes like Drupal.
You can customize Sprankle's character sets by editing sprankle.txt (UTF-16LE encoded). Right now it covers most of the Latin characters, basic Greek plus some math symbols.

Download Sprankle (source + win32 binary).

Drupal search improvements

2004-10-28T00:00:00+02:00

I've finished up my search improvements patch for Drupal. It is now ready to be committed to core (if approved).

Changes:

Clean up the text analyser: make it handle UTF-8 and all sorts of characters. The word splitter now does intelligent splitting into words and supports all Unicode characters. It has smart handling of acronyms, URLs, dates, ...
It now indexes the filtered output, which means it can take advantage of HTML tags. Meaningful tags (headers, strong, em, ...) are analysed and used to boost certain words scores. This has the side-effect of allowing the indexing of PHP nodes.
Link analyser for node links. The HTML analyser also checks for links. If they point to a node on the current site (handles path aliases) then the link's words are counted as part of the target node. This helps bring out commonly linked FAQs and answers to the top of the results.
Index comments along with the node. This means that the search can make a difference between a single node/comment about 'X' and a whole thread about 'X'. It also makes the search results much shorter and more relevant (before this patch, comments were even shown first).
We now keep track of total counts as well as a per item count for a word. This allows us to divide the word score by the total before adding up the scores for different words, and automatically makes noisewords have less influence than rare words. This dramatically improves the relevancy of multiword searches. This also makes the disadvantage of now using OR searching instead of AND searching less problematic.
Includes support for text preprocessors through a hook. This is required to index Chinese and Japanese, because these languages do not use spaces between words. An external utility can be used to split these into words through a simple wrapper module. Other uses could be spell checking (although it would have no UI).
Indexing is now regulated: only a certain amount of items will be indexed per cron run. This prevents PHP from running out of memory or timing out. This also makes the reindexing required for this patch automatic. I also added an index coverage estimate to the search admin screen.
Code cleanup! Moved all the search stuff from common.inc into search.module, rewired some hooks and simplified the functions used. The search form and results now also use valid XHTML and form_ functions. The search admin was moved from search/configure to admin/search for consistency.
Improved search output: we also show much more info per item: date, author, node type, amount of comments and a cool dynamic excerpt à la Google. The search form is now much more simpler and the help is only displayed as tips when no search results are found.
By moving all search logic to SQL, I was able to add a pager to the search results. This improves usability and performance dramatically.

UFPDF: Unicode/UTF-8 extension for FPDF

2004-09-01T00:00:00+02:00

Note: I wrote UFPDF as an experiment, not as a finished product. If you have problems using it, don't bug me for support. Patches are welcome though, but I don't have much time to maintain this.

FPDF is a PHP class for generating PDF files on-the-fly. Unfortunately it does not support Unicode. So I've coded UFPDF, an extension of FPDF which accepts input in UTF-8.

Only TrueType fonts are supported for now. To embed .TTF files, you need to extract the font metrics and build the required tables using the provided utilities (see README.txt). Included is a modified version of TTF2PT1 which extracts the Unicode glyph info.

UFPDF works the same as FPDF, except that all text is in UTF-8, so consult the FPDF documentation for usage.

Download UFPDF Example PDF

UTF-8 conversion support for mIRC

2004-07-13T00:00:00+02:00

mIRC's lack of UTF-8 support has been an issue for quite some time. The author promised to 'look at it', but in the meantime, chatting in UTF-8 is not possible. This is problematic for any language that uses more than the occasional accented letter.

So I decided to make a temporary fix myself. The result is a flexible conversion mechanism between UTF-8 and the ANSI codepages. The user sees and types regular ANSI characters, but all data which is sent to and received from the IRC server is UTF-8 encoded. You are still limited to one ANSI codepage though: making mIRC support real Unicode is not possible without an mIRC rewrite.

The script performs a real UTF-8 encoding/decoding, so unlike a simple 'find and replace' approach, characters which do not fit into the current codepage are indicated as such.

I included conversion tables for all of the Windows ANSI codepages:

1250 (ANSI - Central Europe)
1251 (ANSI - Cyrillic)
1252 (ANSI - Western Europe / Latin I)
1253 (ANSI - Greek)
1254 (ANSI - Turkish)
1255 (ANSI - Hebrew)
1256 (ANSI - Arabic)
1257 (ANSI - Baltic)
1258 (ANSI/OEM - Viet Nam)

There is also a little utility (with source) for generating conversion tables for more codepages.

For instructions on how to use it, check the top of the utf-8.mrc file. You can download the script here (19 KB).

Important: This script is provided as-is without any guarantees. Use it if you like it, but don't bug me if you can't get it to work. If you find bugs, feel free to report them, but try to give a little more information than just 'it doesn't work'.

Drupal filter system updating

2004-07-03T00:00:00+02:00

For a long time people have been complaining about the filter system in Drupal. This is the part that handles the transformation from user-supplied input into the HTML output, and takes such responsibilities like HTML tag stripping, code tags, auto-links, etc.

Like most parts of Drupal, it's very modular and pluggable. Still, it doesn't do what most people want. In fact, it lacks some features which are present in most other CMS. To address these issues I've been thinking about a major filter system upgrade for a couple of months, but I haven't had time to actually do it, until now.

The root of the problem is that Drupal only has one global filtering profile: the same settings and rules are applied to all input, regardless of who posted it or where it was posted. Administrators cannot have looser filters than anonymous visitors. In some cases (blocks, book and site pages), some customizability is available through a module-specific selector for text, HTML or PHP code, which is then only available to administrators, but this is independent of the filter system.

My solution basically consists of multiple filter profiles.

Instead of one global profile, administrators will be free to define as many profiles as they want. Each profile contains its own filter configuration: which filters are enabled, in what order and with what settings. Access to filter profiles is configurable with roles.

In addition to this, some small extra filters will be created out of current pieces of code. For example, one for evaluating PHP code. Instead of block, book and page.module each having a PHP type, the admin can simply set up a PHP filtering profile, restricted to admins, and enter content with that type in the blocks and pages to be run as PHP code.

For anonymous users, only one profile is likely to be available, and in that case nothing changes for them. Only when multiple profiles are enabled do you get a selector (dropdown or radio) below a textarea to choose the format.

Now, the idea sounds nice, but how do we implement it?

1) Filters need to be made profile-aware. Since the filter-ordering changes in 4.4, filters have grown already from simple hooks to registered things. We simply expand the filter hook and require modules to store information per-profile. This is not a problem because most configuration is done with Drupal variables anyway: simple prefixing will work. For complex filters which have extra setting pages, the module can decide to have global settings or per-profile settings itself. For example, smileys.module will probably not require separate sets of smileys per profile: you either have smileys enabled or you don't.

2) How to store type information Secondly, and this is the biggest problem, is where and how to store the information about which profile a particular piece of content uses. Either we provide a function to output a profile form selector and put the responsibility for using it in modules, or we simply include the selector with form_textarea, and pick a standard format for handling metadata about pieces of text (a fieldname_meta column for fieldname? change textfields into arrays with 'text' 'type' members?). I prefer the form_textarea method because it fits in with how we now handle tips about filtering below textareas.

3) Updating modules that display content When a module has to display a piece of user-supplied text and passes it to check_output, it would also have to pass the profile used. This is all that is needed, so it keeps the hassle minimal. Checking which profile can be used and who used it is done on submission, not on viewing, so no permissions checks have to be done when filtering takes place.

4) Include a modified, profile-aware filtercache The additional complexity would reduce performance a bit, but the increase in power would be huge. On top of that, many people agree with me that filtercache offers significant speedups for a site that uses any sort of non-trivial filtering, so I will push for inclusion of a (modified) filtercache along with the major changes.

5) Handle editboxes A final problem is what to do with regular editboxes. Right now there is no consensus whether or not to filter them. Some people want to used HTML in them, others don't. Including a type selector for every editbox is unnecessary and nearly impossible to do from a UI point of view, so I would instead just let the admin choose one profile to be used for non-textarea content which would default to 'plain-text'.

A typical set-up of profiles would be: - Filtered HTML: default type for regular visitors, HTML is limited to a set of allowed tags, CSS can be stripped. - Plain-text: default type for editboxes - Raw HTML: only for admins, performs no filtering on the HTML - PHP: only for admins, executes the PHP code and outputs the result

Filters like Textile can either be used as the default profile or as an extra profile for those who want to give their visitors a choice. People who do not need any filtering complexity simply use the same setup as before and nothing changes for visitors. The only difference is that they, as admins, still have more control and options for filtering.