Acko.net

I is for Intent

2024-02-05T00:00:00+01:00

Why your app turned into spaghetti

"I do not like your software sir,
your architecture's poor.

Your users can't do anything,
unless you code some more.

This isn't how it used to be,
we had this figured out.

But then you had to mess it up
by moving into clouds."

There's a certain kind of programmer. Let's call him Stanley.

Stanley has been around for a while, and has his fair share of war stories. The common thread is that poorly conceived and specced solutions lead to disaster, or at least, ongoing misery. As a result, he has adopted a firm belief: it should be impossible for his program to reach an invalid state.

Stanley loves strong and static typing. He's a big fan of pattern matching, and enums, and discriminated unions, which allow correctness to be verified at compile time. He also has strong opinions on errors, which must be caught, logged and prevented. He uses only ACID-compliant databases, wants foreign keys and triggers to be enforced, and wraps everything in atomic transactions.

He hates any source of uncertainty or ambiguity, like untyped JSON or plain-text markup. His APIs will accept data only in normalized and validated form. When you use a Stanley lib, and it doesn't work, the answer will probably be: "you're using it wrong."

Stanley is most likely a back-end or systems-level developer. Because nirvana in front-end development is reached when you understand that this view of software is not just wrong, but fundamentally incompatible with the real world.

I will prove it.

State Your Intent

Take a text editor. What happens if you press the up and down arrows?

The keyboard cursor (aka caret) moves up and down. Duh. Except it also moves left and right.

The editor state at the start has the caret on line 1 column 6. Pressing down will move it to line 2 column 6. But line 2 is too short, so the caret is forcibly moved left to column 1. Then, pressing down again will move it back to column 6.

It should be obvious that any editor that didn't remember which column you were actually on would be a nightmare to use. You know it in your bones. Yet this only works because the editor allows the caret to be placed on a position that "does not exist." What is the caret state in the middle? It is both column 1 and column 6.

To accommodate this, you need more than just a View that is a pure function of a State, as is now commonly taught. Rather, you need an Intent, which is the source of truth that you mutate... and which is then parsed and validated into a State. Only then can it be used by the View to render the caret in the right place.

To edit the intent, aka what a classic Controller does, is a bit tricky. When you press left/right, it should determine the new Intent.column based on the validated State.column +/- 1. But when you press up/down, it should keep the Intent.column you had before and instead change only Intent.line. New intent is a mixed function of both previous intent and previous state.

The general pattern is that you reuse Intent if it doesn't change, but that new computed Intent should be derived from State. Note that you should still enforce normal validation of Intent.column when editing too: you don't allow a user to go past the end of a line. Any new intent should be as valid as possible, but old intent should be preserved as is, even if non-sensical or inapplicable.

Functionally, for most of the code, it really does look and feel as if the state is just State, which is valid. It's just that when you make 1 state change, the app may decide to jump into a different State than one would think. When this happens, it means some old intent first became invalid, but then became valid again due to a subsequent intent/state change.

This is how applications actually work IRL. FYI.

Knives and Forks

I chose a text editor as an example because Stanley can't dismiss this as just frivolous UI polish for limp wristed homosexuals. It's essential that editors work like this.

The pattern is far more common than most devs realize:

A tree view remembers the expand/collapse state for rows that are hidden.
Inspector tabs remember the tab you were on, even if currently disabled or inapplicable.
Toggling a widget between type A/B/C should remember all the A, B and C options, even if mutually exclusive.

All of these involve storing and preserving something unknown, invalid or unused, and bringing it back into play later.

More so, if software matches your expected intent, it's a complete non-event. What looks like a "surprise hidden state transition" to a programmer is actually the exact opposite. It would be an unpleasant surprise if that extra state transition didn't occur. It would only annoy users: they already told the software what they wanted, but it keeps forgetting.

The ur-example is how nested popup menus should work: good implementations track the motion of the cursor so you can move diagonally from parent to child, without falsely losing focus:

This is an instance of the goalkeeper's curse: people rarely compliment or notice the goalkeeper if they do their job, only if they mess up. Successful applications of this principle are doomed to remain unnoticed and unstudied.

Validation is not something you do once, discarding the messy input and only preserving the normalized output. It's something you do continuously and non-destructively, preserving the mess as much as possible. It's UI etiquette: the unspoken rules that everyone expects but which are mostly undocumented folklore.

This poses a problem for most SaaS in the wild, both architectural and existential. Most APIs will only accept mutations that are valid. The goal is for the database to be a sequence of fully valid states:

The smallest possible operation in the system is a fully consistent transaction. This flattens any prior intent.

In practice, many software deviates from this ad-hoc. For example, spreadsheets let you create cyclic references, which is by definition invalid. The reason it must let you do this is because fixing one side of a cyclic reference also fixes the other side. A user wants and needs to do these operations in any order. So you must allow a state transition through an invalid state:

This requires an effective Intent/State split, whether formal or informal.

Because cyclic references can go several levels deep, identifying one cyclic reference may require you to spider out the entire dependency graph. This is functionally equivalent to identifying all cyclic references—dixit Dijkstra. Plus, you need to produce sensible, specific error messages. Many "clever" algorithmic tricks fail this test.

Now imagine a spreadsheet API that doesn't allow for any cyclic references ever. This still requires you to validate the entire resulting model, just to determine if 1 change is allowed. It still requires a general validate(Intent). In short, it means your POST and PUT request handlers need to potentially call all your business logic.

That seems overkill, so the usual solution is bespoke validators for every single op. If the business logic changes, there is a risk your API will now accept invalid intent. And the app was not built for that.

If you flip it around and assume intent will go out-of-bounds as a normal matter, then you never have this risk. You can write the validation in one place, and you reuse it for every change as a normal matter of data flow.

Note that this is not cowboy coding. Records and state should not get irreversibly corrupted, because you only ever use valid inputs in computations. If the system is multiplayer, distributed changes should still be well-ordered and/or convergent. But the data structures you're encoding should be, essentially, entirely liberal to your user's needs.

Consider git. Here, a "unit of intent" is just a diff applied to a known revision ID. When something's wrong with a merge, it doesn't crash, or panic, or refuse to work. It just enters a conflict state. This state is computed by merging two incompatible intents.

It's a dirty state that can't be turned into a clean commit without human intervention. This means git must continue to work, because you need to use git to clean it up. So git is fully aware when a conflict is being resolved.

As a general rule, the cases where you actually need to forbid a mutation which satisfies all the type and access constraints are small. A good example is trying to move a folder inside itself: the file system has to remain a sensibly connected tree. Enforcing the uniqueness of names is similar, but also comes with a caution: falsehoods programmers believe about names. Adding (Copy) to a duplicate name is usually better than refusing to accept it, and most names in real life aren't unique at all. Having user-facing names actually requires creating tools and affordances for search, renaming references, resolving duplicates, and so on.

Even among front-end developers, few people actually grok this mental model of a user. It's why most React(-like) apps in the wild are spaghetti, and why most blog posts about React gripes continue to miss the bigger picture. Doing React (and UI) well requires you to unlearn old habits and actually design your types and data flow so it uses potentially invalid input as its single source of truth. That way, a one-way data flow can enforce the necessary constraints on the fly.

The way Stanley likes to encode and mutate his data is how programmers think about their own program: it should be bug-free and not crash. The mistake is to think that this should also apply to any sort of creative process that program is meant to enable. It would be like making an IDE that only allows you to save a file if the code compiles and passes all the tests.

Trigger vs Memo

Coding around intent is a very hard thing to teach, because it can seem overwhelming. But what's overwhelming is not doing this. It leads to codebases where every new feature makes ongoing development harder, because no part of the codebase is ever finished. You will sprinkle copies of your business logic all over the place, in the form of request validation, optimistic local updaters, and guess-based cache invalidation.

If this is your baseline experience, your estimate of what is needed to pull this off will simply be wrong.

In the traditional MVC model, intent is only handled at the individual input widget or form level. e.g. While typing a number, the intermediate representation is a string. This may be empty, incomplete or not a number, but you temporarily allow that.

I've never seen people formally separate Intent from State in an entire front-end. Often their state is just an adhoc mix of both, where validation constraints are relaxed in the places where it was most needed. They might just duplicate certain fields to keep a validated and unvalidated variant side by side.

There is one common exception. In a React-like, when you do a useMemo with a derived computation of some state, this is actually a perfect fit. The eponymous useState actually describes Intent, not State, because the derived state is ephemeral. This is why so many devs get lost here.

const state = useMemo(
  () => validate(intent),
  [intent]
);

Their usual instinct is that every action that has knock-on effects should be immediately and fully realized, as part of one transaction. Only, they discover some of those knock-on effects need to be re-evaluated if certain circumstances change. Often to do so, they need to undo and remember what it was before. This is then triggered anew via a bespoke effect, which requires a custom trigger and mutation. If they'd instead deferred the computation, it could have auto-updated itself, and they would've still had the original data to work with.

e.g. In a WYSIWYG scenario, you often want to preview an operation as part of mouse hovering or dragging. It should look like the final result. You don't need to implement custom previewing and rewinding code for this. You just need the ability to layer on some additional ephemeral intent on top of the intent that is currently committed. Rewinding just means resetting that extra intent back to empty.

You can make this easy to use by treating previews as a special kind of transaction: now you can make preview states with the same code you use to apply the final change. You can also auto-tag the created objects as being preview-only, which is very useful. That is: you can auto-translate editing intent into preview intent, by messing with the contents of a transaction. Sounds bad, is actually good.

The same applies to any other temporary state, for example, highlighting of elements. Instead of manually changing colors, and creating/deleting labels to pull this off, derive the resolved style just-in-time. This is vastly simpler than doing it all on 1 classic retained model. There, you run the risk of highlights incorrectly becoming sticky, or shapes reverting to the wrong style when un-highlighted. You can architect it so this is simply impossible.

The trigger vs memo problem also happens on the back-end, when you have derived collections. Each object of type A must have an associated type B, created on-demand for each A. What happens if you delete an A? Do you delete the B? Do you turn the B into a tombstone? What if the relationship is 1-to-N, do you need to garbage collect?

If you create invisible objects behind the scenes as a user edits, and you never tell them, expect to see a giant mess as a result. It's crazy how often I've heard engineers suggest a user should only be allowed to create something, but then never delete it, as a "solution" to this problem. Everyday undo/redo precludes it. Don't be ridiculous.

The problem is having an additional layer of bookkeeping you didn't need. The source of truth was collection A, but you created a permanent derived collection B. If you instead make B ephemeral, derived via a stateless computation, then the problem goes away. You can still associate data with B records, but you don't treat B as the authoritative source for itself. This is basically what a WeakMap is.

In database land this can be realized with a materialized view, which can be incremental and subscribed to. Taken to its extreme, this turns into event-based sourcing, which might seem like a panacea for this mindset. But in most cases, the latter is still a system by and for Stanley. The event-based nature of those systems exists to support housekeeping tasks like migration, backup and recovery. Users are not supposed to be aware that this is happening. They do not have any view into the event log, and cannot branch and merge it. The exceptions are extremely rare.

It's not a system for working with user intent, only for flattening it, because it's append-only. It has a lot of the necessary basic building blocks, but substitutes programmer intent for user intent.

What's most nefarious is that the resulting tech stacks are often quite big and intricate, involving job queues, multi-layered caches, distribution networks, and more. It's a bunch of stuff that Stanley can take joy and pride in, far away from users, with "hard" engineering challenges. Unlike all this *ugh* JavaScript, which is always broken and unreliable and uninteresting.

Except it's only needed because Stanley only solved half the problem, badly.

Patch or Bust

When factored in from the start, it's actually quite practical to split Intent from State, and it has lots of benefits. Especially if State is just a more constrained version of the same data structures as Intent. This doesn't need to be fully global either, but it needs to encompass a meaningful document or workspace to be useful.

It does create an additional problem: you now have two kinds of data in circulation. If reading or writing requires you to be aware of both Intent and State, you've made your code more complicated and harder to reason about.

More so, making a new Intent requires a copy of the old Intent, which you mutate or clone. But you want to avoid passing Intent around in general, because it's fishy data. It may have the right types, but the constraints and referential integrity aren't guaranteed. It's a magnet for the kind of bugs a type-checker won't catch.

I've published my common solution before: turn changes into first-class values, and make a generic update of type Update be the basic unit of change. As a first approximation, consider a shallow merge {...value, ...update}. This allows you to make an updateIntent(update) function where update only specifies the fields that are changing.

In other words, Update looks just like Update and can be derived 100% from State, without Intent. Only one place needs to have access to the old Intent, all other code can just call that. You can make an app intent-aware without complicating all the code.

If your state is cleaved along orthogonal lines, then this is all you need. i.e. If column and line are two separate fields, then you can selectively change only one of them. If they are stored as an XY tuple or vector, now you need to be able to describe a change that only affects either the X or Y component.

const value = {
  hello: 'text',
  foo: { bar: 2, baz: 4 },
};

const update = {
  hello: 'world',
  foo: { baz: 50 },
};

expect(
  patch(value, update)
).toEqual({
  hello: 'world',
  foo: { bar: 2, baz: 50 },
});

So in practice I have a function patch(value, update) which implements a comprehensive superset of a deep recursive merge, with full immutability. It doesn't try to do anything fancy with arrays or strings, they're just treated as atomic values. But it allows for precise overriding of merging behavior at every level, as well as custom lambda-based updates. You can patch tuples by index, but this is risky for dynamic lists. So instead you can express e.g. "append item to list" without the entire list, as a lambda.

I've been using patch for years now, and the uses are myriad. To overlay a set of overrides onto a base template, patch(base, overrides) is all you need. It's the most effective way I know to erase a metric ton of {...splats} and ?? defaultValues and != null from entire swathes of code. This is a real problem.

You could also view this as a "poor man's OT", with the main distinction being that a patch update only describes the new state, not the old state. Such updates are not reversible on their own. But they are far simpler to make and apply.

It can still power a global undo/redo system, in combination with its complement diff(A, B): you can reverse an update by diffing in the opposite direction. This is an operation which is formalized and streamlined into revise(…), so that it retains the exact shape of the original update, and doesn't require B at all. The structure of the update is sufficient information: it too encodes some intent behind the change.

With patch you also have a natural way to work with changes and conflicts as values. The earlier WYSIWIG scenario is just patch(commited, ephemeral) with bells on.

The net result is that mutating my intent or state is as easy as doing a {...value, ...update} splat, but I'm not falsely incentivized to flatten my data structures.

Instead it frees you up to think about what the most practical schema actually is from the data's point of view. This is driven by how the user wishes to edit it, because that's what you will connect it to. It makes you think about what a user's workspace actually is, and lets you align boundaries in UX and process with boundaries in data structure.

Remember: most classic "data structures" are not about the structure of data at all. They serve as acceleration tools to speed up specific operations you need on that data. Having the reads and writes drive the data design was always part of the job. What's weird is that people don't apply that idea end-to-end, from database to UI and back.

SQL tables are shaped the way they are because it enables complex filters and joins. However, I find this pretty counterproductive: it produces derived query results that are difficult to keep up to date on a client. They also don't look like any of the data structures I actually want to use in my code.

A Bike Shed of Schemas

This points to a very under-appreciated problem: it is completely pointless to argue about schemas and data types without citing specific domain logic and code that will be used to produce, edit and consume it. Because that code determines which structures you are incentivized to use, and which structures will require bespoke extra work.

From afar, column and line are just XY coordinates. Just use a 2-vector. But once you factor in the domain logic and etiquette, you realize that the horizontal and vertical directions have vastly different rules applied to them, and splitting might be better. Which one do you pick?

This applies to all data. Whether you should put items in a List or a Map largely depends on whether the consuming code will loop over it, or need random access. If an API only provides one, consumers will just build the missing Map or List as a first step. This is O(n log n) either way, because of sorting.

The method you use to read or write your data shouldn't limit use of everyday structure. Not unless you have a very good reason. But this is exactly what happens.

A lot of bad choices in data design come down to picking the "wrong" data type simply because the most appropriate one is inconvenient in some cases. This then leads to Conway's law, where one team picks the types that are most convenient only for them. The other teams are stuck with it, and end up writing bidirectional conversion code around their part, which will never be removed. The software will now always have this shape, reflecting which concerns were considered essential. What color are your types?

{
  order: [4, 11, 9, 5, 15, 43],
  values: {
    4: {...},
    5: {...},
    9: {...},
    11: {...},
    15: {...},
    43: {...},
  },
);

For List vs Map, you can have this particular cake and eat it too. Just provide a List for the order and a Map for the values. If you structure a list or tree this way, then you can do both iteration and ID-based traversal in the most natural and efficient way. Don't underestimate how convenient this can be.

This also has the benefit that "re-ordering items" and "editing items" are fully orthogonal operations. It decomposes the problem of "patching a list of objects" into "patching a list of IDs" and "patching N separate objects". It makes code for manipulating lists and trees universal. It lets you to decide on a case by case basis whether you need to garbage collect the map, or whether preserving unused records is actually desirable.

Limiting it to ordinary JSON or JS types, rather than going full-blown OT or CRDT, is a useful baseline. With sensible schema design, at ordinary editing rates, CRDTs are overkill compared to the ability to just replay edits, or notify conflicts. This only requires version numbers and retries.

Users need those things anyway: just because a CRDT converges when two people edit, doesn't mean the result is what either person wants. The only case where OTs/CRDTs are absolutely necessary is rich-text editing, and you need bespoke UI solutions for that anyway. For simple text fields, last-write-wins is perfectly fine, and also far superior to what 99% of RESTy APIs do.

A CRDT is just a mechanism that translates partially ordered intents into a single state. Like, it's cool that you can make CRDT counters and CRDT lists and whatnot... but each CRDT implements only one particular resolution strategy. If it doesn't produce the desired result, you've created invalid intent no user expected. With last-write-wins, you at least have something 1 user did intend. Whether this is actually destructive or corrective is mostly a matter of schema design and minimal surface area, not math.

The main thing that OTs and CRDTs do well is resolve edits on ordered sequences, like strings. If two users are typing text in the same doc, edits higher-up will shift edits down below, which means the indices change when rebased. But if you are editing structured data, you can avoid referring to indices entirely, and just use IDs instead. This sidesteps the issue, like splitting order from values.

For the order, there is a simple solution: a map with a fractional index, effectively a dead-simple list CRDT. It just comes with some overhead.

Using a CRDT for string editing might not even be enough. Consider Google Docs-style comments anchored to that text: their indices also need to shift on every edit. Now you need a bespoke domain-aware CRDT. Or you work around it by injecting magic markers into the text. Either way, it seems non-trivial to decouple a CRDT from the specific target domain of the data inside. The constraints get mixed in.

If you ask me, this is why the field of real-time web apps is still in somewhat of a rut. It's mainly viewed as a high-end technical problem: how do we synchronize data structures over a P2P network without any data conflicts? What they should be asking is: what is the minimal amount of structure we need to reliably synchronize, so that users can have a shared workspace where intent is preserved, and conflicts are clearly signposted. And how should we design our schemas, so that our code can manipulate the data in a straightforward and reliable way? Fixing non-trivial user conflicts is simply not your job.

Most SaaS out there doesn't need any of this technical complexity. Consider that a good multiplayer app requires user presence and broadcast anyway. The simplest solution is just a persistent process on a single server coordinating this, one per live workspace. It's what most MMOs do. In fast-paced video games, this even involves lag compensation. Reliable ordering is not the big problem.

The situations where this doesn't scale, or where you absolutely must be P2P, are a minority. If you run into them, you must be doing very well. The solution that I've sketched out here is explicitly designed so it can comfortably be done by small teams, or even just 1 person.

The (private) CAD app I showed glimpses of above is entirely built this way. It's patch all the way down and it's had undo/redo from day 1. It also has a developer mode where you can just edit the user-space part of the data model, and save/load it.

When the in-house designers come to me with new UX requests, they often ask: "Is it possible to do ____?" The answer is never a laborious sigh from a front-end dev with too much on their plate. It's "sure, and we can do more."

If you're not actively aware the design of schemas and code is tightly coupled, your codebase will explode, and the bulk of it will be glue. Much of it just serves to translate generalized intent into concrete state or commands. Arguments about schemas are usually just hidden debates about whose job it is to translate, split or join something. This isn't just an irrelevant matter of "wire formats" because changing the structure and format of data also changes how you address specific parts of it.

In an interactive UI, you also need a reverse path, to apply edits. What I hope you are starting to realize is that this is really just the forward path in reverse, on so many levels. The result of a basic query is just the ordered IDs of the records that it matched. A join returns a tuple of record IDs per row. If you pre-assemble the associated record data for me, you actually make my job as a front-end dev harder, because there are multiple forward paths for the exact same data, in subtly different forms. What I want is to query and mutate the same damn store you do, and be told when what changes. It's table-stakes now.

With well-architected data, this can be wired up mostly automatically, without any scaffolding. The implementations you encounter in the wild just obfuscate this, because they don't distinguish between the data store and the model it holds. The fact that the data store should not be corruptible, and should enforce permissions and quotas, is incorrectly extended to the entire model stored inside. But that model doesn't belong to Stanley, it belongs to the user. This is why desktop applications didn't have a "Data Export". It was just called Load and Save, and what you saved was the intent, in a file.

Having a universal query or update mechanism doesn't absolve you from thinking about this either, which is why I think the patch approach is so rare: it looks like cowboy coding if you don't have the right boundaries in place. Patch is mainly for user-space mutations, not kernel-space, a concept that applies to more than just OS kernels. User-space must be very forgiving.

If you avoid it, you end up with something like GraphQL, a good example of solving only half the problem badly. Its getter assembles data for consumption by laboriously repeating it in dozens of partial variations. And it turns the setter part into an unsavory mix of lasagna and spaghetti. No wonder, it was designed for a platform that owns and hoards all your data.

* * *

Viewed narrowly, Intent is just a useful concept to rethink how you enforce validation and constraints in a front-end app. Viewed broadly, it completely changes how you build back-ends and data flows to support that. It will also teach you how adding new aspects to your software can reduce complexity, not increase it, if done right.

A good metric is to judge implementation choices by how many other places of the code need to care about them. If a proposed change requires adjustments literally everywhere else, it's probably a bad idea, unless the net effect is to remove code rather than add.

I believe reconcilers like React or tree-sitter are a major guide stone here. What they do is apply structure-preserving transforms on data structures, and incrementally. They actually do the annoying part for you. I based Use.GPU on the same principles, and use it to drive CPU canvases too. The tree-based structure reflects that one function's state just might be the next function's intent, all the way down. This is a compelling argument that the data and the code should have roughly the same shape.

You will also conclude there is nothing more nefarious than a hard split between back-end and front-end. You know, coded by different people, where each side is only half-aware of the other's needs, but one sits squarely in front of the other. Well-intentioned guesses about what the other end needs will often be wrong. You will end up with data types and query models that cannot answer questions concisely and efficiently, and which must be babysat to not go stale.

In the last 20 years, little has changed here in the wild. On the back-end, it still looks mostly the same. Even when modern storage solutions are deployed, people end up putting SQL- and ORM-like layers on top, because that's what's familiar. The split between back-end and database has the exact same malaise.

None of this work actually helps make the app more reliable, it's the opposite: every new feature makes on-going development harder. Many "solutions" in this space are not solutions, they are copes. Maybe we're overdue for a NoSQL-revival, this time with a focus on practical schema design and mutation? SQL was designed to model administrative business processes, not live interaction. I happen to believe a front-end should sit next to the back-end, not in front of it, with only a thin proxy as a broker.

What I can tell you for sure is: it's so much better when intent is a first-class concept. You don't need nor want to treat user data as something to pussy-foot around, or handle like it's radioactive. You can manipulate and transport it without a care. You can build rich, comfy functionality on top. Once implemented, you may find yourself not touching your network code for a very long time. It's the opposite of overwhelming, it's lovely. You can focus on building the tools your users need.

This can pave the way for more advanced concepts like OT and CRDT, but will show you that neither of them is a substitute for getting your application fundamentals right.

In doing so, you reach a synthesis of Dijkstra and anti-Dijkstra: your program should be provably correct in its data flow, which means it can safely break in completely arbitrary ways.

Because the I in UI meant "intent" all along.

More:

APIs are About Policy

2019-07-26T00:00:00+02:00

A pox on both houses

“The Web is a system, Neo. That system is our enemy. But when you're inside, you look around, what do you see? Full-stack engineers, web developers, JavaScript ninjas. The very minds of the people we are trying to save.

But until we do, these people are still a part of that system and that makes them our enemy. You have to understand, most of these people are not ready to be unplugged. And many of them are so inured, so hopelessly dependent on the system, that they will fight to protect it.

Were you listening to me, Neo? Or were you looking at the widget library in the red dress?"

...

"What are you trying to tell me, that I can dodge unnecessary re-renders?"

"No Neo. I'm trying to tell you that when you're ready, you won't have to."

The web is always moving and shaking, or more precisely, shaking off whatever latest fad has turned out to be a mixed blessing after all. Specifically, the latest hotness for many is GraphQL, slowly but surely dethroning King REST. This means changing the way we shove certain data into certain packets. This then requires changing the code responsible for packing and unpacking that data, as well as replacing the entire digital last mile of routing it at both source and destination, despite the fact that all the actual infrastructure in between is unchanged. This is called full stack engineering. Available for hire now.

The expected custom and indeed, regular passtime, is of course to argue for or against, the old or the new. But instead I'd like to tell you why both are completely wrong, for small values of complete. You see, APIs are about policy.

RESTless API

Take your typical RESTful API. I say typical, because an actual Representationally State Transferred API is as common as a unicorn. A client talks to a server by invoking certain methods on URLs over HTTP, let's go with that.

Optimists will take a constructive view. The API is a tool of empowerment. It enables you to do certain things in your program you couldn't do before, and that's why you are importing it as a dependency to maintain. The more methods in the swagger file, the better, that's why it's called swagger.

But instead I propose a subtractive view. The API is a tool of imprisonment. Its purpose is to take tasks that you are perfectly capable of doing yourself, and to separate them from you with bulletproof glass and a shitty telephone handset. One that is usually either too noisy or too quiet, but never just right. Granted, sometimes this is self-inflicted or benign, but rarely both.

This is also why there are almost no real REST APIs. If we consult the book of difficult-to-spot lies, we learn that the primary features of a REST API are Statelessness, Cacheability, Layeredness, Client-Side Injection and a Uniform Interface. Let's check them.

Statelessness means a simple generality. URLs point to blobs, which are GET and PUT atomically. All the necessary information is supplied with the request, and no state is retained other than the contents saved and loaded. Multiple identical GETs and PUTs are idempotent. The DELETE verb is perhaps a PUT of a null value. So far mostly good. The PATCH verb is arguably a stateless partial PUT, and might be idempotent in some implementations, but only if you don't think too much about it. Which means a huge part of what remains are POST requests, the bread and butter of REST APIs, and those aren't stateless or idempotent at all.

Cacheability and layeredness (i.e. HTTP proxies) in turn have both been made mostly irrelevant. The move to HTTPS everywhere means the layering of proxies is more accurately termed a man-in-the-middle attack. That leaves mainly reverse proxying on the server or CDN side. The HTTP Cache-Control headers are also completely backwards in practice. For anything that isn't immutable, the official mechanism for cache invalidation is for a server to make an educated guess when its own data is going to become stale, which it can almost never know. If they guess too late, the client will see stale data. If they guess too soon, the client has to make a remote request before using their local cache, defeating the point. This was designed for a time when transfer time dominated over latency, whereas now we have the opposite problem. Common practice now is actually for the server to tag cacheable URLs with a revision ID, turning a mutable resource at an immutable URL into an immutable resource at a mutable URL.

Client-Side Injection on the other hand, i.e. giving a browser JavaScript to run, is obviously here to stay, but still, no sane REST API makes you interpret JavaScript code to interact with it. That was mostly a thing Rubyists did in their astronautical pursuits to minimize the client/server gap from their point of view. In fact, we have entirely the opposite problem: we all want to pass bits of code to a server, but that's unsafe, so we find various ways of encoding lobotomized chunks of not-code and pretend that's sufficient.

Which leaves us with the need for a uniform interface, a point best addressed with a big belly laugh and more swagger definition file.

Take the most common REST API of them all, and the one nearly everyone gets wrong, /user. User accounts are some of the most locked up objects around, and as a result, this is a prime candidate for breaking all the rules.

The source of truth is usually something like:

ID	Email	Handle	Real Name	Password Hash	Picture	Karma	Admin
1	admin@example.com	admin	John Doe	sd8ByTq86ED...	s3://bucket/1.jpg	5	true
2	jane@example.com	jane	Jane Doe	j7gREnf63pO...	s3://bucket/2.jpg	-3	false

But if you GET /user/2, you likely see:

{
  "id": 2,
  "handle": "jane",
  "picture": "s3://bucket/2.jpg"
}

Unless you are Jane Doe, receiving:

{
  "id": 2,
  "email": "jane@example.com",
  "handle": "jane",
  "name": "Jane Doe",
  "picture": "s3://bucket/2.jpg"
}

Unless you are John Doe, the admin, who'll get:

{
  "id": 2,
  "email": "jane@example.com",
  "handle": "jane",
  "name": "Jane Doe",
  "picture": "s3://bucket/2.jpg",
  "karma": -3,
  "admin": false
}

What is supposedly a stateless, idempotent, cacheable, proxiable and uniform operation turns out to be a sparse GET of a database row, differentiated by both the subject and the specific objects being queried, which opaquely determines the specific variant we get back. People say horizontal scaling means treating a million users as if they were one, but did they ever check how true that actually was?

I'm not done yet. These GETs won't even have matching PUTs, because likely the only thing Jane was allowed to do initially was:

POST /user/create

{
  "name": "Jane Doe",
  "email": "jane@example.com",
  "password": "hunter2"
}

Note the subtle differences with the above.

She couldn't supply her own picture URL directly, she will have to upload the actual file to S3 through another method. This involves asking the API for one-time permission and details to do so, after which her user record will be updated behind the scenes. Really, the type of picture is not string, it is a bespoke read-only boolean wearing a foreign key outfit.
She didn't get to pick her own id either. Its appearance in the GET body is actually entirely redundant, because it's merely humoring you by echoing back the number you gave it in the URL. Which it assigned to you in the first place. It's not part of the data, it's metadata... or rather the URL is. See, unless you put the string /user/ before the id you can't actually do anything with it. id is not even metadata, it's truncated metadata; unless you're crazy enough to have a REST API where IDs are mutable, in which case, stop that.
One piece of truth "data," the password hash, actually never appears in either GETs or POSTs. Only the unencoded password, which is shredded as soon as it's received, and never given out. Is the hash also metadata? Or is it the result of policy?

PATCH /user/:id/edit is left as an exercise for the reader, but consider what happens when Jane tries to change her own email address? What about when John tries to change Jane's? Luckily nobody has ever accidentally mass emailed all their customers by running some shell scripts against their own API.

Neither from the perspective of the client, nor that of the server, do we have a /user API that saves and loads user objects. There is no consistent JSON schema for the client—not even among a given single type during a single "stateless" session—nor idempotent whole row updates in the database.

Rather, there is an endpoint which allows you to read/write one or more columns in a row in the user table, according to certain very specific rules per column. This is dependent not just on the field types and values (i.e. data integrity), but on the authentication state (i.e. identity and permission), which comes via an HTTP header and requires extra data and lookups to validate.

If there was no client/server gap, you'd just have data you owned fully and which you could manipulate freely. The effect and purpose of the API is to prevent that from happening, which is why REST is a lie in the real world. The only true REST API is a freeform key/value store. So I guess S3 and CouchDB qualify, but neither's access control or query models are going to win any awards for elegance. When "correctly" locked down, CouchDB too will be a document store that doesn't return the same kind of document contents for different subjects and objects, but it will at least give you a single ID for the true underlying data and its revision. It will even tell you in real-time when it changes, a superb feature, but one that probably should have been built into the application-session-transport-whatever-this-is layer as the SUBSCRIBE HTTP verb.

Couch is the exception though. In the usual case, if you try to cache any of your responses, you usually have too much or too little data, no way of knowing when and how it changed without frequent polling, and no way of reliably identifying let alone ordering particular snapshots. If you try to PUT it back, you may erase missing fields or get an error. Plus, I know your Express server spits out some kind of ETag for you with every response, but, without looking it up, can you tell me specifically what that's derived from and how? Yeah I didn't think so. If that field meant anything to you, you'd be the one supplying it.

If you're still not convinced, you can go through this exercise again but with a fully normalized SQL database. In that case, the /user API implementation reads/writes several tables, and what you have is a facade that allows you to access and modify one or more columns in specific rows in these particular tables, cross referenced by meaningless internal IDs you probably don't see. The rules that govern these changes are fickle and unknowable, because you trigger a specific set of rules through a combination of URL, HTTP headers, POST body, and internal database state. If you're lucky your failed attempts will come back with some notes about how you might try to fix them individually, if not, too bad, computer says no.

For real world apps, it is generally impossible by construction for a client to create and maintain an accurate replica of the data they are supposed to be able to query and share ownership of.

Regressive Web Apps

I can already hear someone say: my REST API is clean! My data models are well-designed! All my endpoints follow the same consistent pattern, all the verbs are used correctly, there is a single source of truth for every piece of data, and all the schemas are always consistent!

So what you're saying is that you wrote or scaffolded the exact same code to handle the exact same handful of verbs for all your different data types, each likely with their own Model(s) and Controller(s), and their own URL namespace, without any substantial behavioral differences between them? And you think this is good?

As an aside, consider how long ago people figured out that password hashes should go in the /etc/shadow file instead of the now misnamed /etc/passwd. This is a one-to-one mapping, the kind of thing database normalization explicitly doesn't want you to split up, with the same repeated "primary keys" in both "tables". This duplication is actually good though, because the OS' user API implements Policy™, and the rules and access patterns for shell information are entirely different from the ones for password hashes.

You see, if APIs are about policy and not empowerment, then it absolutely makes sense to store and access that data in a way that is streamlined to enforce those policies. Because you know exactly what people are and aren't going to be doing with it—if you don't, that's undefined behavior and/or a security hole. This is something most NoSQLers also got wrong, organizing their data not by policy but rather by how it would be indexed or queried, which is not the same thing.

This is also why people continue to write REST APIs, as flawed as they are. The busywork of creating unique, bespoke endpoints incidentally creates a time and place for defining and implementing some kind of rules. It also means you never have to tackle them all at once, consistently, which would be more difficult to pull off (but easier to maintain). The stunted vocabulary of ad-hoc schemas and their ill-defined nouns forces you to harmonize it all by hand before you can shove it into your meticulously typed and normalized database. The superfluous exercise of individually shaving down the square pegs you ordered, to fit the round holes you carved yourself, has incidentally allowed you to systematically check for woodworms.

It has nothing to do with REST or even HTTP verbs. There is no semantic difference between:

PATCH /user/1/edit

{"name": "Jonathan Doe"}

and

UPDATE TABLE users SET name = "Jonathan Doe" WHERE id = 1

The main reason you don't pass SQL to your Rails app is because deciding on a policy for which SQL statements are allowed and which are not is practically impossible. At most you could pattern match on a fixed set of query templates. Which, if you do, would mean effectively using the contents of arbitrary SQL statements as enum values, using the power of SQL to express the absense of SQL. The Aristocrats.

But there is an entirely more practical encoding of sparse updates in {whatever} (of (tree you) prefer).

POST /api/update

{
  "user": {
    "1": {
      "name": {"$set": "Jonathan Doe"}
    }
  }
}

It even comes with free bulk operations.

Validating an operation encoded like this is actually entirely feasible. First you validate the access policy of the individual objects and properties being modified, according to a defined policy schema. Then you check if any new values are references to other protected objects or forbidden values. Finally you opportunistically merge the update, and check the result for any data integrity violations, before committing it.

You've been doing this all along in your REST API endpoints, you just did it with bespoke code instead of declarative functional schemas and lambdas, like a chump.

If the acronyms CRDT and OT don't mean anything to you, this is also your cue to google them so you can start to imagine a very different world. One where your sparse updates can be undone or rebased like git commits in realtime, letting users resolve any conflicts among themselves as they occur, despite latency. It's one where the idea of remote web apps being comparable to native local apps is actually true instead of a lie an entire industry has shamelessly agreed to tell itself.

You might also want to think about how easy it would be to make a universal reducer for said updates on the client side too, obviating all those Redux actions you typed out. How you could use the composition of closures during the UI rendering process to make memoized update handlers, which produce sparse updates automatically to match your arbitrary descent into your data structures. That is, react-cursor and its ancestors except maybe reduced to two and a half hooks and some change, with all the same time travel. Have you ever built a non-trivial web app that had undo/redo functionality that actually worked? Have you ever used a native app that didn't have this basic affordance?

It's entirely within your reach.

GraftQL

If you haven't been paying attention, you might think GraphQL answers a lot of these troubles. Isn't GraphQL just like passing an arbitrary SELECT query to the server? Except in a query language that is recursive, typed, composable, and all that? And doesn't GraphQL have typed mutations too, allowing for better write operations?

Well, no.

Let's start with the elephant in the room. GraphQL was made by Facebook. That Facebook. They're the same people who made the wildly successful React, but here's the key difference: you probably have the same front-end concerns as Facebook, but you do not have the same back-end concerns.

The value proposition here is of using a query language designed for a platform that boxes its 2+ billion users in, feeds them extremely precise selections from an astronomical trove of continuously harvested data, and only allows them to interact by throwing small pebbles into the relentless stream in the hope they make some ripples.

That is, it's a query language that is very good at letting you traverse an enormous graph while verifying all traversals, but it was mainly a tool of necessity. It lets them pick and choose what to query, because letting Facebook's servers tell you everything they know about the people you're looking at would saturate your line. Not to mention they don't want you to keep any of this data, you're not allowed to take it home. All that redundant querying over time has to be minimized and overseen somehow.

One problem Facebook didn't have though was to avoid busywork, that's what junior hires are for, and hence GraphQL mutations are just POST requests with a superficial layer of typed paint. The Graph part of the QL is only for reading, which few people actually had real issues with, seeing as GET was the one verb of REST that worked the most as advertised.

Retaining a local copy of all visible data is impractical and undesirable for Facebook's purposes, but should it be impractical for your app? Or could it actually be extremely convenient, provided you got there via technical choices and a data model adapted to your situation? In order to do that, you cannot be fetching arbitrary sparse views of unlabelled data, you need to sync subgraphs reliably both ways. If the policy boundaries don't match the data's own, that becomes a herculean task.

What's particularly poignant is that the actual realization of a GraphQL back-end in the wild is typically done by... hooking it up to an SQL database and grafting all the records together. You recursively query this decidedly non-graph relational database, which has now grown JSON columns and other mutations. Different peg, same hole, but the peg shaving machine is now a Boston Dynamics robot with a cute little dog called Apollo and they do some neat tricks together. It's just an act though, you're not supposed to participate.

Don't get me wrong, I know there are real benefits around GraphQL typing and tooling, but you do have to realize that most of this serves to scaffold out busywork, not eliminate it fully, while leaving the INSERT/UPDATE/DELETE side of things mostly unaddressed. You're expected to keep treating your users like robots that should only bleep the words GET and POST, instead of just looking at the thing and touching the thing directly, preferably in group, tolerant to both error and lag.

This is IMO the real development and innovation bottleneck in practical client/server application architecture, the thing that makes so many web apps still feel like web apps instead of native apps, even if it's Electron. It makes any requirement of an offline mode a non-trivial investment rather than a sane default for any developer. The effect is also felt by the user, as an inability to freely engage with the data. You are only allowed to siphon it out at low volume, applying changes only after submitting a permission slip in triplicate and getting a stamped receipt. Bureaucracy is a necessary evil, but it should only ever be applied at minimum viable levels, not turned into an industry tradition.

The exceptions are rare, always significant feats of smart engineering, and unmistakeable on sight. It's whenever someone has successfully managed to separate the logistics of the API from its policies, without falling into the trap of making a one-size-fits-all tool that doesn't fit by design.

Can we start trying to democratize that? It would be a good policy.

Next: The Incremental Machine

Height: 108,409px

Using Web APIs for Research

2009-01-08T00:00:00+01:00

Recently we launched our new product at Strutta, a 'create your own contest site' web service. In each contest, users submit and vote on each other's videos, pictures, songs or writings.

As part of the research we did for the development, we wanted to examine our competition. So, I dove into YouTube to try and figure out some of their ideas and algorithms. For me, this wasn't entirely new: when I posted my Line Rider videos to YouTube, I followed up each video with manual statistics tracking and gained some insight into how a video becomes popular on YouTube. However, that only gave me a very narrow view of the community and its dynamics.

Since then though, things have changed a lot. YouTube now has a public API as well as pre-made libraries to use. With these, it becomes very easy to collect statistics and perform your own analysis. So, armed with Python, I set out to investigate YouTube's ubiquitous 'related videos' feature.

I found it interesting to analyse a big site through their own API rather than screen scraping. Traditionally, one first tries to collect as much data as possible, but the resulting data set can become very unwieldy. In this case, I already had full access and I could focus on exactly which queries I wanted to run, how to aggregate my data, and which measures to focus on.

The results revealed some interesting conclusions. My big write-up can be found on the Strutta Blog, aptly titled Six Degrees of YouTube.