Robert Mosolgo

GraphQL Query as a State Machine

State machines are applied to a wide variety of programming problems. I found it useful to think of a GraphQL query as a state machine.

Part 0: Introduction to State Machines

Practically speaking, a state machine is a unit of code with these properties:

  • It has a set of states
  • It is in one state at a time
  • Transitions connect one state to another
  • Transitions can be triggered by outside activity and/or make changes to code on the “outside”
  • One state is the starting state
  • One or more states may be valid ending states

State machines are also called “finite automata”.

To see why code like this is useful, let’s examine a couple of applications of state machines:

  • Some ORMs use a state machine to track the lifecycle of persisted objects. For example, the set of states may be: new, persisted and destroyed. A new object begins in the new state. Calling save() initiates a transition to the persisted state. Calling destroy() moves the machine to the the destroyed state. Moving from destroyed to new is impossible; there is no transition between these states.
  • Regular expressions are often implemented with state machines. The regular expression’s various patterns are transformed into states. While the expression is tested against a string, matching characters cause transitions from one valid state to another. Non-matching characters case a transition to the “failed” state. After the string has been completely tested, the regular expression tests itself: if it’s in a valid ending state, then the string was a match. If it isn’t, then the string was not a match.

In summary, state machines provide a model for well-defined progression through a many-stepped process.

Determinism and Non-Determinism

Let’s examine a regular expression as a state machine. Here’s /^abc$/ (matches "abc" only):

1
2
3
4
5
6
+-------+          +-----+          +-----+          +-----+          +-----+
| start | - "a" -> | MS1 | - "b" -> | MS2 | - "c" -> | MS3 | - EOS -> | end |
+-------+          +-----+          +-----+          +-----+          +-----+

* MS: "Matching State"
* EOS: "End of string"

In the diagram above, each state is represented by a box. Between states, transitions are represented by arrows. Since this is a regular expression, the transitions are named after the strings which they match. For example, if the machine is in the start state and it observes an "a", it moves to Matching State 1 (MS1). As the regular expression matches the string "abc", it progresses through the states, finally reaching end.

Let’s see another regular expression, /^a(bc|bd)$/. It matches two strings, "abc" and "abd". Here’s a naive machine for this expression:

1
2
3
4
5
6
7
|                                      +-----+          +-----+  
|                             . "b" -> | MS2 | - "d" -> | MS3 | - EOS
| +-------+          +-----+ /         +-----+          +-----+       `-> +-----+
| | start | - "a" -> | MS1 |                                              | end |
| +-------+          +-----+ \         +-----+          +-----+       .-> +-----+
|                              `"b" -> | MS4 | - "c" -> | MS5 | - EOS
|                                      +-----+          +-----+

Contrasting this machine to the previous one, we can see a difference: this machine has a branch. To make matters “worse”, the branch is ambiguous: from MS1, when a "b" appears in the string, should the machine move to MS2 or MS4? It can only tell by looking ahead, and possibly backtracking, which is inefficient.

This difference is called deterministic vs non-deterministic. The first machine is deterministic: for each state, each input character can lead to exactly one state (failed state is not pictured). The second machine is non-deterministic: for some states, an input character may lead to multiple states.

Solving Non-Determinism

It turns out, you can transform a non-deterministic machine into a deterministic machine. The process works like this:

  • Inspect the non-deterministic machine:
  • For each state, gather the possible inputs for that state.
  • For each possible input, find the one-or-more destination states which it leads to.
  • Take those destination states a create a new state in deterministic machine
    • For a set of destination states S, the new state represents “any of S
    • Repeat the process from this new state (find possible inputs, derive a new state for its possible destinations)

The result is a deterministic machine, some of whose states represent a set of states in the non-deterministic machine.

Let’s apply the transformation to the non-deterministic machine above:

1
2
3
4
5
6
7
|                                                              +-----+
|                                                     . "d" -> | MS3 | - EOS .
| +-------+          +-----+          +------------+ /         +-----+        `-> +-----+
| | start | - "a" -> | MS1 | - "b" -> | MS2 or MS4 |                              | end |
| +-------+          +-----+          +------------+ \         +-----+        .-> +-----+
|                                                      `"c" -> | MS5 | - EOS '
|                                                              +-----+

The non-deterministic transitions on "b" have been replaced by a single transition to a newly-created state. The new state represents MS2 or MS4, and it has transitions from both of those states. It may transition on EOS or it may transition on "c".

The result is a deterministic machine: for each input, we have exactly one transition, so we never need to backtrack.

What about GraphQL?

Consider a GraphQL query:

1
2
3
4
5
6
7
8
9
{
  parent {
    child {
      field1
      field2
      field3
    }
  }
}

Executing this query can be articulated in terms of a state machine:

  • Each selection ({ ... }) is a state, which has a type (a GraphQL type) and a value (a value in the host language, eg, Ruby object)
  • Fields are transitions: they move execution from one state to another (that is, from one selection to a child selection)
  • When each field in a selection has been executed, the machine moves “back” to the parent state
  • The starting state is the root-level selection (eg, query { ... })
  • The ending state is also the root-level selection, after traversing all selections in the query
  • Some transitions are invalid: for example, if the value is nil, the machine can’t move into a state whose type is non-null.

Non-Determinism in GraphQL

Consider a query with three conditional fragments:

1
2
3
4
5
6
7
{
  node(id: $nodeId) {
    ... on Interface1 { child { field1 } }
    ... on Interface2 { child { field2 } }
    ... on Interface3 { child { field3 } }
  }
}

Depending on the runtime type of node(id: $nodeId), 0, 1, 2, or 3 of those typed selections may be executed. This is a kind of non-determinism.

A simple solution for a GraphQL AST interpreter is:

  • Test each condition and gather the selections which apply
  • For each unique field in the set of selections, evaluate it
  • Evaluate sub-selections for each field which matches that unique field

Concretely, that boils down to:

  • Get the runtime type (RT) of the object (O) returned by node(id: $nodeId)
  • For each interface type, if RT implements that interface, gather that selection
  • Find uniquely-named fields in the set of selections (child is the only one)
  • Resolve child field on O
  • For each selection in the set, find fields named child and gather them up (a subset of field1, field2, field3)
  • Repeat

This solution is not a good fit for graphql-ruby because we have a pre-execution phase for analyzing incoming queries. This flow requires runtime types which are not available until fields are actually executed.

Solving Non-Determinism in GraphQL

To streamline execution, we can apply a similar transformation to a GraphQL query. Before executing, we can check each field:

  • Identify each possible return type
  • Identify each type condition
  • Build a new “state” for each valid combination of return type and type conditions

The state contains a set of selections. This machine can be the basis of both pre-execution and execution, since all possible transitions have been identified. Execution will follow a subset of those transitions, depending on the runtime type of returned objects.

To visualize this transformation, consider a schema like this one:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
type Query {
  node(id: ID!) : Node
}

union Node = TypeA | TypeB

type TypeA implements Interface1, Interface2 {
  # ...
}

type TypeB implements Interface2, Interface3 {
  # ...
}

interface Interface1 { ... }
interface Interface2 { ... }
interface Interface3 { ... }

Revisiting the previous query, we can do some ahead-of-time calculation about the possible execution paths:

  • node(id: $nodeId) returns one of: TypeA, TypeB
  • selections are conditioned by: Interface1, Interface2, Inteface3
  • If node(...) returns TypeA, selections from Interface1 and Interface2 should be applied
  • If node(...) returns TypeB, selections from Interface2 and Interface3 should be applied

This information allows us to “rewrite” the query above in a deterministic way:

1
2
3
4
5
6
7
8
9
10
{
  node(id: $nodeId) {
    ... on TypeA {
      child { field1, field2 }
    }
    ... on TypeB {
      child { field2, field3 }
    }
  }
}

Now, each runtime type transitions to exactly one selection. This information simplifies pre-execution analysis and execution. Additionally, this computed state can cache field-level values like coerced arguments and field resolve functions.

In graphql-ruby, these transformations are implemented in GraphQL::InternalRepresentation. At time of writing, the multi-selection state is implemented as an array of InternalRepresentation::Nodes, but an incoming PR will formalize them as InternalRepresentation::Selections.

Ceci n’est pas un state machine

Although this mindset helps solve a problem with GraphQL fragment merging, there are also some key differences between a GraphQL query and a state machine:

  • A GraphQL query cannot be cyclical. That is, the same state may not be reached more than once. (Even “returning” from a selection is not actually the same state: you’re in a different place because some fields have been resolved.)
  • Because a GraphQL query can’t form a cycle, “precompiling” it to a deterministic machine doesn’t yield a big payoff. Instead, you can build states as you reach them, saving some work in case some branches are not reached at runtime (or if an error occurs during execution).