Building a Tiny Model Checker, Part 1

A tiny scheduler and systematic interleavings.

Do you know those bugs that show up only sometimes and disappear the moment you add a printf? Very often these are concurrency bugs, triggered only under a rare interleaving of thread actions.

Over the years, I have sporadically used model checkers such as TLA+/TLC, GenMC, and Dartagnan to diagnose and understand such bugs. While doing so, I was often intrigued by the apparent “magic” behind these tools. This post is a learning exercise: an attempt to demystify that magic by building a very small model checker from scratch.

See also other parts of this series.

1. Motivation

My goal is to understand the core ideas behind model checking by re-implementing them in the simplest possible setting. Since I am also learning the Scheme programming language, this becomes a double learning exercise: I will build a tiny model checker in Scheme, using only a handful of language features.

In this post, we build the smallest possible execution explorer that already does something real:

it enumerates all possible schedules of concurrent tasks,
supports dynamic task creation,
and handles data nondeterminism via a CHOOSE operator.

At this stage, however, we are not checking any properties, and we do not yet model shared state. In that sense, this is not a model checker in the traditional sense – at least not yet. The sole focus of this first part is on systematically and exhaustively generating executions: turning implicit nondeterminism into an explicit, enumerable space of runs.

This separation is intentional. Before reasoning about invariants, assertions, or temporal properties, we need a precise understanding of how executions themselves are generated, controlled, and replayed.

To keep things manageable, the focus is on stateless exploration under the standard interleaving model. Each thread is modeled as a task: a sequential computation whose execution can be paused and resumed by the explorer. We deliberately avoid full state storage and comparison, as done by fully stateful checkers such as TLC.

Instead, we follow the approach pioneered by Verisoft, which demonstrated that many concurrency bugs can be exposed by systematically controlling and replaying executions of the real program, without constructing an explicit global state graph. In this style of checking, the execution itself is the primary object of exploration, and nondeterminism is made explicit through controlled scheduling decisions.

This line of work is closely related to what later became known as Systematic Concurrency Testing (SCT): non-exhaustive but systematic exploration of the execution space of concurrent programs, typically guided by a scheduler that enumerates thread interleavings. While SCT tools usually operate on full programs written in C, Java, or similar languages, the underlying idea is the same here: explicitly generate executions rather than abstract states, and reason about concurrency through controlled scheduling.

We also do not use OS threads and do not implement any reduction techniques (such as DPOR). The entire explorer fits in just a few dozen lines of Scheme.

In later parts of this series (two, possibly three in total), we will incrementally extend this core with:

shared mutable state,
property checking,
and a discussion of where reduction techniques would naturally fit.

As a running example, we will use small concurrent programs with explicit yield points, gradually extending them to illustrate scheduling nondeterminism, dynamic task creation, and data choice.

2. The execution model

We begin by fixing the execution model and a tiny domain-specific language (DSL) for concurrent programs formed by three primitives: SPAWN, YIELD, and EXPLORE. The following definitions are intentionally not a real implementation: they merely illustrate the intended control flow and execution model.

;;; Yields current task potentially resuming another task. YIELD cannot be
;;; called from within an atomic block.
(define (YIELD)
  (display "== YIELD: could change to another task here\n"))

;;; Spawns a new task with `thunk` as processing payload. The task should
;;; only communicate with other tasks via the shared state (TBD).
(define (SPAWN thunk)
  (display "== SPAWN: execute right away\n")
  (thunk))

;;; Starts execution exploration.
(define (EXPLORE)
  (display "== EXPLORE: tasks already ran\n"))

These definitions make the intent explicit but deliberately avoid any real scheduling logic. In particular, YIELD does not yet suspend execution, and SPAWN simply executes its argument immediately.

In the next sections, we will give this DSL a concrete operational semantics by modeling tasks as Scheme continuations and explicitly controlling their interleaving.

A scheduling decision happens at every YIELD, and every task completion.

Let’s see an example:

(import (scheme base)
        (scheme write))

(include "mc0.scm") ; version 0 of our model checker

(define (Joey-Ramone)
  (display "Hey!\n")
  (YIELD)
  (display "Let's\n"))

(define (Johnny-Ramone)
  (display "Ho!\n")
  (YIELD)
  (display "Go!\n"))

(SPAWN Joey-Ramone)
(SPAWN Johnny-Ramone)

(EXPLORE)

Here, (SPAWN thunk) takes a thunk as argument. A thunk is simply a procedure with no arguments. In the example above, passing a lambda expression would work equally well:

(SPAWN (lambda ()
         (display "Ho!\n")
         (YIELD)
         (display "Go!\n")))

3. The task queue

We now need a place to store runnable tasks. For our purposes, a simple queue is sufficient. This implementation uses a global variable *tasks*, represented as a plain list.

(define *tasks* '())

(define (no-tasks?)
  (null? *tasks*))

(define (clear-taskqueue!)
  (set! *tasks* '()))

(define (push-task task)
  (when (memq task *tasks*)
    (error "push-task: duplicate continuation"))
  (set! *tasks* (cons task *tasks*)))

(define (take-task idx)
  (cond ((not idx) #f)
        ((or (negative? idx)
             (>= idx (length *tasks*))) #f)
        (else (let ((task (list-ref *tasks* idx)))
                (set! *tasks* (delete task *tasks*))
                task))))

4. Scheduling with continuations

At this point we have a task queue, but we still do not have a way to pause and resume tasks. Executing a thunk from beginning to end is not sufficient: we want to run a task until it explicitly yields, then resume it later exactly where it left off.

This is where continuations come in.

In Scheme, a continuation represents “the rest of the computation from this point on”. Capturing a continuation is conceptually similar to taking a snapshot of the call stack and instruction pointer. Later, invoking that continuation resumes execution as if nothing had happened in between.

I tried several approaches and found that using continuations is by far the most elegant solution. Continuations can be used like coroutines and are one of the defining features of Scheme. They are essentially a very powerful, language-level setjmp/longjmp mechanism.

Scheme exposes continuations via call/cc (“call with current continuation”).

Yielding tasks

When a task calls YIELD, we want to

capture the current continuation of that task,
store it in the task queue,
resume another runnable task.

(define (YIELD)
  (when (not (*finisher*))
    (error "scheduler not running"))
  (call/cc (lambda (cc)
             (push-task cc)
             (resume)
             (error "unreachable"))))

The *finisher* parameter is only set while an execution is being explored; calling YIELD outside that context is therefore an error.

Spawning tasks

(define (SPAWN thunk)
  (push-task (lambda (_)
               (thunk)
               (resume)
               (error "unreachable"))))

When the thunk terminates, its task is complete. At that point, control must return to the scheduler so that another task can run. For this reason, we do not enqueue the thunk directly. Instead, we wrap it in a small lambda that resumes the scheduler when the thunk finishes.

The wrapper takes one argument to match the continuation interface, but the argument itself is ignored.

Resuming tasks

(define (resume)
  (if (no-tasks?)
      ((*finisher*) #f)
      (let* ((idx (pick-next *tasks*))
             (cc (take-task idx)))
        (cc #f))))

If there are no tasks available, we go back to the caller of EXPLORE using its continuation stored in the *finisher* parameter. If there are tasks available, we pick the next one with pick-next. This function is the heart of the exploration strategy. Finally, we invoke the selected continuation. We always pass #f as the argument, since we do not use continuation values in this model.

Starting and finishing an execution

(define *finisher* (make-parameter #f))
(define (make-finisher cc)
  (lambda (_) (clear-taskqueue!) (cc #f)))

(define (run)
  (call/cc (lambda (cc)
             (parameterize ((*finisher* (make-finisher cc)))
               (resume)))))

Before defining EXPLORE, we introduce a small helper called run. Its purpose is to execute one complete execution of the program under the control of the scheduler.

The call to call/cc in run captures a continuation cc that represents “return to the caller of run”. We then wrap this continuation using make-finisher and store it in the dynamically scoped parameter *finisher*. This gives the scheduler a well-defined way to terminate the current execution and jump back to the exploration loop once no runnable tasks remain.

A dynamically scoped parameter in Scheme is a controlled form of global variable. Its value is temporarily overridden within the dynamic extent of a (parameterize ...) form and automatically restored when execution leaves that scope. In this case, *finisher* is set while an execution is running and reverts to its original value (#f) once the execution completes.

When the scheduler eventually runs out of runnable tasks, resume invokes the finisher. This clears the task queue and transfers control back to the caller of run, effectively ending the current execution.

Now we have a fully functional cooperative scheduler: tasks can be spawned, paused, resumed, and interleaved at explicit yield points, and each execution cleanly returns control to the exploration loop.

5. Random exploration

As a first step, we choose the next runnable task at random. This does not give us systematic coverage, but it allows us to validate the scheduler and observe different interleavings in practice.

(define (pick-next lst)
  (pseudo-random-integer (length lst)))

(define (EXPLORE)
  (let ((initial-tasks (snapshot-taskqueue)))
    (let loop ((execution-count 0))
      (if (= execution-count 8)
          execution-count
          (begin
            (print "\n=== EXECUTION " (+ 1 execution-count) " ===")
            (run)
            (restore-taskqueue! initial-tasks)
            (loop (+ execution-count 1)))))))

This defines the last missing component of our DSL: EXPLORE.

Execution starts by taking a snapshot of the initial task queue. We then enter a loop, executing the program multiple times. Each iteration:

calls run, which installs the finisher and executes the tasks until completion,
returns control to EXPLORE via the finisher continuation,
restores the original task queue, and
starts the next execution.

In this version, we arbitrarily limit exploration to 8 executions. Since scheduling decisions are random, repeated executions may produce identical interleavings, while others may differ.

Running the example from the previous section produces output such as:

=== EXECUTION 1 ===
Ho!
Hey!
Let's
Go!

=== EXECUTION 2 ===
Hey!
Ho!
Go!
Let's

=== EXECUTION 3 ===
Hey!
Let's
Ho!
Go!

=== EXECUTION 4 ===
Hey!
Ho!
Go!
Let's

=== EXECUTION 5 ===
Hey!
Ho!
Go!
Let's

=== EXECUTION 6 ===
Ho!
Hey!
Go!
Let's

=== EXECUTION 7 ===
Hey!
Ho!
Go!
Let's

=== EXECUTION 8 ===
Hey!
Ho!
Go!
Let's

This confirms that our scheduler is working: different executions produce different interleavings of the same program. However, randomness alone cannot guarantee that all possible executions will be explored. In the next section, we will replace random choice with systematic enumeration.

6. Decision history

Random scheduling was useful to validate the scheduler, but it provides no guarantee that we will eventually see every possible interleaving. To explore executions systematically, we need one additional piece of bookkeeping: we must record the decisions made at each nondeterministic choice point.

Concretely, every time the scheduler (or the program) makes a choice among n alternatives, we record the pair:

(idx . max)

Meaning:

idx: which alternative was chosen (an integer in 0 .. max)
max: the maximum valid index at that point (n - 1)

In the scheduler, idx is the index into the current runnable task list. Importantly, this is not a stable task identifier: it is relative to the set of runnable tasks at that moment.

A complete execution is therefore represented by a decision history:

((idx0 . max0) (idx1 . max1) (idx2 . max2) ...)

This history fully characterizes the execution: if we replay the same program while forcing the same decisions, we obtain the same execution again.

Implementation

The implementation of the decision history is straightforward. We record decisions in a list as they occur during execution, and later extract them in execution order.

;; history of selections form (cons idx max-idx)
(define *history* '())

;; returns true if in any scheduling point in the future,
;; where a different option can still be taken
(define (has-future? hist)
  (cond
    ((null? hist) #f)
    ((< (caar hist) (cdar hist)) #t)
    (else (has-future? (cdr hist)))))

;; add index selection to history
(define (add-history idx max-idx)
  (let ((item (cons idx max-idx)))
    (set! *history* (cons item *history*))))

(define (extract-history)
  (reverse *history*))

(define (swap-history!)
  (let ((prev-hist (extract-history)))
    (set! *history* '())
    prev-hist))

7. Systematic exploration

With a recorded decision history, systematic exploration becomes straightforward:

Run the program once, recording its decision history.
Compute the next decision history.
Replay the program, consuming decisions from that history.
Repeat until no alternatives remain.

At the top level, EXPLORE runs the program repeatedly. After each execution it saves the history of decisions made in that run, then uses it to drive the next run.

(define (EXPLORE)
  (let ((initial-taskqueue (snapshot-taskqueue)))
    (let loop ((execution-count 0))
      (if (or (zero? execution-count)
              (has-future? (extract-history)))
          (begin
            (print "\n=== EXECUTION " (+ 1 execution-count) " ===")
            (set! *prev-hist* (swap-history!))
            (run)
            (restore-taskqueue! initial-taskqueue)
            (print "== HISTORY: " (extract-history))
            (loop (+ execution-count 1)))
          execution-count))))

The condition

(or (zero? execution-count) (has-future? (extract-history)))

means:

always run the first execution (there is no previous history yet),
then keep running as long as the last execution’s description still contains at least one decision point with an unexplored alternative.

Computing the next execution

Each history element (idx . max) is a digit whose range is 0 .. max. We want the next execution to be “as close as possible” to the previous one:

keep the longest possible prefix of decisions unchanged,
then change the rightmost decision that still has an unexplored alternative,
and after that point, restart decisions from 0 (because the set of runnable tasks may differ, so we must re-discover the remainder of the run).

That is exactly what pick/dfs implements.

We maintain a list *prev-hist* that represents the decisions we intend to replay in the current run. Each time we reach a choice point, we consult *prev-hist* and decide what to do:

If there is no history at all, we pick 0 (the first alternative).
If a later (deeper) decision still has an unexplored alternative, we keep the current decision the same and drop the head of *prev-hist* (replaying the prefix).
Otherwise, if the current decision itself has an unexplored alternative, we increment it and clear the suffix (since it is still unknown).

(define (pick/dfs hist)
  (cond
    ((null? hist)
     (values 0 '()))
    ; some future change point has an option
    ((has-future? (cdr hist))
     (let ((idx (caar hist)))
       (values idx (cdr hist))))
    ; there is still some option in the current change point
    ((< (caar hist) (cdar hist))
     (let ((idx (+ (caar hist) 1)))
       (values idx '())))
    (else (error "should never happen"))))

Finally, pick-next ties everything together: it consumes one decision from *prev-hist*, records the decision actually taken into the current history (*history*), and returns the chosen index.

(define *prev-hist* '())
(define (pick-next lst)
  (let-values (((nxt hist) (pick/dfs *prev-hist*)))
    (set! *prev-hist* hist)
    (add-history nxt (- (length lst) 1))
    nxt))

At this point, pick-next is no longer random: it is a deterministic enumerator. Over repeated executions, it systematically explores every possible combination of choices, and therefore explores every distinct execution exactly once.

For our example program, there are exactly 6 distinct executions, which are enumerated below.


=== EXECUTION 1 ===
Ho!
Go!
Hey!
Let's
== HISTORY: ((0 . 1) (0 . 1) (0 . 0) (0 . 0))

=== EXECUTION 2 ===
Ho!
Hey!
Let's
Go!
== HISTORY: ((0 . 1) (1 . 1) (0 . 1) (0 . 0))

=== EXECUTION 3 ===
Ho!
Hey!
Go!
Let's
== HISTORY: ((0 . 1) (1 . 1) (1 . 1) (0 . 0))

=== EXECUTION 4 ===
Hey!
Let's
Ho!
Go!
== HISTORY: ((1 . 1) (0 . 1) (0 . 0) (0 . 0))

=== EXECUTION 5 ===
Hey!
Ho!
Go!
Let's
== HISTORY: ((1 . 1) (1 . 1) (0 . 1) (0 . 0))

=== EXECUTION 6 ===
Hey!
Ho!
Let's
Go!
== HISTORY: ((1 . 1) (1 . 1) (1 . 1) (0 . 0))

A small note on indices

One subtlety is worth keeping in mind: idx is an index into the current runnable set. After changing a decision at some point, the set of runnable tasks later in the execution may differ. That is why we intentionally “forget” the suffix after the point we increment: we must re-discover the remainder of the schedule under the new prefix.

8. Adding data nondeterminism: `CHOOSE`

We are now almost at the end of this post, but there is one feature that is super useful and easy to introduce at this point: Programs often need to express nondeterminism explicitly. Let’s add add a CHOOSE operator to our DSL:

(define (CHOOSE . options)
  (when (not (*finisher*))
    (error "CHOOSE: scheduler not running"))
  (when (null? options)
    (error "CHOOSE: empty option list"))
  (let ((i (pick-next options)))
    (list-ref options i)))

That is the entire implementation.

What happens:

CHOOSE sees a list of options.
It calls pick-next, exactly like the scheduler does.
The decision (i . (n-1)) is recorded in the decision history.
The selected value is returned.

We don’t need any further machinery.

9. Another example

Let’s see a small example on how to use it.

(SPAWN (lambda ()
         (YIELD)
         (display (CHOOSE "Hello\n" "Hallo\n"))))

(SPAWN (lambda ()
         (display (CHOOSE "World!\n" "Welt!\n"))
         (YIELD)))

(EXPLORE)

This program contains:

two concurrent tasks,
one YIELD in each task,
and two CHOOSE calls with two options each.

How many executions does it have?

Each task has one YIELD and one implicit “finish” step (ignoring the CHOOSE). The number of ways to interleave two sequences of length 2 while preserving internal order is:

(2 + 2)! / (2! x 2!) = 6

The numerator 4! counts all permutations of the four scheduling events. We divide by 2! to account for the fact that swapping Task 1’s own events does not create a new valid execution, and by another 2! for the same reason for Task 2.

Each CHOOSE introduces two data choices, independent of scheduling. With two CHOOSE calls in the program, this contributes an additional factor of 4.

Therefore, the total number of executions explored is 6 x 4 = 24.

And the checker explores exactly those 24 executions.

10. Outlook

With very little code, we now have:

a cooperative scheduler,
systematic enumeration of schedules,
dynamic task creation (for example, by calling SPAWN inside another task),
and systematic data nondeterminism via CHOOSE.

This already forms the core of a real model checker. Nevertheless, it still has many limitations:

it explores all schedules, even when many of them lead to the same observable behavior,
it does not model any shared state,
and it does not check any actual properties.

These omissions are deliberate. This first part was about building the smallest correct foundation: a precise and controllable execution generator that makes all sources of nondeterminism explicit.

In the following parts of this series, I plan to address these missing aspects step by step, by introducing shared state, property checking, and discussing where reduction techniques would naturally fit.

If you want to try out the code yourself, you can download the source and a small example.

I have tested the code with Chibi Scheme 0.11 and CHICKEN Scheme 6.0, but it should work with other R7RS-compliant Scheme implementations as well.

See also other parts of this series. Feedback is welcome – please send a message to my public inbox: ~db7/public-inbox@lists.sr.ht (Archive)