Concurrency and Cancellation

Cats Effect 3 pipeline, --parallel semantics, Resource lifecycle, and the cancellation contract

At a glance

Every stage of spec-to-rest (parse, IR build, translate, Z3, Alloy, dump, report) returns IO. The CLI runs as a CommandIOApp, so the whole verification pipeline is one fiber tree rooted at main. This page documents the model, the --parallel flag, the cancellation contract, and troubleshooting tips.

Effect system overview

We chose Cats Effect 3 (CE3) for three properties:

Structured concurrency. parTraverseN gives deterministic bounded parallelism and cancels losers when any sibling errors. Resource gives finalizer guarantees: when an enclosing effect completes (success, error, or cancellation of the parent fiber) the Z3 Context closes via WasmBackend.close(). Alloy's per-call executor is created and torn down inside the check() body's try/finally rather than via the Resource finalizer (AlloyBackend.close() is a no-op kept for symmetry).
Explicit blocking boundary. The Z3 and Alloy solver calls live inside IO.blocking, which is uncancelable until natural completion: parent-fiber cancellation waits for the in-flight solver call to return (bounded by the solver's own inner timeout) before Resource finalizers run. That is the contract: the abort is deterministic, even if not instantaneous.
Referentially transparent pipeline. The same IO value re-runs deterministically, which makes parTraverseN ordering, unit tests, and JMH benchmarks trivial.

Why not ZIO / Ox / Kyo

Runtime	Why not
ZIO 2	Roughly equivalent in capability. Typelevel ecosystem (circe, munit, decline-effect) integrates directly with CE3 at zero wiring cost.
Ox	Still pre-1.0 in 2026; structured-concurrency-on-virtual-threads is promising but doesn't yet have a decline-style CLI story or a stable `Resource`-equivalent.
Kyo	Fewer integrations; migrating back is hard once effects are encoded in its capability set.

The migration meta #95 tracks the work that landed between M_CE.1 and M_CE.8 and the numbers from M_CE.9.

Resource lifecycle

In parallel mode each fiber allocates a fresh (WasmBackend, AlloyBackend) pair via Resource.make, runs the check, and releases on either success, failure, or cancellation. Z3's Java bindings are not thread-safe across a single Context, so one Context per fiber is mandatory, which also means the use { ... } scope and the fiber lifetime coincide. In serial mode (--parallel 0 or 1) one outer backendsResource.use wraps the whole plans.traverse(...), so all checks share a single backend pair.

The Z3 Context.close() runs on every exit path (success, error, cancellation, or timeout fallback) because it's wired with Resource.make(acquire)(release) rather than try/finally. Alloy's executor lifecycle is not Resource-managed: it's created on the hot path inside check() and shut down in the same call's finally, so it always terminates when the IO.blocking body returns (naturally, on inner timeout, or after onCancel interrupts the future).

`--parallel` semantics

Value	Behaviour
(omitted)	Default. `Runtime.getRuntime.availableProcessors`, respects container cgroup CPU limits (JDK 10+).
`--parallel 0` or `1`	Serial: `plans.traverse(...)` inside a single `backendsResource.use`. All checks share one `WasmBackend`/`AlloyBackend` pair.
`--parallel n` ( $n \geq 2$ )	`plans.parTraverseN(n)`: bounded parallelism. Each check allocates its own backend pair inside the traversal's lambda.

parTraverseN preserves input order in the result list, so the CLI output is deterministic regardless of fiber completion order.

When to pick serial (`--parallel 0`)

Memory pressure. Each parallel fiber holds a Z3 Context (~tens of MB native). On a box running verify alongside other heavy processes (containers, IDE, browser), capping at --parallel 2 or dropping to serial avoids OOM.
CI determinism. Parallel-vs-serial paths are exercised in ParallelTest (identical verdicts, identical order), but if CI needs strict reproducibility of durationMs numbers the serial path is the one to pin.
Tiny specs. For a spec that enumerates only a handful of checks, fiber/backend allocation overhead dominates. safe_counter ( $7$ checks: $1$ global $+$ $2 \times 2$ op-check kinds $+$ $1 \times 2 \times 1$ preservation) barely differs between --parallel 1 and --parallel 4.
Profiling. JFR / async-profiler traces are easier to read without the fiber interleave.

JMH numbers

Benchmark ParallelVerifyBench.verifyUrlShortener on url_shortener.spec (25 checks), JDK 21, 3 warmup x 5 measurement iterations x 1 fork (CSV checked in at fixtures/golden/bench/parallel_verify.csv):

`--parallel`	avgt (ms/op)	speedup
1	244	1.00x
2	168	1.45x
4	109	2.24x
8	88	2.78x

--parallel 0 is omitted: runConsistencyChecks routes maxParallel <= 1 through the same serial path, so 0 and 1 measure identical code. Speedup saturates near 2.5-3x because each fiber still performs serial Z3 work inside its own Context; parallelism helps only until SMT solver threads saturate the box's cores. Error bars are wide (Z3 context init

JVM GC); read the numbers as directional rather than absolute.

Cancellation contract

Per-check timeouts are enforced inside the solver: Z3 honours its params.add("timeout", cfg.timeoutMs); Alloy honours Future.get(cfg.timeoutMs, MILLISECONDS) on the Kodkod worker thread. There is no outer IO.timeoutTo wrapper in Consistency.runConsistencyChecks; a misbehaving solver that ignores its own inner deadline will hold its fiber until natural completion.

A cancelled top-level fiber (Ctrl+C, SIGTERM, or parent cancel) flows through these layers:

IOApp shutdown hook (io-cancel-hook) cancels the top-level fiber. 30 s grace deadline before the JVM is force-killed.
parTraverseN cancels every in-flight sibling on the first error or the parent cancel.
IO.blocking is uncancelable: the in-flight solver call runs to natural completion (bounded by the inner Z3 / Alloy timeout above). Only after the blocking call returns does cancellation proceed.
Resource finalizers then close the Z3 Context and shut down the Alloy backend pool. On a cfg.timeoutMs deadline, the backend returns a CheckOutcome.Unknown carrying a solver_timeout diagnostic from the inner timeout; no outer fallback IO is involved.

Mid-call abort via `Context.interrupt()`

Z3's solver.check() doesn't observe Thread.isInterrupted, and because the call runs inside IO.blocking we wouldn't send a thread interrupt anyway. The fix (#113, shipped) wires Context.interrupt() (the only Z3 API designed to be called from a thread other than the one running the solver) into the onCancel finalizer:

IO.blocking {
  // declare sorts/funcs, build solver, solver.check(), decode result, ...
}.onCancel(IO.blocking(ctx.interrupt()))

On cancel, CE3 fires onCancel concurrently with the still-blocked solver thread; Z3 observes the interrupt, solver.check() aborts (typically returning Status.UNKNOWN, though the result is discarded by the cancelled fiber), the blocking body completes, and the Resource[IO, WasmBackend] finalizer closes the Context. Practical native mid-call abort is single-digit ms; TimeoutTest keeps a looser ~2 s regression bound to absorb JIT / GC / CI noise on slower runners.

Alloy's backend creates a single-thread Executor per call to drive its solveTask with a Future.get(timeoutMs, MILLISECONDS) deadline. The executor is shut down via pool.shutdownNow() in the same call's finally, so it always terminates regardless of whether the deadline expired or the call returned naturally; no Resource-managed pool is involved (AlloyBackend.close() is a no-op). Alloy's equivalent of Context.interrupt() is future.cancel(true), which is in the timeout path but not yet wired into the onCancel finalizer the way Z3's interrupt is. A symmetric onCancel(future.cancel(true)) is filed as a follow-up if mid-call cancel of a long Alloy run becomes a problem in practice.

SIGINT handling

spec-to-rest runs as decline-effect's CommandIOApp, which via the underlying IOApp installs the io-cancel-hook shutdown hook. SIGINT (Ctrl+C) and SIGTERM both feed into top-level fiber cancellation, which propagates through the path above: in-flight Z3 calls abort via Context.interrupt(), parTraverseN cancels sibling fibers, Resource[IO, WasmBackend] and Resource[IO, AlloyBackend] finalizers fire in reverse acquisition order (Alloy first, then Wasm), and the JVM exits cleanly. The Wasm finalizer calls ctx.close(); the Alloy finalizer is a no-op (its per-call executor pool is shut down inside the check() body rather than via Resource). The shutdown hook honours a 30 s grace deadline before forcing exit.

Troubleshooting

Reading a cancelled-fiber stack trace

CE3 rewrites stack traces to show the fiber callsite rather than the raw thread frames. Look for FiberTracker, parTraverseN, and the CE pool threads (io-compute-*, io-blocking-*). If you see a dangling Z3 Context leaked warning without a corresponding fiber cancelled line, that's a missing Resource.make; file an issue against the backend module rather than the orchestrator.

A check hangs past its timeout

If spec-to-rest verify --timeout 5000 <spec> doesn't return within ~10 s, one of these is true:

Z3 ignored its inner timeout. Rare; usually means the params.add("timeout", ...) key is wrong for the Z3 version, check z3-turnkey release notes.
Native memory exhaustion. Z3 swallows timeout when the solver is in the middle of a large memory allocation. Cap --parallel lower.
Alloy worker stuck before Future.get. Translation emitted .als source that the parser accepts but Kodkod can't decide within the scope. Lower --alloy-scope or rewrite the invariant.

"Backend error" on every check

--parallel 64 on an 8-core box can starve Z3's native allocator. Reduce --parallel, or rerun serial (--parallel 0) to confirm the spec itself is fine.

Ctrl+C doesn't feel clean

The JVM shutdown hook has a 30 s grace deadline. In that window the Alloy worker thread gets interrupted, Z3 finalizers fire, and the JVM exits cleanly. If you see a Error: Context leaked or a hung sbt run, retry without the sbt wrapper (cli/run inside sbt keeps a second JVM alive); ./modules/cli/target/native-image/spec-to-rest exits immediately on the first Ctrl+C.