Concurrency and Cancellation
Edit on GitHubCats Effect 3 pipeline, --parallel semantics, Resource lifecycle, and the cancellation contract
Last updated:
At a glance
Every stage of spec-to-rest (parse, IR build, translate, Z3, Alloy, dump, report) returns
IO. The CLI runs as a CommandIOApp, so the whole verification pipeline is one fiber tree
rooted at main. This page documents the model, the --parallel flag, the cancellation
contract, and troubleshooting tips.
Effect system overview
We chose Cats Effect 3 (CE3) for three properties:
- Structured concurrency.
parTraverseNgives deterministic bounded parallelism and cancels losers when any sibling errors.Resourcegives finalizer guarantees: when an enclosing effect completes (success, error, or cancellation of the parent fiber) the Z3Contextcloses viaWasmBackend.close(). Alloy's per-call executor is created and torn down inside thecheck()body'stry/finallyrather than via the Resource finalizer (AlloyBackend.close()is a no-op kept for symmetry). - Explicit blocking boundary. The Z3 and Alloy solver calls live inside
IO.blocking, which is uncancelable until natural completion: parent-fiber cancellation waits for the in-flight solver call to return (bounded by the solver's own inner timeout) beforeResourcefinalizers run. That is the contract: the abort is deterministic, even if not instantaneous. - Referentially transparent pipeline. The same
IOvalue re-runs deterministically, which makesparTraverseNordering, unit tests, and JMH benchmarks trivial.
Why not ZIO / Ox / Kyo
| Runtime | Why not |
|---|---|
| ZIO 2 | Roughly equivalent in capability. Typelevel ecosystem (circe, munit, decline-effect) integrates directly with CE3 at zero wiring cost. |
| Ox | Still pre-1.0 in 2026; structured-concurrency-on-virtual-threads is promising but doesn't yet have a decline-style CLI story or a stable Resource-equivalent. |
| Kyo | Fewer integrations; migrating back is hard once effects are encoded in its capability set. |
The migration meta #95 tracks the work that landed between M_CE.1 and M_CE.8 and the numbers from M_CE.9.
Resource lifecycle
In parallel mode each fiber allocates a fresh (WasmBackend, AlloyBackend) pair via
Resource.make, runs the check, and releases on either success, failure, or cancellation.
Z3's Java bindings are not thread-safe across a single Context, so one Context per
fiber is mandatory, which also means the use { ... } scope and the fiber lifetime
coincide. In serial mode (--parallel 0 or 1) one outer backendsResource.use wraps
the whole plans.traverse(...), so all checks share a single backend pair.
The Z3 Context.close() runs on every exit path (success, error, cancellation, or
timeout fallback) because it's wired with Resource.make(acquire)(release) rather than
try/finally. Alloy's executor lifecycle is not Resource-managed: it's created on the
hot path inside check() and shut down in the same call's finally, so it always
terminates when the IO.blocking body returns (naturally, on inner timeout, or after
onCancel interrupts the future).
--parallel semantics
| Value | Behaviour |
|---|---|
| (omitted) | Default. Runtime.getRuntime.availableProcessors, respects container cgroup CPU limits (JDK 10+). |
--parallel 0 or 1 | Serial: plans.traverse(...) inside a single backendsResource.use. All checks share one WasmBackend/AlloyBackend pair. |
--parallel n () | plans.parTraverseN(n): bounded parallelism. Each check allocates its own backend pair inside the traversal's lambda. |
parTraverseN preserves input order in the result list, so the CLI output is deterministic
regardless of fiber completion order.
When to pick serial (--parallel 0)
- Memory pressure. Each parallel fiber holds a Z3
Context(~tens of MB native). On a box runningverifyalongside other heavy processes (containers, IDE, browser), capping at--parallel 2or dropping to serial avoids OOM. - CI determinism. Parallel-vs-serial paths are exercised in
ParallelTest(identical verdicts, identical order), but if CI needs strict reproducibility ofdurationMsnumbers the serial path is the one to pin. - Tiny specs. For a spec that enumerates only a handful of checks, fiber/backend
allocation overhead dominates.
safe_counter( checks: global op-check kinds preservation) barely differs between--parallel 1and--parallel 4. - Profiling. JFR / async-profiler traces are easier to read without the fiber interleave.
JMH numbers
Benchmark ParallelVerifyBench.verifyUrlShortener on url_shortener.spec (25 checks), JDK 21,
3 warmup x 5 measurement iterations x 1 fork (CSV checked in at
fixtures/golden/bench/parallel_verify.csv):
--parallel | avgt (ms/op) | speedup |
|---|---|---|
| 1 | 244 | 1.00x |
| 2 | 168 | 1.45x |
| 4 | 109 | 2.24x |
| 8 | 88 | 2.78x |
--parallel 0 is omitted: runConsistencyChecks routes maxParallel <= 1 through the
same serial path, so 0 and 1 measure identical code. Speedup saturates near 2.5-3x
because each fiber still performs serial Z3 work inside its own Context; parallelism helps
only until SMT solver threads saturate the box's cores. Error bars are wide (Z3 context init
- JVM GC); read the numbers as directional rather than absolute.
Cancellation contract
Per-check timeouts are enforced inside the solver: Z3 honours its
params.add("timeout", cfg.timeoutMs); Alloy honours Future.get(cfg.timeoutMs, MILLISECONDS)
on the Kodkod worker thread. There is no outer IO.timeoutTo wrapper in
Consistency.runConsistencyChecks; a misbehaving solver that ignores its own inner deadline
will hold its fiber until natural completion.
A cancelled top-level fiber (Ctrl+C, SIGTERM, or parent cancel) flows through these layers:
IOAppshutdown hook (io-cancel-hook) cancels the top-level fiber. 30 s grace deadline before the JVM is force-killed.parTraverseNcancels every in-flight sibling on the first error or the parent cancel.IO.blockingis uncancelable: the in-flight solver call runs to natural completion (bounded by the inner Z3 / Alloy timeout above). Only after the blocking call returns does cancellation proceed.Resourcefinalizers then close the Z3Contextand shut down the Alloy backend pool. On acfg.timeoutMsdeadline, the backend returns aCheckOutcome.Unknowncarrying asolver_timeoutdiagnostic from the inner timeout; no outer fallbackIOis involved.
Mid-call abort via Context.interrupt()
Z3's solver.check() doesn't observe Thread.isInterrupted, and because the call runs inside
IO.blocking we wouldn't send a thread interrupt anyway. The fix
(#113, shipped) wires
Context.interrupt() (the only Z3 API designed to be called from a thread other than
the one running the solver) into the onCancel finalizer:
IO.blocking {
// declare sorts/funcs, build solver, solver.check(), decode result, ...
}.onCancel(IO.blocking(ctx.interrupt()))On cancel, CE3 fires onCancel concurrently with the still-blocked solver thread; Z3 observes
the interrupt, solver.check() aborts (typically returning Status.UNKNOWN, though the result
is discarded by the cancelled fiber), the blocking body completes, and the
Resource[IO, WasmBackend] finalizer closes the Context. Practical native mid-call abort is
single-digit ms; TimeoutTest keeps a looser ~2 s regression bound to absorb JIT / GC / CI
noise on slower runners.
Alloy's backend creates a single-thread Executor per call to drive its solveTask with
a Future.get(timeoutMs, MILLISECONDS) deadline. The executor is shut down via
pool.shutdownNow() in the same call's finally, so it always terminates regardless of
whether the deadline expired or the call returned naturally; no Resource-managed pool is
involved (AlloyBackend.close() is a no-op). Alloy's equivalent of Context.interrupt()
is future.cancel(true), which is in the timeout path but not yet wired into the
onCancel finalizer the way Z3's interrupt is. A symmetric onCancel(future.cancel(true))
is filed as a follow-up if mid-call cancel of a long Alloy run becomes a problem in
practice.
SIGINT handling
spec-to-rest runs as decline-effect's CommandIOApp, which via the underlying IOApp
installs the io-cancel-hook shutdown hook. SIGINT (Ctrl+C) and SIGTERM both feed into
top-level fiber cancellation, which propagates through the path above: in-flight Z3
calls abort via Context.interrupt(), parTraverseN cancels sibling fibers,
Resource[IO, WasmBackend] and Resource[IO, AlloyBackend] finalizers fire in reverse
acquisition order (Alloy first, then Wasm), and the JVM exits cleanly. The Wasm finalizer
calls ctx.close(); the Alloy finalizer is a no-op (its per-call executor pool is shut
down inside the check() body rather than via Resource). The shutdown hook honours a 30 s
grace deadline before forcing exit.
Troubleshooting
Reading a cancelled-fiber stack trace
CE3 rewrites stack traces to show the fiber callsite rather than the raw thread frames.
Look for FiberTracker, parTraverseN, and the CE pool threads (io-compute-*,
io-blocking-*). If you see a dangling Z3 Context leaked warning without a corresponding
fiber cancelled line, that's a missing Resource.make; file an issue against the backend
module rather than the orchestrator.
A check hangs past its timeout
If spec-to-rest verify --timeout 5000 <spec> doesn't return within ~10 s, one of these is true:
- Z3 ignored its inner timeout. Rare; usually means the
params.add("timeout", ...)key is wrong for the Z3 version, checkz3-turnkeyrelease notes. - Native memory exhaustion. Z3 swallows
timeoutwhen the solver is in the middle of a large memory allocation. Cap--parallellower. - Alloy worker stuck before
Future.get. Translation emitted.alssource that the parser accepts but Kodkod can't decide within the scope. Lower--alloy-scopeor rewrite the invariant.
"Backend error" on every check
--parallel 64 on an 8-core box can starve Z3's native allocator. Reduce --parallel, or
rerun serial (--parallel 0) to confirm the spec itself is fine.
Ctrl+C doesn't feel clean
The JVM shutdown hook has a 30 s grace deadline. In that window the Alloy worker thread
gets interrupted, Z3 finalizers fire, and the JVM exits cleanly. If you see a Error: Context leaked or a hung sbt run, retry without the sbt wrapper (cli/run inside sbt
keeps a second JVM alive); ./modules/cli/target/native-image/spec-to-rest exits
immediately on the first Ctrl+C.
See also
- Verification Engine, Z3/Alloy routing, timeout semantics, exit codes.
- Architecture, where the effect layer sits in the overall compiler.
- M_CE migration meta #95, the complete Cats Effect 3 migration log.
Diff migrations for schema evolution
Snapshot-driven incremental migrations across Alembic (Python), golang-migrate (Go), and Prisma (TS)
Synthesis Pipeline
LLM clients, prompt construction, diff-checker, cost ledger, on-disk cache, the CEGIS feedback loop wired to `dafny verify`, the graduated-fallback orchestrator, DafnyPro-style hint-augmentation, and `compile --with-synthesis` that translates verified bodies (or labelled skeletons) to the target language.