spec_to_rest
Pipelines

Verification Engine

Edit on GitHub

Z3 + Alloy 6 verification of .spec files before code generation

Last updated:

At a glance

The verification engine model-checks a .spec file against its own invariants before any code is generated. Two solvers are wired in, each owning a disjoint class of checks:

  • Z3 SMT solver (via tools.aqua:z3-turnkey) handles all first-order-expressible checks: invariant satisfiability, consistency, preservation, dead-op detection. Unbounded proof; fast.
  • Alloy 6 (in-process via org.alloytools:org.alloytools.alloy.core) handles checks whose expression uses supported second-order constructs (currently powerset ^s) or temporal declarations. Bounded-scope search only; scope 5 by default, tunable via --alloy-scope.

Every verification check is routed to exactly one backend based on structural inspection of the expression (there is no overlap), and each check line is tagged [z3] or [alloy] in the CLI output so the attribution is visible.

For the per-feature shipped/planned/not-supported inventory of the verifier (44 rows covering temporal, powerset, set algebra, JSON output, narration, cross-solver, trust dimension, etc.), see Roadmap \to Verification capabilities.

Preservation VC shape

For every (operation, invariant) pair the engine asks Z3 whether the negated post-invariant is satisfiable under the pre-invariant, the operation's contract, and a synthesised frame:

VC  =  I(s)    req(x,s)    ens(x,s,s,y)    frame(s,s)    ¬I(s)\textit{VC} \;=\; I(s) \;\land\; \textit{req}(\textbf{x}, s) \;\land\; \textit{ens}(\textbf{x}, s, s', \textbf{y}) \;\land\; \textit{frame}(s, s') \;\land\; \lnot\, I(s')

unsat \Rightarrow the invariant is preserved; sat \Rightarrow a counterexample exists and the reporter decodes it; unknown \Rightarrow the solver gave up (treated as a failure).

Total checks per spec, with NN operations, MM invariants, TT temporal declarations:

checks  =  1global  +  2Nrequires/enabled  +  N×Mpreservation  +  Ttemporal (Alloy)\textit{checks} \;=\; \underbrace{1}_{\text{global}} \;+\; \underbrace{2N}_{\textit{requires}\,/\,\textit{enabled}} \;+\; \underbrace{N \times M}_{\text{preservation}} \;+\; \underbrace{T}_{\text{temporal (Alloy)}}

Non-overlap routing (Z3 vs. Alloy)

Each check has exactly one owner. A small Classifier inspects the expression involved and returns either VerifierTool.Z3 or VerifierTool.Alloy:

def classifyInvariant(inv: InvariantDecl): VerifierTool =
  if inv.expr contains powerset (^s)
  then VerifierTool.Alloy
  else VerifierTool.Z3

Rule is deterministic and structural, no runtime fall-through, no "try Z3 then Alloy". Each CheckResult carries a tool field that surfaces in CLI output as [z3] or [alloy]. Specs whose invariants are pure first-order (the vast majority) never invoke Alloy.

What "bounded" means for Alloy routing

Alloy's decision procedure explores instances within a scope, an upper bound on the number of atoms per signature. Default is 5; tune with --alloy-scope N. Finding a counterexample within scope \Rightarrow real bug. No counterexample \Rightarrow no bug up to scope N; it is not an unbounded proof. The CLI line reads e.g.:

  OK [alloy] global              sat      23ms

The sat means "a model was found within scope 5"; increasing --alloy-scope makes the search deeper but more expensive.

--dump-alloy

Mirrors --dump-smt. Emits the generated Alloy .als source to stdout or to a file. Useful for debugging translator output, feeding the source into a standalone Alloy Analyzer, or snapshot-testing translator stability.

Temporal block (pragmatic v1)

Services may declare temporal properties next to invariants:

temporal everyUserReachable:
  eventually(some u in users | u.active)

temporal alwaysOK:
  always(all u in users | u.verified)

Semantics in v1 (bounded single-state, not trace-based):

  • always(P) is checked by asking whether I¬PI \land \neg P is satisfiable; if Alloy finds any II-satisfying state where PP is false within the current scope, the property is violated.
  • eventually(P) is checked by asking whether IPI \land P is satisfiable; if Alloy cannot find any II-satisfying state where PP holds within the current scope, the property is unreachable.
  • fairness(op) is not supported and raises a translator error; implementing it requires trace-based verification via Alloy 6's var-sig mode, deferred to a future milestone.

The bounded-scope caveat applies: absence of a counterexample is not a proof for all sizes. Increase --alloy-scope for deeper search.

Language convention

Inside ensures clauses, unprimed state references are pre-state; primed references (x') are post-state; pre(x) is an explicit-but-equivalent way to name the pre-state.

Smart frame synthesis

For each state slot X, the translator classifies how the operation's ensures mentions it:

PatternSynthesised axiom
X' = RHSnone; the ensures fully specifies X'
k not in X' (no other mention)k. X_dom_post(k)X_dom(k)kk\forall k'.\ \textit{X\_dom\_post}(k') \Leftrightarrow \textit{X\_dom}(k') \land k' \neq k and card_X_post=card_X1\textit{card\_X\_post} = \textit{card\_X} - 1
X'[k].f = vX_dom_post(k)\textit{X\_dom\_post}(k) plus k. kkmap_post(k)=map(k)\forall k'.\ k' \neq k \to \textit{map\_post}(k') = \textit{map}(k'); other fields of map_post(k)\textit{map\_post}(k) equal the corresponding pre-state fields; card_X_post=card_X\textit{card\_X\_post} = \textit{card\_X}
No Prime mentionk. X_dom_post(k)X_dom(k)map_post(k)=map(k)\forall k.\ \textit{X\_dom\_post}(k) \Leftrightarrow \textit{X\_dom}(k) \land \textit{map\_post}(k) = \textit{map}(k) and card_X_post=card_X\textit{card\_X\_post} = \textit{card\_X}
Unclassified Prime mentionno axiom (conservative; the under-constrained post-state admits more models, so the solver may return sat on the VC negation, surfacing as a spurious preservation failure the user investigates)

Cardinality delta synthesis

A structural pattern-matcher over ensures recognises three shapes on state relations and emits the corresponding cardinality-pre-post relationship:

  • kpre(X)X=pre(X)+{kv}    card_X_post=card_X+1k \notin \textit{pre}(X) \land X' = \textit{pre}(X) + \{k \mapsto v\} \;\Rightarrow\; \textit{card\_X\_post} = \textit{card\_X} + 1
  • kpre(X)X=pre(X){k}    card_X_post=card_X1k \in \textit{pre}(X) \land X' = \textit{pre}(X) - \{k\} \;\Rightarrow\; \textit{card\_X\_post} = \textit{card\_X} - 1
  • X=pre(X)X' = \textit{pre}(X) or frame-implied     card_X_post=card_X\;\Rightarrow\; \textit{card\_X\_post} = \textit{card\_X}

Any other arithmetic shape leaves card_X_post uninterpreted, which is sound but weak; affected preservation VCs may come back unknown.

dom(X) = dom(Y) lowers extensionally

dom(X) = dom(Y) and dom(X) != dom(Y) where both sides are Call(dom, [state-relation]) lower to k. X_dom(k)Y_dom(k)\forall k.\ \textit{X\_dom}(k) \Leftrightarrow \textit{Y\_dom}(k) (and its negation). This is load-bearing for invariants like metadataConsistent in url_shortener.spec.

Reading diagnostics

On failure the CLI prints a spec-aware multi-line diagnostic per failing check. Diagnostics carry a category, a primary source span, related spans, an optional counterexample, and an opt-in suggestion.

Categories

CategoryWhen it firesTypical remedy
contradictory_invariantsglobal check is unsattwo invariants overlap-but-contradict, narrow one's range
unsatisfiable_precondition<Op>.requires is unsatrequires can never be true in isolation, relax an input predicate
unreachable_operation<Op>.enabled is unsat while <Op>.requires is satinvariants block every valid pre-state, relax invariants or tighten the input type
invariant_violation_by_operationpreservation VC is satisfied (i.e. invariant fails post)tighten ensures so the invariant's constrained fields are pinned by = or a range predicate
solver_timeoutZ3 returns unknownraise --timeout, simplify the invariant, or split a heavy quantifier
translator_limitationIR construct not yet supported by the verifierskip the affected check or narrow the invariant so the unsupported construct doesn't appear
backend_errorsolver crash / init failurere-run with -v; if reproducible, file an issue with --dump-smt output

Worked example: preservation violation

fixtures/spec/broken_url_shortener.spec
service BrokenUrlShortener {

  entity UrlMapping {
    click_count: Int
  }

  state {
    metadata: Int -> lone UrlMapping
    totalClicks: Int
  }

  operation Tamper {
    input: code: Int

    requires:
      code in metadata

    ensures:
      metadata'[code].click_count = pre(metadata)[code].click_count - 100
  }

  operation Drain {
    requires:
      true

    ensures:
      totalClicks' = -1
  }

  invariant clickCountNonNegative:
    all c in metadata | metadata[c].click_count >= 0

  invariant totalClicksNonNegative:
    totalClicks >= 0
}
$ spec-to-rest verify fixtures/spec/broken_url_shortener.specexit 1
✘ <spec>: 2 failure(s), 0 skipped in 9 consistency checks (<elapsed>)

✘ <spec>:22:2: error: operation 'Drain' violates invariant 'totalClicksNonNegative'
  related: <spec>:33:2 (invariant 'totalClicksNonNegative' declared here)

  Counterexample:
  entities:
    UrlMapping#0 { click_count = 0 }
    UrlMapping#1 { click_count = 0 }
  pre-state:
    metadata = {}
    totalClicks = 0
  post-state:
    metadata' = {}
    totalClicks' = -1

  Why this violates the invariant:
    1. Invariant 'totalClicksNonNegative' requires:
         (totalClicks >= 0)
    2. Operation 'Drain' computes 'totalClicks' from:
         (- 1)
    3. The solver picked pre(totalClicks) = 0, producing post-state totalClicks' = -1.
    4. The post-state value violates the bound on 'totalClicks' from invariant 'totalClicksNonNegative'.

  hint: 'Drain' violates 'totalClicksNonNegative'. Tighten 'ensures' so the fields 'totalClicksNonNegative' constrains are pinned by '=' or a range predicate; see counterexample.

✘ <spec>:12:2: error: operation 'Tamper' violates invariant 'clickCountNonNegative'
  related: <spec>:30:2 (invariant 'clickCountNonNegative' declared here)

  Counterexample:
  inputs:
    code = 1
  entities:
    UrlMapping#0 { click_count = -1 }
    UrlMapping#1 { click_count = 99 }
    UrlMapping#2 { click_count = -1 }
  pre-state:
    metadata = { 1 → UrlMapping#1 }
    totalClicks = 0
  post-state:
    metadata' = { 1 → UrlMapping#0 }
    totalClicks' = 0

  Why this violates the invariant:
    1. Invariant 'clickCountNonNegative' requires:
         (all c in metadata | (metadata[c].click_count >= 0))
    2. Operation 'Tamper' computes 'click_count' from:
         (pre(metadata)[code].click_count - 100)
    3. The solver picked code = 1, pre(metadata)[1].click_count = 99, producing post-state metadata'[1].click_count = -1.
    4. The post-state value violates the bound on 'click_count' from invariant 'clickCountNonNegative'.

  hint: 'Tamper' violates 'clickCountNonNegative' on field(s) 'click_count' — see counterexample. Tighten 'ensures' with a range predicate or a constructor that initialises click_count correctly.

The hint names the violating operation, the violated invariant, and the field(s) the invariant constrains, extracted from the IR rather than re-rendered. Suggestions are capped at 200 chars and can be turned off entirely with --no-suggestions (the suggestion field then comes back as null in JSON output as well). The phrasing is intentionally non-stable text intended for humans; downstream tooling should switch on the category enum, not on the suggestion string.

The numbered "Why this violates" block is the narration layer added in #89: a deterministic structural walk over the violated invariant's IR, the relevant ensures clause, and the decoded counterexample. Steps 1 and 2 pretty-print IR (fully parenthesized, readability cost is small, ambiguity cost is none); step 3 reads input/pre/post values directly off the model (no interpreter, no arithmetic re-evaluation); step 4 is a fixed phrasing keyed to the invariant name. Narration is populated for invariant_violation_by_operation, contradictory_invariants, and unreachable_operation. Other categories surface narrative: null in JSON. Suppress with --no-narration (mirrors --no-suggestions).

The decoder enumerates the model's entity universe, evaluates field functions, and shows pre-/post-state relations side by side. Uninterpreted sort values like UrlMapping!val!0 are rewritten to the readable UrlMapping#N labels.

Worked example: contradictory invariants

fixtures/spec/unsat_invariants.spec
service Contradiction {
  invariant: 1 >= 10
  invariant: 1 <= 5
  invariant: 1 >= 10 and 1 <= 5
}
$ spec-to-rest verify fixtures/spec/unsat_invariants.specexit 1
✘ <spec>: 1 failure(s), 0 skipped in 1 consistency checks (<elapsed>)

✘ <spec>:2:2: error: invariants are jointly unsatisfiable — no valid state exists
  related: <spec>:3:2 (invariant 'inv_1')
  related: <spec>:4:2 (invariant 'inv_2')

  Why these invariants conflict:
    1. The verifier could not satisfy all invariants jointly.
    2. Invariants involved: inv_0, inv_1, inv_2.
       (Run with --explain to see the contributing pair.)

  hint: The invariant set is jointly unsatisfiable; for example, review 'inv_0', 'inv_1', 'inv_2' for a pair whose range constraints cannot overlap (e.g., 'x >= 10' alongside 'x <= 5'); narrow or drop one.

No counterexample is emitted here: the engine concluded unsat on the invariant conjunction alone, so there's no model to decode. Pass --explain to extract the unsat core and sharpen the related: list to the minimal contradictory subset; with --explain off, the related spans cover the full invariant set.

Using the CLI

fixtures/spec/url_shortener.spec
service UrlShortener {

  // --- Type Definitions ---

  type ShortCode = String where len(value) >= 6 and len(value) <= 10
                              and value matches /^[a-zA-Z0-9]+$/

  type LongURL = String where len(value) > 0 and isValidURI(value)

  type BaseURL = String where isValidURI(value)

  // --- Entities ---

  entity UrlMapping {
    code: ShortCode
    url: LongURL
    created_at: DateTime
    click_count: Int where value >= 0

    invariant: isValidURI(url)
  }

  // --- State ---

  state {
    store: ShortCode -> lone LongURL
    metadata: ShortCode -> lone UrlMapping
    base_url: BaseURL
  }

  // --- Operations ---

  operation Shorten {
    input:  url: LongURL
    output: code: ShortCode, short_url: String

    requires:
      isValidURI(url)

    ensures:
      code not in pre(store)
      store' = pre(store) + {code -> url}
      short_url = base_url + "/" + code
      #store' = #pre(store) + 1
      metadata'[code].url = url
      metadata'[code].click_count = 0
  }

  operation Resolve {
    input:  code: ShortCode
    output: url: LongURL

    requires:
      code in store

    ensures:
      url = store[code]
      store' = store
      metadata'[code].click_count = pre(metadata)[code].click_count + 1
  }

  operation Delete {
    input: code: ShortCode

    requires:
      code in store

    ensures:
      code not in store'
      code not in metadata'
      #store' = #pre(store) - 1
  }

  operation ListAll {
    output: entries: Set[UrlMapping]

    requires:
      true

    ensures:
      entries = { m in metadata | true }
      store' = store
  }

  // --- Global Invariants ---

  invariant allURLsValid:
    all c in store | isValidURI(store[c])

  invariant metadataConsistent:
    dom(store) = dom(metadata)

  invariant clickCountNonNegative:
    all c in metadata | metadata[c].click_count >= 0

  // --- Convention Overrides ---

  conventions {
    Shorten.http_method = "POST"
    Shorten.http_path = "/shorten"
    Shorten.http_status_success = 201

    Resolve.http_method = "GET"
    Resolve.http_path = "/{code}"
    Resolve.http_status_success = 302
    Resolve.http_header "Location" = output.url

    Delete.http_method = "DELETE"
    Delete.http_path = "/{code}"
    Delete.http_status_success = 204

    ListAll.http_method = "GET"
    ListAll.http_path = "/urls"
    ListAll.http_status_success = 200
  }
}
$ spec-to-rest verify fixtures/spec/url_shortener.specexit 4
⚠ <spec>: 4/21 checks passed; 17 skipped (17 soundness coverage) (<elapsed>)

⚠ <spec>:87:2: warning: check 'global' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:62:2: warning: check 'Delete.enabled' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:62:2: warning: check 'Delete.preserves.allURLsValid' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:62:2: warning: check 'Delete.preserves.metadataConsistent' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:62:2: warning: check 'Delete.preserves.clickCountNonNegative' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:74:2: warning: check 'ListAll.enabled' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:74:2: warning: check 'ListAll.preserves.allURLsValid' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:74:2: warning: check 'ListAll.preserves.metadataConsistent' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:74:2: warning: check 'ListAll.preserves.clickCountNonNegative' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:49:2: warning: check 'Resolve.enabled' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:49:2: warning: check 'Resolve.preserves.allURLsValid' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:49:2: warning: check 'Resolve.preserves.metadataConsistent' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:33:2: warning: check 'Shorten.requires' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:33:2: warning: check 'Shorten.enabled' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:33:2: warning: check 'Shorten.preserves.allURLsValid' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:33:2: warning: check 'Shorten.preserves.metadataConsistent' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

⚠ <spec>:33:2: warning: check 'Shorten.preserves.clickCountNonNegative' skipped: outside lower's coverage

  hint: The check involves a construct that is outside the formally verified subset captured by the Isabelle 'lower' projection. Narrow the spec to lowerable shapes — or, if the construct is essential to you…

Add -v for stage timing, per-check timing, and the full check table. Each per-check line carries its routed backend tag ([z3] or [alloy]):

fixtures/spec/safe_counter.spec
service SafeCounter {
  state {
    count: Int
  }

  operation Increment {
    requires:
      true

    ensures:
      count' = count + 1
  }

  operation Decrement {
    requires:
      count > 0

    ensures:
      count' = count - 1
  }

  invariant countNonNegative:
    count >= 0
}
$ spec-to-rest verify fixtures/spec/safe_counter.spec -vexit 0
  Parsed in <elapsed>ms
  Built IR in <elapsed>ms
  Timeout: <elapsed>ms
  Alloy scope: 5
  Max parallel: <cores>
✔ <spec>: 7/7 consistency checks passed (<elapsed>)
    ✔ [z3]    [sound]       global                       sat <elapsed>ms
    ✔ [z3]    [sound]       Decrement.requires           sat <elapsed>ms
    ✔ [z3]    [sound]       Decrement.enabled            sat <elapsed>ms
    ✔ [z3]    [sound]       Decrement.preserves.countNonNegative sat <elapsed>ms
    ✔ [z3]    [sound]       Increment.requires           sat <elapsed>ms
    ✔ [z3]    [sound]       Increment.enabled            sat <elapsed>ms
    ✔ [z3]    [sound]       Increment.preserves.countNonNegative sat <elapsed>ms

Skipped checks (translator coverage gap) print as a separate WARN translator limitation on check '<id>': <reason> warning block before the surviving entries in the per-check table; they don't appear inline as a skipped row.

Options

Prop

Type

Exit codes

CodeMeaning
0All translated checks passed (sat); no checks were skipped.
1Specification or spec-file issue: at least one ran check failed (unsat / preservation sat / timeout), or the spec could not be read / parsed / IR-built.
2Translator coverage gap: the verifier hit an unsupported construct, either when translating the whole spec (e.g., under --dump-smt) or while running individual checks. In normal check runs affected checks surface as skipped; other checks may still pass.
3Backend error (solver init failure or solver crash).
4Soundness coverage gap: at least one check was skipped because its source exprs fall outside the Isabelle lower projection, and no higher-precedence outcome (Backend / Violation / Translator) was raised.

Precedence (highest -> lowest): Backend (3) -> Violation (1) -> Translator (2) -> Trust (4) -> Ok (0). A backend crash always wins over a translator gap; a violation always wins over a soundness gap.

Exit 2 is a partial-success signal: the spec may still be correct, but the verifier could not fully discharge every check. Skipped checks print as warnings with a translator_limitation diagnostic so CI / editors can surface the coverage gap alongside successes. Exit 4 is the same shape but driven by the verified subset: the skipped checks carry a soundness_limitation diagnostic, signalling the spec uses constructs the formal soundness theorem does not yet cover (rather than constructs the translator does not handle). Verify routes every check that does run through the extracted (Isabelle-verified) translator; checks that would route through the hand-written translator at the check-body level always skip instead.

Inspecting the SMT-LIB

--dump-smt emits a standalone .smt2 file you can hand to any SMT solver, useful for cross-solver sanity checks, bug reports, or pasting into the Z3 web playground.

$ spec-to-rest verify --dump-smt-out /tmp/out.smt2 fixtures/spec/url_shortener.spec
Wrote SMT-LIB to /tmp/out.smt2

$ z3 /tmp/out.smt2
sat

CI runs exactly this cross-check against the checked-in fixtures/golden/smt/url_shortener.smt2 snapshot using apt-get install -y z3, if the emitter ever produces invalid SMT-LIB or loses satisfiability, CI fails loudly.

Machine-readable output (--json / --json-out)

spec-to-rest verify --json <spec> runs the full engine and emits a structured JSON report to stdout instead of the human-readable text. --json-out <file> writes the same JSON to a file and keeps stdout clean. Either flag can compose with --explain (core spans land in the JSON) and --dump-vc <dir> (VC artifacts go to the dir, JSON to stdout/file).

Exit codes (text and JSON modes alike): 0 pass, 1 violation, 2 translator gap, 3 backend error, 4 soundness gap (at least one check was skipped because its source exprs fall outside the verified subset). Backend errors take precedence over violations and over soundness gaps; the JSON complements the exit signal rather than replacing it.

Combining --json / --json-out with --dump-smt / --dump-alloy is rejected with a clear error: the dump flags short-circuit before any checks run, so there is no report to serialize.

Top-level schema

{
  "schemaVersion": 2,
  "specFile": "fixtures/spec/url_shortener.spec",
  "ok": false,
  "totalMs": 842.0,
  "checks": [ /* ... */ ]
}

Each entry in checks mirrors the internal CheckResult 1:1:

{
  "id": "Tamper.preserves.clickCountNonNegative",
  "kind": "preservation",
  "tool": "z3",
  "trust": "best-effort",
  "operationName": "Tamper",
  "invariantName": "clickCountNonNegative",
  "status": "unsat",
  "durationMs": 142.0,
  "detail": "operation 'Tamper' does not preserve invariant '...', counterexample found",
  "sourceSpans": [{ "startLine": 11, "startCol": 2, "endLine": 19, "endCol": 3 }],
  "diagnostic": {
    "level": "error",
    "category": "invariant_violation_by_operation",
    "message": "operation 'Tamper' violates invariant 'clickCountNonNegative'",
    "primarySpan": { "startLine": 11, "startCol": 2, "endLine": 19, "endCol": 3 },
    "relatedSpans": [{ "span": {...}, "note": "invariant '...' declared here" }],
    "counterexample": {
      "entities": [{ "sortName": "UrlMapping", "label": "UrlMapping#0",
                     "rawElement": "UrlMapping!val!0",
                     "fields": [{ "name": "click_count",
                                  "value": { "display": "-1", "entityLabel": null } }] }],
      "stateRelations": [{ "stateName": "metadata", "side": "pre",
                           "entries": [{ "key": {...}, "value": {...} }] }],
      "stateConstants": [],
      "inputs": [{ "name": "code", "value": { "display": "2", "entityLabel": null } }]
    },
    "suggestion": "Tighten the 'ensures' clause so...",
    "narrative": "Why this violates the invariant:\n  1. Invariant 'clickCountNonNegative' requires:\n       (all c in metadata | (metadata[c].click_count >= 0))\n  ...",
    "coreSpans": []
  }
}

Passing checks have diagnostic: null. Checks without a counterexample (e.g. global unsat) have counterexample: null. coreSpans is an empty array when --explain is off or the backend can't provide a core. narrative is null outside the three categories that support narration (see Reading diagnostics) or when --no-narration is set.

Stable enum tokens

These snake_case strings are the contract consumers depend on. Breaking changes bump schemaVersion.

FieldValues
kindglobal, requires, enabled, preservation, temporal
statussat, unsat, unknown, skipped
toolz3, alloy
trustsound, best-effort
levelerror, warning
categorycontradictory_invariants, unsatisfiable_precondition, unreachable_operation, invariant_violation_by_operation, solver_timeout, translator_limitation, backend_error, soundness_limitation

Example consumer queries

# Did verification pass?
spec-to-rest verify --json spec.spec | jq .ok

# List every failing check ID.
spec-to-rest verify --json spec.spec | jq '.checks[] | select(.status != "sat" and .status != "skipped") | .id'

# Extract the first preservation counterexample's inputs.
spec-to-rest verify --json spec.spec \
  | jq '.checks[] | select(.diagnostic.category == "invariant_violation_by_operation") | .diagnostic.counterexample.inputs'

The JSON is complementary to --dump-vc: --dump-vc emits the raw solver inputs for replay; --json emits the decoded diagnostic data for consumption by editors, CI bots, and aggregators.

Verification certificates

The verifier can write per-check VC artifacts for replay and audit, and can extract unsat cores for diagnosing failure causes.

Per-check VC dump (--dump-vc)

spec-to-rest verify --dump-vc out/ <spec> writes one file per check into out/, plus out/verdicts.json:

out/
global.smt2combined satisfiability of all invariants
Shorten.requires.smt2precondition feasibility for operation Shorten
Shorten.preserves.clickCountNonNegative.smt2preservation VC: op x invariant
Delete.requires.smt2precondition feasibility for operation Delete
temporal.alwaysSafe.alsAlloy-routed temporal check
verdicts.json{ id, tool, outcome, rawStatus, durationMs, file } per check

Z3-routed checks emit SMT-LIB; Alloy-routed checks emit .als. Each file is the exact input that was passed to the live backend, replayable with a fresh z3 -in or the Alloy CLI to re-derive the verdict against an independent solver / version. The verdicts.json index records the verdict, duration, and tool for each check.

The optional z3-cross-check CI job iterates fixtures/golden/vc/*/verdicts.json, replays each .smt2 file listed under an entry with tool == "z3" through native Z3, and asserts agreement with the entry's rawStatus. The symmetric alloy-cross-check job iterates the same verdicts.json files, selects entries with tool == "alloy", replays each entry's .file through the native Alloy 6.2.0 CLI (java -jar org.alloytools.alloy.dist.jar exec), parses receipt.json (presence of a solution key on the single run command = sat, absence = unsat), and asserts the same agreement. The third sibling job, cvc5-cross-check, replays the same Z3-tool .smt2 files through native cvc5 (cvc5 --finite-model-find --tlimit-per=30000) to catch divergence between two independent SMT engines (defense in depth against solver-specific soundness bugs, since neither solver has had a CVE the other shared). A cvc5 unknown is treated as a skip (not all checks decide in 30s on cvc5's default strategy); a definite verdict that disagrees with the recorded rawStatus fails the job and uploads the offending .smt2 artifact. Regenerate the goldens by running sbt "cli/run verify <spec> --dump-vc fixtures/golden/vc/<name>".

rawStatus records what the solver actually returned for the SMT-LIB (sat / unsat / unknown). outcome records the user-facing verdict after preservation-style inversion (a preservation check is "preserved = sat outcome" when the solver returned unsat = no counterexample). The cross-check uses rawStatus; the CLI report uses outcome.

Unsat-core extraction (--explain)

spec-to-rest verify --explain <spec> re-runs the engine in core-extracting mode. For each check whose outcome is unsat, the diagnostic grows an unsat core (contributing assertions): section listing the spec source positions of the assertions that participated in the contradiction:

spec.spec:42:3: error: invariants are jointly unsatisfiable, no valid state exists

  unsat core (contributing assertions):
    spec.spec:14:3  contributing invariant
    spec.spec:30:3  contributing invariant

Mechanism per backend:

  • Z3 uses Solver.assertAndTrack with one tracker constant per assertion; after unsat, getUnsatCore() returns the tracker names whose originating assertions form the core.
  • Alloy switches SATFactory to minisat.prover (the only proof-emitting solver bundled in Pardinus 6.2.0), enables coreGranularity = 1 + coreMinimization = 1, and reads A4Solution.highLevelCore after unsat. Each returned Pos is mapped back to a spec span via the line-map produced during .als rendering.

--explain adds modest overhead (Z3 switches to assert-and-track; Alloy switches to a core-capable SAT backend and minimizes). Use it when investigating failures, not on every run.

Known limitation: Alloy's highLevelCore is empty for contradictions that depend on arithmetic encoding (e.g. cardinality constraints #s = N). Those still produce a correct unsat verdict; only the per-fact attribution is unavailable. Relational / quantifier-driven contradictions return populated cores.

Mechanically verified translator (Isabelle/HOL)

The translator's correctness is mechanically validated by the universal soundness theorem in proofs/isabelle/SpecRest/Soundness.thy. The theorem closes with zero sorry for the verified subset (atoms; logical/arithmetic/comparison operators; state-relation membership/cardinality/lookup/subset; set literals and set-valued operations; FieldAccess on entity-typed values; single-state Prime/Pre collapse; quantifiers over enums and state-relations; record-update via Skolem encoding).

When verify returns UNSAT for an in-subset obligation, that verdict reflects a property of the spec, not a coincidence between the translator and Z3.

Isabelle's Code_Target_Scala extracts the verified translate, eval, and smt_eval functions plus the canonical IR ADT to ~2.4 kLoC of idiomatic Scala 3 (BigInt-mapped) at ir/generated/SpecRestGenerated.scala. The Scala layer's translate is no longer hand-written; it is the extracted Isabelle definition. Since #202, the IR ADT consumed by every module is also extracted.

CI builds the proofs every PR via .github/workflows/isabelle-build.yml (~1m 30s for isabelle build SpecRest).

Trust closure: Isabelle/HOL kernel + Z3 driver + the extracted-Scala output. The previously emitted per-run --emit-cert Lake bundles were deleted post-pivot; universal soundness covers what they validated, structurally.

Delivered through #88 (closed 2026-04-26 after decomposition into M_L.0-M_L.4 + the #170 global-proof umbrella) and the Isabelle/HOL pivot #193.

What's intentionally not exported

Full Z3 proof terms (Alethe-format, machine-checkable by carcara / veriT) are not exported. Solver.getProof() is reachable through the JNI binding but format / replay tooling is unsettled and there is no concrete consumer today. Tracked separately in #90.

Timeout, cancellation, and parallel dispatch

--timeout maps to a solver-native deadline (Z3's timeout parameter, Alloy's Future.get(timeout, MILLISECONDS)); there is no outer IO.timeoutTo wrapper. The solver_timeout diagnostic above is the only verifier-specific surface , the inner deadline expiry produces a CheckOutcome.Unknown carrying that category. 0 disables the solver-native deadline; the 30 s default is conservative headroom for heavier preservation checks on large specs.

--parallel n (or 0 for serial) and the cancel-on-Ctrl+C / SIGINT path live entirely in the effect layer: per-check fibers via parTraverseN, Resource[IO, WasmBackend] / Resource[IO, AlloyBackend] for backend lifecycle, Context.interrupt() wired into onCancel, and a CommandIOApp shutdown hook for signal handling. The full mechanics (JMH numbers, native-memory caveats, the cancellation contract, and the Context.interrupt() mid-call abort details) are documented in Concurrency & Cancellation.

Set theory

Set[T] maps to Z3's built-in set theory (extensional Array T Bool under the hood). The SMT-LIB emitter and the Z3 Java backend agree on the same surface:

Spec formSMT-LIB
Set[Int] type(Set Int)
x in {A, B, C}(or (= x A) (= x B) (= x C)), exact lowering
x not in {A, B, C}(not (or ...))
x in (a union b) etc.(select <expr> x)
a union b(union a b)
a intersect b(intersection a b)
a minus b(setminus a b)
a subset b(subset a b)
{ } (empty set)((as const (Set T)) false), requires receiver context
{ e1, e2 } (non-empty lit)(store (store ((as const (Set T)) false) e1 true) e2 true)

Set-literal membership (in {A, B, C}) lowers to a disjunction of equalities rather than a select against a skolem set; this is exact and avoids introducing uninterpreted infrastructure for the common enum-membership pattern.

{} as a standalone expression is rejected (the element sort can't be inferred from the literal itself); assign it to a typed receiver or constrain it by equality.

Standalone set comprehensions (entries = { x in D | P }) remain a translator throw: the comprehension's element sort is the binder's sort (e.g. a relation's key sort), which typically mismatches the receiver's declared type and would produce a sort-mismatch rather than a useful verdict. Use the inline membership form instead: y in { x in S | P }.

Extending the translator

Adding a new IR expression kind is a three-file change:

  1. verify/z3/Translator.scala, add a case to translateExpr; declare any helper uninterpreted functions lazily via ctx.declareFunc. Spans flow automatically thanks to the wrapping translateExpr call.
  2. verify/z3/SmtLib.scala, add a case to renderExpr that produces valid SMT-LIB text.
  3. verify/z3/Backend.scala, add a case to renderExpr that produces a com.microsoft.z3.BoolExpr / Arith / Expr as appropriate.

Unit-test the new case in verify/test/z3/SmtLibGoldenTest.scala (Z3Script shape + SMT-LIB render, no solver), and add a scenario to verify/test/z3/BackendTest.scala if the case requires solver round-tripping. End-to-end behaviour goes in verify/test/ConsistencyTest.scala or verify/test/z3/CounterExampleTest.scala. Update fixtures/golden/smt/url_shortener.smt2 by dumping the new SMT via sbt "cli/run verify --dump-smt-out fixtures/golden/smt/url_shortener.smt2 fixtures/spec/url_shortener.spec" when the snapshot legitimately changes.

Verify-as-gate in compile

spec-to-rest compile runs the verification engine as a pre-codegen gate. If any check reports a failure, compile exits non-zero and writes no files, the output directory stays untouched (not even created). For a deployment profile that claims "verified correct", the gate is the guarantee that what ships matches what was checked.

The gate shares one code path with spec-to-rest verify: same Consistency.runConsistencyChecks call, same diagnostic reporter, same exit-code mapping (0 / 1 / 2 / 3). A spec that verify exits 1 on will make compile exit 1 before any emit runs. A spec that verify clears (exit 0) compiles normally.

Escape hatch

When you need to emit code against a spec that doesn't (yet) pass verification (active debugging, a known translator-coverage gap, a demo on a counterexample spec), pass --ignore-verify:

$ spec-to-rest compile --ignore-verify --out /tmp/out fixtures/spec/unsat_invariants.spec
WARN proceeding without verification (--ignore-verify)
OK wrote 26 files to /tmp/out

The warning prints on every --ignore-verify run; the flag is conspicuous by design, not a silent passthrough.

Gate defaults and tuning

The gate uses verify's defaults: 30 s per-check timeout, Alloy scope 5, host-CPU parallelism. There are no --timeout / --parallel / --alloy-scope flags on compile today. If the defaults don't fit your spec, run spec-to-rest verify --timeout X --parallel N <spec> first to confirm green, then spec-to-rest compile --ignore-verify <spec>. File an issue if the pattern comes up often and the gate deserves its own tuning knobs.

Exit codes

The gate inherits verify's contract unchanged:

CodeMeaning
0Gate passed. Codegen ran; files written.
1One or more checks reported unsat or unknown. No files written.
2Translator coverage gap (Skipped checks with TranslatorLimitation).
3Backend error (Z3 / Alloy crash or setup failure).

A spec that exits 2 blocks codegen because the gate cannot prove the spec correct; use --ignore-verify if you accept the coverage gap.

See also

  • Convention Engine, the other half of the M1-M10 pipeline.
  • Architecture, where verification sits in the overall compiler.
  • Research: Spec Verification, the full verification design (pipeline, Alloy/Quint backends, counterexample shape).

On this page