Implementation Architecture

Design document for the compiler toolchain, parser technology, IR design, project structure, dependency choices, and build plan. This covers HOW we build the compiler itself, rather than what it compiles to.

Status, actual decisions vs. this research doc. This doc evaluated five compiler-host languages and originally recommended TypeScript (§1.7). The shipped compiler runs on Scala 3.6.3 (see Architecture for the live overview). The language-comparison sketches in §1 are kept as historical context, they're "if we used Python / Rust / TS / Go / Kotlin, here's what code would look like", rather than project source. The Scala source of truth for each component is linked from the live docs: IR types (extracted from proofs/isabelle/SpecRest/IR.thy), parser, verifier, convention engine, codegen, testgen, CLI.

Language Choice for the Compiler Itself
Parser Technology
Intermediate Representation (IR) Design
Solver Integration Architecture
Project Structure
CLI Design
Testing Strategy for the Compiler Itself
Dependency Management and Distribution
Build Plan with Milestones
Risk Mitigation

1. Language choice for the compiler itself

The compiler implementation language determines development velocity, integration story with formal tools, and distribution ergonomics. We evaluate five candidates against nine criteria.

1.1 Python

Parser code example (hand-written recursive descent)

from dataclasses import dataclass
from typing import List, Optional

@dataclass
class OperationDecl:
    name: str
    inputs: List["Parameter"]
    outputs: List["Parameter"]
    requires: List["Expr"]
    ensures: List["Expr"]

class Parser:
    def __init__(self, tokens: List[Token]):
        self.tokens = tokens
        self.pos = 0

    def parse_operation(self) -> OperationDecl:
        self.expect(TokenKind.KEYWORD, "operation")
        name = self.expect(TokenKind.IDENT).value
        self.expect(TokenKind.LBRACE)
        inputs = self.parse_param_block("input")
        outputs = self.parse_param_block("output")
        requires = self.parse_constraint_block("requires")
        ensures = self.parse_constraint_block("ensures")
        self.expect(TokenKind.RBRACE)
        return OperationDecl(name, inputs, outputs, requires, ensures)

    def parse_constraint_block(self, keyword: str) -> List[Expr]:
        if not self.check(TokenKind.KEYWORD, keyword):
            return []
        self.advance()
        self.expect(TokenKind.COLON)
        constraints = []
        while not self.check(TokenKind.KEYWORD) and not self.check(TokenKind.RBRACE):
            constraints.append(self.parse_expr())
            self.skip_newlines()
        return constraints

Template/emitter code example (Jinja2)

from jinja2 import Environment, FileSystemLoader

class PythonFastAPIEmitter:
    def __init__(self, ir: ServiceIR):
        self.ir = ir
        self.env = Environment(loader=FileSystemLoader("templates/python"))

    def emit_route(self, op: OperationIR) -> str:
        template = self.env.get_template("route.py.j2")
        return template.render(
            method=op.http_method.lower(),
            path=op.http_path,
            func_name=snake_case(op.name),
            input_model=op.input_model_name,
            output_model=op.output_model_name,
            requires_checks=op.requires,
            status_code=op.http_status_success,
        )

Z3 integration. Native. The z3-solver PyPI package ships prebuilt binaries for all platforms. The Python API is the primary Z3 interface, it is the best-documented and most feature-complete binding. Example:

from z3 import Solver, Int, And, ForAll, Implies, sat

s = Solver()
code_len = Int("code_len")
s.add(And(code_len >= 6, code_len <= 10))
result = s.check()  # sat

LLM API integration. Best-in-class. The anthropic Python SDK is Anthropic's primary SDK. OpenAI, Cohere, and every other LLM provider ship Python SDKs first. Streaming, tool use, batch APIs are all production-grade.

Distribution/packaging. Weakest dimension. Python packaging is notoriously fragile. Options: pip install spec-to-rest requires users have a working Python environment. pipx improves isolation. PyInstaller or Nuitka can produce single binaries but add 50-100 MB and have platform-specific quirks. Docker is the reliable fallback.

Assessment. Best for prototyping and integration with Z3/LLM ecosystems. The packaging story is the main liability, mitigated by Docker distribution. Type annotations with mypy provide adequate safety for a project of this scale.

1.2 Rust

Parser code example (using pest PEG parser)

use pest::Parser;
use pest_derive::Parser;

#[derive(Parser)]
#[grammar = "dsl.pest"]
struct DslParser;

fn parse_operation(pair: pest::iterators::Pair<Rule>) -> OperationDecl {
    let mut inner = pair.into_inner();
    let name = inner.next().unwrap().as_str().to_string();
    let mut inputs = Vec::new();
    let mut outputs = Vec::new();
    let mut requires = Vec::new();
    let mut ensures = Vec::new();

    for item in inner {
        match item.as_rule() {
            Rule::input_block => inputs = parse_params(item),
            Rule::output_block => outputs = parse_params(item),
            Rule::requires_block => requires = parse_constraints(item),
            Rule::ensures_block => ensures = parse_constraints(item),
            _ => unreachable!(),
        }
    }
    OperationDecl { name, inputs, outputs, requires, ensures }
}

Template/emitter code example (askama)

use askama::Template;

#[derive(Template)]
#[template(path = "route.py.j2")]
struct RouteTemplate<'a> {
    method: &'a str,
    path: &'a str,
    func_name: &'a str,
    input_model: &'a str,
    output_model: &'a str,
    status_code: u16,
}

fn emit_route(op: &OperationIR) -> String {
    let tmpl = RouteTemplate {
        method: &op.http_method.to_lowercase(),
        path: &op.http_path,
        func_name: &snake_case(&op.name),
        input_model: &op.input_model_name,
        output_model: &op.output_model_name,
        status_code: op.http_status_success,
    };
    tmpl.render().unwrap()
}

Z3 integration. Available via z3 crate (Rust bindings to the C API). Functional but less ergonomic than Python. The bindings are maintained but lag behind the Python API in coverage. Building from source requires LLVM/Clang toolchain.

LLM API integration. The anthropic Rust crate exists but is community-maintained, not official. HTTP-level integration via reqwest is straightforward but requires implementing streaming, retry logic, and error handling manually.

Distribution/packaging. Excellent. cargo install spec-to-rest produces a single static binary. Cross-compilation via cross gives Linux/macOS/Windows from any host. Binary size ~10-20 MB. No runtime dependencies.

Assessment. Best for production distribution and performance. Slower development velocity (2-3x vs Python for compiler code). The Z3 and LLM integration stories are adequate but not first-class. Good choice if the project outlives prototyping and performance matters (large specs with many operations).

1.3 TypeScript/node

Parser code example (hand-written with tokenizer)

interface OperationDecl {
  name: string;
  inputs: Parameter[];
  outputs: Parameter[];
  requires: Expr[];
  ensures: Expr[];
}

class Parser {
  private tokens: Token[];
  private pos: number = 0;

  parseOperation(): OperationDecl {
    this.expect(TokenKind.Keyword, "operation");
    const name = this.expect(TokenKind.Ident).value;
    this.expect(TokenKind.LBrace);
    const inputs = this.parseParamBlock("input");
    const outputs = this.parseParamBlock("output");
    const requires = this.parseConstraintBlock("requires");
    const ensures = this.parseConstraintBlock("ensures");
    this.expect(TokenKind.RBrace);
    return { name, inputs, outputs, requires, ensures };
  }

  private parseConstraintBlock(keyword: string): Expr[] {
    if (!this.check(TokenKind.Keyword, keyword)) return [];
    this.advance();
    this.expect(TokenKind.Colon);
    const constraints: Expr[] = [];
    while (!this.check(TokenKind.Keyword) && !this.check(TokenKind.RBrace)) {
      constraints.push(this.parseExpr());
      this.skipNewlines();
    }
    return constraints;
  }
}

Template/emitter code example (EJS)

import * as ejs from "ejs";
import * as fs from "fs";

class PythonFastAPIEmitter {
  private templates: Map<string, string>;

  constructor(templateDir: string) {
    this.templates = new Map();
    this.templates.set("route", fs.readFileSync(`${templateDir}/route.py.ejs`, "utf-8"));
  }

  emitRoute(op: OperationIR): string {
    return ejs.render(this.templates.get("route")!, {
      method: op.httpMethod.toLowerCase(),
      path: op.httpPath,
      funcName: snakeCase(op.name),
      inputModel: op.inputModelName,
      outputModel: op.outputModelName,
      statusCode: op.httpStatusSuccess,
    });
  }
}

Z3 integration. The z3-solver npm package provides WASM-compiled Z3. It works but is ~3x slower than native Z3 and has a 35 MB WASM payload. Alternatively, shell out to the Z3 binary. Neither option is as clean as the Python bindings.

LLM API integration. Good. The @anthropic-ai/sdk is an official Anthropic package. OpenAI's Node SDK is equally mature. Streaming with async iterators works well.

Distribution/packaging. Decent. npm install -g spec-to-rest works for Node users. pkg or esbuild can produce single-binary executables (~50-80 MB). Bun offers faster startup and single-binary distribution.

Assessment. Good middle ground. TypeScript provides static typing with fast iteration. The Z3 story is the weakest link. Tree-sitter has native JS bindings. Best choice if the team is JS-native and prioritizes development speed over solver integration quality.

1.4 Go

Parser code example (hand-written)

type OperationDecl struct {
    Name     string
    Inputs   []Parameter
    Outputs  []Parameter
    Requires []Expr
    Ensures  []Expr
}

func (p *Parser) parseOperation() (*OperationDecl, error) {
    if err := p.expect(TokenKeyword, "operation"); err != nil {
        return nil, err
    }
    name, err := p.expectIdent()
    if err != nil {
        return nil, err
    }
    if err := p.expect(TokenLBrace, ""); err != nil {
        return nil, err
    }
    inputs, err := p.parseParamBlock("input")
    if err != nil {
        return nil, err
    }
    outputs, err := p.parseParamBlock("output")
    if err != nil {
        return nil, err
    }
    requires, err := p.parseConstraintBlock("requires")
    if err != nil {
        return nil, err
    }
    ensures, err := p.parseConstraintBlock("ensures")
    if err != nil {
        return nil, err
    }
    if err := p.expect(TokenRBrace, ""); err != nil {
        return nil, err
    }
    return &OperationDecl{
        Name: name, Inputs: inputs, Outputs: outputs,
        Requires: requires, Ensures: ensures,
    }, nil
}

Template/emitter code example

import "text/template"

const routeTemplate = `
@app.{{.Method}}("{{.Path}}", status_code={{.StatusCode}})
async def {{.FuncName}}({{if .InputModel}}body: {{.InputModel}}{{end}}) -> {{.OutputModel}}:
    {{range .RequiresChecks}}
    if not ({{.}}):
        raise HTTPException(status_code=422, detail="Precondition failed: {{.}}")
    {{end}}
    # TODO: implementation
    pass
`

func emitRoute(op *OperationIR) (string, error) {
    tmpl, err := template.New("route").Parse(routeTemplate)
    if err != nil {
        return "", err
    }
    var buf bytes.Buffer
    err = tmpl.Execute(&buf, op)
    return buf.String(), err
}

Z3 integration. No maintained Go bindings. Must shell out to the z3 binary or use CGo with the C API (fragile, breaks cross-compilation). This is a significant gap.

LLM API integration. No official Anthropic Go SDK. Community packages exist but are not feature-complete. HTTP-level integration is straightforward with net/http.

Distribution/packaging. Excellent. go install produces a single static binary. Cross-compilation is trivial (GOOS=linux GOARCH=amd64 go build). Binary size ~15 MB.

Assessment. Excellent distribution but poor integration with formal tools and LLM SDKs. The verbose error handling makes compiler code significantly more boilerplate. Go's lack of sum types makes IR representation awkward (must use interfaces with type switches). Not recommended for this project.

1.5 Kotlin/JVM

Parser code example (ANTLR4 + Kotlin)

class SpecIRBuilder : SpecBaseVisitor<Any>() {
    override fun visitOperationDecl(ctx: SpecParser.OperationDeclContext): OperationDecl {
        val name = ctx.IDENT().text
        val inputs = ctx.paramBlock("input")?.let { visitParamBlock(it) } ?: emptyList()
        val outputs = ctx.paramBlock("output")?.let { visitParamBlock(it) } ?: emptyList()
        val requires = ctx.constraintBlock("requires")?.let { visitConstraints(it) } ?: emptyList()
        val ensures = ctx.constraintBlock("ensures")?.let { visitConstraints(it) } ?: emptyList()
        return OperationDecl(name, inputs, outputs, requires, ensures)
    }

    override fun visitConstraintBlock(ctx: SpecParser.ConstraintBlockContext): List<Expr> {
        return ctx.expr().map { visitExpr(it) as Expr }
    }
}

Template/emitter code example (Kotlin + stringtemplate)

class PythonFastAPIEmitter(private val ir: ServiceIR) {
    fun emitRoute(op: OperationIR): String = buildString {
        appendLine("@app.${op.httpMethod.lowercase()}(\"${op.httpPath}\", status_code=${op.httpStatusSuccess})")
        append("async def ${op.name.toSnakeCase()}(")
        if (op.inputModelName != null) append("body: ${op.inputModelName}")
        appendLine(") -> ${op.outputModelName}:")
        for (req in op.requires) {
            appendLine("    if not (${emitExpr(req)}):")
            appendLine("        raise HTTPException(status_code=422, detail=\"Precondition failed\")")
        }
        appendLine("    # implementation")
    }
}

Z3 integration. Excellent. Z3 ships official Java bindings (com.microsoft.z3). Kotlin interoperates perfectly with Java libraries. The Z3 Java API mirrors the Python API closely.

LLM API integration. No official Anthropic Kotlin/Java SDK. However, the HTTP API is trivial to call via ktor-client or OkHttp. Alternatively, use langchain4j which wraps Anthropic and OpenAI with a unified interface.

Distribution/packaging. Moderate. GraalVM native-image can produce a single binary (~30-50 MB) with fast startup. Without GraalVM, requires a JVM (~200 MB). Fat JARs via shadowJar are the standard approach but require Java on the user's machine.

Z3 + Alloy + Dafny synergy. This is Kotlin's killer advantage. Alloy Analyzer is a Java library (edu.mit.csail.sdg:org.alloytools.alloy). Kodkod (Alloy's SAT backend) is Java. Dafny has a JVM compilation target and its API is accessible from the JVM. All three formal tools are natively available without subprocess calls.

Assessment. Strongest integration with formal methods tools (Alloy, Z3, Dafny all JVM-native). ANTLR4 is the gold standard parser generator and it targets Kotlin/Java natively. Kotlin's sealed classes provide excellent sum type support for IR representation. The main cost is JVM distribution weight, mitigated by GraalVM native-image. Recommended if the team is JVM-proficient.

1.6 Comparative matrix

Criterion	Python	Rust	TypeScript	Go	Kotlin
Dev velocity	5	2	4	3	4
Type safety	3 (mypy)	5	4	4	5
Parser ecosystem	4	4	3	2	5 (ANTLR)
Z3 integration	5	3	2	1	4
Alloy integration	2 (subprocess)	1 (subprocess)	1 (subprocess)	1 (subprocess)	5 (native)
Dafny integration	3 (subprocess)	2 (subprocess)	2 (subprocess)	2 (subprocess)	4 (JVM target)
LLM SDK quality	5	2	4	2	3
Distribution	2	5	3	5	3 (GraalVM)
IR modeling (sum types)	3	5 (enums)	3	2	5 (sealed)
Weighted total	32	29	26	22	38

Weights: Parser, Z3, Alloy, Dafny integration weighted 1.5x (core compiler concerns).

1.7 Recommendation

Original research recommendation: TypeScript. Actual decision: Scala 3 (the shipped compiler lives in Scala 3.6.3 under modules/). The TypeScript recommendation was overruled during M0 in favour of the Kotlin-leaning 38 row of the comparative matrix above, generalised to the JVM. Drivers:

Alloy 6 is a Java library (org.alloytools:org.alloytools.alloy.core) and Kodkod is Java, JVM-native invocation without subprocess fragility was decisive.
Z3 via tools.aqua:z3-turnkey ships libz3 natively for every supported OS/arch (no system install, no WASM step) and exposes the full Z3 Java API.
Scala 3's enum ADTs with derives CanEqual give the IR exhaustive pattern-match guarantees, with Mirror-based JSON via circe.
ANTLR4 has first-class Scala support via sbt-antlr4.
Cats Effect 3 + decline-effect + munit-cats-effect give the IO-typed pipeline, per-check Resource lifecycle, and parTraverseN parallelism, see Concurrency and Cancellation.
Translator soundness is mechanically validated by the universal soundness theorem in Isabelle/HOL (proofs/isabelle/SpecRest/Soundness.thy); Code_Target_Scala extracts the verified translate/eval/smt_eval plus the canonical IR ADT to ir/generated/SpecRestGenerated.scala. Pivoted from Lean 4 via #193; IR canonicalized in #202.

The generated code targets are unchanged from the original recommendation (Python/FastAPI shipped today; Go/chi #33 and TS/Express #35 tracked as Phase 7 work), those are output targets, rather than the compiler's own implementation language.

The remainder of this document presents design examples in Python and TypeScript for illustrative comparison; the production compiler is Scala 3, and the live source of truth for the IR ADT is proofs/isabelle/SpecRest/IR.thy (extracted to modules/ir/src/main/scala/specrest/ir/generated/SpecRestGenerated.scala by Code_Target_Scala).

2. Parser technology

2.1 Hand-written recursive descent

When is this the right choice?

When the grammar is small, error messages must be exceptional, and the team wants full control over recovery strategies. Most production compilers (Go, Rust, Swift, V8's JS parser) use hand-written recursive descent parsers. The DSL grammar for spec-to-rest is small enough (~30 productions) that a hand-written parser is viable.

Estimated code size: 800-1200 lines for the full grammar (lexer + parser).

Error recovery strategies

Synchronization: on error, skip tokens until a synchronization point (next }, next keyword like operation, entity, state)
Insert missing token: if { expected but IDENT found, insert { and report
Panic-mode with context: maintain a stack of "currently parsing X" for messages like "expected : after parameter name in operation Shorten"

Parser sketch for operation declaration

class Parser:
    def parse_operation(self) -> OperationDecl:
        """
        operation_decl := 'operation' IDENT '{' param_blocks constraint_blocks '}'
        param_blocks   := (input_block | output_block)*
        input_block    := 'input' ':' param_list
        output_block   := 'output' ':' param_list
        param_list     := param (',' param)*
        param          := IDENT ':' type_expr
        constraint_blocks := (requires_block | ensures_block)*
        requires_block := 'requires' ':' expr_list
        ensures_block  := 'ensures' ':' expr_list
        expr_list      := expr (NEWLINE expr)*
        """
        start = self.current_pos()
        self.expect(TokenKind.KEYWORD, "operation")
        name_tok = self.expect(TokenKind.IDENT)
        self.expect(TokenKind.LBRACE)

        inputs: List[Parameter] = []
        outputs: List[Parameter] = []
        requires: List[Expr] = []
        ensures: List[Expr] = []

        while not self.check(TokenKind.RBRACE) and not self.is_at_end():
            if self.check_keyword("input"):
                inputs = self.parse_input_block()
            elif self.check_keyword("output"):
                outputs = self.parse_output_block()
            elif self.check_keyword("requires"):
                requires = self.parse_requires_block()
            elif self.check_keyword("ensures"):
                ensures = self.parse_ensures_block()
            else:
                self.error_and_sync(
                    f"unexpected {self.current().kind} in operation body; "
                    f"expected 'input', 'output', 'requires', or 'ensures'",
                    sync_to={TokenKind.KEYWORD, TokenKind.RBRACE}
                )

        self.expect(TokenKind.RBRACE)
        return OperationDecl(
            name=name_tok.value,
            inputs=inputs,
            outputs=outputs,
            requires=requires,
            ensures=ensures,
            span=Span(start, self.current_pos()),
        )

    def parse_expr(self, min_prec: int = 0) -> Expr:
        """Pratt parser for expressions with operator precedence."""
        left = self.parse_prefix()

        while not self.is_at_end():
            op = self.current()
            prec = self.precedence(op)
            if prec < min_prec:
                break

            if op.kind == TokenKind.KEYWORD and op.value in ("in", "not"):
                left = self.parse_membership(left)
            elif op.kind == TokenKind.KEYWORD and op.value in ("and", "or"):
                left = self.parse_logical(left, prec)
            elif op.kind in (TokenKind.EQ, TokenKind.NEQ, TokenKind.LT,
                             TokenKind.GT, TokenKind.LTE, TokenKind.GTE):
                left = self.parse_comparison(left, prec)
            elif op.kind == TokenKind.DOT:
                left = self.parse_field_access(left)
            elif op.kind == TokenKind.LBRACKET:
                left = self.parse_index(left)
            elif op.kind in (TokenKind.PLUS, TokenKind.MINUS):
                left = self.parse_arithmetic(left, prec)
            else:
                break

        return left

    def parse_prefix(self) -> Expr:
        tok = self.current()

        if tok.kind == TokenKind.HASH:
            self.advance()
            operand = self.parse_prefix()
            return Cardinality(operand, span=Span(tok.pos, operand.span.end))

        if tok.kind == TokenKind.KEYWORD and tok.value == "pre":
            self.advance()
            self.expect(TokenKind.LPAREN)
            expr = self.parse_expr()
            self.expect(TokenKind.RPAREN)
            return PreState(expr, span=Span(tok.pos, self.prev_pos()))

        if tok.kind == TokenKind.KEYWORD and tok.value == "all":
            return self.parse_quantifier(QuantifierKind.FORALL)

        if tok.kind == TokenKind.KEYWORD and tok.value == "some":
            return self.parse_quantifier(QuantifierKind.EXISTS)

        if tok.kind == TokenKind.IDENT:
            self.advance()
            name = tok.value
            if name.endswith("'"):
                return PostState(Variable(name[:-1]), span=tok.span)
            return Variable(name, span=tok.span)

        if tok.kind == TokenKind.STRING:
            self.advance()
            return Literal(tok.value, LiteralKind.STRING, span=tok.span)

        if tok.kind == TokenKind.NUMBER:
            self.advance()
            return Literal(int(tok.value), LiteralKind.INT, span=tok.span)

        if tok.kind == TokenKind.LPAREN:
            self.advance()
            expr = self.parse_expr()
            self.expect(TokenKind.RPAREN)
            return expr

        self.error(f"expected expression, got {tok.kind}")

2.2 ANTLR4

Complete grammar for a significant subset of the DSL

grammar Spec;

// ===== Parser Rules =====

specFile
    : serviceDecl EOF
    ;

serviceDecl
    : 'service' IDENT '{' serviceBody* '}'
    ;

serviceBody
    : entityDecl
    | stateDecl
    | operationDecl
    | invariantDecl
    | conventionsDecl
    ;

entityDecl
    : 'entity' IDENT '{' entityField* entityInvariant* '}'
    ;

entityField
    : IDENT ':' typeExpr
    ;

entityInvariant
    : 'invariant' ':' expr
    ;

stateDecl
    : 'state' '{' stateRelation* '}'
    ;

stateRelation
    : IDENT ':' typeExpr '->' multiplicity typeExpr     // e.g., store: ShortCode -> lone LongURL
    ;

multiplicity
    : 'one' | 'lone' | 'some' | 'set'
    ;

operationDecl
    : 'operation' IDENT '{' operationBody* '}'
    ;

operationBody
    : inputBlock
    | outputBlock
    | requiresBlock
    | ensuresBlock
    ;

inputBlock
    : 'input' ':' paramList
    ;

outputBlock
    : 'output' ':' paramList
    ;

paramList
    : param (',' param)*
    ;

param
    : IDENT ':' typeExpr
    ;

requiresBlock
    : 'requires' ':' exprList
    ;

ensuresBlock
    : 'ensures' ':' exprList
    ;

exprList
    : expr (NEWLINE expr)*
    ;

invariantDecl
    : 'invariant' ':' expr
    ;

conventionsDecl
    : 'conventions' '{' conventionEntry* '}'
    ;

conventionEntry
    : qualifiedIdent '=' expr                           // Resolve.http_method = GET
    | qualifiedIdent STRING '=' expr                    // Resolve.http_header "Location" = ...
    ;

qualifiedIdent
    : IDENT ('.' IDENT)*
    ;

// ----- Type Expressions -----

typeExpr
    : IDENT                              // simple type: String, Int, ShortCode
    | typeExpr '?'                       // optional: String?
    | 'Set' '<' typeExpr '>'             // set type
    | 'Map' '<' typeExpr ',' typeExpr '>'// map type
    | 'List' '<' typeExpr '>'            // list type
    ;

// ----- Expressions -----

expr
    : expr 'and' expr                    // logical and
    | expr 'or' expr                     // logical or
    | 'not' expr                         // logical not
    | expr ('=' | '!=' | '<' | '>' | '<=' | '>=') expr  // comparison
    | expr ('+' | '-') expr              // arithmetic
    | expr ('*' | '/') expr              // arithmetic
    | expr 'in' expr                     // membership
    | expr 'not' 'in' expr              // non-membership
    | '#' expr                           // cardinality
    | 'pre' '(' expr ')'                // pre-state
    | expr PRIME                          // post-state (store')
    | expr '[' expr ']'                  // indexing
    | expr '.' IDENT                     // field access
    | IDENT '(' exprArgList? ')'         // function call
    | quantifierExpr                     // forall / exists
    | IDENT                              // variable
    | STRING                             // string literal
    | NUMBER                             // number literal
    | BOOL                               // boolean literal
    | '(' expr ')'                       // grouping
    ;

quantifierExpr
    : ('all' | 'some' | 'no') IDENT 'in' expr '|' expr
    ;

exprArgList
    : expr (',' expr)*
    ;

// ===== Lexer Rules =====

PRIME     : '\'' ;
IDENT     : [a-zA-Z_][a-zA-Z0-9_]* ;
STRING    : '"' (~["\\\n] | '\\' .)* '"' ;
NUMBER    : [0-9]+ ('.' [0-9]+)? ;
BOOL      : 'true' | 'false' ;
NEWLINE   : '\r'? '\n' ;
WS        : [ \t]+ -> skip ;
LINE_COMMENT : '//' ~[\r\n]* -> skip ;
BLOCK_COMMENT : '/*' .*? '*/' -> skip ;

Error reporting quality. ANTLR4 generates decent default error messages. Custom error strategies can be implemented via ANTLRErrorStrategy. The DefaultErrorStrategy provides synchronization-based recovery. For production quality, implement a custom BailErrorStrategy that collects multiple errors before aborting.

IDE integration. ANTLR4 grammars can be used to generate TextMate grammars (via antlr4-tm) for VS Code syntax highlighting, and Language Server Protocol (LSP) implementations can use the ANTLR parser for completions and diagnostics.

2.3 Tree-sitter

Complete tree-sitter grammar for a significant subset

// grammar.js
module.exports = grammar({
  name: "spec",

  extras: ($) => [/\s/, $.comment],

  rules: {
    source_file: ($) => $.service_decl,

    service_decl: ($) =>
      seq(
        "service",
        $.identifier,
        "{",
        repeat(
          choice(
            $.entity_decl,
            $.state_decl,
            $.operation_decl,
            $.invariant_decl,
            $.conventions_decl,
          ),
        ),
        "}",
      ),

    entity_decl: ($) =>
      seq("entity", $.identifier, "{", repeat(choice($.entity_field, $.entity_invariant)), "}"),

    entity_field: ($) => seq($.identifier, ":", $.type_expr),

    entity_invariant: ($) => seq("invariant", ":", $._expr),

    state_decl: ($) => seq("state", "{", repeat($.state_relation), "}"),

    state_relation: ($) => seq($.identifier, ":", $.type_expr, "->", $.multiplicity, $.type_expr),

    multiplicity: ($) => choice("one", "lone", "some", "set"),

    operation_decl: ($) =>
      seq(
        "operation",
        $.identifier,
        "{",
        repeat(choice($.input_block, $.output_block, $.requires_block, $.ensures_block)),
        "}",
      ),

    input_block: ($) => seq("input", ":", $.param_list),
    output_block: ($) => seq("output", ":", $.param_list),
    param_list: ($) => seq($.param, repeat(seq(",", $.param))),
    param: ($) => seq($.identifier, ":", $.type_expr),

    requires_block: ($) => seq("requires", ":", $.expr_list),
    ensures_block: ($) => seq("ensures", ":", $.expr_list),
    expr_list: ($) => seq($._expr, repeat($._expr)),

    invariant_decl: ($) => seq("invariant", ":", $._expr),

    conventions_decl: ($) => seq("conventions", "{", repeat($.convention_entry), "}"),

    convention_entry: ($) => seq($.qualified_ident, "=", $._expr),

    qualified_ident: ($) => seq($.identifier, repeat(seq(".", $.identifier))),

    type_expr: ($) =>
      choice(
        $.identifier,
        seq($.type_expr, "?"),
        seq("Set", "<", $.type_expr, ">"),
        seq("Map", "<", $.type_expr, ",", $.type_expr, ">"),
        seq("List", "<", $.type_expr, ">"),
      ),

    _expr: ($) =>
      choice(
        $.binary_expr,
        $.unary_expr,
        $.membership_expr,
        $.cardinality_expr,
        $.pre_state_expr,
        $.post_state_expr,
        $.index_expr,
        $.field_access_expr,
        $.call_expr,
        $.quantifier_expr,
        $.identifier,
        $.string,
        $.number,
        $.boolean,
        $.paren_expr,
      ),

    binary_expr: ($) =>
      choice(
        prec.left(1, seq($._expr, choice("and", "or"), $._expr)),
        prec.left(2, seq($._expr, choice("=", "!=", "<", ">", "<=", ">="), $._expr)),
        prec.left(3, seq($._expr, choice("+", "-"), $._expr)),
        prec.left(4, seq($._expr, choice("*", "/"), $._expr)),
      ),

    unary_expr: ($) => prec(5, seq("not", $._expr)),
    membership_expr: ($) => prec.left(2, seq($._expr, choice("in", seq("not", "in")), $._expr)),
    cardinality_expr: ($) => prec(6, seq("#", $._expr)),
    pre_state_expr: ($) => seq("pre", "(", $._expr, ")"),
    post_state_expr: ($) => seq($.identifier, "'"),
    index_expr: ($) => prec.left(7, seq($._expr, "[", $._expr, "]")),
    field_access_expr: ($) => prec.left(7, seq($._expr, ".", $.identifier)),
    call_expr: ($) => prec(7, seq($.identifier, "(", optional($.arg_list), ")")),
    quantifier_expr: ($) =>
      seq(choice("all", "some", "no"), $.identifier, "in", $._expr, "|", $._expr),
    paren_expr: ($) => seq("(", $._expr, ")"),

    arg_list: ($) => seq($._expr, repeat(seq(",", $._expr))),

    identifier: ($) => /[a-zA-Z_][a-zA-Z0-9_]*/,
    string: ($) => /"([^"\\]|\\.)*"/,
    number: ($) => /\d+(\.\d+)?/,
    boolean: ($) => choice("true", "false"),
    comment: ($) => choice(seq("//", /.*/), seq("/*", /[^*]*\*+([^/*][^*]*\*+)*/, "/")),
  },
});

IDE integration. Tree-sitter grammars automatically enable syntax highlighting, code folding, and incremental parsing in editors that support tree-sitter (Neovim, Helix, Zed, and VS Code via the tree-sitter extension). The incremental parsing is especially valuable for an IDE/LSP scenario where specs are edited live.

2.4 PEG parsers

pest (Rust) grammar snippet:

operation_decl = {
    "operation" ~ ident ~ "{" ~ operation_body* ~ "}"
}

operation_body = {
    input_block | output_block | requires_block | ensures_block
}

input_block = {
    "input" ~ ":" ~ param_list
}

requires_block = {
    "requires" ~ ":" ~ expr_list
}

expr_list = {
    expr ~ (NEWLINE ~ expr)*
}

expr = {
    prefix_expr ~ (binary_op ~ prefix_expr)*
}

prefix_expr = {
    "#" ~ atom_expr
  | "not" ~ atom_expr
  | "pre" ~ "(" ~ expr ~ ")"
  | atom_expr
}

atom_expr = {
    ident ~ "'" |               // post-state
    ident ~ "(" ~ arg_list? ~ ")" |  // function call
    ident ~ "[" ~ expr ~ "]" |  // index
    ident |                     // variable
    string | number | boolean |
    "(" ~ expr ~ ")"
}

ident = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }

Limitations. PEG parsers cannot handle left recursion (expr = expr "+" expr). This requires restructuring the grammar into precedence-climbing or Pratt-style layers. Backtracking in ordered choice (/) can cause exponential parse times on pathological inputs, though pest mitigates this with packrat memoization.

2.5 Xtext (eclipse)

Grammar snippet

grammar org.spectorest.Spec with org.eclipse.xtext.common.Terminals

generate spec "http://www.spectorest.org/Spec"

SpecFile:
    service=ServiceDecl;

ServiceDecl:
    'service' name=ID '{' body+=ServiceBody* '}';

ServiceBody:
    EntityDecl | StateDecl | OperationDecl | InvariantDecl | ConventionsDecl;

OperationDecl:
    'operation' name=ID '{'
        (inputs=InputBlock)?
        (outputs=OutputBlock)?
        (requires=RequiresBlock)?
        (ensures=EnsuresBlock)?
    '}';

InputBlock:
    'input' ':' params+=Param (',' params+=Param)*;

RequiresBlock:
    'requires' ':' constraints+=Expr+;

Pros. Xtext generates a full Eclipse IDE (editor with syntax highlighting, content assist, outline view, error markers, quick fixes) automatically from the grammar. It also generates an LSP server for VS Code. This is the lowest-effort path to a complete IDE experience.

Cons. Heavyweight Eclipse/OSGi dependency. The generated LSP server requires a JVM. The Xtext ecosystem is Eclipse-centric and has declined in mindshare since 2020. Not recommended unless the team is already in the Eclipse ecosystem.

2.6 Parser technology recommendation

Decision: ANTLR4 via the antlr-ng TypeScript target.

ANTLR4 is the most mature and capable parser generator available. The antlr-ng project provides a native TypeScript target, enabling direct integration with the TypeScript compiler codebase without JVM dependencies at build time.

Note on Langium: The DSL framework evaluation (09_dsl_compiler_frameworks.md) initially recommended Langium (TypeScript-based language workbench by TypeFox) for its integrated LSP server, cross-reference resolution, and grammar-to-AST generation. A subsequent devil's advocate audit found critical risks: (1) Typir (Langium's companion type system library) cannot handle our refinement types, relation types, or quantified expressions, so ~80% of the type checker must be built custom regardless; (2) Langium's community is very thin (985 stars, 22 contributors); (3) TypeFox's Fastbelt announcement (Go-based, 21-33x faster) signals strategic drift from Langium. ANTLR4/antlr-ng was chosen for its battle-tested stability (17k+ stars, 20+ year track record), no framework lock-in, and the fact that we must build scoping, type checking, and validation ourselves in either case. Langium remains a future option if IDE support (LSP with rich completions) becomes a top-3 user requirement and the framework matures further.

Factor	ANTLR4 (antlr-ng)	Hand-written	tree-sitter
Development speed	High (grammar file)	Medium (write code)	High (grammar.js)
Error message quality	Good (customizable)	Best (full control)	Basic
IDE support generation	Good (TextMate export)	Manual	Excellent (native)
Maintenance burden	Low (change grammar)	Medium (change code)	Low (change grammar)
Grammar evolution	Easy (regenerate)	Requires code changes	Easy (regenerate)
TypeScript target	Native (antlr-ng)	N/A	Bindings exist

Phase plan

All phases: Use the ANTLR4 grammar (Section 2.2) with the antlr-ng TypeScript runtime. ANTLR4 generates a robust parser with visitor/listener patterns that integrate cleanly with TypeScript interfaces for the IR.
IDE: Generate a tree-sitter grammar from the ANTLR4 grammar (tools exist for this conversion) for editor syntax highlighting. The LSP server will use the ANTLR4 parser directly for completions and diagnostics.

3. Intermediate representation (ir) design

The IR is the central nervous system of the compiler. Every stage reads from or writes to the IR. It must be:

Complete: represent everything in the spec
Annotatable: the convention engine and verifier add information
Serializable: for debugging, caching, and incremental compilation
Traversable: visitors/pattern matching for each compiler stage

3.1 Complete IR type definitions

from __future__ import annotations
from dataclasses import dataclass, field
from enum import Enum, auto
from typing import List, Optional, Dict, Any
import json

# ===== Source Location =====

@dataclass(frozen=True)
class Span:
    start_line: int
    start_col: int
    end_line: int
    end_col: int

# ===== Top-Level =====

@dataclass
class ServiceIR:
    name: str
    entities: List[EntityDecl]
    state: StateDecl
    operations: List[OperationDecl]
    invariants: List[InvariantDecl]
    conventions: Optional[ConventionsDecl]
    # -- Added by convention engine --
    annotations: Dict[str, Any] = field(default_factory=dict)
    span: Optional[Span] = None

# ===== Entities =====

@dataclass
class EntityDecl:
    name: str
    fields: List[FieldDecl]
    invariants: List[Expr]
    span: Optional[Span] = None

@dataclass
class FieldDecl:
    name: str
    type_expr: TypeExpr
    constraints: List[Expr] = field(default_factory=list)
    span: Optional[Span] = None

# ===== Types =====

class TypeKind(Enum):
    SIMPLE = auto()      # String, Int, ShortCode
    OPTIONAL = auto()    # String?
    SET = auto()         # Set<ShortCode>
    LIST = auto()        # List<String>
    MAP = auto()         # Map<String, Int>

@dataclass
class TypeExpr:
    kind: TypeKind
    name: Optional[str] = None          # for SIMPLE
    inner: Optional[TypeExpr] = None    # for OPTIONAL, SET, LIST
    key: Optional[TypeExpr] = None      # for MAP
    value: Optional[TypeExpr] = None    # for MAP
    span: Optional[Span] = None

# ===== State =====

@dataclass
class StateDecl:
    relations: List[RelationDecl]
    span: Optional[Span] = None

class Multiplicity(Enum):
    ONE = "one"        # exactly one (total function)
    LONE = "lone"      # zero or one (partial function)
    SOME = "some"      # one or more
    SET = "set"        # zero or more (relation)

@dataclass
class RelationDecl:
    name: str
    from_type: TypeExpr
    to_type: TypeExpr
    multiplicity: Multiplicity
    span: Optional[Span] = None

# ===== Operations =====

@dataclass
class OperationDecl:
    name: str
    inputs: List[ParamDecl]
    outputs: List[ParamDecl]
    requires: List[Expr]
    ensures: List[Expr]
    span: Optional[Span] = None
    # -- Added by convention engine --
    http_method: Optional[str] = None
    http_path: Optional[str] = None
    http_status_success: Optional[int] = None
    http_status_errors: Dict[str, int] = field(default_factory=dict)

@dataclass
class ParamDecl:
    name: str
    type_expr: TypeExpr
    span: Optional[Span] = None

# ===== Invariants =====

@dataclass
class InvariantDecl:
    expr: Expr
    span: Optional[Span] = None

# ===== Conventions =====

@dataclass
class ConventionsDecl:
    entries: List[ConventionEntry]
    span: Optional[Span] = None

@dataclass
class ConventionEntry:
    target: str         # e.g., "Resolve.http_method"
    key: Optional[str]  # e.g., "Location" for http_header
    value: Expr
    span: Optional[Span] = None

# ===== Expression AST =====

class ExprKind(Enum):
    BINARY_OP = auto()
    UNARY_OP = auto()
    QUANTIFIER = auto()
    FIELD_ACCESS = auto()
    INDEX = auto()
    FUNCTION_CALL = auto()
    LITERAL = auto()
    VARIABLE = auto()
    PRE_STATE = auto()
    POST_STATE = auto()
    SET_OP = auto()
    CARDINALITY = auto()
    MEMBERSHIP = auto()

class BinaryOp(Enum):
    AND = "and"
    OR = "or"
    EQ = "="
    NEQ = "!="
    LT = "<"
    GT = ">"
    LTE = "<="
    GTE = ">="
    ADD = "+"
    SUB = "-"
    MUL = "*"
    DIV = "/"

class UnaryOp(Enum):
    NOT = "not"
    NEGATE = "-"

class QuantifierKind(Enum):
    FORALL = "all"
    EXISTS = "some"
    NONE = "no"

class LiteralKind(Enum):
    STRING = auto()
    INT = auto()
    FLOAT = auto()
    BOOL = auto()

@dataclass
class Expr:
    """Tagged union for expression nodes."""
    kind: ExprKind
    span: Optional[Span] = None

    # -- BINARY_OP --
    binary_op: Optional[BinaryOp] = None
    left: Optional[Expr] = None
    right: Optional[Expr] = None

    # -- UNARY_OP --
    unary_op: Optional[UnaryOp] = None
    operand: Optional[Expr] = None

    # -- QUANTIFIER --
    quantifier_kind: Optional[QuantifierKind] = None
    quantifier_var: Optional[str] = None
    quantifier_domain: Optional[Expr] = None
    quantifier_body: Optional[Expr] = None

    # -- FIELD_ACCESS --
    base: Optional[Expr] = None
    field_name: Optional[str] = None

    # -- INDEX --
    index_base: Optional[Expr] = None
    index_key: Optional[Expr] = None

    # -- FUNCTION_CALL --
    func_name: Optional[str] = None
    func_args: List[Expr] = field(default_factory=list)

    # -- LITERAL --
    literal_kind: Optional[LiteralKind] = None
    literal_value: Any = None

    # -- VARIABLE --
    var_name: Optional[str] = None

    # -- PRE_STATE / POST_STATE --
    state_expr: Optional[Expr] = None

    # -- CARDINALITY --
    card_expr: Optional[Expr] = None

    # -- MEMBERSHIP (a in b, a not in b) --
    membership_negated: bool = False
    membership_element: Optional[Expr] = None
    membership_collection: Optional[Expr] = None

TypeScript equivalent using discriminated unions (production implementation)

interface Span {
  readonly startLine: number;
  readonly startCol: number;
  readonly endLine: number;
  readonly endCol: number;
}

type Expr =
  | { kind: "BinaryOp"; op: BinaryOperator; left: Expr; right: Expr; span?: Span }
  | { kind: "UnaryOp"; op: UnaryOperator; operand: Expr; span?: Span }
  | {
      kind: "Quantifier";
      quantifierKind: QuantifierKind;
      variable: string;
      domain: Expr;
      body: Expr;
      span?: Span;
    }
  | { kind: "FieldAccess"; base: Expr; field: string; span?: Span }
  | { kind: "Index"; base: Expr; key: Expr; span?: Span }
  | { kind: "FunctionCall"; name: string; args: Expr[]; span?: Span }
  | { kind: "Literal"; value: unknown; literalKind: LiteralKind; span?: Span }
  | { kind: "Variable"; name: string; span?: Span }
  | { kind: "PreState"; expr: Expr; span?: Span }
  | { kind: "PostState"; expr: Expr; span?: Span }
  | { kind: "Cardinality"; expr: Expr; span?: Span }
  | { kind: "Membership"; element: Expr; collection: Expr; negated: boolean; span?: Span };

The TypeScript discriminated union provides exhaustive checking via switch on the kind field, and each variant carries only its own data. This is the IR representation used by the production compiler.

3.2 Complete IR for the URL shortener spec

url_shortener_ir = ServiceIR(
    name="UrlShortener",
    entities=[
        EntityDecl(
            name="ShortCode",
            fields=[FieldDecl(name="value", type_expr=TypeExpr(kind=TypeKind.SIMPLE, name="String"))],
            invariants=[
                # len(value) >= 6 and len(value) <= 10
                Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.AND,
                    left=Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.GTE,
                        left=Expr(kind=ExprKind.FUNCTION_CALL, func_name="len",
                            func_args=[Expr(kind=ExprKind.VARIABLE, var_name="value")]),
                        right=Expr(kind=ExprKind.LITERAL, literal_kind=LiteralKind.INT, literal_value=6)),
                    right=Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.LTE,
                        left=Expr(kind=ExprKind.FUNCTION_CALL, func_name="len",
                            func_args=[Expr(kind=ExprKind.VARIABLE, var_name="value")]),
                        right=Expr(kind=ExprKind.LITERAL, literal_kind=LiteralKind.INT, literal_value=10))),
                # value matches /^[a-zA-Z0-9]+$/
                Expr(kind=ExprKind.FUNCTION_CALL, func_name="matches",
                    func_args=[
                        Expr(kind=ExprKind.VARIABLE, var_name="value"),
                        Expr(kind=ExprKind.LITERAL, literal_kind=LiteralKind.STRING,
                             literal_value="[a-zA-Z0-9]+")]),
            ],
        ),
        EntityDecl(
            name="LongURL",
            fields=[FieldDecl(name="value", type_expr=TypeExpr(kind=TypeKind.SIMPLE, name="String"))],
            invariants=[
                Expr(kind=ExprKind.FUNCTION_CALL, func_name="isValidURI",
                    func_args=[Expr(kind=ExprKind.VARIABLE, var_name="value")]),
            ],
        ),
    ],
    state=StateDecl(relations=[
        RelationDecl(
            name="store",
            from_type=TypeExpr(kind=TypeKind.SIMPLE, name="ShortCode"),
            to_type=TypeExpr(kind=TypeKind.SIMPLE, name="LongURL"),
            multiplicity=Multiplicity.LONE,
        ),
        RelationDecl(
            name="created_at",
            from_type=TypeExpr(kind=TypeKind.SIMPLE, name="ShortCode"),
            to_type=TypeExpr(kind=TypeKind.SIMPLE, name="DateTime"),
            multiplicity=Multiplicity.ONE,
        ),
    ]),
    operations=[
        OperationDecl(
            name="Shorten",
            inputs=[ParamDecl(name="url", type_expr=TypeExpr(kind=TypeKind.SIMPLE, name="LongURL"))],
            outputs=[
                ParamDecl(name="code", type_expr=TypeExpr(kind=TypeKind.SIMPLE, name="ShortCode")),
                ParamDecl(name="short_url", type_expr=TypeExpr(kind=TypeKind.SIMPLE, name="String")),
            ],
            requires=[
                # isValidURI(url.value)
                Expr(kind=ExprKind.FUNCTION_CALL, func_name="isValidURI",
                    func_args=[Expr(kind=ExprKind.FIELD_ACCESS,
                        base=Expr(kind=ExprKind.VARIABLE, var_name="url"),
                        field_name="value")]),
            ],
            ensures=[
                # code not in pre(store)
                Expr(kind=ExprKind.MEMBERSHIP, membership_negated=True,
                    membership_element=Expr(kind=ExprKind.VARIABLE, var_name="code"),
                    membership_collection=Expr(kind=ExprKind.PRE_STATE,
                        state_expr=Expr(kind=ExprKind.VARIABLE, var_name="store"))),
                # store'[code] = url
                Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.EQ,
                    left=Expr(kind=ExprKind.INDEX,
                        index_base=Expr(kind=ExprKind.POST_STATE,
                            state_expr=Expr(kind=ExprKind.VARIABLE, var_name="store")),
                        index_key=Expr(kind=ExprKind.VARIABLE, var_name="code")),
                    right=Expr(kind=ExprKind.VARIABLE, var_name="url")),
                # short_url = base_url + "/" + code.value
                Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.EQ,
                    left=Expr(kind=ExprKind.VARIABLE, var_name="short_url"),
                    right=Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.ADD,
                        left=Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.ADD,
                            left=Expr(kind=ExprKind.VARIABLE, var_name="base_url"),
                            right=Expr(kind=ExprKind.LITERAL, literal_kind=LiteralKind.STRING,
                                       literal_value="/")),
                        right=Expr(kind=ExprKind.FIELD_ACCESS,
                            base=Expr(kind=ExprKind.VARIABLE, var_name="code"),
                            field_name="value"))),
                # #store' = #store + 1
                Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.EQ,
                    left=Expr(kind=ExprKind.CARDINALITY,
                        card_expr=Expr(kind=ExprKind.POST_STATE,
                            state_expr=Expr(kind=ExprKind.VARIABLE, var_name="store"))),
                    right=Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.ADD,
                        left=Expr(kind=ExprKind.CARDINALITY,
                            card_expr=Expr(kind=ExprKind.VARIABLE, var_name="store")),
                        right=Expr(kind=ExprKind.LITERAL, literal_kind=LiteralKind.INT, literal_value=1))),
            ],
        ),
        OperationDecl(
            name="Resolve",
            inputs=[ParamDecl(name="code", type_expr=TypeExpr(kind=TypeKind.SIMPLE, name="ShortCode"))],
            outputs=[ParamDecl(name="url", type_expr=TypeExpr(kind=TypeKind.SIMPLE, name="LongURL"))],
            requires=[
                # code in store
                Expr(kind=ExprKind.MEMBERSHIP, membership_negated=False,
                    membership_element=Expr(kind=ExprKind.VARIABLE, var_name="code"),
                    membership_collection=Expr(kind=ExprKind.VARIABLE, var_name="store")),
            ],
            ensures=[
                # url = store[code]
                Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.EQ,
                    left=Expr(kind=ExprKind.VARIABLE, var_name="url"),
                    right=Expr(kind=ExprKind.INDEX,
                        index_base=Expr(kind=ExprKind.VARIABLE, var_name="store"),
                        index_key=Expr(kind=ExprKind.VARIABLE, var_name="code"))),
                # store' = store
                Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.EQ,
                    left=Expr(kind=ExprKind.POST_STATE,
                        state_expr=Expr(kind=ExprKind.VARIABLE, var_name="store")),
                    right=Expr(kind=ExprKind.VARIABLE, var_name="store")),
            ],
        ),
        OperationDecl(
            name="Delete",
            inputs=[ParamDecl(name="code", type_expr=TypeExpr(kind=TypeKind.SIMPLE, name="ShortCode"))],
            outputs=[],
            requires=[
                # code in store
                Expr(kind=ExprKind.MEMBERSHIP, membership_negated=False,
                    membership_element=Expr(kind=ExprKind.VARIABLE, var_name="code"),
                    membership_collection=Expr(kind=ExprKind.VARIABLE, var_name="store")),
            ],
            ensures=[
                # code not in store'
                Expr(kind=ExprKind.MEMBERSHIP, membership_negated=True,
                    membership_element=Expr(kind=ExprKind.VARIABLE, var_name="code"),
                    membership_collection=Expr(kind=ExprKind.POST_STATE,
                        state_expr=Expr(kind=ExprKind.VARIABLE, var_name="store"))),
                # #store' = #store - 1
                Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.EQ,
                    left=Expr(kind=ExprKind.CARDINALITY,
                        card_expr=Expr(kind=ExprKind.POST_STATE,
                            state_expr=Expr(kind=ExprKind.VARIABLE, var_name="store"))),
                    right=Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.SUB,
                        left=Expr(kind=ExprKind.CARDINALITY,
                            card_expr=Expr(kind=ExprKind.VARIABLE, var_name="store")),
                        right=Expr(kind=ExprKind.LITERAL, literal_kind=LiteralKind.INT, literal_value=1))),
            ],
        ),
    ],
    invariants=[
        # all c in store | isValidURI(store[c].value)
        InvariantDecl(expr=Expr(kind=ExprKind.QUANTIFIER,
            quantifier_kind=QuantifierKind.FORALL,
            quantifier_var="c",
            quantifier_domain=Expr(kind=ExprKind.VARIABLE, var_name="store"),
            quantifier_body=Expr(kind=ExprKind.FUNCTION_CALL, func_name="isValidURI",
                func_args=[Expr(kind=ExprKind.FIELD_ACCESS,
                    base=Expr(kind=ExprKind.INDEX,
                        index_base=Expr(kind=ExprKind.VARIABLE, var_name="store"),
                        index_key=Expr(kind=ExprKind.VARIABLE, var_name="c")),
                    field_name="value")]))),
        # all c in store | c in created_at
        InvariantDecl(expr=Expr(kind=ExprKind.QUANTIFIER,
            quantifier_kind=QuantifierKind.FORALL,
            quantifier_var="c",
            quantifier_domain=Expr(kind=ExprKind.VARIABLE, var_name="store"),
            quantifier_body=Expr(kind=ExprKind.MEMBERSHIP, membership_negated=False,
                membership_element=Expr(kind=ExprKind.VARIABLE, var_name="c"),
                membership_collection=Expr(kind=ExprKind.VARIABLE, var_name="created_at")))),
    ],
    conventions=ConventionsDecl(entries=[
        ConventionEntry(target="Resolve.http_method", key=None,
            value=Expr(kind=ExprKind.VARIABLE, var_name="GET")),
        ConventionEntry(target="Resolve.http_status_success", key=None,
            value=Expr(kind=ExprKind.LITERAL, literal_kind=LiteralKind.INT, literal_value=302)),
        ConventionEntry(target="Resolve.http_header", key="Location",
            value=Expr(kind=ExprKind.FIELD_ACCESS,
                base=Expr(kind=ExprKind.VARIABLE, var_name="output"),
                field_name="url")),
    ]),
)

3.3 Design decisions

Mutable vs immutable IR nodes

The IR uses mutable nodes for the initial implementation. The convention engine mutates OperationDecl in-place by setting http_method, http_path, etc. This is pragmatic for the MVP but has known downsides:

Difficult to implement undo/rollback
Hard to cache intermediate states
Debugging "who set this field?" requires tracing

For the production TypeScript compiler, use readonly interfaces with spread-based copies. The convention engine produces a new IR with annotations applied, leaving the original intact. This enables diffing (before/after convention application) and caching.

Visitor pattern vs pattern matching for IR traversal

For Python: use a match statement (Python 3.10+) on ExprKind for simple traversals, and a Visitor base class for multi-pass transformations:

class ExprVisitor:
    def visit(self, expr: Expr) -> Any:
        method_name = f"visit_{expr.kind.name.lower()}"
        visitor = getattr(self, method_name, self.generic_visit)
        return visitor(expr)

    def generic_visit(self, expr: Expr) -> Any:
        raise NotImplementedError(f"No visitor for {expr.kind}")

class ExprPrinter(ExprVisitor):
    def visit_binary_op(self, expr: Expr) -> str:
        left = self.visit(expr.left)
        right = self.visit(expr.right)
        return f"({left} {expr.binary_op.value} {right})"

    def visit_variable(self, expr: Expr) -> str:
        return expr.var_name

    def visit_literal(self, expr: Expr) -> str:
        if expr.literal_kind == LiteralKind.STRING:
            return f'"{expr.literal_value}"'
        return str(expr.literal_value)

    def visit_cardinality(self, expr: Expr) -> str:
        return f"#{self.visit(expr.card_expr)}"

    def visit_pre_state(self, expr: Expr) -> str:
        return f"pre({self.visit(expr.state_expr)})"

    def visit_post_state(self, expr: Expr) -> str:
        return f"{self.visit(expr.state_expr)}'"

For Kotlin: exhaustive when expressions on sealed classes are both cleaner and compiler-checked:

fun printExpr(expr: Expr): String = when (expr) {
    is Expr.BinaryOp -> "(${printExpr(expr.left)} ${expr.op.symbol} ${printExpr(expr.right)})"
    is Expr.Variable -> expr.name
    is Expr.Literal -> if (expr.kind == LiteralKind.STRING) "\"${expr.value}\"" else "${expr.value}"
    is Expr.Cardinality -> "#${printExpr(expr.expr)}"
    is Expr.PreState -> "pre(${printExpr(expr.expr)})"
    is Expr.PostState -> "${printExpr(expr.expr)}'"
    // ... all variants must be handled or the compiler errors
}

Convention engine annotation strategy

Use a separate annotation layer rather than inlining annotations into the IR nodes. The annotation layer is a map from IR node identity (by path or ID) to annotation records:

@dataclass
class ConventionAnnotations:
    """Annotations applied by the convention engine."""
    operation_annotations: Dict[str, OperationAnnotation]  # keyed by operation name
    entity_annotations: Dict[str, EntityAnnotation]        # keyed by entity name
    relation_annotations: Dict[str, RelationAnnotation]    # keyed by relation name

@dataclass
class OperationAnnotation:
    http_method: str              # GET, POST, PUT, PATCH, DELETE
    http_path: str                # /shorten, /{code}
    http_status_success: int      # 200, 201, 204, 302
    http_status_errors: Dict[str, int]  # "not_found" -> 404, "validation" -> 422
    request_body_model: Optional[str]   # Pydantic model name
    response_body_model: Optional[str]
    path_params: List[str]
    query_params: List[str]
    operation_kind: str           # "create", "read", "update", "delete", "action"
    is_idempotent: bool
    headers: Dict[str, str]       # response headers

@dataclass
class EntityAnnotation:
    table_name: str
    model_class_name: str
    columns: List[ColumnAnnotation]

@dataclass
class ColumnAnnotation:
    column_name: str
    sql_type: str
    nullable: bool
    primary_key: bool
    check_constraint: Optional[str]

@dataclass
class RelationAnnotation:
    table_name: str
    from_column: str
    to_column: str
    foreign_key: bool
    junction_table: Optional[str]  # for many-to-many

This separation means the original IR from parsing is never mutated. The convention engine produces ConventionAnnotations as a separate artifact. The emitter consumes both the IR and the annotations. This enables:

Diffing: compare annotations from different convention profiles
Override: user convention overrides modify the annotation layer only
Caching: the annotation layer can be cached independently of parsing

IR serialization

Serialize to JSON for debugging and caching. Use a custom encoder that handles Enum values and dataclass nesting:

import json
from dataclasses import asdict

class IREncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Enum):
            return {"__enum__": f"{type(obj).__name__}.{obj.name}"}
        if hasattr(obj, "__dataclass_fields__"):
            return asdict(obj)
        return super().default(obj)

def serialize_ir(ir: ServiceIR) -> str:
    return json.dumps(ir, cls=IREncoder, indent=2)

def deserialize_ir(text: str) -> ServiceIR:
    # Custom decoder that reconstructs dataclass instances
    data = json.loads(text)
    return _reconstruct_service_ir(data)

4. Solver integration architecture

4.1 Z3 integration

Translation from IR constraints to Z3

The key insight is that our spec constraints map to Z3 constructs as follows:

Spec construct	Z3 construct
`entity` field type	Z3 Sort (IntSort, StringSort, or uninterpreted)
`state` relation	Z3 Array or Function
`invariant`	Z3 assertion (universal quantifier over state)
`requires`	Z3 precondition (existential satisfiability)
`ensures`	Z3 postcondition (post-state implies ensures)
`#collection`	Z3 custom cardinality function
`all x in S \| P(x)`	`ForAll(x, Implies(member(x, S), P(x)))`

Complete example, URL shortener invariant checking

from z3 import (
    Solver, DeclareSort, Function, BoolSort, IntSort, StringSort,
    ForAll, Exists, Implies, And, Or, Not, Const, sat, unsat, unknown
)

def check_url_shortener_invariants():
    """
    Verify that the URL shortener spec's invariants are consistent
    (not mutually contradictory) and that operations preserve them.
    """
    s = Solver()
    s.set("timeout", 30000)  # 30 second timeout

    # Declare sorts for entity types
    ShortCode = DeclareSort("ShortCode")
    LongURL = DeclareSort("LongURL")

    # State relations as functions
    # store: ShortCode -> lone LongURL (partial function)
    store = Function("store", ShortCode, LongURL)
    store_defined = Function("store_defined", ShortCode, BoolSort())

    # created_at: ShortCode -> DateTime
    created_at_defined = Function("created_at_defined", ShortCode, BoolSort())

    # Helper: valid_uri as uninterpreted predicate
    valid_uri = Function("valid_uri", LongURL, BoolSort())

    # Helper: code_len as uninterpreted function
    code_len = Function("code_len", ShortCode, IntSort())

    # --- Entity invariants ---
    c = Const("c", ShortCode)

    # ShortCode.invariant: len(value) >= 6 and len(value) <= 10
    s.add(ForAll(c, And(code_len(c) >= 6, code_len(c) <= 10)))

    # --- Global invariants ---

    # Invariant 1: all c in store | isValidURI(store[c].value)
    s.add(ForAll(c, Implies(store_defined(c), valid_uri(store(c)))))

    # Invariant 2: all c in store | c in created_at
    s.add(ForAll(c, Implies(store_defined(c), created_at_defined(c))))

    # --- Check consistency ---
    # The invariants should be satisfiable (not contradictory)
    result = s.check()

    if result == sat:
        print("Invariants are consistent (satisfiable)")
        model = s.model()
        print(f"  Example model: {model}")
        return True
    elif result == unsat:
        print("ERROR: Invariants are contradictory!")
        print(f"  Unsat core: {s.unsat_core()}")
        return False
    else:
        print("WARNING: Solver returned unknown (timeout or undecidable)")
        return None

def check_operation_preserves_invariants():
    """
    Verify that the Shorten operation preserves the global invariant
    'all c in store | isValidURI(store[c].value)'.

    Encoding: if invariant holds in pre-state AND requires holds AND
    ensures holds, THEN invariant holds in post-state.
    """
    s = Solver()
    s.set("timeout", 60000)

    ShortCode = DeclareSort("ShortCode")
    LongURL = DeclareSort("LongURL")

    # Pre-state
    store_pre = Function("store_pre", ShortCode, LongURL)
    store_pre_defined = Function("store_pre_defined", ShortCode, BoolSort())

    # Post-state
    store_post = Function("store_post", ShortCode, LongURL)
    store_post_defined = Function("store_post_defined", ShortCode, BoolSort())

    valid_uri = Function("valid_uri", LongURL, BoolSort())

    c = Const("c", ShortCode)
    new_code = Const("new_code", ShortCode)
    new_url = Const("new_url", LongURL)

    # Assume: invariant holds in pre-state
    s.add(ForAll(c, Implies(store_pre_defined(c), valid_uri(store_pre(c)))))

    # Assume: requires holds (isValidURI(url.value))
    s.add(valid_uri(new_url))

    # Assume: ensures holds
    # code not in pre(store)
    s.add(Not(store_pre_defined(new_code)))
    # store'[code] = url
    s.add(store_post(new_code) == new_url)
    s.add(store_post_defined(new_code))
    # Frame condition: everything else unchanged
    s.add(ForAll(c, Implies(
        c != new_code,
        And(
            store_post_defined(c) == store_pre_defined(c),
            store_post(c) == store_pre(c)
        )
    )))

    # Try to VIOLATE the invariant in post-state
    violation = Const("violation", ShortCode)
    s.add(store_post_defined(violation))
    s.add(Not(valid_uri(store_post(violation))))

    result = s.check()
    if result == unsat:
        print("VERIFIED: Shorten preserves the isValidURI invariant")
        return True
    elif result == sat:
        model = s.model()
        print("COUNTEREXAMPLE: Shorten can violate the invariant")
        print(f"  Violating code: {model[violation]}")
        return False
    else:
        print("WARNING: Solver returned unknown")
        return None

Timeout management. Z3 timeouts are set per-solver instance via s.set("timeout", milliseconds). Our compiler should use a tiered approach:

Quick check (5s): basic consistency of invariants
Medium check (30s): operation preservation of invariants
Deep check (120s): cross-operation interaction analysis
User-configurable via --verify-timeout

If Z3 returns unknown, report a warning rather than an error. The user can increase the timeout or accept the risk.

4.2 Alloy integration

How to invoke alloyapi from JVM (Kotlin)

import edu.mit.csail.sdg.alloy4.A4Reporter
import edu.mit.csail.sdg.alloy4compiler.ast.Command
import edu.mit.csail.sdg.alloy4compiler.parser.CompUtil
import edu.mit.csail.sdg.alloy4compiler.translator.A4Options
import edu.mit.csail.sdg.alloy4compiler.translator.TranslateAlloyToKodkod

class AlloyBackend {
    fun checkSpec(alloySource: String): AlloyResult {
        val reporter = A4Reporter()
        val world = CompUtil.parseEverything_fromString(reporter, alloySource)

        val options = A4Options().apply {
            solver = A4Options.SatSolver.SAT4J
            skolemDepth = 2
        }

        val results = mutableListOf<CommandResult>()
        for (command in world.allCommands) {
            val solution = TranslateAlloyToKodkod.execute_command(
                reporter, world.allReachableSigs, command, options
            )
            results.add(CommandResult(
                command = command.label,
                satisfiable = solution.satisfiable(),
                counterexample = if (solution.satisfiable()) solution.toString() else null
            ))
        }
        return AlloyResult(results)
    }
}

How to translate IR to Alloy source text

class AlloyTranslator:
    def translate(self, ir: ServiceIR) -> str:
        lines = []

        # Entities become sigs
        for entity in ir.entities:
            fields = ", ".join(
                f"{f.name}: {self.translate_type(f.type_expr)}"
                for f in entity.fields
            )
            lines.append(f"sig {entity.name} {{ {fields} }}")
            for inv in entity.invariants:
                lines.append(f"fact {{ all self: {entity.name} | {self.translate_expr(inv, 'self')} }}")
            lines.append("")

        # State relations become a single State sig
        lines.append("one sig State {")
        for rel in ir.state.relations:
            alloy_mult = {"one": "one", "lone": "lone", "some": "some", "set": "set"}
            lines.append(f"  {rel.name}: {rel.from_type.name} -> {alloy_mult[rel.multiplicity.value]} {rel.to_type.name},")
        lines.append("}")
        lines.append("")

        # Global invariants become facts
        for inv in ir.invariants:
            lines.append(f"fact {{ {self.translate_expr(inv.expr)} }}")

        # Operations become predicates
        for op in ir.operations:
            params = ", ".join(f"{p.name}: {p.type_expr.name}" for p in op.inputs + op.outputs)
            lines.append(f"pred {op.name}[{params}, s, s': State] {{")
            for req in op.requires:
                lines.append(f"  {self.translate_expr(req, state_var='s')}")
            for ens in op.ensures:
                lines.append(f"  {self.translate_expr(ens, state_var='s', post_state_var=\"s'\")}")
            lines.append("}")
            lines.append("")

        # Check that operations preserve invariants
        for op in ir.operations:
            lines.append(f"check {op.name}_preserves_invariants {{")
            lines.append(f"  all s, s': State, {self._op_params(op)} |")
            lines.append(f"    {op.name}[{self._op_args(op)}, s, s'] implies {{")
            for inv in ir.invariants:
                lines.append(f"      {self.translate_expr(inv.expr, state_var=\"s'\")}")
            lines.append(f"    }}")
            lines.append(f"}} for 5")
            lines.append("")

        return "\n".join(lines)

How to parse Alloy analyzer output

Alloy Analyzer returns A4Solution objects. Key fields:

solution.satisfiable(): boolean, was a counterexample found?
solution.getAllAtoms(): all atoms in the counterexample instance
solution.eval(expr): evaluate an expression in the counterexample
solution.toString(): human-readable counterexample text

For subprocess-based integration (TypeScript), run java -jar alloy.jar and parse stdout. The output format is XML (-xml flag) or text. XML parsing gives structured access to atoms and tuples.

4.3 Dafny integration

How to invoke the Dafny compiler programmatically

Dafny is distributed as a .NET tool (dotnet tool install dafny). Invocation is via subprocess:

import subprocess
import json
import re
from dataclasses import dataclass
from typing import List, Optional

@dataclass
class DafnyVerificationResult:
    verified: bool
    errors: List["DafnyError"]
    counterexamples: List[str]

@dataclass
class DafnyError:
    file: str
    line: int
    col: int
    message: str
    related: List[str]

class DafnyBackend:
    def __init__(self, dafny_path: str = "dafny"):
        self.dafny_path = dafny_path

    def verify(self, dafny_source: str, timeout: int = 120) -> DafnyVerificationResult:
        """Write source to temp file, invoke dafny verify, parse results."""
        import tempfile, os
        with tempfile.NamedTemporaryFile(mode="w", suffix=".dfy", delete=False) as f:
            f.write(dafny_source)
            temp_path = f.name

        try:
            result = subprocess.run(
                [self.dafny_path, "verify", "--verification-time-limit", str(timeout),
                 "--json-diagnostics", temp_path],
                capture_output=True, text=True, timeout=timeout + 30
            )
            return self._parse_output(result.stdout, result.stderr, result.returncode)
        finally:
            os.unlink(temp_path)

    def compile_to_python(self, dafny_source: str) -> str:
        """Compile verified Dafny to Python."""
        import tempfile, os
        with tempfile.NamedTemporaryFile(mode="w", suffix=".dfy", delete=False) as f:
            f.write(dafny_source)
            temp_path = f.name

        try:
            out_dir = tempfile.mkdtemp()
            result = subprocess.run(
                [self.dafny_path, "translate", "py",
                 "--output", os.path.join(out_dir, "output"),
                 temp_path],
                capture_output=True, text=True, timeout=120
            )
            if result.returncode != 0:
                raise RuntimeError(f"Dafny compilation failed: {result.stderr}")

            # Read the generated Python file
            py_files = [f for f in os.listdir(out_dir) if f.endswith(".py")]
            if not py_files:
                raise RuntimeError("No Python output generated")
            with open(os.path.join(out_dir, py_files[0])) as f:
                return f.read()
        finally:
            os.unlink(temp_path)

    def _parse_output(self, stdout: str, stderr: str, returncode: int) -> DafnyVerificationResult:
        errors = []
        counterexamples = []

        # Parse JSON diagnostics if available
        for line in stderr.splitlines():
            m = re.match(r"(.+)\((\d+),(\d+)\): Error: (.+)", line)
            if m:
                errors.append(DafnyError(
                    file=m.group(1), line=int(m.group(2)),
                    col=int(m.group(3)), message=m.group(4), related=[]
                ))

            # Extract counterexample values
            if "counterexample" in line.lower():
                counterexamples.append(line.strip())

        return DafnyVerificationResult(
            verified=(returncode == 0 and len(errors) == 0),
            errors=errors,
            counterexamples=counterexamples,
        )

Complete example, generate Dafny from IR and verify

class DafnyGenerator:
    def generate_operation(self, op: OperationDecl, entities: List[EntityDecl],
                           state: StateDecl) -> str:
        """Generate a Dafny method from a spec operation."""
        lines = []

        # Generate state type
        lines.append("// Auto-generated from spec")
        lines.append("type ShortCode = s: string | 6 <= |s| <= 10 && forall i :: 0 <= i < |s| ==> s[i] in 'a'..'z' + 'A'..'Z' + '0'..'9'")
        lines.append("type LongURL = string  // simplified; full validation in runtime")
        lines.append("")
        lines.append("class Store {")
        lines.append("  var mapping: map<ShortCode, LongURL>")
        lines.append("")

        # Generate method with pre/postconditions
        input_params = ", ".join(f"{p.name}: {self._dafny_type(p.type_expr)}" for p in op.inputs)
        output_params = ", ".join(f"{p.name}: {self._dafny_type(p.type_expr)}" for p in op.outputs)

        returns_clause = f" returns ({output_params})" if op.outputs else ""

        lines.append(f"  method {op.name}({input_params}){returns_clause}")
        lines.append(f"    modifies this")

        # Requires clauses
        for req in op.requires:
            lines.append(f"    requires {self._dafny_expr(req)}")

        # Ensures clauses
        for ens in op.ensures:
            lines.append(f"    ensures {self._dafny_expr(ens)}")

        lines.append("  {")
        lines.append("    // TODO: LLM-synthesized body")
        lines.append("  }")
        lines.append("}")

        return "\n".join(lines)

4.4 LLM API integration

Architecture

import anthropic
import time
import hashlib
from typing import Optional
from dataclasses import dataclass, field

@dataclass
class SynthesisConfig:
    model: str = "claude-sonnet-4-20250514"
    max_iterations: int = 8
    temperature: float = 0.3
    max_tokens: int = 4096
    cost_budget_usd: float = 5.0
    cache_dir: Optional[str] = ".spec-to-rest-cache/synthesis"

@dataclass
class SynthesisResult:
    code: str
    verified: bool
    iterations: int
    total_tokens: int
    cost_usd: float
    verification_errors: List[str] = field(default_factory=list)

class LLMSynthesisEngine:
    def __init__(self, config: SynthesisConfig, dafny: DafnyBackend):
        self.config = config
        self.dafny = dafny
        self.client = anthropic.Anthropic()
        self.total_tokens = 0
        self.total_cost = 0.0

    def synthesize_operation(self, op: OperationDecl, context: ServiceIR) -> SynthesisResult:
        """
        CEGIS loop: generate Dafny code via LLM, verify, feed errors back.
        """
        # Check cache first
        cache_key = self._cache_key(op, context)
        cached = self._load_cache(cache_key)
        if cached:
            return cached

        prompt = self._build_initial_prompt(op, context)
        errors_history = []

        for iteration in range(self.config.max_iterations):
            # Check cost budget
            if self.total_cost >= self.config.cost_budget_usd:
                return SynthesisResult(
                    code="// Budget exhausted -- manual implementation needed",
                    verified=False, iterations=iteration,
                    total_tokens=self.total_tokens, cost_usd=self.total_cost,
                    verification_errors=errors_history[-1] if errors_history else [],
                )

            # Generate candidate
            if iteration == 0:
                messages = [{"role": "user", "content": prompt}]
            else:
                messages = self._build_repair_messages(prompt, errors_history)

            response = self.client.messages.create(
                model=self.config.model,
                max_tokens=self.config.max_tokens,
                temperature=self.config.temperature,
                messages=messages,
            )

            # Track costs
            input_tokens = response.usage.input_tokens
            output_tokens = response.usage.output_tokens
            self.total_tokens += input_tokens + output_tokens
            self.total_cost += self._estimate_cost(input_tokens, output_tokens)

            # Extract Dafny code from response
            dafny_code = self._extract_dafny_code(response.content[0].text)
            if not dafny_code:
                errors_history.append(["Failed to extract Dafny code from response"])
                continue

            # Verify with Dafny
            verification = self.dafny.verify(dafny_code)
            if verification.verified:
                result = SynthesisResult(
                    code=dafny_code, verified=True,
                    iterations=iteration + 1,
                    total_tokens=self.total_tokens, cost_usd=self.total_cost,
                )
                self._save_cache(cache_key, result)
                return result

            # Collect errors for next iteration
            error_msgs = [e.message for e in verification.errors]
            if verification.counterexamples:
                error_msgs.extend(verification.counterexamples)
            errors_history.append(error_msgs)

        # Exhausted iterations
        return SynthesisResult(
            code=dafny_code if dafny_code else "",
            verified=False, iterations=self.config.max_iterations,
            total_tokens=self.total_tokens, cost_usd=self.total_cost,
            verification_errors=errors_history[-1] if errors_history else [],
        )

    def _build_initial_prompt(self, op: OperationDecl, context: ServiceIR) -> str:
        return f"""You are generating a verified Dafny implementation for a REST service operation.

## Service Context
Service: {context.name}
Entities: {', '.join(e.name for e in context.entities)}

## Operation: {op.name}

Inputs:
{chr(10).join(f'  {p.name}: {self._type_str(p.type_expr)}' for p in op.inputs)}

Outputs:
{chr(10).join(f'  {p.name}: {self._type_str(p.type_expr)}' for p in op.outputs)}

Preconditions (requires):
{chr(10).join(f'  {self._expr_str(r)}' for r in op.requires)}

Postconditions (ensures):
{chr(10).join(f'  {self._expr_str(e)}' for e in op.ensures)}

## Task
Generate a complete Dafny method that:
1. Has the exact requires/ensures clauses above
2. Contains an implementation body that satisfies all postconditions
3. Compiles and verifies with `dafny verify`

Return ONLY the Dafny code in a ```dafny code block."""

    def _build_repair_messages(self, initial_prompt: str,
                                errors_history: List[List[str]]) -> list:
        messages = [{"role": "user", "content": initial_prompt}]
        for i, errors in enumerate(errors_history):
            messages.append({"role": "assistant", "content": f"[Previous attempt {i+1}]"})
            messages.append({"role": "user", "content":
                f"The Dafny verifier reported these errors:\n" +
                "\n".join(f"- {e}" for e in errors) +
                "\n\nPlease fix the implementation and try again. Return ONLY the corrected Dafny code."
            })
        return messages

    def _estimate_cost(self, input_tokens: int, output_tokens: int) -> float:
        # Approximate costs for Claude Sonnet (adjust per model)
        return (input_tokens * 3.0 + output_tokens * 15.0) / 1_000_000

    def _cache_key(self, op: OperationDecl, context: ServiceIR) -> str:
        content = f"{context.name}:{op.name}:{serialize_ir(context)}"
        return hashlib.sha256(content.encode()).hexdigest()[:16]

    def _load_cache(self, key: str) -> Optional[SynthesisResult]:
        if not self.config.cache_dir:
            return None
        import os, pickle
        path = os.path.join(self.config.cache_dir, f"{key}.pkl")
        if os.path.exists(path):
            with open(path, "rb") as f:
                return pickle.load(f)
        return None

    def _save_cache(self, key: str, result: SynthesisResult):
        if not self.config.cache_dir:
            return
        import os, pickle
        os.makedirs(self.config.cache_dir, exist_ok=True)
        path = os.path.join(self.config.cache_dir, f"{key}.pkl")
        with open(path, "wb") as f:
            pickle.dump(result, f)

    def _extract_dafny_code(self, text: str) -> Optional[str]:
        import re
        match = re.search(r"```dafny\n(.*?)```", text, re.DOTALL)
        if match:
            return match.group(1).strip()
        match = re.search(r"```\n(.*?)```", text, re.DOTALL)
        if match:
            return match.group(1).strip()
        return None

Token budget management. The SynthesisConfig.cost_budget_usd field caps total LLM spending per compilation. Each operation synthesis tracks cumulative cost. When the budget is exhausted, remaining operations get TODO stubs. A typical URL shortener spec with 3 operations costs approximately $0.10-0.50 depending on iteration count.

Retry logic. The CEGIS loop itself provides semantic retry (verification error feedback). For API-level failures (rate limits, network errors), add exponential backoff:

import time

def call_with_retry(fn, max_retries=3, base_delay=1.0):
    for attempt in range(max_retries):
        try:
            return fn()
        except anthropic.RateLimitError:
            delay = base_delay * (2 ** attempt)
            time.sleep(delay)
        except anthropic.APIConnectionError:
            delay = base_delay * (2 ** attempt)
            time.sleep(delay)
    raise RuntimeError(f"API call failed after {max_retries} retries")

5. Project structure

spec-to-rest/

README.md

LICENSE

package.jsonTypeScript build config

.github/

workflows/

ci.ymllint, typecheck, test on every push

release.ymlbuild + publish on tag

src/

spec_to_rest/

index.tsmain module entry point

cli.tsCLI argument parsing (commander/yargs)

config.pyconfiguration management

parser/

__init__.py

lexer.pytokenizer: spec text -> token stream

tokens.pytoken kind enum and Token dataclass

parser.pyrecursive descent parser: tokens -> AST

ast.pyAST node definitions (raw parse tree)

errors.pyparse error types with span info

ir/

__init__.py

types.pyIR type definitions (Section 3.1)

builder.pyAST -> IR transformation + desugaring

printer.pyIR -> human-readable format (for --verbose)

serializer.pyIR -> JSON (for caching, debugging)

validator.pyIR well-formedness checks (type resolution, etc.)

verify/

__init__.py

engine.pyverification orchestrator

z3_backend.pyIR -> Z3 translation and checking

alloy_backend.pyIR -> Alloy source and subprocess invocation

consistency.pyinvariant consistency checks

preservation.pyoperation-preserves-invariant checks

errors.pyverification error types + rendering

conventions/

__init__.py

engine.pyconvention application orchestrator

http.pyHTTP method, path, status code mapping

database.pyDB schema, table, column mapping

validation.pyvalidation rule extraction from requires

naming.pynaming conventions (snake_case, pluralize, etc.)

overrides.pyuser convention override application

profiles/

__init__.py

base.pyabstract profile interface

python_fastapi.pyPython/FastAPI/SQLAlchemy/PostgreSQL

go_chi.pyGo/chi/sqlx/PostgreSQL

ts_express.pyTypeScript/Express/Prisma/PostgreSQL

synthesis/

__init__.py

engine.pyLLM synthesis orchestrator (CEGIS loop)

dafny_gen.pyIR -> Dafny source generation

dafny_backend.pyDafny compiler invocation

prompts.pyprompt templates for LLM

feedback.pyDafny error parsing -> LLM feedback

cache.pysynthesis result caching

emit/

__init__.py

engine.pycode emission orchestrator

openapi.pyOpenAPI 3.1 spec generation

sql.pySQL DDL / migration generation

dockerfile.pyDockerfile + docker-compose generation

ci.pyGitHub Actions workflow generation

templates/

python/

main.py.j2

route.py.j2

model.py.j2

schema.py.j2

database.py.j2

config.py.j2

requirements.txt.j2

Dockerfile.j2

go/

main.go.j2

handler.go.j2

model.go.j2

repository.go.j2

go.mod.j2

Dockerfile.j2

typescript/

app.ts.j2

route.ts.j2

model.ts.j2

repository.ts.j2

package.json.j2

Dockerfile.j2

test_gen/

__init__.py

engine.pytest generation orchestrator

property.pyensures -> Hypothesis property tests

stateful.pyoperations -> Hypothesis state machine

schemathesis_cfg.pyOpenAPI -> Schemathesis config

templates/

test_property.py.j2

test_stateful.py.j2

conftest.py.j2

tests/

conftest.pyshared fixtures

fixtures/

url_shortener.spec

todo_list.spec

ecommerce.spec

expected/golden file outputs

url_shortener/

openapi.yaml

migrations/

app/

...

unit/

test_lexer.py

test_parser.py

test_ir_builder.py

test_conventions_http.py

test_conventions_database.py

test_emitter_openapi.py

test_emitter_python.py

test_z3_backend.py

integration/

test_parse_to_ir.pyspec text -> IR round-trip

test_conventions_to_emit.pyIR -> annotated IR -> generated code

test_end_to_end.pyspec -> docker-compose up -> tests pass

golden/

test_golden_files.pysnapshot testing against expected/ outputs

examples/

url_shortener/

spec.rest

todo_list/

spec.rest

ecommerce/

spec.rest

research/design documents (this directory)

00_comprehensive_analysis.md

spec_to_test_tools_research.md

07_implementation_architecture.md

docs/

language_reference.md

convention_rules.md

tutorial.md

Key design choices in the project structure

src/spec_to_rest/ layout: Uses the src/ layout recommended by the Python Packaging Authority. This prevents accidental imports of the source tree during testing (tests import the installed package, rather than the local directory).
Templates as package data: The emit/templates/ directory contains Jinja2 templates that are installed as package data. This ensures they are available when the compiler is installed via pip.
Separate ast.py and ir/types.py: The AST (raw parse tree) and IR (desugared, resolved) are distinct types. The ir/builder.py transforms one to the other. This separation allows AST changes (syntax evolution) without breaking the rest of the compiler.
conventions/profiles/: Each target platform (Python/FastAPI, Go/chi, TypeScript/Express) has its own convention profile module. The profile defines target-specific naming, typing, and structure conventions. This is the extension point for adding new targets.
tests/fixtures/expected/: Golden file outputs for snapshot testing. When code generation changes, pytest --update-snapshots regenerates these.

6. CLI design

6.1 Command structure

spec-to-rest <command> [options] <spec-file>

Commands:
  check       Parse and verify the spec (no code generation)
  generate    Generate the complete service
  test        Generate and run tests against an existing service
  diff        Show what would change if the spec were regenerated
  inspect     Print the IR or convention annotations (for debugging)

Global Options:
  --verbose, -v             Show detailed compilation progress
  --quiet, -q               Suppress non-error output
  --color / --no-color      Force color output on/off
  --config <path>           Path to config file (default: .spec-to-rest.toml)

check Options:
  --verify-level <n>        Verification depth:
                              0 = parse only
                              1 = type check + name resolution
                              2 = invariant consistency (Z3)
                              3 = operation preservation (Z3 + frame analysis)
  --verify-timeout <secs>   Z3 timeout per check (default: 30)

generate Options:
  --target <profile>        Target platform:
                              python-fastapi (default)
                              go-chi
                              ts-express
  --output <dir>            Output directory (default: ./generated)
  --llm <model>             LLM model for synthesis (default: claude-sonnet-4-20250514)
  --no-synthesis            Skip LLM synthesis (emit TODOs for complex operations)
  --no-verify               Skip spec verification before generation
  --no-tests                Skip test generation
  --no-infra                Skip Dockerfile/docker-compose generation
  --dry-run                 Show what would be generated without writing files

test Options:
  --server-url <url>        URL of the running service (default: http://localhost:8000)
  --test-kind <kind>        Test kind to run:
                              all (default)
                              property
                              stateful
                              schemathesis

inspect Options:
  --format <fmt>            Output format: ir, json, alloy, dafny, openapi
  --stage <stage>           Pipeline stage: parse, conventions, verify

diff Options:
  --output <dir>            Directory of previously generated code
  --format <fmt>            Diff format: unified (default), summary

6.2 Implementation

# cli.py
import click
import sys
from pathlib import Path

@click.group()
@click.version_option()
@click.option("--verbose", "-v", is_flag=True)
@click.option("--quiet", "-q", is_flag=True)
@click.option("--color/--no-color", default=True)
@click.option("--config", type=click.Path(), default=None)
@click.pass_context
def cli(ctx, verbose, quiet, color, config):
    ctx.ensure_object(dict)
    ctx.obj["verbose"] = verbose
    ctx.obj["quiet"] = quiet
    ctx.obj["color"] = color
    ctx.obj["config"] = config

@cli.command()
@click.argument("spec_file", type=click.Path(exists=True))
@click.option("--verify-level", type=int, default=2)
@click.option("--verify-timeout", type=int, default=30)
@click.pass_context
def check(ctx, spec_file, verify_level, verify_timeout):
    """Parse and verify a spec file."""
    from .parser import parse_spec_file
    from .ir import build_ir
    from .verify import verify_spec

    spec_text = Path(spec_file).read_text()

    # Stage 1: Parse
    _log(ctx, f"Parsing {spec_file}...")
    ast = parse_spec_file(spec_text, filename=spec_file)

    # Stage 2: Build IR
    _log(ctx, "Building IR...")
    ir = build_ir(ast)

    # Stage 3: Verify
    if verify_level > 0:
        _log(ctx, f"Verifying (level {verify_level})...")
        result = verify_spec(ir, level=verify_level, timeout=verify_timeout)
        if result.errors:
            for err in result.errors:
                _error(ctx, err.render(spec_text))
            sys.exit(1)
        if result.warnings:
            for warn in result.warnings:
                _warn(ctx, warn.render(spec_text))

    _success(ctx, f"Spec is valid ({len(ir.operations)} operations, "
                   f"{len(ir.invariants)} invariants)")

@cli.command()
@click.argument("spec_file", type=click.Path(exists=True))
@click.option("--target", type=click.Choice(["python-fastapi", "go-chi", "ts-express"]),
              default="python-fastapi")
@click.option("--output", "-o", type=click.Path(), default="./generated")
@click.option("--llm", default="claude-sonnet-4-20250514")
@click.option("--no-synthesis", is_flag=True)
@click.option("--no-verify", is_flag=True)
@click.option("--no-tests", is_flag=True)
@click.option("--no-infra", is_flag=True)
@click.option("--dry-run", is_flag=True)
@click.pass_context
def generate(ctx, spec_file, target, output, llm, no_synthesis, no_verify,
             no_tests, no_infra, dry_run):
    """Generate a complete service from a spec file."""
    from .parser import parse_spec_file
    from .ir import build_ir
    from .verify import verify_spec
    from .conventions import apply_conventions
    from .synthesis import synthesize_operations
    from .emit import emit_project
    from .test_gen import generate_tests

    spec_text = Path(spec_file).read_text()

    # Parse
    _log(ctx, f"Parsing {spec_file}...")
    ast = parse_spec_file(spec_text, filename=spec_file)
    ir = build_ir(ast)

    # Verify
    if not no_verify:
        _log(ctx, "Verifying spec consistency...")
        result = verify_spec(ir, level=2, timeout=30)
        if result.errors:
            for err in result.errors:
                _error(ctx, err.render(spec_text))
            sys.exit(1)

    # Apply conventions
    _log(ctx, f"Applying conventions for {target}...")
    annotations = apply_conventions(ir, profile=target)

    # LLM Synthesis
    if not no_synthesis:
        _log(ctx, "Synthesizing operation implementations...")
        synthesis_results = synthesize_operations(ir, annotations, model=llm)
        for name, result in synthesis_results.items():
            status = "verified" if result.verified else "UNVERIFIED"
            _log(ctx, f"  {name}: {status} ({result.iterations} iterations, ${result.cost_usd:.2f})")
    else:
        synthesis_results = {}

    # Emit
    _log(ctx, f"Generating project in {output}/...")
    manifest = emit_project(ir, annotations, synthesis_results,
                            target=target, output_dir=output, dry_run=dry_run)

    # Tests
    if not no_tests:
        _log(ctx, "Generating tests...")
        test_manifest = generate_tests(ir, annotations, target=target,
                                       output_dir=output, dry_run=dry_run)
        manifest.extend(test_manifest)

    # Infrastructure
    if not no_infra:
        _log(ctx, "Generating infrastructure files...")
        # Dockerfile, docker-compose, CI config
        # (handled by emit_project based on flags)

    # Summary
    if dry_run:
        _log(ctx, "Dry run -- files that would be generated:")
        for entry in manifest:
            _log(ctx, f"  {entry.action:8s} {entry.path}")
    else:
        _success(ctx, f"Generated {len(manifest)} files in {output}/")
        _log(ctx, f"Run: cd {output} && docker-compose up")

6.3 Example workflows

First-time generation

# Generate a URL shortener service
$ spec-to-rest generate --target python-fastapi --output ./url-shortener url_shortener.spec

Parsing url_shortener.spec...
Building IR...
Verifying spec consistency...
Applying conventions for python-fastapi...
Synthesizing operation implementations...
  Shorten: verified (3 iterations, $0.12)
  Resolve: verified (1 iteration, $0.02)
  Delete: verified (1 iteration, $0.02)
Generating project in ./url-shortener/...
Generating tests...
Generating infrastructure files...
Generated 18 files in ./url-shortener/

Run: cd ./url-shortener && docker-compose up

# Run the generated service
$ cd url-shortener
$ docker-compose up -d
$ curl -X POST http://localhost:8000/shorten -d '{"url": "https://example.com"}'
{"code": "abc123", "short_url": "http://localhost:8000/abc123"}

Iterative development (spec changes -> regenerate)

# Edit the spec to add a Stats operation
$ vim url_shortener.spec

# See what would change
$ spec-to-rest diff --output ./url-shortener url_shortener.spec
  added    url-shortener/app/routes/stats.py
  modified url-shortener/app/main.py (1 route added)
  modified url-shortener/openapi.yaml (1 endpoint added)
  added    url-shortener/tests/test_stats_property.py
  modified url-shortener/tests/test_stateful.py (1 rule added)

# Regenerate
$ spec-to-rest generate --target python-fastapi --output ./url-shortener url_shortener.spec

# The generated service includes the new Stats operation

CI/CD integration

# .github/workflows/spec-check.yml
name: Spec Verification
on: [push, pull_request]
jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: spec-to-rest/setup@v1 # installs compiler + Z3 + Dafny
      - run: spec-to-rest check --verify-level 3 spec.rest
      - run: spec-to-rest generate --target python-fastapi --output ./generated spec.rest
      - run: cd generated && docker-compose up -d && sleep 5
      - run: cd generated && pytest tests/ -v
      - run: cd generated && schemathesis run openapi.yaml --base-url http://localhost:8000

Running in a Docker container

# The compiler itself runs in Docker (includes Z3, Dafny, LLM SDK)
$ docker run --rm -v $(pwd):/workspace -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
    spec-to-rest/compiler:latest \
    generate --target python-fastapi --output /workspace/generated /workspace/spec.rest

7. Testing strategy for the compiler itself

7.1 Unit tests

Parser tests (spec text -> expected AST)

# tests/unit/test_parser.py
import pytest
from spec_to_rest.parser import parse_spec_file
from spec_to_rest.parser.ast import *

class TestParseEntity:
    def test_simple_entity(self):
        ast = parse_spec_file("""
            service Test {
                entity Foo {
                    name: String
                    age: Int
                }
            }
        """)
        assert len(ast.entities) == 1
        entity = ast.entities[0]
        assert entity.name == "Foo"
        assert len(entity.fields) == 2
        assert entity.fields[0].name == "name"
        assert entity.fields[0].type_expr.name == "String"

    def test_entity_with_invariant(self):
        ast = parse_spec_file("""
            service Test {
                entity Code {
                    value: String
                    invariant: len(value) >= 6
                }
            }
        """)
        entity = ast.entities[0]
        assert len(entity.invariants) == 1
        inv = entity.invariants[0]
        assert inv.kind == ExprKind.BINARY_OP
        assert inv.binary_op == BinaryOp.GTE

    def test_parse_error_missing_brace(self):
        with pytest.raises(ParseError) as exc_info:
            parse_spec_file("""
                service Test {
                    entity Foo {
                        name: String
                    # missing closing brace
                }
            """)
        error = exc_info.value
        assert "expected '}'" in str(error)
        assert error.span.start_line > 0

class TestParseOperation:
    def test_operation_with_requires_ensures(self):
        ast = parse_spec_file("""
            service Test {
                state { items: String -> set Int }
                operation Add {
                    input: key: String, value: Int
                    requires: key not in items
                    ensures: items'[key] = value
                }
            }
        """)
        op = ast.operations[0]
        assert op.name == "Add"
        assert len(op.inputs) == 2
        assert len(op.requires) == 1
        assert len(op.ensures) == 1

class TestParseExpressions:
    def test_quantifier(self):
        expr = parse_expr("all x in store | valid(x)")
        assert expr.kind == ExprKind.QUANTIFIER
        assert expr.quantifier_kind == QuantifierKind.FORALL
        assert expr.quantifier_var == "x"

    def test_cardinality(self):
        expr = parse_expr("#store + 1")
        assert expr.kind == ExprKind.BINARY_OP
        assert expr.left.kind == ExprKind.CARDINALITY

    def test_pre_state(self):
        expr = parse_expr("code not in pre(store)")
        assert expr.kind == ExprKind.MEMBERSHIP
        assert expr.membership_negated == True
        assert expr.membership_collection.kind == ExprKind.PRE_STATE

    def test_post_state(self):
        expr = parse_expr("store'[code] = url")
        assert expr.kind == ExprKind.BINARY_OP
        assert expr.left.kind == ExprKind.INDEX
        assert expr.left.index_base.kind == ExprKind.POST_STATE

    def test_operator_precedence(self):
        # 'and' binds tighter than 'or'
        expr = parse_expr("a or b and c")
        assert expr.kind == ExprKind.BINARY_OP
        assert expr.binary_op == BinaryOp.OR
        assert expr.right.binary_op == BinaryOp.AND

Convention engine tests (IR -> expected HTTP mapping)

# tests/unit/test_conventions_http.py
from spec_to_rest.conventions.http import HttpConventionEngine
from spec_to_rest.ir.types import *

class TestHttpMethodMapping:
    def test_operation_with_state_mutation_is_post(self):
        """Operation that adds to state -> POST."""
        op = OperationDecl(
            name="Shorten",
            inputs=[ParamDecl("url", TypeExpr(TypeKind.SIMPLE, name="LongURL"))],
            outputs=[ParamDecl("code", TypeExpr(TypeKind.SIMPLE, name="ShortCode"))],
            requires=[],
            ensures=[
                # store'[code] = url  (state mutation)
                Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.EQ,
                    left=Expr(kind=ExprKind.INDEX,
                        index_base=Expr(kind=ExprKind.POST_STATE,
                            state_expr=Expr(kind=ExprKind.VARIABLE, var_name="store")),
                        index_key=Expr(kind=ExprKind.VARIABLE, var_name="code")),
                    right=Expr(kind=ExprKind.VARIABLE, var_name="url")),
            ],
        )
        engine = HttpConventionEngine()
        annotation = engine.annotate_operation(op)
        assert annotation.http_method == "POST"
        assert annotation.http_status_success == 201

    def test_operation_with_no_mutation_is_get(self):
        """Operation where store' = store -> GET."""
        op = OperationDecl(
            name="Resolve",
            inputs=[ParamDecl("code", TypeExpr(TypeKind.SIMPLE, name="ShortCode"))],
            outputs=[ParamDecl("url", TypeExpr(TypeKind.SIMPLE, name="LongURL"))],
            requires=[],
            ensures=[
                # store' = store  (no mutation)
                Expr(kind=ExprKind.BINARY_OP, binary_op=BinaryOp.EQ,
                    left=Expr(kind=ExprKind.POST_STATE,
                        state_expr=Expr(kind=ExprKind.VARIABLE, var_name="store")),
                    right=Expr(kind=ExprKind.VARIABLE, var_name="store")),
            ],
        )
        engine = HttpConventionEngine()
        annotation = engine.annotate_operation(op)
        assert annotation.http_method == "GET"
        assert annotation.http_status_success == 200

    def test_operation_with_removal_is_delete(self):
        """Operation where element removed from state -> DELETE."""
        op = OperationDecl(
            name="Delete",
            inputs=[ParamDecl("code", TypeExpr(TypeKind.SIMPLE, name="ShortCode"))],
            outputs=[],
            requires=[],
            ensures=[
                # code not in store'
                Expr(kind=ExprKind.MEMBERSHIP, membership_negated=True,
                    membership_element=Expr(kind=ExprKind.VARIABLE, var_name="code"),
                    membership_collection=Expr(kind=ExprKind.POST_STATE,
                        state_expr=Expr(kind=ExprKind.VARIABLE, var_name="store"))),
            ],
        )
        engine = HttpConventionEngine()
        annotation = engine.annotate_operation(op)
        assert annotation.http_method == "DELETE"
        assert annotation.http_status_success == 204

    def test_requires_membership_maps_to_404(self):
        """requires: code in store -> 404 when code not found."""
        op = OperationDecl(
            name="Resolve",
            inputs=[ParamDecl("code", TypeExpr(TypeKind.SIMPLE, name="ShortCode"))],
            outputs=[ParamDecl("url", TypeExpr(TypeKind.SIMPLE, name="LongURL"))],
            requires=[
                Expr(kind=ExprKind.MEMBERSHIP, membership_negated=False,
                    membership_element=Expr(kind=ExprKind.VARIABLE, var_name="code"),
                    membership_collection=Expr(kind=ExprKind.VARIABLE, var_name="store")),
            ],
            ensures=[],
        )
        engine = HttpConventionEngine()
        annotation = engine.annotate_operation(op)
        assert "not_found" in annotation.http_status_errors
        assert annotation.http_status_errors["not_found"] == 404

Emitter tests (IR -> expected code snippets)

# tests/unit/test_emitter_python.py
from spec_to_rest.emit.engine import PythonFastAPIEmitter
from spec_to_rest.ir.types import *
from spec_to_rest.conventions.engine import ConventionAnnotations, OperationAnnotation

class TestPythonRouteEmission:
    def test_post_route(self):
        ir = make_test_ir()  # fixture helper
        annotations = ConventionAnnotations(
            operation_annotations={
                "Shorten": OperationAnnotation(
                    http_method="POST", http_path="/shorten",
                    http_status_success=201,
                    http_status_errors={"validation": 422},
                    request_body_model="ShortenRequest",
                    response_body_model="ShortenResponse",
                    path_params=[], query_params=[],
                    operation_kind="create", is_idempotent=False,
                    headers={},
                ),
            },
            entity_annotations={}, relation_annotations={},
        )
        emitter = PythonFastAPIEmitter(ir, annotations)
        code = emitter.emit_route("Shorten")

        assert '@app.post("/shorten", status_code=201)' in code
        assert "async def shorten(body: ShortenRequest) -> ShortenResponse:" in code
        assert "HTTPException" in code  # for validation errors

7.2 Integration tests

End-to-end test

# tests/integration/test_end_to_end.py
import subprocess
import time
import requests
import pytest
from pathlib import Path

@pytest.fixture(scope="module")
def generated_service(tmp_path_factory):
    """Generate a service from a fixture spec and start it with docker-compose."""
    output_dir = tmp_path_factory.mktemp("generated")
    spec_path = Path(__file__).parent.parent / "fixtures" / "url_shortener.spec"

    # Generate
    result = subprocess.run(
        ["python", "-m", "spec_to_rest", "generate",
         "--target", "python-fastapi",
         "--output", str(output_dir),
         "--no-synthesis",  # use TODOs for speed in CI
         str(spec_path)],
        capture_output=True, text=True
    )
    assert result.returncode == 0, f"Generation failed: {result.stderr}"

    # Start service
    subprocess.run(
        ["docker-compose", "up", "-d", "--build"],
        cwd=str(output_dir),
        capture_output=True, text=True,
        check=True
    )

    # Wait for service to be ready
    for _ in range(30):
        try:
            resp = requests.get("http://localhost:8000/health")
            if resp.status_code == 200:
                break
        except requests.ConnectionError:
            time.sleep(1)
    else:
        pytest.fail("Service failed to start within 30 seconds")

    yield output_dir

    # Cleanup
    subprocess.run(
        ["docker-compose", "down", "-v"],
        cwd=str(output_dir),
        capture_output=True
    )

def test_generated_code_compiles(generated_service):
    """The generated Python code should have no syntax errors."""
    result = subprocess.run(
        ["python", "-m", "py_compile", "app/main.py"],
        cwd=str(generated_service),
        capture_output=True, text=True
    )
    assert result.returncode == 0

def test_openapi_spec_valid(generated_service):
    """The generated OpenAPI spec should be valid."""
    resp = requests.get("http://localhost:8000/openapi.json")
    assert resp.status_code == 200
    spec = resp.json()
    assert "paths" in spec
    assert "/shorten" in spec["paths"]

def test_crud_operations(generated_service):
    """Basic CRUD cycle should work."""
    # Create
    resp = requests.post("http://localhost:8000/shorten",
                         json={"url": "https://example.com"})
    assert resp.status_code == 201
    data = resp.json()
    code = data["code"]

    # Read
    resp = requests.get(f"http://localhost:8000/{code}",
                        allow_redirects=False)
    assert resp.status_code in (200, 302)

    # Delete
    resp = requests.delete(f"http://localhost:8000/{code}")
    assert resp.status_code == 204

    # Read after delete -> 404
    resp = requests.get(f"http://localhost:8000/{code}")
    assert resp.status_code == 404

7.3 Golden file tests

# tests/golden/test_golden_files.py
import pytest
from pathlib import Path
from spec_to_rest.parser import parse_spec_file
from spec_to_rest.ir import build_ir
from spec_to_rest.conventions import apply_conventions
from spec_to_rest.emit import emit_project

FIXTURES_DIR = Path(__file__).parent.parent / "fixtures"
EXPECTED_DIR = FIXTURES_DIR / "expected"

@pytest.mark.parametrize("spec_name", ["url_shortener", "todo_list"])
def test_golden_openapi(spec_name, snapshot):
    spec_text = (FIXTURES_DIR / f"{spec_name}.spec").read_text()
    ast = parse_spec_file(spec_text)
    ir = build_ir(ast)
    annotations = apply_conventions(ir, profile="python-fastapi")
    openapi = emit_openapi(ir, annotations)
    snapshot.assert_match(openapi, f"{spec_name}_openapi.yaml")

@pytest.mark.parametrize("spec_name", ["url_shortener", "todo_list"])
def test_golden_sql(spec_name, snapshot):
    spec_text = (FIXTURES_DIR / f"{spec_name}.spec").read_text()
    ast = parse_spec_file(spec_text)
    ir = build_ir(ast)
    annotations = apply_conventions(ir, profile="python-fastapi")
    sql = emit_sql(ir, annotations)
    snapshot.assert_match(sql, f"{spec_name}_migration.sql")

7.4 Property-based tests for the compiler

# tests/unit/test_parser_properties.py
from hypothesis import given, strategies as st, assume
from spec_to_rest.parser import parse_spec_file
from spec_to_rest.ir import build_ir
from spec_to_rest.ir.printer import print_ir

# Strategy: generate random valid spec text
@st.composite
def valid_entity_name(draw):
    first = draw(st.sampled_from("ABCDEFGHIJKLMNOPQRSTUVWXYZ"))
    rest = draw(st.text(alphabet="abcdefghijklmnopqrstuvwxyz0123456789_", min_size=0, max_size=20))
    return first + rest

@st.composite
def valid_field_type(draw):
    return draw(st.sampled_from(["String", "Int", "Float", "Bool", "DateTime"]))

@st.composite
def valid_entity(draw):
    name = draw(valid_entity_name())
    num_fields = draw(st.integers(min_value=1, max_value=5))
    fields = []
    used_names = set()
    for _ in range(num_fields):
        fname = draw(st.text(alphabet="abcdefghijklmnopqrstuvwxyz", min_size=1, max_size=10))
        assume(fname not in used_names)
        used_names.add(fname)
        ftype = draw(valid_field_type())
        fields.append(f"    {fname}: {ftype}")
    return f"  entity {name} {{\n" + "\n".join(fields) + "\n  }"

@st.composite
def valid_spec(draw):
    name = draw(valid_entity_name())
    num_entities = draw(st.integers(min_value=1, max_value=3))
    entities = [draw(valid_entity()) for _ in range(num_entities)]
    return f"service {name} {{\n" + "\n\n".join(entities) + "\n}"

@given(spec_text=valid_spec())
def test_parse_roundtrip(spec_text):
    """Any valid spec should parse without error."""
    ast = parse_spec_file(spec_text)
    ir = build_ir(ast)
    assert ir.name is not None
    assert len(ir.entities) >= 1

@given(spec_text=valid_spec())
def test_ir_serialization_roundtrip(spec_text):
    """IR -> JSON -> IR should be lossless."""
    ast = parse_spec_file(spec_text)
    ir = build_ir(ast)
    json_text = serialize_ir(ir)
    ir2 = deserialize_ir(json_text)
    assert ir.name == ir2.name
    assert len(ir.entities) == len(ir2.entities)

8. Dependency management and distribution

8.1 Dependencies (Python examples shown for illustrative reference)

# pyproject.toml
[project]
name = "spec-to-rest"
version = "0.1.0"
description = "Compile formal behavioral specs to verified REST services"
requires-python = ">=3.11"
dependencies = [
    "click>=8.1",            # CLI framework
    "jinja2>=3.1",           # template engine for code emission
    "z3-solver>=4.12",       # Z3 SMT solver (spec verification)
    "anthropic>=0.40",       # Anthropic API client (LLM synthesis)
    "pyyaml>=6.0",           # YAML generation (OpenAPI)
    "rich>=13.0",            # terminal output formatting
]

[project.optional-dependencies]
dev = [
    "pytest>=8.0",
    "pytest-snapshot>=0.9",
    "hypothesis>=6.100",
    "mypy>=1.9",
    "ruff>=0.4",
]

[project.scripts]
spec-to-rest = "spec_to_rest.cli:cli"

[tool.mypy]
strict = true

8.2 External tool dependencies

Tool	Version	Purpose	Size	Bundling Strategy
Z3	4.13+	Spec verification	~50 MB	Bundled via `z3-solver` pip package
Dafny	4.9+	Code verification + compilation	~200 MB	User install or Docker image
Python 3.11+	-	Target language toolchain	-	User's machine
Go 1.22+	-	Target language toolchain	-	User's machine (if targeting Go)
Node 20+	-	Target language toolchain	-	User's machine (if targeting TS)
Docker	24+	Running generated services	-	User's machine
PostgreSQL	16+	Database for generated services	-	Via docker-compose

8.3 Distribution options

Option 1: pip install (Python users)

pip install spec-to-rest
# or with pipx for isolation:
pipx install spec-to-rest

Pros: Simple, familiar to Python users. Cons: Requires Python 3.11+, does not include Dafny.

Option 2: Docker image (most reliable)

FROM python:3.12-slim

# Install Z3 (included via pip)
# Install Dafny
RUN apt-get update && apt-get install -y dotnet-sdk-8.0 && \
    dotnet tool install --global dafny --version 4.9.0

# Install compiler
COPY . /app
RUN pip install /app

ENTRYPOINT ["spec-to-rest"]

docker pull spec-to-rest/compiler:latest
docker run --rm -v $(pwd):/workspace spec-to-rest/compiler:latest generate spec.rest

Pros: Includes all dependencies (Z3, Dafny, SDKs). Works everywhere. Cons: Requires Docker, ~500 MB image.

Option 3: GitHub Action (CI users)

- uses: spec-to-rest/action@v1
  with:
    spec-file: spec.rest
    target: python-fastapi
    output: ./generated

The action uses the Docker image internally.

Option 4: Single binary (future, Kotlin/GraalVM path)

# macOS
brew install spec-to-rest

# Linux
curl -sSL https://get.spec-to-rest.dev | sh

# Windows
scoop install spec-to-rest

GraalVM native-image produces a single binary that bundles the JVM, Z3 (via JNI), and Alloy. Dafny still requires a separate install or bundled .NET runtime.

8.4 LLM API key management

The compiler needs an Anthropic API key for the synthesis stage. Key management follows standard practices:

# Priority order:
1. --llm-api-key CLI flag (not recommended -- visible in shell history)
2. ANTHROPIC_API_KEY environment variable (recommended for CI)
3. .spec-to-rest.toml config file (recommended for local dev)
4. ~/.config/spec-to-rest/config.toml (user-level default)

# .spec-to-rest.toml
[synthesis]
model = "claude-sonnet-4-20250514"
api_key_env = "ANTHROPIC_API_KEY"  # name of env var to read
cost_budget_usd = 5.0
cache_enabled = true
cache_dir = ".spec-to-rest-cache"

The compiler should never log or display the API key. If --no-synthesis is set, no API key is needed.

9. Build plan with milestones

Phase 1: Parser + convention engine MVP (~4 weeks)

Goal: spec-to-rest generate spec.rest produces a running Python/FastAPI service for the URL shortener spec.

Week 1-2: Parser

Implement lexer (tokenizer) for the DSL
Implement recursive descent parser for: entities, state, operations, invariants
Implement AST -> IR builder
Implement IR printer (for --verbose and debugging)
Write unit tests for all parser features
Deliverable: spec-to-rest inspect --format ir url_shortener.spec prints the IR

Week 2-3: Convention Engine

Implement HTTP method/path/status mapping from IR analysis
- State mutation detection (pre-state vs post-state comparison in ensures)
- Operation kind classification (create, read, update, delete, action)
- Path parameter extraction from inputs
- Error status code mapping from requires clauses
Implement database schema mapping
- Entity -> table, field -> column
- State relation -> foreign key or junction table
- Constraint -> CHECK constraint
- Type mapping (String -> TEXT, Int -> INTEGER, etc.)
Implement naming conventions (snake_case, pluralize, etc.)
Deliverable: spec-to-rest inspect --format openapi url_shortener.spec prints valid OpenAPI

Week 3-4: Code Emitter

Implement Jinja2 templates for Python/FastAPI target:
- main.py (FastAPI app, router registration)
- routes/{operation}.py (route handlers)
- models.py (Pydantic request/response models)
- schemas.py (SQLAlchemy ORM models)
- database.py (connection, session management)
- config.py (settings from environment)
- requirements.txt
- Dockerfile + docker-compose.yml (app + PostgreSQL)
- alembic/ (database migrations)
Implement SQL DDL generation for initial migration
Implement OpenAPI spec generation
Write golden file tests for all generated output
Deliverable: spec-to-rest generate url_shortener.spec && cd generated && docker-compose up

Phase 1 exit criteria

URL shortener spec -> running service that responds to requests
CRUD operations work (create short URL, resolve, delete)
Business logic is stubbed (TODO comments for non-trivial operations)
Generated code passes linting (ruff) and type checking (mypy)
80%+ test coverage on the compiler itself

Phase 2: Spec verification (~3 weeks)

Goal: spec-to-rest check spec.rest catches spec errors before code generation.

Week 5: Z3 Backend

Implement IR -> Z3 constraint translation
- Entity type invariants -> Z3 assertions
- State relation multiplicities -> Z3 cardinality constraints
- Global invariants -> Z3 universal quantifier assertions
Implement invariant consistency checking (are all invariants simultaneously satisfiable?)
Implement basic counterexample rendering
Deliverable: spec-to-rest check --verify-level 2 spec.rest reports contradictory invariants

Week 6: Operation Preservation

Implement pre/post-state encoding for Z3
- Pre-state as "given" variables
- Post-state as "exists" variables
- Ensures as constraints linking pre and post
- Frame condition inference (what does NOT change)
Implement operation-preserves-invariant checking
- For each (operation, invariant) pair: verify that if invariant holds in pre-state and ensures hold, then invariant holds in post-state
Deliverable: spec-to-rest check --verify-level 3 spec.rest reports operations that violate invariants

Week 7: Error Reporting

Implement rich error messages with source location highlighting
- "Invariant on line 12 conflicts with invariant on line 15"
- "Operation 'Shorten' on line 20 may violate invariant on line 30"
- Counterexample rendering: "When store contains {abc -> x.com}, calling Delete(abc) leaves store empty but invariant requires..."
Implement --verify-timeout and graceful timeout handling
Write integration tests for all verification scenarios

Phase 3: Test generation (~3 weeks)

Goal. Generated service includes comprehensive tests derived from the spec.

Week 8: Property Tests

Translate ensures clauses to Hypothesis property tests
- Each ensures clause -> one test function
- Input parameters generated by Hypothesis strategies
- Test body: call the API endpoint, assert ensures holds on response
Generate conftest.py with database fixtures and test client setup
Deliverable: cd generated && pytest tests/test_property.py runs ensures-derived tests

Week 9: Stateful Tests

Translate operations to Hypothesis RuleBasedStateMachine
- Each operation -> one @rule
- Each requires clause -> one @precondition
- Each global invariant -> one @invariant
- Operation inputs -> rule parameters with appropriate strategies
- Bundles for cross-operation data flow (e.g., created codes fed to resolve/delete)
Deliverable: cd generated && pytest tests/test_stateful.py runs state machine tests

Week 10: Schemathesis Integration

Generate Schemathesis configuration from OpenAPI spec
Generate a schemathesis.yml config file with:
- Target URL
- Authentication (if any)
- Stateful test options
- Custom checks from spec invariants
Deliverable: cd generated && schemathesis run openapi.yaml runs API fuzzing tests

Phase 4: LLM synthesis (~4 weeks)

Goal. Non-trivial operations are synthesized and verified via LLM + Dafny.

Week 11: Dafny Generation

Implement IR -> Dafny source translation
- Entity types -> Dafny type definitions
- State relations -> Dafny class fields
- Operations -> Dafny methods with requires/ensures
- Invariants -> Dafny class invariants
Implement Dafny compiler invocation and error parsing
Deliverable: spec-to-rest inspect --format dafny spec.rest emits valid Dafny

Week 12: LLM Integration

Implement Anthropic API client with:
- Prompt construction from IR + Dafny skeleton
- Response parsing (extract code blocks)
- Token tracking and cost budgeting
- Cache layer (hash of operation spec -> cached result)
Deliverable: LLM generates Dafny method bodies from spec operations

Week 13: CEGIS Feedback Loop

Implement the generate-verify-repair loop:
- LLM generates candidate Dafny
- Dafny verifier checks it
- On failure: extract error messages and counterexamples
- Feed errors back to LLM as repair context
- Repeat up to N iterations
Implement fallback: if CEGIS fails, emit TODO stub
Deliverable: Complex operations are verified and compiled

Week 14: Dafny -> Target Language

Implement Dafny -> Python compilation integration
- Invoke dafny translate py
- Post-process generated Python to integrate with FastAPI app
- Handle Dafny runtime library dependency
Implement integration of synthesized code with convention-generated scaffolding
Deliverable: Synthesized code plugs into generated service seamlessly

Phase 5: Multi-target support (~4 weeks)

Goal. Generate services for Go/chi and TypeScript/Express in addition to Python/FastAPI.

Week 15-16: Go/chi Target

Create Go convention profile (naming, types, imports)
Create Go templates:
- main.go (chi router, middleware)
- handler.go (request handlers)
- model.go (structs)
- repository.go (database access via sqlx)
- go.mod, go.sum
- Dockerfile
Implement Dafny -> Go compilation integration
Write golden file tests for Go output
Deliverable: spec-to-rest generate --target go-chi spec.rest

Week 17-18: TypeScript/Express Target

Create TypeScript convention profile
Create TypeScript templates:
- app.ts (Express app, middleware)
- routes/{operation}.ts (route handlers)
- models.ts (TypeScript interfaces + Zod schemas)
- repository.ts (Prisma ORM)
- package.json, tsconfig.json
- Dockerfile
Implement Dafny -> JavaScript compilation integration
Write golden file tests for TypeScript output
Deliverable: spec-to-rest generate --target ts-express spec.rest

Phase 6: Polish (~2 weeks)

Week 19: Error Messages and Documentation

Audit all error messages for clarity
- Every error should include: what went wrong, where (line:col), and what to do
- Example: "error[E012]: Operation 'Shorten' requires 'code' to not exist in store, but 'code' is not defined as an input parameter. Did you mean to add 'code' to the output? (spec.rest:22:5)"
Write language reference documentation
Write convention rules documentation
Write getting-started tutorial
Deliverable: docs/ directory with complete documentation

Week 20: Performance and Release

Profile compiler on large specs (10+ entities, 20+ operations)
Optimize hot paths (parser, convention engine, template rendering)
Implement --dry-run and diff commands
Set up release pipeline:
- PyPI publish on git tag
- Docker image publish on git tag
- GitHub Action publish
Deliverable: v0.1.0 release

10. Risk mitigation

Risk 1: Parser is too hard or grammar keeps changing

Probability. Medium Impact: Delays Phase 1 by 1-2 weeks

Mitigation strategies

Start with a minimal grammar subset (entities + operations only, no conventions block). Add features incrementally.
Use hand-written recursive descent for maximum flexibility during grammar iteration. Do not invest in ANTLR4 until the grammar is stable.
Keep a "parse test for every grammar feature" discipline. Any grammar change must update tests first.
If stuck on expression parsing, use a Pratt parser (well-documented, 200 lines of code, handles precedence automatically).

Escalation. If grammar design takes more than 2 weeks, freeze the grammar and revisit in Phase 6. A slightly awkward grammar that works is better than a perfect grammar that doesn't ship.

Risk 2: Z3 integration is fragile or slow

Probability. Medium-High Impact: Delays Phase 2 or produces unreliable verification

Mitigation strategies

Start with simple consistency checks (are invariants satisfiable?) before attempting operation preservation proofs. The simple checks use basic Z3 and are reliable.
Use Z3 timeouts aggressively (5s for quick checks, 30s for deep checks). Report "unknown" as a warning, rather than an error.
Use uninterpreted functions for complex predicates (like isValidURI). This makes Z3's job easier at the cost of weaker verification.
If Z3 Python bindings cause packaging issues, fall back to subprocess invocation of the Z3 CLI with SMT-LIB2 input format.

Escalation. If Z3 integration takes more than 2 weeks, ship Phase 2 with parse-level checks only (type resolution, name resolution, basic consistency) and defer SMT-based verification.

Risk 3: Dafny is too slow or produces non-idiomatic code

Probability. Medium Impact: Phase 4 is less useful; synthesized code needs manual cleanup

Mitigation strategies

Make LLM synthesis entirely optional (--no-synthesis flag). The compiler is useful without it , convention-based generation covers CRUD.
For simple operations where the body is deterministic from the spec (e.g., url = store[code] is just a database lookup), emit code directly without LLM/Dafny. Only invoke the LLM for operations with non-trivial computation.
Post-process Dafny-generated code to make it more idiomatic:
- Rename variables to match conventions
- Replace Dafny runtime library calls with native equivalents where possible
- Format with the target language's formatter (black, gofmt, prettier)
Cache Dafny verification results aggressively. The same spec operation should not be re-verified unless the spec changes.

Escalation. If Dafny integration takes more than 3 weeks, ship Phase 4 with LLM synthesis only (no formal verification). The LLM generates target-language code directly, checked by type system and tests rather than Dafny proofs.

Risk 4: LLM costs too high

Probability. Low-Medium Impact: Users avoid synthesis; compiler value proposition weakened

Mitigation strategies

Cache all synthesis results. A spec that hasn't changed should never re-invoke the LLM.
Skip LLM for CRUD operations (the majority of REST API operations). Only invoke for operations with non-trivial computation.
Use the smallest effective model. Start with Claude Sonnet (cheaper) and only escalate to Opus for operations that fail verification.
Implement a cost budget (--llm-budget 5.00) that caps total spending per compilation. Default to $5.00 which is sufficient for 10-20 complex operations.
Support local LLM models via OpenAI-compatible API (Ollama, vLLM) for cost-sensitive users willing to accept lower synthesis quality.

Estimated costs per compilation

Spec complexity	Operations	LLM calls	Estimated cost
URL shortener (3 ops)	1 synthesized	3-8 calls	$0.05-0.20
Todo list (5 ops)	2 synthesized	6-16 calls	$0.10-0.40
E-commerce (15 ops)	5 synthesized	15-40 calls	$0.30-1.20
Large service (30 ops)	10 synthesized	30-80 calls	$0.60-2.50

Risk 5: Generated code quality is poor

Probability. Medium Impact: Users don't trust the compiler; adoption stalls

Mitigation strategies

Extensive golden file tests. Every example spec has a "known-good" output that is manually reviewed. Any change to code generation is caught immediately.
Generate code that passes the target language's strictest linting:
- Python: ruff + mypy --strict
- Go: golangci-lint with all checks
- TypeScript: eslint + tsc --strict
Generate code that is formatted by the target language's canonical formatter (black, gofmt, prettier). Never emit unformatted code.
Limit the number of supported targets. Three well-tested targets (Python, Go, TypeScript) are better than ten mediocre ones.
Include the spec as a comment in the generated code. Users can see exactly which spec element produced which code, making manual adjustments easier.

Risk 6: Convention engine makes wrong decisions

Probability. Medium Impact: Generated HTTP API doesn't match user expectations

Mitigation strategies

Make every convention decision overridable via the conventions block in the spec. The engine's defaults are suggestions, rather than mandates.
Implement spec-to-rest inspect --stage conventions spec.rest to show all convention decisions before code generation. Users can preview and override.
Document every convention rule in docs/convention_rules.md with rationale.
Use conservative defaults that match REST API community norms (POST for create, GET for read, etc.).
When the convention engine is uncertain (e.g., an operation that both reads and writes state), emit a warning suggesting the user add an explicit convention override.

Risk 7: TypeScript ecosystem limitations

Probability. Low (TypeScript ecosystem is mature) Impact: May need to shell out to native binaries for some integrations

Mitigation strategies

TypeScript is the sole implementation language (no rewrite needed). The compiler is shippable from Phase 1.
The TypeScript ecosystem has mature tooling for every integration point (ANTLR4 via antlr-ng, Z3 via z3-solver npm, LLM SDKs).
Use Kotlin idiomatically from the start:
- Sealed classes for IR (not Java-style class hierarchies)
- Extension functions for utilities (not inheritance)
- Coroutines for concurrent LLM calls (not threads)
If Kotlin is off the table, the Python compiler can be optimized with:
- Mypyc compilation for hot paths (2-5x speedup)
- multiprocessing for parallel verification and LLM calls
- C extension for the parser (via tree-sitter Python bindings)

Risk 8: Spec language design is wrong

Probability. Medium-High (most likely risk) Impact: Fundamental rework needed; wasted effort on parser/IR

Mitigation strategies

Design the spec language iteratively. Start with the URL shortener example and add features only when a real spec needs them.
Write 5+ example specs before finalizing the grammar. Each spec will reveal missing features or awkward syntax.
Get user feedback on the spec language early (before building the code emitter). Show developers the spec and ask: "Does this make sense? Would you write it this way?"
Keep the parser/IR loosely coupled from the rest of the compiler. A grammar change should require updating parser/ and ir/builder.py only, the convention engine and emitters work on the IR, which is more stable than the syntax.
Study Quint's evolution carefully. Quint is the most recent formal spec DSL designed for developer adoption (not academics), and it went through multiple syntax iterations based on user feedback.

Summary decision matrix

Risk	Probability	Impact	Mitigation Cost	Priority
Spec language design is wrong	High	Critical	Low (iterate)	P0
Generated code quality is poor	Medium	High	Medium (golden tests)	P0
Z3 integration is fragile	Medium-High	Medium	Low (timeouts, fallback)	P1
Convention engine makes wrong decisions	Medium	Medium	Low (overrides)	P1
Dafny is too slow	Medium	Medium	Low (optional)	P2
LLM costs too high	Low-Medium	Low	Low (caching, budgets)	P2
Parser is too hard	Medium	Low	Low (hand-write)	P3
Team can't learn Kotlin	Variable	Low	Low (stay on Python)	P3

The highest-priority risk is getting the spec language design right. This is addressed by aggressive prototyping: build the simplest possible end-to-end flow (Phase 1) and iterate on the language based on real usage. Every other risk has a fallback that keeps the project moving forward.

11. Gradual adoption strategy

A common adoption barrier for spec-driven tools is the "all or nothing" problem: developers feel they must specify their entire service before getting any value. The spec-to-REST compiler is designed for incremental adoption from day one.

11.1 Start with one endpoint

A developer can write a minimal spec with a single entity and a single operation:

service MyService {
  entity Item {
    id: Int
    name: String
    invariant: len(name) >= 1
  }

  state {
    items: Int -> one Item
  }

  operation CreateItem {
    input:  name: String
    output: item: Item

    requires: len(name) >= 1
    ensures:  item.name = name and items' = pre(items) + {item.id -> item}
  }
}

Running spec-to-rest compile myservice.spec produces a working project with one POST endpoint, one database table, one migration, validation, and tests. The developer can then add operations incrementally, each spec-to-rest compile regenerates only the affected files (see incremental regeneration in the code generation pipeline).

11.2 Coexistence with hand-written code

The generated project includes clearly marked USER CODE regions in service files where developers can add hand-written logic that survives regeneration:

# === USER CODE START: custom_middleware ===
# Add your custom middleware here. This block is preserved on regeneration.
# === USER CODE END: custom_middleware ===

Developers can also add entirely hand-written endpoints to the router by placing files in a custom/ directory that the generated main module auto-discovers. This means a team can spec-generate their core CRUD while keeping complex endpoints hand-written, migrating them to the spec one at a time.

11.3 Adopt verification gradually

The compiler pipeline has clear skip points:

Stage	Can Skip?	Effect of Skipping
Spec parsing + convention engine	No (core)
Model checking (spec verification)	Yes (`--skip-verify`)	Spec may have inconsistencies; code still generates
Dafny synthesis (business logic)	Yes (`--skip-synthesis`)	Complex operations get `TODO` stubs instead of verified code
Conformance test generation	Yes (`--skip-tests`)	No auto-generated test suite

A team can start with --skip-verify --skip-synthesis and get the structural benefits (HTTP routing, DB schema, validation, OpenAPI) immediately. As they gain confidence, they enable verification and synthesis one operation at a time.

11.4 Migration from existing services

For teams with existing REST services, the recommended path is:

Write a spec for one resource/endpoint group
Generate the code and compare it against the existing implementation
Use the generated conformance tests to validate the existing service
Replace the existing implementation with the generated one, or keep both running behind a feature flag

The spec-to-REST compiler does not require a greenfield project. It is designed to be adopted incrementally, one endpoint at a time, within an existing codebase.

Implementation Architecture

On this page