spec_to_rest
Research

Comprehensive Research Analysis

Edit on GitHub

Synthesis across 7 research domains for the spec-to-REST compiler

Last updated:

A formal-specification-to-running-REST-service compiler. This document synthesizes research across 7 domains: Alloy-to-code tools, LLM+verification synthesis, spec-first REST generators, program synthesis foundations, model-driven engineering, property-based testing, and service DSLs.


1. Executive Summary

The goal: Write a formal behavioral specification of a REST service and have a compiler produce a complete, verified, running implementation in a single pass -- abstracting away HTTP, database, language, and infrastructure concerns.

The verdict: No such tool exists today. But every piece of it has been built separately by different communities who don't talk to each other. This is an engineering integration project, not a research moonshot.

The landscape gap (confirmed by research across 60+ tools):

CapabilityTools that do itMissing piece
Formal behavioral verificationTLA+, Alloy, Quint, P languageNo code generation
Structural API spec generationTypeSpec, Smithy, OpenAPINo behavioral verification
Code generation from specsopenapi-generator, JHipster, BallerinaNo behavioral correctness
Verified code extractionDafny, F*/KaRaMeL, CoqNo REST/HTTP awareness
LLM+verifier synthesis loopsClover, AutoVerus, DafnyPro (86% on benchmarks)No service targeting
Conformance testing from specsSchemathesis, RESTler, HypothesisTesting only, no generation
Alloy-to-code compilationAlchemy (2008, Brown/WPI)Dead research prototype

No tool bridges behavioral specification to REST service implementation. JHipster is closest on the structural side (full CRUD stacks from JDL). TLA+/P language are closest on the behavioral side (used at AWS on S3, DynamoDB). The gap between them is exactly what this compiler fills.


2. Prior Art: What Has Been Tried

2.1 Alchemy (2008) -- The Only Direct Predecessor

  • Authors: Krishnamurthi, Dougherty, Fisler, Yoo (WPI/Brown)
  • What it did: Compiled Alloy specs to database-backed implementations
  • Pipeline: Alloy sigs -> DB tables, Alloy predicates -> stored procedures, Alloy facts -> DB integrity constraints
  • Core algorithm: Rewriting relational algebra formulas into database transaction code
  • Why it died: Research prototype, subset of Alloy only, no HTTP layer, no maintenance after 2010
  • Key insight we inherit: The convention that "state-change predicates = write operations" and "facts = integrity constraints" is directly applicable to REST services
  • Paper: FSE 2008
  • Improved algorithm: arXiv:1003.5350 (2010)

2.2 Imperative Alloy (2010) -- Alloy-to-Prolog

  • Author: Joseph P. Near (MIT, Jackson's group)
  • What it did: Extended Alloy with imperative constructs, compiled to Prolog
  • Key insight: Prolog's native nondeterminism maps well to Alloy's declarative constraints
  • Why it matters: Shows that the relational-to-imperative translation is feasible for a useful subset
  • Thesis: people.csail.mit.edu/jnear/papers/jnear_ms.pdf

2.3 aRby (2014) -- Alloy Embedded in Ruby

  • Authors: Milicevic, Efrati, Jackson (MIT CSAIL)
  • What it did: Mixed imperative Ruby + declarative Alloy constraint solving in same program
  • Key insight: You can embed a spec language in an executable host and get both verification and execution
  • Status: Inactive since 2014, 19 GitHub stars
  • GitHub: github.com/sdg-mit/arby

2.4 Squander (2011) -- Alloy Specs as Java Annotations

  • Authors: Milicevic, Rayside, Yessenov, Jackson (MIT)
  • What it did: Java annotations with Alloy-like specs, solved at runtime against live heap via Kodkod/SAT
  • Key insight: Specs can be inline with executable code and enforced at runtime
  • GitHub: github.com/aleksandarmilicevic/squander

2.5 Milicevic PhD Thesis (2015) -- Unifying Framework

  • Title: "Advancing Declarative Programming" (MIT)
  • Covers: aRby, Alloy*, Squander, and SUNNY (a model-based reactive web framework)
  • SUNNY is notable: Domain-specific language with declarative constraints, runtime model checking, online code generation, reactive UI updates. This is the closest historical precedent to what we want to build.
  • PDF: aleksandarmilicevic.github.io/papers/mit15-milicevic-phd.pdf

2.6 Rosette (Active) -- The Spiritual Successor

  • Author: Emina Torlak (UW) -- also created Kodkod, Alloy's SAT backend
  • What it does: Solver-aided Racket programming. Write an interpreter for any DSL in Rosette, get synthesis/verification for free.
  • Key capability: (synthesize #:forall input #:guarantee expr) finds values for program holes satisfying all inputs
  • Why it matters: Demonstrates that the "DSL interpreter in a solver-aided host" pattern can make specs executable
  • Status: Active (v4.1), 688 stars
  • Site: emina.github.io/rosette/

3. The LLM+Verification Frontier (2023-2026)

This is the most active and promising research area. 16 major projects in the last 3 years.

3.1 The Dominant Pattern: Generate-Check-Repair

Nearly all successful systems use the same loop:

LLM generates candidate code
    -> Formal verifier checks it
    -> If fail: error message + counterexample fed back to LLM
    -> Repeat until verified or budget exhausted

3.2 Key Projects (ranked by relevance to our compiler)

ProjectYearTargetSuccess RateKey Innovation
Clover (Stanford)2023Dafny87% acceptance, 0% false positivesTriangulates code/annotations/docstrings for consistency
DafnyPro2026Dafny86% on DafnyBenchDiff-checker + pruner + hint augmentation
AutoVerus (MSR)2024Verus/Rust>90% on 150 tasksMulti-agent mimicking human proof phases
AlphaVerus (CMU)2024Verus/Rust32.9-65.7% verified code gen (75.7% proof-only annotation)Self-improving via iterative translation
SAFE (MSR)2024Verus/Rust43% on VerusBenchSelf-evolving spec + proof synthesis
Laurel (UCSD)2024Dafny56.6% assertion genLocalizes where assertions are needed
VerMCTS (Harvard)2024Dafny/Coq+30% absolute over baselinesVerifier as MCTS heuristic
Baldur (UMass/Google)2023Isabelle/HOL65.7% combinedWhole-proof generation + repair
LMGPA (Northeastern)2025TLA+38-59% on protocolsConstrained decomposition for TLA+ proofs
LLMLift (UC Berkeley)2024DSL transpilation44/45 Spark benchmarksPython as intermediate repr for LLMs
Eudoxus/SPEAC (UC Berkeley)2024UCLID584.8% parse rate"Parent language" design for LLM alignment
SynVer (Purdue)2024C + Rocq/VSTVerified linked lists, BSTsTwo-LLM: coder + prover

3.3 Key Trend: Dafny as the Verification IL

The field is converging on Dafny as the preferred intermediate verification language because:

  • Auto-active verification (specs inline with code, no separate proof scripts)
  • Compiles to C#, Java, Go, JavaScript, Python
  • Growing benchmark ecosystem (DafnyBench)
  • Success rates climbing: ~70% -> 86% in one year (2025-2026)
  • AWS already uses smithy-dafny for SDK verification

3.4 Implications for Our Compiler

The LLM+verifier loop is the key to filling the "business logic" gap. The convention engine handles structural mapping (spec -> HTTP routes, DB schema), but business logic (how to compute a short code, how to validate a URL) needs synthesis. The research shows:

  1. LLMs can reliably generate code that passes formal verification (86%+ on Dafny)
  2. The verifier feedback loop is critical -- raw LLM generation is much worse
  3. Dafny is the right intermediate target -- verified, then compiled to any language
  4. Clover's triangulation approach is especially relevant: generate code + annotations + docstrings, cross-validate all three

4. Spec-First REST Tools: The Structural Side

4.1 Tools That Generate Code from API Specs

ToolInputOutputQualityStars
OpenAPI GeneratorOpenAPI YAML40+ language stubsMixed -- "often doesn't compile" for some targets26.1k
Smithy (AWS)Smithy IDLClient+server SDKs (7+ langs)Production-proven (all AWS SDKs)--
TypeSpec (Microsoft)TypeSpec DSLOpenAPI + JSON Schema + protobufHigh -- enforces consistency5.7k
JHipsterJDLFull-stack CRUD apps (Spring Boot + frontend)Production-ready but CRUD-only--
BallerinaBallerina codeBidirectional OpenAPITight integration, niche language--
gRPC-GatewayProtobuf + annotationsGo reverse-proxy + OpenAPI v2Battle-tested, millions of req/day19.9k

4.2 Key Insights for Our Compiler

  • OpenAPI Generator quality varies wildly -- we should generate OpenAPI as an intermediate artifact but not rely on third-party generators for the final code
  • Smithy's trait system is the best model for extensible API metadata
  • JHipster is the closest existing tool to what we want (spec -> running app), but it's CRUD-only with no behavioral verification
  • Ballerina proves a language can be purpose-built for network services with structural typing and first-class HTTP
  • TypeSpec proves a focused DSL can generate multiple output formats from a single source

4.3 Testing Tools That Verify Implementations Against Specs

ToolTechniqueFindsStatus
SchemathesisProperty-based fuzzing from OpenAPISchema violations, 500s, validation bypassesActive (~4.15), 3.2k stars
RESTler (MSR)Stateful fuzzing with dependency inferenceSecurity bugs, resource leaks, race conditionsActive, 2.9k stars
EvoMasterEvolutionary test generation80 real bugs across 5 servicesActive, 695 stars
DreddSchema validationStructural conformanceArchived Nov 2024
PactConsumer-driven contractsIntegration mismatchesActive, mature

5. Formal Specification Languages: The Behavioral Side

5.1 Languages Ranked by Relevance

LanguageExpressivenessCode GenIndustry UseOur Relevance
DafnyPre/postconditions, invariants, terminationC#, Java, Go, JS, PythonAWS (Cedar)Primary target -- verified impl extraction
TLA+Temporal logic, state machines, concurrencyNone (verification only)AWS (S3, DynamoDB, 10 systems per 2015 CACM paper)Spec language inspiration
QuintTLA+ with TS-like syntax, temporal operatorsTraces + verificationBlockchain protocolsSyntax inspiration
AlloyRelational logic, transitive closureNone (analysis only)AT&T, Amazon (limited)Data modeling inspiration
P language (AWS)Communicating state machinesNone (verification only)AWS (S3, DynamoDB, Aurora, EC2)State machine model
Event-BRefinement-based, set theoryJava, C, Ada (via plugins)Safety-critical systemsRefinement concept
VDMPre/postconditions, set theoryJava, C (via Overture)Legacy industrialOperation modeling
F*Dependent types, effectsC (via KaRaMeL)HACL* crypto (Firefox, Linux)Extraction proof

5.2 The P Language Deserves Special Attention

AWS's P language is the closest to our target domain:

  • Specifies systems as communicating state machines (= microservices)
  • Used for S3's strong consistency migration, DynamoDB, MemoryDB, Aurora, EC2, IoT
  • PObserve (2023) validates production logs match P specs post-hoc
  • But: P is verification-only, does not generate service code

5.3 Session Types -- Behavioral Types for Communication

  • Scribble (Imperial College) generates type-safe Java API channels from global protocol specs
  • Session types guarantee freedom from communication errors, deadlocks, livelocks
  • Implementations exist for 16+ languages
  • Problem: REST is stateless request-response; session types assume stateful multi-step sessions
  • Opportunity: Session types could model multi-step API workflows (create -> retrieve -> update -> delete)

6. The Compiler Architecture

Based on all the research, here is the architecture that synthesizes the best ideas from each field.

6.1 Overview

parse SPEC LANGUAGE (DSL)behavioral + structural INTERMEDIATE REPR (IR)operations, types, stateinvariants, transitions CONVENTION ENGINEspec → HTTPspec → DBspec → OpenAPI VERIFICATION ENGINEmodel-checkthe spec itself LLM SYNTHESIS ENGINEgenerate body for eachop via Dafny + CEGIS CODE EMITTERSOpenAPI spec · SQL migrations · Server implClient SDK · Property tests · Dockerfile CONFORMANCE CHECKERschemathesis (structural) + property tests (behavioral)run against generated server to verify end-to-end

6.2 The Five Stages

Stage 1: Spec Language + Parser

A new DSL combining:

  • Alloy-like relational data modeling (sigs, fields, relations)
  • TLA+/Quint-like state transition definitions (pre/post conditions)
  • VDM-like operation modeling (requires/ensures)
  • TypeSpec-like API hints (when the user wants to override conventions)

See Section 7 for the language design.

Stage 2: Convention Engine (the key innovation)

A declarative mapping that bridges the abstraction gap:

state mutation with input  ->  POST /{resource}
state read, no mutation    ->  GET /{resource}/{id}
partial state mutation     ->  PATCH /{resource}/{id}
state deletion             ->  DELETE /{resource}/{id}
collection read            ->  GET /{resource}
requires clause            ->  HTTP 422 + request validation
ensures clause             ->  response schema + property test
fact / invariant           ->  DB constraint + runtime assertion
sig with fields            ->  DB table + ORM model
relation (one-to-many)     ->  foreign key
relation (many-to-many)    ->  junction table

This is what Alchemy did for databases in 2008. We extend it to HTTP + DB + tests.

Stage 3: Verification Engine

Before generating any code, verify the spec itself:

  • Model-check for reachability (can all operations be invoked?)
  • Check for conflicting invariants
  • Verify that postconditions are achievable given preconditions
  • Check for deadlocks in state machine transitions

Tools: Alloy Analyzer for relational checks, or compile to Quint/TLA+ for temporal checks.

Stage 4: LLM Synthesis + CEGIS Loop (for business logic)

For each operation:

  1. Extract the pre/postcondition pair from the spec
  2. Generate a Dafny implementation using an LLM (Clover-style triangulation)
  3. Verify with Dafny verifier
  4. If verification fails, feed error + counterexample back to LLM
  5. Repeat (budget: ~8 iterations, per DafnyPro research showing 86% success)
  6. Compile verified Dafny to target language (Python, Go, Java, etc.)

For simple CRUD operations, skip the LLM -- the convention engine can emit them directly (like JHipster does).

Stage 5: Conformance Testing

Generate a test suite from the spec:

  • Structural tests: Schemathesis-style fuzzing against the generated OpenAPI
  • Behavioral tests: Hypothesis-style stateful tests encoding the spec's pre/postconditions as a state machine model (QuickCheck/Hypothesis RuleBasedStateMachine)
  • Trace validation: If the spec has temporal properties, generate TLA+ traces and validate against running system (MongoDB's approach, Kuppe et al. 2024)

7. The Spec Language Design

7.1 Design Principles

  1. Looks like pseudocode, not math -- target audience is developers, not proof theorists
  2. Structural + behavioral in one file -- don't split across formalisms
  3. Conventions are defaults, overridable -- the convention engine makes decisions you can override
  4. No HTTP/DB/language concepts -- those are compilation targets, not spec concerns
  5. Machine-verifiable -- every statement can be model-checked before code generation

7.2 Language Sketch

service UrlShortener {

  // --- Data Model (Alloy-inspired) ---

  entity ShortCode {
    value: String
    invariant: len(value) >= 6 and len(value) <= 10
    invariant: value matches /^[a-zA-Z0-9]+$/
  }

  entity LongURL {
    value: String
    invariant: isValidURI(value)
  }

  // --- State (the abstract "database") ---

  state {
    store: ShortCode -> lone LongURL    // partial function: each code maps to at most one URL
    created_at: ShortCode -> DateTime   // metadata
  }

  // --- Operations (VDM/Dafny-inspired) ---

  operation Shorten {
    input:   url: LongURL
    output:  code: ShortCode, short_url: String

    requires: isValidURI(url.value)

    ensures:
      code not in pre(store)           // code was fresh
      store'[code] = url               // store updated
      short_url = base_url + "/" + code.value
      #store' = #store + 1             // exactly one entry added
  }

  operation Resolve {
    input:  code: ShortCode
    output: url: LongURL

    requires: code in store             // code must exist

    ensures:
      url = store[code]                // correct lookup
      store' = store                   // state unchanged
  }

  operation Delete {
    input:  code: ShortCode

    requires: code in store

    ensures:
      code not in store'               // removed
      #store' = #store - 1
  }

  // --- Global Invariants ---

  invariant: all c in store | isValidURI(store[c].value)
  invariant: all c in store | c in created_at

  // --- Optional: Override Conventions ---

  conventions {
    Resolve.http_method = "GET"           // default would be GET anyway (no mutation)
    Resolve.http_status_success = 302     // override: redirect instead of 200
    Resolve.http_header "Location" = output.url
  }
}

7.3 What the Convention Engine Produces from This

Spec ElementConvention Output
entity ShortCodeDB column type VARCHAR(10) CHECK(...), Pydantic/Go struct
entity LongURLDB column type TEXT CHECK(...), Pydantic/Go struct
state store: ShortCode -> lone LongURLDB table store(code VARCHAR PK, url TEXT NOT NULL)
operation Shorten (mutates state, has input)POST /shorten
operation Resolve (reads state, no mutation)GET /{code}
operation Delete (removes from state)DELETE /{code}
requires: code in storeHTTP 404 when code not found
requires: isValidURI(url.value)HTTP 422 with validation error
invariant: all c in store | isValidURI(...)DB CHECK constraint + runtime assertion
ensures: code not in pre(store)Property test: after POST, code didn't exist before

7.4 What the LLM Synthesizes

The convention engine handles the structural mapping. The LLM handles:

  • How to generate a short code (the algorithm inside Shorten -- hash? random? counter?)
  • How to construct the short_url (string concatenation with base URL)
  • Any non-trivial computation in operation bodies

The LLM generates these as Dafny functions with the spec's pre/postconditions as Dafny requires/ensures clauses. The Dafny verifier confirms correctness. Then Dafny compiles to the target language.


8. Technical Risks and Mitigations

Risk 1: The Abstraction Gap is Too Wide

Concern: The spec says store' = store + {code -> url} but code needs to decide about transactions, retries, connection pooling, etc.

Mitigation: The convention engine encodes these decisions as infrastructure templates. A Postgres template includes transaction handling. A Redis template includes connection management. The user selects a "deployment profile" (postgres+fastapi, redis+go-chi, etc.) and the templates fill the infrastructure gap. This is what JHipster does successfully.

Risk 2: LLM Synthesis Fails for Complex Operations

Concern: DafnyPro achieves 86% on benchmarks, but real operations may be harder.

Mitigation:

  • For CRUD operations (majority of REST APIs), skip the LLM entirely -- emit directly from conventions
  • For complex operations, fall back to generating a skeleton with a TODO comment
  • The user can always write the body manually; the spec still generates everything else
  • Use the Clover triangulation approach (code + annotations + docstrings cross-validated)

Risk 3: Generated Code Quality is Poor

Concern: openapi-generator "often doesn't compile" for some targets. Dafny's generated code is "not idiomatic" and requires runtime libraries.

Mitigation:

  • Generate to a small number of well-tested targets (Python/FastAPI, Go/chi, TypeScript/Express)
  • Use target-specific post-processing templates (not generic code generation)
  • Each target is hand-tuned, not auto-derived from a meta-generator

Risk 4: The Spec Language is Too Complex to Learn

Concern: TLA+ has high learning curve; Alloy is unfamiliar to most developers.

Mitigation:

  • Design the DSL to look like pseudocode (see Section 7.2)
  • Avoid mathematical notation -- use not in instead of \notin, and instead of \land
  • Provide error messages that explain what the spec means in plain English
  • Leverage LLMs to translate natural language requirements to spec language (Eudoxus approach)

Risk 5: Verification of the Spec Itself is Incomplete

Concern: Model checking is bounded; can't prove properties for all possible states.

Mitigation:

  • For REST services, state spaces are typically manageable (unlike distributed protocols)
  • Use Alloy-style bounded checking for data model properties
  • Use Quint/TLA+ for temporal properties if needed
  • Accept that bounded verification catches most bugs (Amazon's experience: found bugs in every system)

9. Build Plan

Phase 1: Spec Language + Convention Engine (Core)

  • Design and implement the DSL grammar (ANTLR4 via antlr-ng TypeScript target)
  • Implement the convention engine (spec IR -> HTTP mapping, DB schema, OpenAPI)
  • Generate: OpenAPI spec, SQL migrations, server stubs (one target language)
  • Deliverable: Given a spec, produce a running CRUD service with validation
  • Estimated scope: ~2000-3000 lines of code

Phase 2: Verification Engine

  • Integrate spec model-checking (compile to Alloy or Quint for analysis)
  • Verify spec consistency before code generation
  • Deliverable: Catch spec errors (conflicting invariants, unreachable operations) before generating code

Phase 3: Test Generation

  • Generate Schemathesis config from the produced OpenAPI
  • Generate Hypothesis RuleBasedStateMachine tests from spec operations
  • Generate property tests from ensures clauses
  • Deliverable: Run make test and get spec-derived conformance tests

Phase 4: LLM Synthesis Loop

  • Integrate LLM for non-trivial operation bodies
  • Implement Clover-style triangulation (code + Dafny annotations + docstrings)
  • Implement CEGIS feedback loop with Dafny verifier
  • Compile verified Dafny to target language
  • Deliverable: Complex operations are synthesized and verified, not just stubbed

Phase 5: Multi-Target + Polish

  • Add Go, TypeScript targets
  • Add deployment artifacts (Dockerfile, docker-compose, CI config)
  • Add conventions override system
  • Deliverable: Production-quality generated services in 3 languages

10. Key Sources

Foundational (Must-Read)

SourceWhy
Alchemy (FSE 2008)Only prior Alloy-to-implementation compiler
Milicevic PhD Thesis (MIT 2015)Unifying framework for declarative+imperative
Clover (Stanford 2023)Best LLM+verification triangulation approach
DafnyPro (2026)SOTA Dafny verification with LLMs (86%)
AWS Formal Methods (CACM)Industry validation that formal specs work for services
P LanguageAWS's state-machine spec language for services
SchemathesisBest spec-driven API testing tool

Spec Language Design

SourceWhy
Quint Language ReferenceModern TLA+ syntax inspiration
TypeSpecAPI-focused DSL design patterns
Smithy IDLResource+trait-based API modeling
Ballerina SpecNetwork-native type system design
JolieService-oriented language with formal semantics (SOCK calculus)
Dafny ReferenceVerification-aware language design

Code Generation Infrastructure

SourceWhy
Dafny Compilation TargetsMulti-language extraction pipeline
Smithy-DafnySmithy model -> Dafny -> verified code
Xtext DSL FrameworkGrammar-based DSL tooling
JetBrains MPSProjectional DSL workbench
RosetteSolver-aided DSL embedding
Event-B CodeGenRefinement-based code generation plugins

LLM+Verification

SourceWhy
AutoVerus (MSR 2024)Multi-agent proof generation
AlphaVerus (CMU 2024)Self-improving verified code via bootstrapping
Laurel (UCSD 2024)Localizing where proofs are needed
VerMCTS (Harvard 2024)Verifier-guided search over program space
Eudoxus/SPEAC (Berkeley 2024)"Parent language" design for LLM code gen
LMGPA (2025)LLM-guided TLA+ proof automation
CEGNS (Oxford 2020)Foundational CEGIS + neural network paper

Testing + Conformance

SourceWhy
Schemathesis GitHubProperty-based API testing from OpenAPI
RESTler (MSR)Stateful REST API fuzzing
TLA+ Trace Validation (Kuppe 2024)Checking implementations against TLA+ specs
Hypothesis Stateful TestingModel-based testing in Python
Spec ExplorerMicrosoft's pioneering MBT tool (saved 50 person-years)

MDE + Domain Engineering

SourceWhy
JHipster JDLMost mature spec-to-fullstack generator
EMF-REST (arXiv)EMF models to JAX-RS REST APIs
LEMMAMulti-DSL microservice architecture modeling
Context Mapper -> JHipsterDDD bounded contexts to microservices
Session Types CatalogBehavioral types for communication protocols
Scribble Protocol LanguageProtocol-safe code generation from global types

11. What Makes This Project Novel

No existing tool combines all five of:

  1. Formal behavioral specification (pre/postconditions, invariants, state transitions)
  2. Convention-based HTTP/DB mapping (no manual HTTP/SQL/ORM code)
  3. Verified code generation (Dafny-verified business logic)
  4. Spec-derived conformance tests (structural + behavioral + temporal)
  5. Single-pass compilation (spec in, running service out)

Individual pieces exist:

  • JHipster does #2 and #5 (but no behavioral specs)
  • Dafny does #3 (but no REST awareness)
  • Schemathesis does #4 (but only structural, and testing-only)
  • TLA+/Alloy do #1 (but no code generation)

The compiler is the integration -- and that integration is the product.


12. Competitive Positioning

The spec-to-REST compiler occupies a unique position in the landscape. Here is how it compares to the three categories of tools developers might reach for instead.

12.1 vs AI Coding Agents (Claude, ChatGPT, Cursor, Copilot, v0.dev)

AI coding agents can generate a REST service from a natural language prompt. But they provide none of the guarantees that matter for production software:

DimensionAI Agent ("build me a URL shortener")Spec-to-REST Compiler
CorrectnessBest-effort; may have subtle bugs (e.g., race conditions, missing validation)Formally verified: Dafny-checked business logic, model-checked invariants
ReproducibilityDifferent output every time; prompt sensitivityDeterministic: same spec always produces same code
Incremental updatesRegenerate everything or manually patchChange one operation in the spec, regenerate only affected files
Spec as documentationThe prompt is lost; code is the only artifactThe spec is a living, machine-checked document of system behavior
Conformance testingNone; you write your own testsAuto-generated: property tests from postconditions, fuzz tests from OpenAPI, stateful tests from state machine model
Behavioral verificationCannot verify that code matches intended behaviorPre/postconditions are checked by Dafny verifier and enforced by generated tests

AI agents are useful for prototyping and exploration. The spec-to-REST compiler is for when you need to know the service is correct and keep it correct as requirements change.

12.2 vs Low-Code / Instant-API Platforms (Supabase, Hasura, PostgREST)

These tools generate REST or GraphQL APIs directly from a database schema. They handle CRUD well but cannot express behavioral invariants:

DimensionSupabase / Hasura / PostgRESTSpec-to-REST Compiler
Data modelingSQL schema (tables, columns, FKs)Formal entities with invariants, state relations with multiplicities
Business logicCustom functions, triggers, or external codeSpecified as pre/postconditions; verified and synthesized
InvariantsCHECK constraints only (single-row)Cross-entity invariants, temporal properties, state machine transitions
AuthRow-level security policies (SQL)requires: caller.role in {Admin} as formal, verifiable spec constructs
TestingManualAuto-generated from the spec
PortabilityLocked to Postgres (PostgREST, Supabase) or Hasura's runtimeEmits to Python/FastAPI, Go/chi, TypeScript/Express with any SQL backend

Low-code tools are excellent for simple CRUD APIs where the database schema IS the spec. The spec-to-REST compiler is for services with behavioral complexity that goes beyond what database constraints can express.

12.3 vs Spec-First API Generators (OpenAPI Generator, Smithy, TypeSpec, JHipster)

These tools generate code from structural API descriptions but cannot verify behavior:

DimensionOpenAPI Generator / Smithy / JHipsterSpec-to-REST Compiler
Specification scopeStructural only (request/response shapes, status codes)Structural + behavioral (pre/postconditions, invariants, state transitions)
Code qualityStubs or scaffolds; business logic is manualComplete, runnable services with verified business logic
VerificationNone (except JHipster's entity validation)Model-checking of the spec, Dafny verification of generated code
TestingManual (or Schemathesis for structural fuzzing)Structural + behavioral + temporal tests, all auto-generated
Behavioral guaranteesNone; the generated code compiles but may not be correctPostconditions are formally verified before code emission

12.4 The Value Proposition

Formal specs + verified generation = correctness guarantees that no other approach provides.

  • AI agents generate code but cannot verify it behaves correctly.
  • Low-code platforms handle CRUD but cannot express behavioral invariants.
  • Spec-first generators describe structure but not behavior.

The spec-to-REST compiler is the only tool that takes a behavioral specification and produces a running service with machine-checked correctness guarantees. The spec is both the documentation and the source of truth; the generated code is proven to conform to it.

On this page

1. Executive Summary2. Prior Art: What Has Been Tried2.1 Alchemy (2008) -- The Only Direct Predecessor2.2 Imperative Alloy (2010) -- Alloy-to-Prolog2.3 aRby (2014) -- Alloy Embedded in Ruby2.4 Squander (2011) -- Alloy Specs as Java Annotations2.5 Milicevic PhD Thesis (2015) -- Unifying Framework2.6 Rosette (Active) -- The Spiritual Successor3. The LLM+Verification Frontier (2023-2026)3.1 The Dominant Pattern: Generate-Check-Repair3.2 Key Projects (ranked by relevance to our compiler)3.3 Key Trend: Dafny as the Verification IL3.4 Implications for Our Compiler4. Spec-First REST Tools: The Structural Side4.1 Tools That Generate Code from API Specs4.2 Key Insights for Our Compiler4.3 Testing Tools That Verify Implementations Against Specs5. Formal Specification Languages: The Behavioral Side5.1 Languages Ranked by Relevance5.2 The P Language Deserves Special Attention5.3 Session Types -- Behavioral Types for Communication6. The Compiler Architecture6.1 Overview6.2 The Five Stages7. The Spec Language Design7.1 Design Principles7.2 Language Sketch7.3 What the Convention Engine Produces from This7.4 What the LLM Synthesizes8. Technical Risks and MitigationsRisk 1: The Abstraction Gap is Too WideRisk 2: LLM Synthesis Fails for Complex OperationsRisk 3: Generated Code Quality is PoorRisk 4: The Spec Language is Too Complex to LearnRisk 5: Verification of the Spec Itself is Incomplete9. Build PlanPhase 1: Spec Language + Convention Engine (Core)Phase 2: Verification EnginePhase 3: Test GenerationPhase 4: LLM Synthesis LoopPhase 5: Multi-Target + Polish10. Key SourcesFoundational (Must-Read)Spec Language DesignCode Generation InfrastructureLLM+VerificationTesting + ConformanceMDE + Domain Engineering11. What Makes This Project Novel12. Competitive Positioning12.1 vs AI Coding Agents (Claude, ChatGPT, Cursor, Copilot, v0.dev)12.2 vs Low-Code / Instant-API Platforms (Supabase, Hasura, PostgREST)12.3 vs Spec-First API Generators (OpenAPI Generator, Smithy, TypeSpec, JHipster)12.4 The Value Proposition