spec_to_rest
Research

Parser Reuse Analysis

Edit on GitHub

Evaluating reuse of existing language parsers vs building new

Last updated:

Research document evaluating whether the spec-to-REST compiler should reuse or extend an existing formal specification language's parser instead of building a new DSL from scratch. Covers Alloy 6, Quint, Dafny, TypeSpec, Smithy, and Ballerina with detailed architectural analysis and a final recommendation.


Table of contents

  1. What Our Spec Language Needs
  2. Option 1: Extend Alloy 6
  3. Option 2: Extend Quint
  4. Option 3: Use Dafny's Parser
  5. Option 4: Use TypeSpec's Parser
  6. Option 5: Use Smithy's Parser
  7. Option 6: Use Ballerina
  8. Comparative Matrix
  9. Recommendation

1. What our spec language needs

From the grammar design in 01_spec_language_design.md, our DSL requires these core constructs that any reused parser must handle or be extended to handle:

ConstructDescriptionExample
Entity declarationsTyped fields with multiplicities and inline constraintsentity User { name: String where len(name) >= 1 }
State declarationsMutable service state as typed collectionsstate { users: Set[User] }
Operations with contractsInput/output + requires/ensures clausesoperation Create { requires: ... ensures: ... }
Invariants and factsGlobal constraints that must always holdinvariant: all u in users => len(u.email) > 0
Transition declarationsEntity lifecycle state machinesActive -> Suspended via Suspend when ...
Convention blocksHTTP mapping overridesconventions { Resolve.http_method = "GET" }
Multiplicity annotationsone, lone, some, set on relationsowner: one User
Quantified expressionsall, exists, no with set comprehensionall u in users => u.email != ""
Primed variablesAfter-state referencesusers' = users + {newUser}
Relational typesTyped relations with multiplicitymapping: ShortCode -> lone LongURL
Refinement typesTypes with constraint predicatestype Email = String where matches(email_regex)
Cardinality operator# for collection sizes#users' = #users + 1
Set/Map/Seq collectionsParameterized collection typesSet[User], Map[String, Int], Seq[Event]
Enum declarationsFinite value types for statesenum Status { Active, Suspended, Deleted }

The key insight from the prior survey: no existing language provides all of these. Our DSL uniquely combines Alloy's relational modeling and multiplicities, VDM/Dafny's pre/postcondition syntax, Quint's developer-friendly syntax, and TypeSpec/Smithy's HTTP override capability.


2. Option 1: Extend Alloy 6

Architecture overview

  • Language. Java (90.1%), C++ (5.0% for SAT solver natives)
  • Parser technology. CUP (Java-based LALR parser generator) + JFlex lexer
    • Grammar file: org.alloytools.alloy.core/parser/Alloy.cup
    • Lexer file: org.alloytools.alloy.core/parser/Alloy.lex
  • License. Apache 2.0 (migrated from MIT)
  • Latest release. Alloy 6.2.0 (January 2025)
  • Repository: AlloyTools/org.alloytools.alloy

Parser accessibility

The Alloy jar is explicitly designed to be used as an embeddable library. The CompUtil class provides a clean programmatic API:

// Parse a single module from a string
CompModule module = CompUtil.parseOneModule("sig User { name: one String }");

// Parse a complete specification with all submodules
CompModule module = CompUtil.parseEverything_fromFile(reporter, cache, "spec.als");

// Parse and typecheck an expression in a module context
Expr expr = CompUtil.parseOneExpression_fromString(module, "all u: User | #u.name > 0");

The CompModule return type provides access to all signatures (getAllReachableSigs()), facts, predicates, assertions, and commands.

What Alloy already handles (of our 14 constructs)

Our ConstructAlloy EquivalentCoverage
Entity declarationssig with fieldsFull, sigs are entities with typed fields
Multiplicity annotationsone, lone, some, setFull, native to the language
Relational typesFirst-class relationsFull, Alloy's core strength
Quantified expressionsall, some, noFull, native quantifiers
Primed variablesAlloy 6 var + primeFull, temporal extension
Invariants/factsfact, assertFull, native constructs
Cardinality operator#Full, native operator
Enum declarationsenum keywordFull, supported
State declarationsvar sig / var fieldsPartial, no dedicated state block
Operations with contractspred with pre/post commentsPartial, no requires/ensures keywords
Refinement typesFact constraints on sigsPartial, no inline where syntax
Set/Map/Seq typesRelations (sets native, maps via relations)Partial, no Map[K,V] or Seq[T] syntax
Transition declarationsCan be modeled with pred chainsNone, no dedicated syntax
Convention blocksNot applicableNone, no HTTP concept

Coverage: ~8.5 of 14 constructs (61%)

What would need to be added/modified

  1. operation blocks with requires/ensures, Alloy has predicates but no structured operation declaration. Would need new grammar productions in Alloy.cup for operation, input, output, requires, ensures.

  2. state blocks, Alloy 6 has var fields but no dedicated state declaration block. Minor syntactic addition.

  3. transition declarations, Entirely new. No Alloy equivalent for declarative state machine transitions with via and when clauses.

  4. conventions blocks, Entirely new. HTTP mapping is a foreign concept.

  5. Map[K,V] and Seq[T] type syntax, Alloy uses relations for everything. Would need to add parameterized collection types on top of Alloy's relational foundation.

  6. Refinement types with where, Alloy constrains via facts, rather than inline on type declarations. Would need new syntax.

Integration effort: HIGH

  • CUP is an older parser generator (LALR), less flexible than ANTLR or PEG for adding new productions. Grammar conflicts are likely when adding keyword-heavy blocks alongside Alloy's existing syntax.
  • The Alloy AST (Expr, Sig, Func hierarchy) is tightly coupled to the analyzer and SAT solver. Adding new node types requires changes deep in the compiler pipeline.
  • The Java ecosystem locks us into JVM deployment, which may conflict with desired target platforms.
  • Alloy's type system is relational (everything is a relation of atoms), which does not map cleanly to REST data types like String, DateTime, Map[K,V].

What we gain

  • Bounded model checking via Kodkod/SAT for free, can verify invariants.
  • Temporal operators (always, eventually) for state property verification.
  • Mature relational reasoning (transitive closure, join, etc.).
  • Academic credibility and a well-understood formal foundation.

What we lose

  • Syntax familiarity: sig/fact/pred vocabulary is alien to most developers. The comparative survey rates Alloy syntax as "Medium" friendliness.
  • String handling. Alloy's string support is extremely weak. It cannot reason about string operations, regex matching, or URL validation, critical for REST services.
  • Scalability. Bounded model checking means Alloy cannot prove properties for all sizes. The "small scope hypothesis" is useful but not a guarantee.
  • No code generation pipeline. Alloy is purely analytical. It finds counterexamples but cannot generate implementations.
  • Heavyweight dependency. The Alloy jar bundles SAT solvers, Kodkod, and a GUI application. Extracting just the parser is possible but messy.

The impedance mismatch between Alloy's relational atom-based universe and REST service types is too large. Adding HTTP concepts, structured operations, and developer-friendly syntax to Alloy would effectively mean rewriting most of the parser and building a new AST on top, gaining only the bounded checking backend, which we can invoke separately without coupling to Alloy's parser.


3. Option 2: Extend Quint

Architecture overview

  • Language. TypeScript (67.1%), Rust (16.3%)
  • Parser technology. ANTLR 4 (generates TypeScript parser from .g4 grammar)
  • License. Apache 2.0
  • Repository: informalsystems/quint
  • Ecosystem. VSCode extension, LSP server (npm: @informalsystems/quint-language-server), REPL, Apalache integration for model checking

Parser architecture

Quint's compiler pipeline has clearly separated phases:

  1. Parsing. ANTLR 4 grammar -> ANTLR parse tree -> Quint IR (JSON-serializable)
  2. Name resolution. Resolves identifiers to their declarations
  3. Type checking. Constraint-based type inference with unification
  4. Effect checking. Tracks read/write effects on state variables
  5. Simulation. Random execution for rapid testing
  6. Model checking. Translation to TLA+ for Apalache SMT-based checking

The IR uses a visitor pattern with well-documented ADRs (Architecture Decision Records). The IR is JSON-serializable, meaning external tools can consume it easily.

What Quint already handles

Our ConstructQuint EquivalentCoverage
State declarationsvar x: intFull, native state variables
Quantified expressionsall, existsFull, native quantifiers
Primed variablesx' = exprFull, core of the language
Invariants/factsval inv = ... / temporal propertiesFull, native
Enum declarationsSum types: type Status = Active | SuspendedFull, discriminated unions
Cardinality operatorsize() methodFull, via standard library
Operations with contractsaction with boolean pre/postconditionsPartial, actions can encode pre/post but no requires/ensures keywords
Entity declarationsNo direct equivalentPartial, can use type + record, but no multiplicity annotations
Set/Map/Seq typesSet[T], Map[K,V], List[T]Partial, sets and maps native, but List not Seq
Refinement typesNo inline constraint syntaxNone, types are not refinable
Multiplicity annotationsNot applicableNone, no relational multiplicities
Relational typesNot applicableNone, no first-class relations
Transition declarationsCan be encoded in actionsNone, no declarative state machine syntax
Convention blocksNot applicableNone, no HTTP concept

Coverage: ~6.5 of 14 constructs (46%)

What would need to be added/modified

  1. entity declarations with fields and multiplicities, Quint has record types but no entity keyword, no field-level multiplicity, and no inline where constraints. Would need new ANTLR grammar rules.

  2. service top-level block, Quint uses module. Would need a new top-level construct or treat service as a special module.

  3. operation blocks with requires/ensures, Quint actions are boolean expressions that update state. They can encode pre/postconditions but lack structured input/output declarations and explicit contract keywords.

  4. transition declarations, Entirely new. No Quint equivalent.

  5. conventions blocks, Entirely new.

  6. Multiplicity annotations (one, lone, some, set), Foreign to Quint's type system. Would require extending the type checker significantly.

  7. Relational types (A -> lone B), Quint uses Map[A, B] or Set[A], rather than relations with multiplicity. Fundamental type system difference.

  8. Refinement types, Would need where clause support on type aliases.

Integration effort: MEDIUM-HIGH

  • ANTLR 4 grammar is straightforward to extend with new productions. Grammar files are typically 200-500 lines; adding new rules is well-documented.
  • The IR visitor architecture (documented in ADR003) provides a clean extension point. New node types can be added to the IR and handled by visitors.
  • TypeScript codebase is accessible and well-structured. The compiler phases are cleanly separated (unlike Alloy's monolithic parser-to-SAT pipeline).
  • However, Quint's type system fundamentally lacks relations and multiplicities. Adding them would mean extending the type checker, effect checker, and all downstream consumers, a substantial change.
  • Quint's value proposition is TLA+-based temporal reasoning. Our DSL uses temporal concepts minimally (mostly for invariants). Much of Quint's complexity (non-determinism, temporal formulas, Apalache translation) would be dead weight.

What we gain

  • ANTLR grammar we can extend, saves parser bootstrapping effort.
  • TypeScript ecosystem, VSCode extension, LSP server, REPL.
  • Simulation engine, random execution for rapid spec testing.
  • Apalache backend, SMT-based model checking for invariant verification.
  • Developer-friendly syntax, rated "High" in comparative survey.
  • Active development, regular releases, npm packages, growing community.

What we lose

  • Relational modeling, Quint's set/map types cannot express Alloy-style relations with multiplicities. This is our DSL's most distinctive feature.
  • Entity declarations, Would need to bolt on an entirely foreign concept.
  • HTTP mapping, Still needs to be built from scratch.
  • Fork maintenance burden, Forking Quint means maintaining a divergent codebase against upstream changes. Quint is actively evolving, so merge conflicts would be constant.
  • No code generation, Quint is a specification language, rather than a generator. The implementation pipeline must still be built.

Verdict: INTERESTING BUT POOR FIT

Quint is the closest existing language to our needs in terms of state management and developer ergonomics. But the lack of relational types and multiplicities, the core of our entity modeling -- means we would need to add so much that the result would be a different language wearing Quint's skin. The fork maintenance burden is the killer: we would get Quint's tooling for free initially but pay for it indefinitely in merge conflicts.


4. Option 3: Use dafny's parser

Architecture overview

  • Language. C# (56.9%), Dafny (33.0%), plus Rust, Java, Go, F#
  • Parser technology. Hand-written recursive descent parser in C#
  • License. MIT
  • Latest releases. Active development with frequent releases (Dafny 4.x series)
  • Repository: dafny-lang/dafny
  • Verification backend. Boogie -> Z3 SMT solver

Parser architecture

Dafny's parser is a hand-written recursive descent parser in C#, tightly integrated with the compiler pipeline. The compiler phases are:

  1. Parsing -> AST construction
  2. Name resolution and type checking
  3. Translation to Boogie (intermediate verification language)
  4. Boogie generates verification conditions
  5. Z3 SMT solver discharges proof obligations
  6. Code generation to C#, Java, Go, JavaScript, Python

What Dafny already handles

Our ConstructDafny EquivalentCoverage
Operations with contractsmethod with requires/ensuresFull, Dafny's primary purpose
Invariants/factsinvariant in loops, class invariantsFull, native
Quantified expressionsforall, existsFull, native
Enum declarationsdatatype with constructorsFull, algebraic data types
Set/Map/Seq typesset<T>, map<K,V>, seq<T>Full, native collection types
Refinement typesSubset types with | constraintFull, type Pos = x: int | x > 0
Cardinality operator|s| for collection sizeFull, native
Primed variablesold() for pre-state referencePartial, old() refers to pre-state, not x' for post-state
Entity declarationsclass with fieldsPartial, classes but no multiplicities
State declarationsClass fieldsPartial, no dedicated state block
Multiplicity annotationsNot applicableNone, no relational multiplicities
Relational typesNot applicableNone, no first-class relations
Transition declarationsCan be encoded in methodsNone, no declarative syntax
Convention blocksNot applicableNone, no HTTP concept

Coverage: ~8 of 14 constructs (57%)

What would need to be added/modified

  1. entity with multiplicities, Dafny classes have fields but no multiplicity annotations. Would need new syntax.

  2. service top-level block, Dafny uses module. Would need a new construct.

  3. Relational types, Foreign to Dafny's type system.

  4. transition declarations, Entirely new.

  5. conventions blocks, Entirely new.

  6. Primed variables, Dafny uses old() for pre-state, not x' for post-state. Different paradigm (method-level vs. state-level).

Integration effort: VERY HIGH

  • The parser is hand-written in C#, no grammar file to extend. Every new construct requires manually coding parser functions, AST nodes, and all downstream handling.
  • Dafny's compiler is deeply integrated: parser, resolver, translator, and verifier are tightly coupled. Adding new AST node types propagates changes through the entire pipeline to Boogie translation.
  • C# ecosystem may conflict with deployment targets.
  • Dafny is a substantial project (~2M lines). Understanding the codebase enough to safely extend the parser is a multi-month investment.
  • The smithy-dafny project demonstrates that it is feasible to generate Dafny from external models, but not to extend Dafny's parser itself.

What we gain

  • Proven verification, Z3/Boogie backend provides mathematical proof of correctness, rather than just bounded checking.
  • Code generation, Dafny compiles to C#, Java, Go, JS, Python.
  • requires/ensures syntax, Exactly what we want for operation contracts.
  • Mature type system, Subset types, generic collections, algebraic data types.
  • smithy-dafny precedent, AWS has already mapped Smithy constraint traits to Dafny specifications, proving the Smithy-to-Dafny pipeline works.

What we lose

  • Relational modeling, Dafny thinks in terms of classes, methods, and mathematical types. No first-class relations.
  • Multiplicities, Cannot express one, lone, some, set on fields.
  • Developer ergonomics, Dafny syntax is rated "High" for developers familiar with verification, but the verification concepts (loop invariants, decreases clauses, ghost variables) are intimidating for REST API developers.
  • Implementation body requirement, Dafny requires the programmer to write the method body, not just the contract. For our use case, the compiler should synthesize the body from the contract.

Verdict: USE AS BACKEND, NOT AS PARSER

Dafny is extremely valuable as a verification backend (the smithy-dafny pattern: generate Dafny from our IR, verify with Z3, then generate target code). But extending Dafny's hand-written C# parser to accommodate our entity/state/ convention syntax is the wrong approach. The parser is not designed for extension, and the impedance mismatch on relational types and multiplicities is too large.


5. Option 4: Use typespec's parser

Architecture overview

  • Language. TypeScript/JavaScript (primary), Java (58.3% in repo includes Java emitter code), C#
  • Parser technology. Custom recursive descent parser in TypeScript
  • License. MIT
  • Stars: 5.7k GitHub stars
  • Repository: microsoft/typespec
  • Extension model. Libraries (npm packages) with decorators and emitters

Extension architecture

TypeSpec has the most mature extension model of all candidates:

  1. Custom decorators. Define @myDecorator in TypeSpec, implement $myDecorator in JavaScript. Decorators store metadata via context.program.stateMap.

  2. Custom emitters. TypeSpec emitters traverse the compiled type model using navigateProgram(), navigateType(), listServices() and generate arbitrary output.

  3. Typekits. Provide convenient compiler API access for examining type relationships and extracting decorator metadata.

  4. Library packages. Decorators, types, and emitters are packaged as npm libraries that can be imported.

What TypeSpec already handles

Our ConstructTypeSpec EquivalentCoverage
Entity declarationsmodelPartial, models define shapes but no multiplicities, no inline constraints
Enum declarationsenumFull, native
Set/Map/Seq typesArray types, Record<K,V>Partial, arrays but not Set or Seq semantics
Convention blocks@route, @get, @post, @query, @pathFull, native HTTP decorators
Operations with contractsopPartial, operations define signatures but NO pre/postconditions
Refinement types@minLength, @maxLength, @pattern decoratorsPartial, structural constraints only
State declarationsNot applicableNone, TypeSpec is stateless
Invariants/factsNot applicableNone, no constraint language
Quantified expressionsNot applicableNone, no expression language
Primed variablesNot applicableNone, no state concept
Multiplicity annotationsNot applicableNone, no relational multiplicities
Relational typesNot applicableNone, no first-class relations
Transition declarationsNot applicableNone, no state machines
Cardinality operatorNot applicableNone, no expression language

Coverage: ~3.5 of 14 constructs (25%)

What would need to be added/modified

Nearly everything behavioral:

  1. An entire expression language, TypeSpec has no expression language at all. It describes shapes, rather than behavior. Adding requires, ensures, quantified expressions, arithmetic, set operations, primed variables would require building a complete expression parser and evaluator.

  2. State concept, TypeSpec is fundamentally stateless. Adding mutable state variables requires a paradigm shift in the type model.

  3. Invariants, TypeSpec has no concept of global constraints.

  4. Multiplicity annotations, Foreign to TypeSpec's type system.

  5. Relational types, Not expressible.

  6. Transition declarations, Entirely new.

Could we use custom decorators instead?

One might imagine encoding behavioral specs as decorators:

@requires("code not in store.keys()")
@ensures("store'[code] = url")
@ensures("#store' = #store + 1")
op shorten(@body request: ShortenRequest): ShortenResponse;

This approach fails because:

  • Decorator values are opaque strings, TypeSpec cannot parse, typecheck, or verify them. They are just metadata.
  • No expression resolution: store, code, #store' have no meaning in TypeSpec's type system.
  • We would need to build our own expression parser to interpret these strings, at which point we are building a new parser anyway, just with TypeSpec as an awkward wrapper.

Integration effort: VERY HIGH (for behavioral specs)

  • TypeSpec's extension model is excellent for structural extensions (new decorators on shapes, custom emitters that generate different output formats).
  • But adding a behavioral specification language to a fundamentally structural tool requires building all the hard parts from scratch: expression parser, type checker for expressions, state model, verification semantics.
  • The TypeSpec parser itself is not designed to be extended with new syntax productions, only with new decorators and types.

What we gain

  • HTTP modeling, TypeSpec's @typespec/http and @typespec/rest packages provide the best HTTP API modeling of any option.
  • Emitter ecosystem, OpenAPI 3, JSON Schema, Protobuf emitters exist.
  • Developer experience, VSCode extension, playground, excellent docs.
  • Active community, 5.7k stars, Microsoft backing, Linux Foundation.
  • npm ecosystem, Easy distribution as packages.

What we lose

  • All behavioral specification, Must be built from scratch.
  • Verification, TypeSpec has no verification capability.
  • The core value proposition, Our compiler's differentiator is behavioral specification. TypeSpec provides exactly the part we do not need to innovate on (structural API modeling), while providing nothing for the part we do.

Verdict: WRONG DIRECTION

TypeSpec solves the opposite problem. It excels at structural API description but has zero behavioral specification capability. Extending it to add behavior would mean building our entire specification language as strings inside decorators or fundamentally restructuring TypeSpec's parser -- either way, we get almost no benefit from starting with TypeSpec.

However, TypeSpec is an excellent output target: our compiler could emit TypeSpec (or OpenAPI via TypeSpec) as part of the code generation pipeline.


6. Option 5: Use smithy's parser

Architecture overview

  • Language. Java
  • Parser technology. Hand-written parser for Smithy IDL
  • License. Apache 2.0
  • Repository: smithy-lang/smithy
  • Extension model. Custom traits (shapes), validators (Java SPI), code generators (plugins)

Semantic model and extension architecture

Smithy has a sophisticated, well-documented extension model:

  1. Custom traits. Define new trait shapes in Smithy IDL, implement in Java with TraitService (discovered via Java SPI). Traits carry metadata that code generators can consume.

  2. Validators. Implement the Validator interface, register via SPI. Run automatically during model validation.

  3. Knowledge indexes. Pre-computed caches on the Model object: TopDownIndex, HttpBindingIndex, PaginatedIndex, etc.

  4. Model transformers: ModelTransformer for preprocessing (flatten mixins, prune shapes, copy error shapes).

  5. Code generators: ShapeVisitor pattern for type-specific dispatch.

What Smithy already handles

Our ConstructSmithy EquivalentCoverage
Entity declarationsresource + structure shapesPartial, resources with identifiers/properties, but no multiplicities
Enum declarationsenum shapeFull, native
Convention blocks@http, @readonly, @idempotent traitsFull, native HTTP binding
Operations with contractsoperation with input/output shapesPartial, structural only, no pre/postconditions
Refinement types@pattern, @length, @range constraintsPartial, structural constraints only
Set/Map/Seq typeslist, map shapesPartial, list and map but not set semantics
State declarationsNot applicableNone, Smithy is stateless
Invariants/factsNot applicableNone, no constraint language
Quantified expressionsNot applicableNone, no expression language
Primed variablesNot applicableNone, no state concept
Multiplicity annotations@required onlyMinimal, binary required/optional only
Relational typesNot applicableNone, no first-class relations
Transition declarationsNot applicableNone, no state machines
Cardinality operatorNot applicableNone, no expression language

Coverage: ~4 of 14 constructs (29%)

The smithy-dafny precedent

AWS's smithy-dafny project is the most relevant precedent. Its pipeline:

  1. Write a Smithy model describing a service API
  2. smithy-dafny-codegen-cli generates a Dafny API skeleton
  3. Developer writes Dafny implementation with requires/ensures
  4. Dafny verifier proves correctness
  5. smithy-dafny-codegen-cli generates glue code for C#, Java, JS, Go, Python

Key insight: smithy-dafny maps Smithy constraint traits (@pattern, @length, @range, @required) to Dafny specifications that are statically checked. This proves the Smithy -> Dafny verification pipeline works.

But. In smithy-dafny, the developer still writes the Dafny implementation body. Our compiler needs to synthesize the body from the specification. And Smithy's constraints are purely structural, no behavioral pre/postconditions.

Could we add pre/postcondition traits?

We could define custom Smithy traits:

@trait(selector: "operation")
structure requires {
    conditions: ConditionList
}

@trait(selector: "operation")
structure ensures {
    conditions: ConditionList
}

list ConditionList {
    member: String
}

Applied:

@requires(conditions: ["code not in store.keys()"])
@ensures(conditions: ["store'[code] = url", "#store' = #store + 1"])
operation Shorten { ... }

This has the same fundamental problem as TypeSpec decorators: the condition strings are opaque. Smithy cannot parse, typecheck, or verify them. We would need to build a complete expression parser to interpret these strings, and Smithy's Java-based validator SPI would need to invoke our custom parser/checker.

Integration effort: HIGH

  • Custom traits are easy to define in Smithy IDL.
  • But traits with opaque expression strings gain nothing from Smithy's type system.
  • Building expression parsing, type checking, and verification inside Smithy's Java validator framework means writing our entire specification engine in Java, constrained by Smithy's validation lifecycle.
  • Smithy's parser is hand-written and not designed for syntax extensions.

What we gain

  • Service/resource/operation modeling, Smithy's resource lifecycle (CRUD) maps well to REST.
  • Code generation ecosystem, smithy-codegen generates clients/servers for 7+ languages.
  • AWS battle-testing, Used for all AWS services.
  • smithy-dafny pipeline, Proven Smithy -> Dafny verification path.
  • Trait extensibility, Can add custom metadata to any shape.

What we lose

  • All behavioral specification, Must be built as opaque strings inside traits.
  • Expression language, No benefit from Smithy's parser.
  • State modeling, Smithy is stateless.
  • Java lock-in, The entire Smithy ecosystem is Java-based.

Verdict: VALUABLE AS PIPELINE COMPONENT, NOT AS PARSER BASE

Like TypeSpec, Smithy excels at structural service modeling but provides nothing for behavioral specification. The smithy-dafny precedent is architecturally instructive: it demonstrates that the right approach is to generate Dafny from a service model, rather than to cram specifications into service model annotations.

The optimal use of Smithy in our pipeline would be as an output format: our compiler generates Smithy models (for SDK/client generation) alongside the implementation code.


7. Option 6: Use ballerina

Architecture overview

  • Language. Java (23.4% compiler), Ballerina (76.1% standard library)
  • Parser technology. Originally ANTLR-generated, now custom hand-written parser in Java
  • License. Apache 2.0
  • Stars: 3.8k GitHub
  • Repository: ballerina-platform/ballerina-lang
  • Latest release. Swan Lake 2201.13.2 (March 2026)

Compiler architecture

Ballerina has a three-phase compiler:

  1. Front-end (Java). Lexer -> Parser -> AST -> Symbol creation -> Type checking (structural subtyping) -> Desugaring -> BIR generation
  2. Optimizer. Analysis and transformation on BIR (Ballerina Intermediate Representation, a CFG)
  3. Back-end. JVM bytecode generation (primary), LLVM native (experimental), WebAssembly (planned)

The compiler supports a plugin interface (CompilerPlugin.java) that processes annotations on service nodes, enabling custom validation and code generation.

What ballerina already handles

Our ConstructBallerina EquivalentCoverage
Entity declarationstype records with structural typingPartial, records but no multiplicities
Enum declarationsenum typeFull, native
Convention blocksNative HTTP service declarationFull, first-class HTTP
Operations with contractsresource function with HTTP bindingsPartial, operations but no formal contracts
Set/Map/Seq typesmap<V>, arrays, tablesPartial, maps and arrays but not Set
Refinement typesNot applicableNone, no constraint types
State declarationsService-level variablesPartial, mutable state in services
Invariants/factsNot applicableNone, no constraint language
Quantified expressionsNot applicableNone, no quantifiers (it is a programming language)
Primed variablesNot applicableNone, imperative assignment, rather than spec
Multiplicity annotations? for optionalMinimal, optional/required only
Relational typesNot applicableNone, no first-class relations
Transition declarationsNot applicableNone, no state machines
Cardinality operator.length() methodPartial, method, rather than operator

Coverage: ~4.5 of 14 constructs (32%)

The fundamental mismatch

Ballerina is an implementation language, rather than a specification language. This is a category error. Our compiler needs a language to describe what a service should do (declarative specification), not how it does it (imperative implementation).

Using Ballerina would mean:

  • Writing REST services directly in Ballerina (eliminating the need for our compiler entirely, but also eliminating all formal specification benefits)
  • OR embedding a specification sublanguage in Ballerina (which means building our parser inside Ballerina's ecosystem)

Integration effort: VERY HIGH

  • Ballerina's parser is hand-written in Java and deeply coupled to its compilation pipeline.
  • Adding specification constructs (quantifiers, pre/postconditions, invariants, multiplicities) to an imperative language requires fundamental changes to the type checker and semantic analysis.
  • The compiler plugin mechanism is designed for annotation processing, rather than syntax extension.
  • 127k+ commits across the master branch means the codebase is massive.

What we gain

  • First-class HTTP, Best HTTP support of any option. resource function get users() returns User[] is native syntax.
  • Structural typing, Type compatibility by structure, rather than name.
  • Sequence diagrams, Ballerina can visualize service interactions as sequence diagrams.
  • JVM deployment, Compiles to JVM bytecode.
  • Active community, WSO2 backing, regular releases.

What we lose

  • Everything that makes our project distinctive, Formal specification, behavioral verification, contract-driven synthesis.
  • Declarative semantics, Ballerina is imperative. Our specs must be declarative.
  • The entire compiler value proposition, If developers write Ballerina, they do not need our tool.

Verdict: NOT APPLICABLE

Ballerina solves a different problem (implementation) than ours (specification). However, Ballerina is an excellent compilation target: our compiler could emit Ballerina service code as one of its output formats, leveraging Ballerina's native HTTP support.


8. Comparative matrix

Coverage of our 14 required constructs

ConstructAlloy 6QuintDafnyTypeSpecSmithyBallerina
Entity declarationsFullPartialPartialPartialPartialPartial
State declarationsPartialFullPartialNoneNonePartial
Operations w/ contractsPartialPartialFullPartialPartialPartial
Invariants/factsFullFullFullNoneNoneNone
Transition declarationsNoneNoneNoneNoneNoneNone
Convention blocksNoneNoneNoneFullFullFull
Multiplicity annotationsFullNoneNoneNoneMinimalMinimal
Quantified expressionsFullFullFullNoneNoneNone
Primed variablesFullFullPartialNoneNoneNone
Relational typesFullNoneNoneNoneNoneNone
Refinement typesPartialNoneFullPartialPartialNone
Cardinality operatorFullFullFullNoneNonePartial
Set/Map/Seq typesPartialPartialFullPartialPartialPartial
Enum declarationsFullFullFullFullFullFull

Summary scores

CriterionAlloy 6QuintDafnyTypeSpecSmithyBallerina
Constructs covered8.5/14 (61%)6.5/14 (46%)8/14 (57%)3.5/14 (25%)4/14 (29%)4.5/14 (32%)
Integration effortHighMed-HighVery HighVery HighHighVery High
Parser extensibilityLow (CUP)Medium (ANTLR)Low (hand-written C#)Low (hand-written TS)Low (hand-written Java)Low (hand-written Java)
Ecosystem valueLow (academic)MediumHighHighHighMedium
LicenseApache 2.0Apache 2.0MITMITApache 2.0Apache 2.0
Fork maintenance burdenHighHighVery HighVery HighHighVery High
Verification backendKodkod/SATApalache/SMTZ3/BoogieNoneNoneNone
Code generationNoneNoneC#/Java/Go/JS/PyVia emittersVia smithy-codegenJVM bytecode
HTTP awarenessNoneNoneNoneExcellentExcellentExcellent

The core tension

The analysis reveals a fundamental split:

  • Languages strong on behavior (Alloy, Quint, Dafny) have no HTTP awareness and weak/no code generation to services.
  • Languages strong on HTTP/API modeling (TypeSpec, Smithy, Ballerina) have no behavioral specification capability.

No existing language bridges this gap, which is exactly why our project exists.


9. Recommendation

The verdict: BUILD a NEW PARSER

Build a new parser from scratch, but strategically reuse existing tools as pipeline components rather than as parser bases.

Why not extend an existing parser

  1. No language covers more than 61% of our constructs. The best candidate (Alloy at 61%) still requires adding 5-6 entirely new construct types plus modifying 2-3 existing ones.

  2. The missing constructs are not additive, they are foundational. Adding HTTP convention blocks to Alloy is not like adding a new decorator; it requires rethinking what the language is for. Similarly, adding behavioral specifications to TypeSpec/Smithy requires building an entire expression language from scratch.

  3. Fork maintenance is a project killer. Every candidate is under active development. Forking creates a permanent maintenance obligation to merge upstream changes, which will inevitably conflict with our extensions. Quint, TypeSpec, and Dafny release frequently and make breaking changes.

  4. Parser technology mismatch. The two most extensible parser technologies (ANTLR for Quint, CUP for Alloy) still require deep changes to the AST, type checker, and all downstream phases. Extending a grammar file is 5% of the work; the other 95% is extending the semantic analysis pipeline.

  5. Language ecosystem lock-in. Alloy locks us into JVM + SAT solvers. Dafny locks us into C# + Boogie/Z3. Quint locks us into TypeScript + Apalache. Building our own parser lets us choose the optimal technology stack.

  6. Impedance mismatch on the core differentiator. Our DSL's unique value is combining relational entity modeling (from Alloy), pre/postcondition contracts (from Dafny/VDM), state transitions, and HTTP conventions in one language. No existing parser handles this combination, and the pieces do not compose: Alloy's relational types are incompatible with Dafny's class-based types, which are incompatible with TypeSpec's shape-based types.

                    OUR NEW PARSER
                    (ANTLR 4 grammar)
                         |
                         v
                    OUR IR (AST)
                    /    |    \
                   /     |     \
                  v      v      v
           Dafny     Smithy    Implementation
           Backend   Export    Code Generator
           (verify)  (SDKs)   (Spring/Express/etc.)
                              |
                              v
                         TypeSpec/OpenAPI
                         (API docs)

Instead of extending one parser, use each existing tool where it excels:

ToolRole in Our PipelineIntegration Point
ANTLR 4Parser generator for our grammarBuild time, generates TypeScript/Java parser from our .g4 grammar
DafnyVerification backendOur IR -> Dafny translation -> Z3 proof discharge
SmithyService model exportOur IR -> Smithy model -> SDK generation via smithy-codegen
TypeSpec/OpenAPIAPI documentationOur IR -> TypeSpec/OpenAPI -> docs, client stubs
Quint/ApalacheAlternative model checkingOur IR -> Quint translation -> Apalache SMT checking
BallerinaOptional compilation targetOur IR -> Ballerina service code (leveraging native HTTP)

Why ANTLR 4 for the new parser

  1. Our grammar is already designed in EBNF (in 01_spec_language_design.md). Converting to ANTLR .g4 format is mechanical.

  2. ANTLR 4 generates TypeScript, Java, Python, C#, Go, and more. This avoids locking into any single ecosystem.

  3. Quint uses ANTLR 4 for exactly the same reasons, it is the standard tool for DSL parsers in 2026.

  4. ANTLR 4's ALL(*) parsing algorithm handles the grammar complexity we need (quantified expressions, operator precedence, primed variables) without the ambiguity issues of LALR parsers like CUP.

  5. Excellent tooling. ANTLR 4 has VSCode extensions, grammar visualization, test rigs, and a large community.

Effort comparison

ApproachEstimated EffortRisk
Extend Alloy 6 parser4-6 months (parser + AST + type checker + drop SAT dependency)High (CUP conflicts, relational type mismatch)
Fork and extend Quint3-5 months (grammar + IR + type checker + ongoing merge conflicts)High (fork maintenance, missing relational types)
Extend Dafny parser6-9 months (hand-written C# parser, massive codebase)Very High (complexity, C# lock-in)
Build new ANTLR parser2-3 months (grammar + AST + type checker, purpose-built)Low (clean design, no legacy constraints)

Building new is faster than extending any existing parser because:

  • No time spent understanding a foreign codebase
  • No time fighting the existing parser's design assumptions
  • No time resolving grammar conflicts with existing productions
  • No ongoing maintenance burden from upstream changes
  • Complete control over the AST shape for our downstream pipeline

What to borrow (without forking)

Even though we build a new parser, we should study and borrow design patterns from these languages:

FromBorrow
QuintANTLR grammar structure, IR visitor pattern (ADR003), JSON-serializable IR, effect checking approach
Dafnyrequires/ensures semantics, refinement type syntax, the concept of "old()" for pre-state
AlloyRelational type semantics, multiplicity constraint checking, bounded analysis algorithms
TypeSpecDecorator/convention syntax patterns, emitter architecture for code generation
SmithyResource/operation lifecycle modeling, trait metadata system, knowledge index pattern
smithy-dafnyThe pipeline pattern: spec model -> Dafny skeleton -> verification -> target code

Bottom line

The research confirms the finding from the prior survey: "No existing language provides all of [our required constructs]." The parser reuse question has a clear answer: building a purpose-built ANTLR 4 parser is faster, lower risk, and more maintainable than extending any existing parser, while strategically reusing Dafny (verification), Smithy (SDK generation), and TypeSpec/OpenAPI (documentation) as pipeline components achieves the benefits of each ecosystem without the costs of parser-level coupling.

Note on the Langium evaluation

After this analysis recommended ANTLR 4, a separate evaluation (09_dsl_compiler_frameworks.md) explored whether Langium, a TypeScript-based language workbench by TypeFox, could replace ANTLR by providing an integrated parser + LSP + AST generation stack. Langium was initially recommended as the primary framework.

A subsequent devil's advocate audit reversed that recommendation and confirmed ANTLR 4 (specifically antlr-ng, the production-ready TypeScript port) as the parser technology. The key reasons:

  1. Langium's advantages were overstated for our use case. Its companion type system library (Typir) cannot handle refinement types, relation types, generics, or quantified expressions, all of which our DSL requires. We must build ~80% of the type checker ourselves regardless of framework choice, eliminating Langium's biggest productivity argument.

  2. ANTLR's community is orders of magnitude larger. ANTLR has 17k+ GitHub stars and thousands of contributors vs. Langium's 985 stars and 22 contributors (mostly TypeFox employees). For a multi-year project, community depth matters for bug fixes, documentation, and long-term support.

  3. No framework lock-in. Langium couples grammar, AST types, scoping, and LSP into a proprietary framework. ANTLR4 generates a parser; everything else is built on our own abstractions, making each component independently replaceable.

  4. Strategic risk. TypeFox's announcement of Fastbelt (a Go-based successor 21-33x faster than Langium) signals that Langium may enter de facto maintenance mode as TypeFox's best engineers shift focus.

The implementation uses antlr-ng targeting TypeScript, with Z3 via native subprocess for performance-critical verification and WASM as a deployment fallback. Langium remains a viable future option if VS Code IDE support with rich completions becomes a top-priority requirement and the framework matures further.


Sources

Alloy 6

Quint

Dafny

TypeSpec

Smithy

Ballerina

Comparative / general

On this page

Table of contents1. What our spec language needs2. Option 1: Extend Alloy 6Architecture overviewParser accessibilityWhat Alloy already handles (of our 14 constructs)What would need to be added/modifiedIntegration effort: HIGHWhat we gainWhat we loseVerdict: NOT RECOMMENDED3. Option 2: Extend QuintArchitecture overviewParser architectureWhat Quint already handlesWhat would need to be added/modifiedIntegration effort: MEDIUM-HIGHWhat we gainWhat we loseVerdict: INTERESTING BUT POOR FIT4. Option 3: Use dafny's parserArchitecture overviewParser architectureWhat Dafny already handlesWhat would need to be added/modifiedIntegration effort: VERY HIGHWhat we gainWhat we loseVerdict: USE AS BACKEND, NOT AS PARSER5. Option 4: Use typespec's parserArchitecture overviewExtension architectureWhat TypeSpec already handlesWhat would need to be added/modifiedCould we use custom decorators instead?Integration effort: VERY HIGH (for behavioral specs)What we gainWhat we loseVerdict: WRONG DIRECTION6. Option 5: Use smithy's parserArchitecture overviewSemantic model and extension architectureWhat Smithy already handlesThe smithy-dafny precedentCould we add pre/postcondition traits?Integration effort: HIGHWhat we gainWhat we loseVerdict: VALUABLE AS PIPELINE COMPONENT, NOT AS PARSER BASE7. Option 6: Use ballerinaArchitecture overviewCompiler architectureWhat ballerina already handlesThe fundamental mismatchIntegration effort: VERY HIGHWhat we gainWhat we loseVerdict: NOT APPLICABLE8. Comparative matrixCoverage of our 14 required constructsSummary scoresThe core tension9. RecommendationThe verdict: BUILD a NEW PARSERWhy not extend an existing parserThe recommended architectureWhy ANTLR 4 for the new parserEffort comparisonWhat to borrow (without forking)Bottom lineNote on the Langium evaluationSourcesAlloy 6QuintDafnyTypeSpecSmithyBallerinaComparative / general