Test Generation
Edit on GitHub--with-tests emits a native behavioral/stateful/structural conformance suite (per target language) + a gated admin router from spec ensures/requires/invariants
Last updated:
spec-to-rest compile --with-tests produces a property-test conformance suite alongside the
generated service, emitted in the target's own language — pytest + Hypothesis +
Schemathesis for python-fastapi-*, Vitest + fast-check for ts-express-*, go test +
rapid for go-chi-* (all nine targets: each language × postgres / sqlite / mysql). Each
ensures clause becomes a positive property test; each <input> in <state> requires clause
becomes a negative test (asserting 4xx); each global invariant becomes a post-operation
check. The examples on this page show the Python (fastapi) rendering; see
Native multi-language emission for how the same
spec-derived logic renders into TypeScript and Go.
Looking for the design rationale and the full vision (structural + behavioral + stateful
layers, conformance runner, mutation testing)? See
Test Generation Pipeline. This page documents what
--with-tests actually delivers today; the nightly mutation-testing CI gate that
keeps the suite honest is documented separately on the
Mutation Testing page.
Quick start
sbt "cli/run compile --with-tests --out /tmp/my-service fixtures/spec/url_shortener.spec"
cd /tmp/my-service
cp .env.example .env
# Full docker-compose pipeline (boot, run all 3 layers, tear down):
make test-conformance-docker PROFILE=smoke
# Or against an already-running service:
ENABLE_TEST_ADMIN=1 uv run uvicorn app.main:app --host 0.0.0.0 --port 8000 &
make test-conformance PROFILE=smokeThe generated docker-compose.yml already forwards ENABLE_TEST_ADMIN from the host
to the app container, so ENABLE_TEST_ADMIN=1 docker compose up works without manual
edits.
The ENABLE_TEST_ADMIN=1 env var is required: tests use a small admin router
(/__test_admin__/reset and /__test_admin__/state) to control state and read it back.
Production deployments must leave this unset; the router returns 403 by default.
Native multi-language emission
The conformance logic (which clauses become tests, the skip algebra, the
state/output scoping, the strategy derivation from IR types) is single-sourced and
language-agnostic. Only the leaf rendering varies, behind the Backend.scala seam in
modules/testgen:
ExprBackend— translates an IR expression to the target language (ExprToPython/TsExprBackend/GoExprBackend).StrategyBackend— renders a spec type into a generator (PythonHypothesisStrategyst.*/TsFastCheckStrategyfc.*/GoRapidStrategyrapid).HarnessTemplates— the language-native scaffold (pytest+conftest / Vitest+_client.ts/go test+conf_*.go), selected byTestBackend.
The Behavioral / Stateful / Structural decision predicates are shared by all
three; the parallel Ts* / Go* emitters reuse them and only swap the rendered
tokens. The Python (fastapi) path is held byte-identical through every slice, so it
serves as the differential oracle: CI runs the Python suite and the native suite against
the same contract, and any behavioral disagreement is a mechanically-caught translator
bug.
| Target | Behavioral / stateful | Structural | Run |
|---|---|---|---|
python-fastapi-* | pytest + Hypothesis | Schemathesis (full OpenAPI fuzzing + spec-derived checks) | python tests/run_conformance.py <profile> / make test-conformance |
ts-express-* | Vitest + fast-check | structural-lite (reuse the Strategies seam: type-valid input per non-stub op, assert no 5xx) | node tests/run_conformance.mjs <profile> / npm test |
go-chi-* | go test + rapid | structural-lite (same) | bash tests/run_conformance.sh <profile> / go test -tags conformance ./tests/... |
Structural-lite (ts/go): schemathesis has no TS/Go peer, so rather than reimplement
an OpenAPI/JSON-Schema fuzzer the structural phase reuses the IR Strategies seam — for
every non-stub operation it generates type-valid inputs, calls with a per-case
/__test_admin__/reset, and asserts the server did not 5xx (schemathesis's core
"documented, no server error" check, minus full schema validation). Fail-loud stubs are
honest-skipped exactly as the Python schemathesis renderer schema.exclude(...)s them.
The /__test_admin__/* reset/state/seed contract is identical across targets; only its
implementation is stack-specific (AdminRouter.scala → FastAPI, AdminRouterTs.scala
→ Express src/routes/testAdmin.ts, AdminRouterGo.scala → Go
internal/testadmin/admin_conformance.go). Go links statically, so there is no
dynamic-import gate: codegen always emits a !conformance no-op testadmin.Register +
main.go wiring, and --with-tests emits the real conformance-tagged router — keep
the test-admin code out of production binaries via the build tag, not a runtime branch.
Each backend's per-target page documents the exact file layout and run commands: python-fastapi · ts-express · go-chi.
What gets generated
The table below is the python-fastapi-* layout; the Vitest and go test
backends emit the analogous set in their own conventions (see the per-target pages and
Native multi-language emission above).
Generated files, full table (python-fastapi)
| File | Source | What it is |
|---|---|---|
app/routers/test_admin.py | AdminRouter.scala | /__test_admin__/reset truncates entity tables; /__test_admin__/state returns spec state as JSON; /__test_admin__/seed/<entity> (per entity that participates in a TransitionDecl) inserts a row directly so transition tests can drive entities into a chosen status. All endpoints 403 unless ENABLE_TEST_ADMIN=1. |
tests/__init__.py | empty | makes tests a Python package |
tests/conftest.py | static template | httpx client + session-scoped fixture that skips the whole suite with a clear message if the service is unreachable or the admin router isn't enabled |
tests/predicates.py | rendered from ir.predicates | _powerset plus one Python helper per spec/preamble predicate (auto-derived from the body) |
tests/strategies.py | Strategies.scala | one def strategy_<type>(): return ... per TypeAliasDecl and EnumDecl, plus one def strategy_<entity>() (returning st.fixed_dictionaries) per entity that participates in a TransitionDecl |
tests/strategies_user.py | static template | empty stub for user-supplied strategy functions referenced from <Type>.strategy = "module:symbol" convention rules. Preserved across re-compile. |
tests/redaction.py | static template | redact(strategy) wrapper + _RedactedStr mask class for sensitive Hypothesis values (M5.8) |
tests/test_behavioral_<service>.py | Behavioral.scala | the property tests themselves |
tests/test_stateful_<service>.py | Stateful.scala | a Hypothesis RuleBasedStateMachine exercising multi-step operation sequences |
tests/test_structural_<service>.py | Structural.scala | Schemathesis-driven structural tests: schema fuzzing + spec-derived custom checks + OpenAPI-Links state machine |
tests/run_conformance.py | static template | three-phase orchestrator (structural behavioral stateful) with JUnit XML output and a unified exit code |
tests/_testgen_skips.json | testgen | machine-readable list of clauses that were not turned into tests, with reasons |
pytest.ini | static template | disables pytest-xdist parallelism (admin-router state is global) |
Three test kinds emitted per operation
1. Positive ensures tests
For each operation whose requires does not reference state, every ensures clause that
the IR-to-Python translator can handle becomes a property test:
@given(url=strategy_long_url())
@settings(max_examples=20, suppress_health_check=[HealthCheck.function_scoped_fixture])
def test_shorten_ensures_0(url):
"""ensures: (code not in pre(store))"""
client.post("/__test_admin__/reset")
pre_state = client.get("/__test_admin__/state").json()
response = client.post("/shorten", json={"url": url})
assume(response.status_code == 201)
assert response.status_code == 201, response.text
response_data = response.json() if response.content else {}
post_state = client.get("/__test_admin__/state").json()
assert ((response_data["code"]) not in (pre_state["store"])), "ensures violated: ensures: (code not in pre(store))"State-dependent preconditions are partially covered. If requires references a state
field (e.g., requires: code in store), positive ensures tests for the generic case are
skipped; driving the system into the precondition-satisfying state requires either calls
to other operations or direct seed-writes. For the transition case (operation referenced
by a via clause in a TransitionDecl), M5.9 closes this gap with dedicated transition
tests (see below). The clause still appears in _testgen_skips.json, but the reason text
flips to state-dependent precondition; covered by transition tests (M5.9) for transition
operations and falls back to ... covered by stateful tests (M5.2) for ops bundled into the state machine ... for the residual non-transition operations (GetTodo, UpdateTodo,
DeleteTodo in todo_list). ExprToPython translates every IR Expr shape, so the
translator coverage gap is closed.
2. Negative <input> in <state> tests
For exactly the <input> in <state> requires pattern, a negative test is emitted that
generates a fresh-state input via Hypothesis, asserts the input isn't in the (empty) state,
calls the operation, and asserts a 4xx response:
@given(code=strategy_short_code())
@settings(max_examples=10, suppress_health_check=[HealthCheck.function_scoped_fixture])
def test_resolve_negative_code_not_in_store(code):
"""requires 'code in store' (negative): missing key returns 4xx."""
client.post("/__test_admin__/reset")
pre_state = client.get("/__test_admin__/state").json()
assume(code not in pre_state.get("store", {}))
response = client.get(f"/{code}")
assert 400 <= response.status_code < 500, f"expected 4xx, got {response.status_code}: {response.text}"A second negative pattern recognises status restrictions on existing entries:
<state>[<input>].<enum-field> = <required-value> (the precondition that the entry,
once it exists, must be in a particular status). When the underlying entity is in a
TransitionDecl (so /__test_admin__/seed/<entity> exists), a negative test is
emitted that seeds the entity in any other enum value of the field, picked by
st.sampled_from(<all-other-values>), and asserts a 4xx response:
@given(row=strategy_order(), wrong_status=st.sampled_from(["DRAFT", "PAID", "SHIPPED", "DELIVERED", "CANCELLED", "RETURNED"]), amount=strategy_money())
@settings(max_examples=10, suppress_health_check=[HealthCheck.function_scoped_fixture])
def test_record_payment_negative_orders_status_not_placed(row, wrong_status, amount):
"""requires 'orders[order_id].status = PLACED' (negative): wrong status returns 4xx."""
client.post("/__test_admin__/reset")
row = dict(row)
row["status"] = wrong_status
seed = client.post("/__test_admin__/seed/order", json=row)
assume(seed.status_code == 201)
seeded_id = seed.json()["id"]
response = client.post(f"/orders/{seeded_id}/payments", json={"amount": amount})
assert 400 <= response.status_code < 500, f"expected 4xx, got {response.status_code}: {response.text}"This covers business-logic illegal-from preconditions on non-via operations (e.g.,
ecommerce AddLineItem requires orders[order_id].status = DRAFT but is not the via
of any TransitionDecl). For via operations the same pattern emits an additional
negative alongside the per-illegal-from transition negatives, they assert the same
qualitative property from different angles.
Other requires shapes (predicate violation, ordering, set-content checks) are tracked
as follow-ups and skipped with reason in _testgen_skips.json.
3. Post-operation invariant checks
For each global invariant, one test per operation that has a non-state-dep requires:
@given(url=strategy_long_url())
@settings(max_examples=20, suppress_health_check=[HealthCheck.function_scoped_fixture])
def test_shorten_invariant_all_ur_ls_valid(url):
"""invariant allURLsValid: (all c in store | isValidURI(store[c]))"""
client.post("/__test_admin__/reset")
pre_state = client.get("/__test_admin__/state").json()
response = client.post("/shorten", json={"url": url})
assume(response.status_code == 201)
response_data = response.json() if response.content else {}
post_state = client.get("/__test_admin__/state").json()
assert all(is_valid_uri(post_state["store"][c]) for c in (post_state["store"])), "invariant violated: invariant allURLsValid: (all c in store | isValidURI(store[c]))"4. Temporal predicates (#86)
temporal X: always(P) and temporal X: eventually(P) declarations participate in
test emission alongside invariants. The always(P) form is structurally an
invariant (it asserts P after every step), so behavioral and stateful tests
emit it the same way an invariant decl is emitted, distinguished only by the
method name (temporal_always_<name> vs invariant_<name>).
eventually(P) is liveness-shaped: "at some point during the trace, P held."
The build-time Alloy proof is authoritative (it shows reachability up to the
configured scope), but at runtime we emit a per-trace witness. In the stateful
state machine:
_reset()initializes a sticky flagself._eventually_seen_<name> = False- An
@invariant()observer evaluatesPafter every step and ORs it into the flag if true - The state machine's
teardown(self)asserts the flag was set
@invariant()
def temporal_eventually_observe_some_user_exists(self):
"""temporal eventually(someUserExists): (some u in users | u.is_active)"""
post_state = client.get("/__test_admin__/state").json()
if any(u["is_active"] for u in post_state["users"].values()):
self._eventually_seen_some_user_exists = True
def teardown(self):
assert self._eventually_seen_some_user_exists, \
"temporal eventually never observed in trace: someUserExists: ..."This is strictly weaker than the build-time Alloy proof: each Hypothesis
case observes a single bounded trace, so a trace too short to reach P will
fail the teardown even on a sound spec. If a fixture's state machine defaults
(max_examples=25, stateful_step_count=20) don't reach P reliably, lengthen
the trace via Hypothesis settings or accept the gap and lean on the verifier.
fairness(op) is parsed but emits a _testgen_skips.json entry only; the
verifier already rejects it (v1 does not implement trace-based verification),
so a runtime stub for an unverifiable construct creates an unreachable code
path. Tracked alongside v2 fairness verification.
always / eventually decls also surface in the generated openapi.yaml as
top-level x-invariant and x-temporal extensions, machine-readable for
documentation generators and downstream tooling. Each x-temporal entry carries
{kind: always|eventually|fairness, expr: <pretty-printed predicate>}.
x-invariant:
usersAreValid: (all u in users | u.is_active)
x-temporal:
someUserExists:
kind: eventually
expr: '(some u in users | u.is_active)'
allUsersAlwaysValid:
kind: always
expr: (all u in users | u.is_active)Transition tests (M5.9)
For specs that declare a state machine via transition X { entity: E; field: f; FROM -> TO via Op ... },
testgen synthesizes per-via-operation behavioral tests that drive the entity directly into
each from status using a generated seed admin endpoint, then exercise the operation:
- Positive test per legal
(from, to)rule. Seed entity infrom, call the via operation, assert 2xx and the entity'sfieldpost-state equalsto. - Negative test per illegal
from(every value of the field's enum that is not afrom-state forvia X). Seed entity in that state, call the via operation, assert 4xx.
Setup uses POST /__test_admin__/seed/<entity>, emitted by AdminRouter.scala for every
entity referenced by a TransitionDecl. The endpoint accepts a JSON dict, parses any
DateTime-typed columns from ISO strings, inserts the row, and returns the new primary
key. Like /reset and /state, it 403s unless ENABLE_TEST_ADMIN=1.
Row strategies come from Strategies.scala's new strategy_<entity>() function, emitted
only for transition entities; non-transition specs (safe_counter, url_shortener)
generate byte-identical strategies.py to before.
@given(row=strategy_todo())
@settings(max_examples=20, suppress_health_check=[HealthCheck.function_scoped_fixture])
def test_start_work_transition_todo_to_in_progress(row):
"""transition StartWork: TODO -> IN_PROGRESS (post-state status = IN_PROGRESS)"""
client.post("/__test_admin__/reset")
row = dict(row)
row["status"] = "TODO"
seed = client.post("/__test_admin__/seed/todo", json=row)
assume(seed.status_code == 201)
seeded_id = seed.json()["id"]
response = client.post(f"/todos/{seeded_id}/start")
assert response.status_code == 200, response.text
post_state = client.get("/__test_admin__/state").json()
bucket = post_state.get("todos", {})
entity_view = bucket.get(str(seeded_id)) or bucket.get(seeded_id)
actual = entity_view.get("status") if isinstance(entity_view, dict) else entity_view
assert actual == "IN_PROGRESS", f"expected status=IN_PROGRESS, got {actual!r}"@given(row=strategy_todo())
@settings(max_examples=10, suppress_health_check=[HealthCheck.function_scoped_fixture])
def test_archive_transition_illegal_from_in_progress(row):
"""transition Archive: from=IN_PROGRESS is illegal (no rule); SUT must reject 4xx"""
client.post("/__test_admin__/reset")
row = dict(row)
row["status"] = "IN_PROGRESS"
seed = client.post("/__test_admin__/seed/todo", json=row)
assume(seed.status_code == 201)
seeded_id = seed.json()["id"]
response = client.post(f"/todos/{seeded_id}/archive")
assert 400 <= response.status_code < 500, f"expected 4xx, got {response.status_code}: {response.text}"What gets generated for todo_list.spec
via operation | Legal (from, to) | Positive tests | Illegal from (no rule) | Negative tests |
|---|---|---|---|---|
StartWork | TODO -> IN_PROGRESS | 1 | IN_PROGRESS, DONE, ARCHIVED | 3 |
Complete | IN_PROGRESS -> DONE | 1 | TODO, DONE, ARCHIVED | 3 |
Reopen | DONE -> IN_PROGRESS (guarded by updated_at > completed_at) | 1 | TODO, IN_PROGRESS, ARCHIVED | 3 |
Archive | TODO -> ARCHIVED, DONE -> ARCHIVED | 2 | IN_PROGRESS, ARCHIVED | 2 |
16 transition tests total, including the Reopen positive whose guard
updated_at > completed_at is satisfied deterministically by the recognizer in
#152 (see "Guarded
positive transitions" below). PauseWork is referenced in the spec's transition
graph but has no operation declaration in todo_list.spec, so it produces a
transition[PauseWork] skip in _testgen_skips.json with reason
no operation named 'PauseWork' for via clause.
Guarded positive transitions (#152)
Rules with a when <guard> clause are recognized when the guard reduces to a
conjunction (and), or single instance, of these shapes:
| Shape | Example | Emitted fix-up |
|---|---|---|
| ordered comparison on two entity fields, both DateTime-ish | updated_at > completed_at | row["updated_at"] = (datetime.datetime.fromisoformat(row["completed_at"]) + datetime.timedelta(seconds=1)).isoformat() (with optional if row[..] is None anchor when the RHS is Option[DateTime]) |
| ordered comparison on two entity fields, both numeric | score > threshold | row["score"] = row["threshold"] + 1 (or + 0/- 1 for >=/<=/<) |
| ordered comparison vs. numeric literal | score > 10 | row["score"] = 10 + 1 |
| equality with enum or literal RHS | tier = GOLD, count = 7, name = "x" | row["tier"] = "GOLD" |
equality with none on Option field | closed_at = none | row["closed_at"] = None |
existence (!= none) on Option field | closed_at != none | if row["closed_at"] is None: row["closed_at"] = <inner-default> |
| cardinality on Set/Seq field | #tags > 2, len(tags) >= 1, #tags = 0 | row["tags"] = ["x0", "x1", "x2"] (size and fillers depend on inner type) |
| literal membership in Set/Seq field | "URGENT" in tags | row["tags"] = list(row["tags"]) + ["URGENT"] |
transition-field self-equality matching from | rule LOW -> HIGH ... when phase = LOW | no extra fix line, already satisfied by the row[<field>] = LOW step |
top-level not (...) (De Morgan inversion) | not (a > b) | recurses on a <= b |
Recognized clauses can be combined freely with and. The arithmetic-shift form
on ordered comparisons composes with field-level where constraints already
enforced by the strategy because we keep the strategy-generated RHS value and
shift LHS off it.
Anything outside the table above keeps the skip with reason
guard '<expr>' not representable in seed dict (see #152). Cases that fall back:
- User function calls other than
len(...)(e.g.is_valid_email(addr),paymentCaptured(order_id)): would need runtime predicate evaluation. - Set/map/seq arithmetic beyond cardinality and membership (e.g.
tags - banned = {},tags = other_tags). - Nested field access on either side (e.g.
parent.status = ARCHIVED,priority.level > 3): would need multi-entity seed orchestration; the recognizer only accepts bareIdentifieroperands of the seeded entity. Pre(...)/Prime(...). References system pre-/post-state, not the row being seeded; structurally not row-local.- Compound RHS expressions other than the literal forms in the table
(e.g.
a > b + 1,a > some_function()). - Mismatched operand kinds in ordered comparison (one DateTime-ish, one numeric); the recognizer requires the same kind on both sides.
- Transition-field self-equality contradicting the rule's
from(e.g. ruleLOW -> HIGH ... when phase = HIGH; the guard can never hold for any row of phase=LOW). - Conjunctions where any branch fails recognition. All-or-nothing across
an
andtree; a partial pass would silently miss the unrecognized branch. - Conflicting fixes within a conjunction (e.g.
a > b and a < bproduces two writes torow[a]with different RHS); recognition fails fast rather than emitting an unsatisfiable test. - Top-level
or/implies. Pick-a-branch ambiguity; reduce to a conjunction and re-spec the rule if you want a deterministic test.
Body and query inputs on via operations (#155)
When the via op takes parameters beyond the path id (e.g.,
RecordPayment(order_id: OrderId, amount: Money) mounted at
POST /orders/{order_id}/payments), the body/query inputs are generated through the
same Strategies.expressionFor(...) used in non-transition tests, and wired into the
@given decorator alongside row=strategy_<entity>():
@given(row=strategy_order(), amount=strategy_money())
@settings(max_examples=10, suppress_health_check=[HealthCheck.function_scoped_fixture])
def test_record_payment_transition_illegal_from_draft(row, amount):
"""transition RecordPayment: from=DRAFT is illegal (no rule); SUT must reject 4xx"""
client.post("/__test_admin__/reset")
row = dict(row)
row["status"] = "DRAFT"
seed = client.post("/__test_admin__/seed/order", json=row)
assume(seed.status_code == 201)
seeded_id = seed.json()["id"]
response = client.post(f"/orders/{seeded_id}/payments", json={"amount": amount})
assert 400 <= response.status_code < 500, f"expected 4xx, got {response.status_code}: {response.text}"Body inputs go to json={...}, query inputs to params={...}. Sensitive-field redaction
and per-operation test_strategy overrides apply, since the same expressionFor path
is reused. If any non-path input has no known strategy (e.g., Map[K, V],
Relation[K, V], an unknown named type), the entire via is skipped with a reason
naming the un-generable input.
If a non-path input shares a name with a generated test local, row, seed,
seeded_id, response, client, pre_state, post_state, wrong_status, the
input is aliased to _arg_<name> (with a numeric suffix on collision) for the
function parameter and the @given keyword, while the JSON / query dict key keeps
the original parameter name. So a spec input named seed becomes
@given(..., _arg_seed=...) / def fn(row, _arg_seed) / json={"seed": _arg_seed}.
The positive transition still requires the via op's other requires clauses (besides
the from-state pin) to be satisfied by the generated values; if the SUT enforces
stricter preconditions (e.g., requires: amount = orders[order_id].total), the
positive test will fail by design, that's a real spec-vs-SUT divergence and surfaces
where it should. (Recognising such relational preconditions on body inputs is a
separate concern from #155.)
Other skipped categories
- Non-enum transition field. If
field: <name>resolves to a type other than an enum (or alias of an enum), the entireTransitionDeclis skipped; illegal-from enumeration is undefined for non-enum status fields. - Unknown via operation. A
via XwhoseXhas no matchingoperationdeclaration is skipped. Spec lint should catch this; the skip is a defensive backstop. - Multi-path or zero-path via operation. Transition tests need exactly one path
parameter to identify the seeded entity; nested-path shapes
(e.g.,
/orders/{order_id}/items/{item_id}) and zero-path shapes are skipped pending multi-entity seed orchestration. - Non-generable body or query input. If
Strategies.expressionFor(...)returnsSkipfor any non-path input (most commonlyMap/Relation-typed bodies), the via is skipped with a reason naming the input and the un-generable type.
Stateful tests (M5.2)
Alongside the per-clause behavioral tests, testgen emits one
tests/test_stateful_<service>.py containing a Hypothesis
RuleBasedStateMachine
that exercises multi-step operation sequences against the live SUT. Each spec operation
becomes an @rule performing the real HTTP call. Entity ids returned from Create-classified
operations flow into other rules through Bundles, so a Resolve rule receives a code that
was just produced by a Shorten rule, and a Delete rule consumes the code from the bundle.
Global invariants run after every step against /__test_admin__/state.
class UrlShortenerStateMachine(RuleBasedStateMachine):
url_mapping_ids = Bundle("url_mapping_ids")
@initialize()
def _reset(self):
client.post("/__test_admin__/reset")
@rule(target=url_mapping_ids, url=strategy_long_url())
def shorten(self, url):
response = client.post("/shorten", json={"url": url})
assert response.status_code == 201, response.text
return response.json()["code"]
@rule(code=url_mapping_ids)
def resolve(self, code):
response = client.get(f"/{code}")
assert response.status_code == 302, response.text
@rule(code=consumes(url_mapping_ids))
def delete(self, code):
response = client.delete(f"/{code}")
assert response.status_code == 204, response.text
@invariant()
def invariant_all_ur_ls_valid(self):
post_state = client.get("/__test_admin__/state").json()
assert all(is_valid_uri(post_state["store"][c]) for c in post_state["store"])
TestStatefulUrlShortener = UrlShortenerStateMachine.TestCaseBundle inference
| Operation kind (convention layer) | Role in the state machine |
|---|---|
Create / CreateChild (returns an entity) | @rule(target=<entity>_ids, ...), pushes the response id into the bundle |
Delete (id-typed input) | parameter via consumes(<entity>_ids), pops the id |
| Read / Update / Replace / Transition / SideEffect (id-typed input) | parameter via <entity>_ids (non-consuming draw) |
| any operation with no entity-id input or output | parameter-less rule that just exercises the endpoint |
The <entity>_ids bundle name follows the entity's snake-cased name (e.g., url_mapping_ids,
todo_ids). The id projection from the response uses the entity's primary-key field
(id if present, otherwise the first declared field, for url_shortener this is code).
Per-status bundles for transition-driven entities (#153)
When an entity has a TransitionDecl over an enum field AND every Create operation
deterministically sets the initial status (e.g., todo.status = TODO in the ensures),
testgen splits the single <entity>_ids bundle into one bundle per enum value:
| Entity | Status enum value | Bundle name |
|---|---|---|
Todo | TODO | todo_todo_ids |
Todo | IN_PROGRESS | todo_in_progress_ids |
Todo | DONE | todo_done_ids |
Todo | ARCHIVED | todo_archived_ids |
Rules then route ids through bundles by status:
- Create:
@rule(target=<initial-status-bundle>, ...), pushes the new id into the bundle for the status the ensures sets. - Unguarded transition (no
whenclause): one Python rule perTransitionRule, named<via>_from_<from>_to_<to>. Decorator is@rule(target=<to>_bundle, <id>=consumes(<from>_bundle)). The body asserts strict success andreturns the id, Hypothesis transfers it from<from>into<to>. Including_to_<to>in the function name keeps each rule unique even when several rules share the same via andfrombut differ into. An operation that fires from multiplefrom-states (e.g.,Archivefrom bothTODOandDONE) emits one rule per source bundle (archive_from_todo_to_archived,archive_from_done_to_archived). - Guarded transition (has a
whenclause that bundle membership cannot guarantee): sameconsumes+targetshape, but the body branches:if response.status_code == <success>: return id # SUT accepted -> id moves from -> to elif 400 <= response.status_code < 500: return multiple() # guard failed -> id is consumed and dropped (no move) else: assert False, ...multiple()with no arguments tells Hypothesis "no result"; the id leaves the source bundle without being added to the target, keeping bundle/SUT bookkeeping consistent. - Read / Update / Replace / SideEffect with an id parameter:
- If
requiresis just<id> in <state>(membership only), the rule draws fromst.one_of(<all-status-bundles>)and asserts strict success, bundle membership fully covers the precondition by construction. - If
requiresincludes<id> in <state> and <state>[id].<transition-field> (= X | in {X, Y, ...}), the rule draws fromst.one_of(<bundle X>, <bundle Y>, ...)and is strict. Repeated conjunctive restrictions on the same field are intersected (AND semantics). - If
requiresincludes any unrecognized non-key conjunct (or astate[id].fieldaccess on a field that isn't the entity's transition field), the rule draws fromst.one_of(<all-status-bundles>)non-consuming and falls back to loose assertion.
- If
- Delete:
- With a recognized status restriction that narrows to a single bundle,
consumes(<bundle>), strict. - With just
<id> in <state>membership, a multi-bundle restriction, or any unrecognized requires conjunct, non-consumingst.one_of(...)+ loose. (consumes(st.one_of(...))is not supported by Hypothesis, and consuming an id that the SUT may reject would leak bundle/SUT state.)
- With a recognized status restriction that narrows to a single bundle,
Entities without a TransitionDecl over an enum field, or whose Create operation
doesn't deterministically set an initial status, fall back to the legacy single-bundle
shape. url_shortener and safe_counter outputs stay byte-identical.
Strict vs loose status assertions
For each rule, testgen decides whether to assert the success status strictly or to also accept any 4xx (the SUT rejecting because its own preconditions failed):
- Strict (
assert response.status_code == <success>): used forCreaterules, unguarded transition rules (bundle membership guarantees the precondition by construction), and rules whose everyrequiresclause is either triviallytrueor a<input> in <state>pattern that is satisfied by construction via bundle membership. - Loose (
if-elif-else, accepting<success>or400..499): used for rules with any other state-dependent precondition (e.g., transition guards liketodos[id].status = TODOthat aren't recognised by the per-status splitter, orwhen-clause time guards likeupdated_at > completed_at). A 4xx response there is the SUT correctly rejecting an unsatisfied precondition, not a bug.
Settings
Settings are attached to the state machine's TestCase (which the TestStateful<Service>
alias refers to):
<Service>StateMachine.TestCase.settings = settings(
max_examples=25,
stateful_step_count=20,
deadline=None,
suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
TestStateful<Service> = <Service>StateMachine.TestCasedeadline=None is required because every step performs an HTTP round-trip; a per-step
deadline trips on slow CI machines.
What stateful does not do (in M5.2)
- It does not maintain a Python-side ghost model of state. The SUT, accessed via
/__test_admin__/state, is the model. This trades shrink-output detail for emitter simplicity and avoids re-translating everyensuresclause as a model update. - Per-rule ensures-asserts are handled by the behavioral tests, not duplicated here.
Stateful surfaces multi-step bugs via the
@invariant()checks that run after every step.
Structural tests (M5.3)
Alongside the per-clause behavioral tests and the hand-rolled stateful state machine,
testgen emits a tests/test_structural_<service>.py that wires
Schemathesis to the project's openapi.yaml.
Schemathesis fuzzes every (method, path) declared in the schema and validates
responses against the OpenAPI shape, catching wrong status codes, missing endpoints,
content-type drift, and 5xx unhandled exceptions. The structural file is orthogonal
to M5.2: the M5.2 stateful tests verify the spec contract; the M5.3 structural
tests verify the schema contract.
The generated file is self-contained: it reads SPEC_TEST_BASE_URL and SPEC_TEST_PROFILE
from the environment, declares the three profiles inline (see Profiles
below), and validates SPEC_TEST_PROFILE against the allowed set with an explicit
ValueError (no opaque KeyError):
import os
import schemathesis
from hypothesis import HealthCheck, settings
from tests.conftest import client
BASE_URL = os.environ.get("SPEC_TEST_BASE_URL", "http://localhost:8000")
PROFILE = os.environ.get("SPEC_TEST_PROFILE", "thorough")
PROFILES = {
"smoke": {"max_examples": 10, "stateful_step_count": 3},
"thorough": {"max_examples": 100, "stateful_step_count": 10},
"exhaustive": {"max_examples": 1000, "stateful_step_count": 25},
}
if PROFILE not in PROFILES:
raise ValueError(f"Invalid SPEC_TEST_PROFILE={PROFILE!r}. Expected one of: {', '.join(sorted(PROFILES))}")
_PROFILE = PROFILES[PROFILE]
schema = schemathesis.openapi.from_path("openapi.yaml")
def _check_invariant_metadata_consistent(response, case):
"""invariant metadataConsistent: dom(store) = dom(metadata)"""
if response.status_code >= 500:
return
post_state = client.get("/__test_admin__/state").json()
assert set(post_state["store"].keys()) == set(post_state["metadata"].keys()), \
"invariant violated: metadataConsistent"
_ALL_CHECKS = (_check_invariant_all_ur_ls_valid,
_check_invariant_metadata_consistent,
_check_invariant_click_count_non_negative)
@schema.parametrize()
@settings(max_examples=_PROFILE["max_examples"], deadline=None,
suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture])
def test_api_structural(case):
client.post("/__test_admin__/reset")
response = case.call(base_url=BASE_URL)
if _ALL_CHECKS:
case.validate_response(response, checks=_ALL_CHECKS)
else:
case.validate_response(response)
UrlShortenerLinksStateMachine = schema.as_state_machine()
TestStructuralLinksUrlShortener = UrlShortenerLinksStateMachine.TestCaseProfiles
A single env-var, SPEC_TEST_PROFILE, picks the depth tier:
| Profile | max_examples | stateful_step_count | Use case |
|---|---|---|---|
smoke | 10 | 3 | pre-commit hook, fast feedback |
thorough (default) | 100 | 10 | CI on PRs |
exhaustive | 1000 | 25 | nightly / release gate |
SPEC_TEST_PROFILE=smoke pytest tests/test_structural_*.py -vSpec-derived custom checks
For each global invariant that ExprToPython can translate, testgen emits one
def _check_invariant_<name>(response, case) function. The check fetches
/__test_admin__/state and asserts the translated predicate. Checks gate on
response.status_code < 500 so 5xx unhandled exceptions surface via Schemathesis's
own not_a_server_error check rather than a stale state read.
For Create operations whose ensures clauses reference only inputs and outputs
(no pre(), no state, no '), testgen emits per-operation checks gated by
(case.path, case.method, response.status_code), these run only when Schemathesis
hits the matching endpoint with a successful status code. Clauses that don't fit
this shape are recorded under the new structural_skipped array of
tests/_testgen_skips.json.
Reset between cases
Before every parametrized case, client.post("/__test_admin__/reset") runs so global
invariant checks stay meaningful across hundreds of fuzzed cases. The cost is one
extra round-trip per case (~1 ms locally). The Links state machine, in contrast,
fires Hypothesis's per-scenario __init__ on its own, no manual reset hook needed.
Two state machines, by design
The Schemathesis-built as_state_machine() infers links from response shapes and
parameter-name matches against the OpenAPI document. The M5.2 hand-rolled
<Service>StateMachine infers them from the spec's classified operations
(Create Bundle.target, Delete consumes, etc.). Both run; Schemathesis
fuzzes shapes the spec layer doesn't (e.g., body fields just outside the
where constraint).
What structural does not do (in M5.3)
- It does not emit explicit OpenAPI
links:blocks; Schemathesis discovers links viaLocationheaders and parameter-name heuristics. Explicit Links emission is tracked separately. - Entity-field
whereinvariants (e.g.,len(value) >= 6) are not translated to custom checks; Schemathesis already validates them as part of OpenAPI schema conformance at request and response time. - The structural skip rate on rich specs is high (~70-80% of
ensuresclauses reference state orpre()/primes); that is by design, behavioral and stateful layers cover those.
Conformance runner (M5.4)
tests/run_conformance.py is a static Python orchestrator that runs the three layers
sequentially against an already-running service, emits JUnit XML per phase, and
aggregates a single exit code. The Makefile wraps it with two targets:
# Service must already be reachable at SPEC_TEST_BASE_URL
make test-conformance PROFILE=smoke
# Brings up docker compose, waits for /health, runs, tears down
make test-conformance-docker PROFILE=thoroughPhase order is structural behavioral stateful. Each phase begins with
POST /__test_admin__/reset (on top of the per-case resets in the test files
themselves) so a wedged state from one phase cannot poison the next. The phase
glob is expanded inside the runner; pytest receives concrete paths, so pytest's
own collection-error path doesn't trigger when a phase has no matching files (a
service with no entities legitimately has no stateful tests).
| Profile | max_examples | Use case |
|---|---|---|
smoke | 10 | pre-commit / fast feedback |
thorough (default) | 100 | CI on PRs |
exhaustive | 1000 | nightly cron |
JUnit XML lands under results/<phase>-<profile>.xml. The generated GitHub Actions
workflow (.github/workflows/ci.yml) calls make test-conformance after starting the
service, picks the profile from the event type
(exhaustive on schedule, thorough otherwise), and uploads results/ as an
artifact. Cron is wired at 0 2 * * * for the nightly exhaustive run.
Exit codes:
0, all three phases passed1, at least one phase failed2, service unreachable, admin router disabled, or invalidPROFILE
The generated docker-compose.yml forwards ENABLE_TEST_ADMIN from the host
environment to the app container, so ENABLE_TEST_ADMIN=1 docker compose up is
sufficient, no manual edit required (the M5.1 caveat in the quick-start above is
historical; M5.4 plumbs this for you).
Sensitive fields (M5.8)
Spec fields whose names match SensitiveFields.isSensitive (exact: password,
password_hash, secret, token, api_key; suffix: _hash, _secret, _password,
_api_key, _token) get sensitivity-aware handling at every stage of the generated
suite:
- Hypothesis behavioral tests. Sensitive operation inputs are wrapped in
redact(...)fromtests/redaction.py. The wrapper returns a_RedactedStr(astrsubclass with__repr__masking), so failing-example output showspassword=<redacted len=N>instead of the raw value. JSON serialization is unaffected; the wire format is identical to a plain string. - Schemathesis structural tests. A
@schemathesis.hookbefore_callregisters itself with a_SENSITIVE_BODY_FIELDSfrozenset scoped to the operations in the current spec; sensitive body fields are wrapped in_RedactedStrbefore the request is sent, so failing-case output is masked. - Service-side logs.
app/redaction.pyconfigures structlog with aredact_sensitiveprocessor in the chain. Every log line (application code, uvicorn access logs, uvicorn errors) runs through the same processor; sensitive keys at any depth in the structured event dict become***REDACTED***. The processor is a pure function; tests intests/test_log_redaction.pycover it directly.
Per-field overrides
A test_strategy convention rule overrides the default redaction policy for a
specific operation input or entity field:
service Demo {
// ...
conventions {
Register.password.test_strategy = "live" // send constraint-derived live values
User.password_hash.test_strategy = "redacted" // force the placeholder
// Equivalent string-qualifier syntax (mirrors http_header):
// Register.test_strategy "password" = "live"
}
}Resolution: <Operation>.<input>.test_strategy wins over <Entity>.<field>.test_strategy,
which broadcasts to every operation whose input shares the field name. The validator
rejects unknown field qualifiers, missing qualifiers, values other than
"live" / "redacted", conflicting entity-field overrides for the same field name across
different entities (would resolve ambiguously), and duplicate
target+qualifier+property rules.
live => unwrap (no redact, no placeholder; emits the regular constrained strategy).
redacted => force st.just("***REDACTED***") regardless of constraints. Bare,
unconstrained sensitive types under redacted may fail server-side input validation
(the placeholder is short); pair with a constrained type alias if your invariants demand
a length/regex.
Coverage on canonical fixtures
ExprToPython covers every Expr constructor in the IR, no translator coverage gaps
remain. The remaining ExprPy.Skip outputs split into (i) user errors in the spec
(unbound identifier, undeclared function, wrong arity, reserved-name binding), (ii) typed
Quantifier bindings, and (iii) parser-side gaps (currently the multi-clause let ... in
body in ecommerce). These numbers are locked in SkipRateProbeTest.scala; CI fails if
they regress.
| Fixture | Total clauses | Translated | Skipped | Skip rate | Remaining-skip reason |
|---|---|---|---|---|---|
safe_counter.spec | 5 | 5 | 0 | 0.0% | , |
url_shortener.spec | 21 | 21 | 0 | 0.0% | , |
todo_list.spec | 50 | 50 | 0 | 0.0% | , |
edge_cases.spec | 29 | 29 | 0 | 0.0% | , |
ecommerce.spec | 76 | 74 | 2 | 2.6% | parser scope leak, multi-clause let ... in body, follow-up |
auth_service.spec | 40 | 34 | 6 | 15.0% | spec calls hash() and recentFailedAttempts() without declaring them as function/predicate; treated as user errors |
What _testgen_skips.json looks like
{
"version": 1,
"service": "UrlShortener",
"strategies_skipped": [],
"behavioral_skipped": [
{"operation": "Shorten", "kind": "requires[0]", "reason": "M5.1: only `<input> in <state>` and `<state>[input].<enum-field> = <value>` requires patterns get negative tests"},
{"operation": "Resolve", "kind": "ensures", "reason": "state-dependent precondition; positive ensures/invariant tests are covered by stateful tests (M5.2) for ops bundled into the state machine; the single-shot behavioral test would need explicit pre-seeding"},
{"operation": "Resolve", "kind": "invariant[allURLsValid]", "reason": "state-dependent precondition; positive ensures/invariant tests are covered by stateful tests (M5.2) for ops bundled into the state machine; the single-shot behavioral test would need explicit pre-seeding"}
],
"structural_skipped": [
{"operation": "Shorten", "kind": "structural_ensures[0]", "reason": "ensures references pre()/prime(), covered by behavioral/stateful layers"},
{"operation": "Shorten", "kind": "structural_ensures[2]", "reason": "ensures references state field, covered by stateful invariants"}
]
}The behavioral_skipped array enumerates clauses that the test generator could not turn
into a runnable test. Every entry signals one of: (a) state-machine setup is needed before
the precondition is reachable for a non-transition operation (no via rule covers it; the
remaining gap on GetTodo-shaped operations); (b) the requires shape is outside the
recognized negative-test pattern; (c) a guarded transition rule whose guard sits outside
the recognizer's recognized shapes (ordered comparison / enum or literal equality /
not-none / conjunction of those); (d) a real user error in the spec.
Runtime requirements
- The generated service must be running and reachable at
SPEC_TEST_BASE_URL(defaulthttp://localhost:8000). - The service must have started with
ENABLE_TEST_ADMIN=1. Without it, the conftest fixture skips the whole suite with a clear message rather than failing weirdly. - Setting
SPEC_TEST_INPROC=1flipsconftest.pyto use FastAPI'sTestClientagainst an in-process import ofapp.main:appinstead of going over HTTP toSPEC_TEST_BASE_URL. Eachpytestinvocation re-imports the app fresh, required for mutation testing (see Mutation Testing) where the runner has to observe edits toapp/. The admin router is auto-enabled in this mode (the conftest unconditionally setsENABLE_TEST_ADMIN=1before importingapp.main). Postgres (or whateverDATABASE_URLpoints at) is still required because the in-process app makes real DB calls. - By default,
compileemits service stubs that raiseNotImplementedErrorfor any operation classifiedLLM_SYNTHESIS. Pass--with-synthesisto invoke the M6 CEGIS loop (#27) and substitute verified Dafny output into those stubs. Runtime invariant and temporal-property enforcement (#86) ships the@invariant()observers andx-temporalextensions the assertions rely on.
Custom strategies (M5.6)
Strategies.scala synthesizes a Hypothesis strategy from each TypeAliasDecl's
where clause, recognised shapes are length bounds, regex (matches /.../), numeric
range, and the is_valid_uri / is_valid_email predicate filters. Anything outside
that set lands in _testgen_skips.json under strategies_skipped, with the synth
falling back to an unconstrained base strategy (st.text(), st.integers()).
To override the synthesized strategy with a hand-written one, declare a strategy
convention rule whose value is a module:symbol reference:
service AuthService {
type Email = String where value matches /^[^@]+@[^@]+\.[^@]+$/
type PasswordHash = String where len(value) = 64
conventions {
Email.strategy = "tests.strategies_user:valid_email"
PasswordHash.strategy = "tests.strategies_user:strong_password_hash"
}
}Then implement the symbols in tests/strategies_user.py (stub emitted on first
compile, preserved across re-compile):
from hypothesis import strategies as st
def valid_email():
return st.from_regex(r"^[^@]+@[^@]+\.[^@]+$", fullmatch=True)
def strong_password_hash():
return st.text(alphabet="0123456789abcdef", min_size=64, max_size=64)The generated tests/strategies.py automatically imports the symbols and routes
def strategy_<type>(): return <symbol>() to the override. Downstream test files
(test_behavioral_*.py, test_stateful_*.py) keep calling
strategy_<type>() unchanged, only the body of that function is swapped.
--strict-strategies CI gate
Pass --strict-strategies to compile --with-tests to fail the build if any
type alias or enum has an incomplete synthesized strategy AND no convention
override is registered. Use it as a CI guard against silently-broken strategies:
sbt "cli/run compile --with-tests --strict-strategies --out /tmp/svc fixtures/spec/auth_service.spec"
# Exit 1 if any strategy falls back to st.text() / st.integers() / st.nothing() with no overrideTwo synthesis gaps trip the gate:
- Unhandled
whereconstraint, e.g.,type Code = String where custom_pred(value)producesst.text()plus a# testgen-skip:note. The synthesized strategy is too permissive (custom_pred is not enforced). - Unsupported base type, e.g.,
type Mapping = Map<String, Int>falls through tost.nothing(). The synthesized strategy is unusable.
Both cases land in _testgen_skips.json's strategies_skipped array and
become compile errors under --strict-strategies.
Built-in predicates (M5.10)
isValidURI and isValidEmail are not hardcoded in Scala. They live in
parser/resources/specrest/parser/preamble.spec as ordinary
spec-language predicate declarations:
service Preamble {
predicate isValidURI(s: String) = s matches /^https?:..[^\s]+/
predicate isValidEmail(s: String) = s matches /^[^@\s]+@[^@\s]+\.[^@\s]+$/
}The parser auto-merges this preamble's predicates into every parsed
ServiceIR, so user specs can call isValidURI(x) or isValidEmail(x) without
declaring them. A user-declared predicate with the same name takes precedence.
What this enables
- Adding a new built-in predicate is one line in
preamble.spec, no Scala edit, no JSON registry, no helper-import wiring. - The verifier sees the predicate body (a regex match), not an opaque function.
Z3 still reasons about the
matches_<hash>declaration as uninterpreted (same proving power as before), but the framework no longer special-cases predicate names. - Test-gen synthesises a tighter strategy for type aliases that use a
preamble-style
s matches /regex/predicate: instead ofst.text().filter(lambda v: is_valid_uri(v)), the generator inlines the regex and emitsst.from_regex("^https?:..[^\s]+", fullmatch=True). Hypothesis can hit the property directly rather than rejection-sampling. - Adding a custom predicate in a user spec works identically:
predicate isValidUUID(s: String) = s matches /^[0-9a-f-]{36}$/and the same pipeline applies (Strategy inlines the regex, ExprToPython rendersis_valid_uuid, the verifier treats it as a Bool predicate).
Limitations
The preamble's isValidURI is a regex approximation, not Python's
urlparse-based check. For property-based testing this is symmetric (the
strategy generates URI-like strings; the assertion checks the same regex). The
spec's matches /regex/ literal cannot contain a literal /, character
classes ([/]) and lexer escapes are also unsupported. Predicates that need
non-regex semantics (e.g. opaque urlparse(s) strictness) would require a
separate extern predicate mechanism, out of scope here.
Limitations and follow-up issues
For the live test-generation follow-up backlog (state-dependent preconditions on
non-transition ops, GuardSatisfier extensions, M5.11 multi-target test gen, M5.12
default --with-tests), see Roadmap Test generation follow-ups.