Test Generation Pipeline

Design document for the spec-to-REST compiler's test generation subsystem. Covers structural, behavioral, and stateful test layers with complete worked examples for three service domains.

Implementation status. Phase 5 is largely shipped. Closed milestones cover §2 Schemathesis (#26), §3 Hypothesis behavioral tests (#24), §4 stateful tests (#25), §5 custom strategies (#134), §7 conformance runner (#23), §9 mutation testing gate (#135), plus M5.5–M5.10 follow-ups. §6 TLA+ trace validation remains a non-goal in v1. The shipped surface lives in modules/testgen/ (Scala 3); the live --with-tests reference is pipelines/test-generation.mdx. Open follow-ups: #86 (runtime invariant enforcement) and #139 / #140 (multi-target test gen, default-on --with-tests). Python code samples below are the spec-vision shapes; emitted Python in tests/*.py is what the live shipped pipeline produces and is documented in detail on the pipelines page.

Architecture Overview
Schemathesis Integration (Structural Layer)
Hypothesis Property Tests (Behavioral Layer)
Hypothesis Stateful Testing (State Machine Layer)
Custom Hypothesis Strategies from Spec Types
TLA+ Trace Validation (Advanced)
Conformance Test Runner Architecture
Spec-to-Test Mapping Table
Test Quality Metrics
Comparison with Existing Testing Approaches

1. Architecture overview

The test generation pipeline produces three complementary test layers from a single formal specification. Each layer catches a different class of defects, and together they provide end-to-end conformance checking between the spec and the running service.

What each layer catches

Layer	Catches	Misses
Structural (Schemathesis)	Wrong status codes, missing endpoints, schema mismatches, 500 errors, content-type violations	Correct-shaped but wrong-valued responses, invariant violations across calls
Behavioral (Hypothesis properties)	Single-operation postcondition failures, precondition bypass, wrong return values	Multi-step invariant drift, state corruption across operation sequences
Stateful (Hypothesis state machine)	Multi-step invariant violations, illegal state transitions, resource lifecycle bugs, ordering-dependent failures	Performance issues, concurrency bugs (unless parallel testing is added)

2. Schemathesis integration (structural layer)

2.1 How we generate Schemathesis configuration from the spec

The compiler already produces an OpenAPI specification as an intermediate artifact (see Section 6.2 of 00_comprehensive_analysis.md). This OpenAPI spec becomes the direct input to Schemathesis. No additional translation is needed for the structural layer, the OpenAPI document is the test specification.

However, we augment the basic Schemathesis run with:

Custom checks derived from the formal spec's ensures and invariant clauses
Stateful test links derived from operation input/output relationships
Test profiles that control depth and breadth of exploration
Auth configuration generated from the spec's security declarations
Database seeding scripts generated from the spec's initial state

2.2 Test profiles

The compiler generates three test profiles for different contexts:

Profile	Max Examples	Stateful Depth	Timeout	Use Case
`smoke`	10 per endpoint	3 steps	30s	Pre-commit hook, local dev
`thorough`	100 per endpoint	10 steps	5m	CI pipeline, pull requests
`exhaustive`	1000 per endpoint	25 steps	30m	Release gate, nightly builds

2.3 Complete generated Schemathesis test file (URL shortener)

The following file is generated from the URL shortener spec. It is complete and runnable against the generated service.

# file: tests/test_structural.py
#
# AUTO-GENERATED from spec: UrlShortener
# Do not edit manually -- regenerate with: spec-to-rest generate --tests

import schemathesis
import pytest
from urllib.parse import urlparse

# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------

BASE_URL = "http://localhost:8000"
SCHEMA_PATH = "./openapi.json"

schema = schemathesis.from_path(
    SCHEMA_PATH,
    base_url=BASE_URL,
)

# ---------------------------------------------------------------------------
# Test profile (set via environment variable SPEC_TEST_PROFILE)
# ---------------------------------------------------------------------------

import os

PROFILE = os.environ.get("SPEC_TEST_PROFILE", "thorough")

PROFILE_SETTINGS = {
    "smoke": {"max_examples": 10, "stateful_step_count": 3},
    "thorough": {"max_examples": 100, "stateful_step_count": 10},
    "exhaustive": {"max_examples": 1000, "stateful_step_count": 25},
}

settings = PROFILE_SETTINGS[PROFILE]

# ---------------------------------------------------------------------------
# Auth hook (if the spec declares authentication)
# ---------------------------------------------------------------------------

AUTH_TOKEN = os.environ.get("SPEC_TEST_AUTH_TOKEN", None)


@schemathesis.hook("before_call")
def add_auth(context, case, **kwargs):
    """Attach auth token to every request if configured."""
    if AUTH_TOKEN is not None:
        case.headers = case.headers or {}
        case.headers["Authorization"] = f"Bearer {AUTH_TOKEN}"


# ---------------------------------------------------------------------------
# Custom checks derived from spec ensures/invariants
# ---------------------------------------------------------------------------


@schemathesis.check
def spec_shorten_returns_valid_short_url(response, case):
    """
    Derived from: ensures short_url = base_url + "/" + code.value
    Only applies to POST /shorten with 201 responses.
    """
    if case.path != "/shorten" or case.method.upper() != "POST":
        return
    if response.status_code != 201:
        return
    data = response.json()
    short_url = data.get("short_url", "")
    code = data.get("code", "")
    assert short_url.endswith(
        f"/{code}"
    ), f"short_url '{short_url}' does not end with code '{code}'"
    parsed = urlparse(short_url)
    assert parsed.scheme in (
        "http",
        "https",
    ), f"short_url '{short_url}' has invalid scheme"


@schemathesis.check
def spec_shorten_code_length_valid(response, case):
    """
    Derived from entity invariant: len(ShortCode.value) >= 6 and len(ShortCode.value) <= 10
    """
    if case.path != "/shorten" or case.method.upper() != "POST":
        return
    if response.status_code != 201:
        return
    code = response.json().get("code", "")
    assert 6 <= len(code) <= 10, f"code length {len(code)} outside [6, 10]"


@schemathesis.check
def spec_shorten_code_alphanumeric(response, case):
    """
    Derived from entity invariant: ShortCode.value matches /^[a-zA-Z0-9]+$/
    """
    if case.path != "/shorten" or case.method.upper() != "POST":
        return
    if response.status_code != 201:
        return
    code = response.json().get("code", "")
    assert code.isalnum(), f"code '{code}' is not alphanumeric"


@schemathesis.check
def spec_resolve_returns_valid_uri(response, case):
    """
    Derived from: invariant all c in store | isValidURI(store[c].value)
    Applies to GET /{code} with 302 responses.
    """
    if not case.path.startswith("/{code}"):
        return
    if case.method.upper() != "GET":
        return
    if response.status_code != 302:
        return
    location = response.headers.get("Location", "")
    parsed = urlparse(location)
    assert parsed.scheme in (
        "http",
        "https",
    ), f"Location header '{location}' is not a valid URI"


@schemathesis.check
def spec_no_server_errors(response, case):
    """All operations: server must never return 5xx."""
    assert response.status_code < 500, (
        f"Server error {response.status_code} on "
        f"{case.method.upper()} {case.path}"
    )


# ---------------------------------------------------------------------------
# Stateless tests (one per endpoint)
# ---------------------------------------------------------------------------


from hypothesis import settings as hypothesis_settings

@schema.parametrize()
@hypothesis_settings(max_examples=settings["max_examples"])
def test_api_structural(case):
    """
    For each endpoint in the OpenAPI spec, Schemathesis generates random
    valid requests and validates that responses conform to the declared schemas.
    Custom checks above are run automatically for every response.
    """
    case.call_and_validate()


# ---------------------------------------------------------------------------
# Stateful tests (multi-step sequences via OpenAPI links)
# ---------------------------------------------------------------------------
#
# The generated OpenAPI spec includes links that express the data flow
# between operations:
#
#   POST /shorten  --[code]--> GET /{code}
#   POST /shorten  --[code]--> DELETE /{code}
#   GET /{code}    --[code]--> DELETE /{code}
#
# Schemathesis discovers these links and generates multi-step test sequences.

state_machine = schema.as_state_machine()


class TestStatefulUrlShortener(state_machine):
    """
    Stateful test class. Schemathesis builds a state machine from the OpenAPI
    links and explores multi-step sequences up to the configured depth.
    """

    pass


TestStatefulUrlShortener.settings = hypothesis_settings(
    max_examples=settings["max_examples"],
    stateful_step_count=settings["stateful_step_count"],
)

# ---------------------------------------------------------------------------
# Database seeding fixture
# ---------------------------------------------------------------------------


@pytest.fixture(scope="session", autouse=True)
def seed_database():
    """
    Seed the database with known test data before the test session.
    Generated from the spec's initial state constraints.
    """
    import httpx

    # Seed a few known URLs so that resolve/delete tests have targets
    seed_data = [
        {"url": "https://example.com/page1"},
        {"url": "https://example.com/page2"},
        {"url": "https://example.com/page3"},
    ]
    created_codes = []
    for item in seed_data:
        resp = httpx.post(f"{BASE_URL}/shorten", json=item)
        if resp.status_code == 201:
            created_codes.append(resp.json()["code"])

    yield created_codes

    # Teardown: delete seeded entries
    for code in created_codes:
        httpx.delete(f"{BASE_URL}/{code}")

2.4 Expected output interpretation

Schemathesis produces a structured report. Here is how to interpret it in the context of spec conformance:

Schemathesis Finding	Spec Violation
`status_code_conformance` failure	Convention engine mapped the wrong HTTP status to an operation
`content_type_conformance` failure	Response serialization does not match OpenAPI `produces`
`response_schema_conformance` failure	Entity or output type not correctly mapped to JSON schema
Custom check failure (`spec_shorten_*`)	An `ensures` or `invariant` clause is violated
`not_a_server_error` failure	Unhandled exception in generated code
Stateful test failure	Data flow between operations is broken (e.g., code from shorten cannot be used in resolve)

2.5 CI integration (schemathesis)

The generated Makefile includes targets for each profile:

.PHONY: test-structural test-structural-smoke test-structural-exhaustive

test-structural-smoke:
	SPEC_TEST_PROFILE=smoke pytest tests/test_structural.py -v --tb=short

test-structural:
	SPEC_TEST_PROFILE=thorough pytest tests/test_structural.py -v --tb=short

test-structural-exhaustive:
	SPEC_TEST_PROFILE=exhaustive pytest tests/test_structural.py -v --tb=long

3. Hypothesis property tests (behavioral layer)

This is the most critical layer. Every ensures clause in the spec becomes one or more Hypothesis property tests. Every requires clause generates a negative test. Every invariant generates an after-every-operation check.

3.1 Mapping rules

Spec Element	Generated Test	What It Verifies
`ensures: X not in pre(state)`	Freshness property test	Newly created identifier did not exist before
`ensures: state'[k] = v`	Storage correctness test	Value retrievable after mutation
`ensures: #state' = #state + N`	Cardinality test	Exactly N elements added/removed
`ensures: state' = state`	No-mutation test	Read operations do not change state
`ensures: output = f(input)`	Output correctness test	Return value matches specification
`requires: X in state`	Negative test (missing X)	Returns error, state unchanged
`requires: predicate(input)`	Negative test (invalid input)	Returns 422, state unchanged
`invariant: forall x in S \| P(x)`	Post-operation invariant check	Property holds after every operation

3.2 Helper infrastructure (generated)

Every generated test suite includes these shared utilities:

# file: tests/conftest.py
#
# AUTO-GENERATED from spec: UrlShortener

import pytest
import httpx
from hypothesis import strategies as st, given, settings, assume, HealthCheck

# ---------------------------------------------------------------------------
# Test client
# ---------------------------------------------------------------------------

BASE_URL = "http://localhost:8000"


@pytest.fixture(scope="session")
def client():
    with httpx.Client(base_url=BASE_URL, timeout=10.0) as c:
        yield c


# Global client for use in Hypothesis tests (which cannot use pytest fixtures
# directly due to the @given decorator)
_client = httpx.Client(base_url=BASE_URL, timeout=10.0)


def get_client() -> httpx.Client:
    return _client


# ---------------------------------------------------------------------------
# Spec-derived helper functions
# ---------------------------------------------------------------------------


def get_all_codes() -> set[str]:
    """Return all short codes currently in the store."""
    resp = get_client().get("/codes")
    resp.raise_for_status()
    return set(resp.json())


def get_store_count() -> int:
    """Return the number of entries in the store."""
    resp = get_client().get("/stats")
    resp.raise_for_status()
    return resp.json()["count"]


def is_valid_uri(s: str) -> bool:
    """Check if a string is a valid URI per the spec's valid_uri predicate."""
    from urllib.parse import urlparse

    try:
        result = urlparse(s)
        return all([result.scheme in ("http", "https"), result.netloc])
    except Exception:
        return False


# ---------------------------------------------------------------------------
# Database reset between tests
# ---------------------------------------------------------------------------


@pytest.fixture(autouse=True)
def reset_state():
    """Reset the service state before each test function."""
    resp = get_client().post("/test/reset")
    assert resp.status_code == 200
    yield
    # No teardown needed; next test will reset

3.3 URL shortener, complete property tests

Spec

service UrlShortener {
  entity ShortCode { value: String; invariant: len(value) >= 6 and len(value) <= 10; invariant: value matches /^[a-zA-Z0-9]+$/ }
  entity LongURL   { value: String; invariant: isValidURI(value) }
  state { store: ShortCode -> lone LongURL }

  operation Shorten {
    input:  url: LongURL
    output: code: ShortCode, short_url: String
    requires: isValidURI(url.value)
    ensures:  code not in pre(store); store'[code] = url; #store' = #store + 1
  }

  operation Resolve {
    input:  code: ShortCode
    output: url: LongURL
    requires: code in store
    ensures:  url = store[code]; store' = store
  }

  operation Delete {
    input:  code: ShortCode
    requires: code in store
    ensures:  code not in store'; #store' = #store - 1
  }

  invariant: all c in store | isValidURI(store[c].value)
}

Generated tests

# file: tests/test_behavioral_url_shortener.py
#
# AUTO-GENERATED from spec: UrlShortener
# Tests every ensures clause as a Hypothesis property test,
# every requires clause as a negative test,
# and every invariant as a post-operation check.

import re
import pytest
from hypothesis import given, settings, assume, HealthCheck
from hypothesis import strategies as st
from tests.conftest import get_client, get_all_codes, get_store_count, is_valid_uri

# ---------------------------------------------------------------------------
# Strategies derived from entity definitions
# ---------------------------------------------------------------------------

valid_urls = st.sampled_from([
    "https://example.com",
    "https://example.com/path/to/page",
    "https://subdomain.example.org/page?q=1&r=2",
    "http://example.net/a/b/c",
    "https://example.com/very/long/path/" + "x" * 200,
]) | st.from_regex(
    r"https?://[a-z]{3,12}\.[a-z]{2,6}(/[a-z0-9]{1,20}){0,5}",
    fullmatch=True,
)

invalid_urls = st.sampled_from([
    "",
    "not-a-url",
    "ftp://wrong-scheme.com",
    "://missing-scheme",
    "http://",
    "   ",
    "javascript:alert(1)",
])

# ---------------------------------------------------------------------------
# OPERATION: Shorten
# ---------------------------------------------------------------------------
# ensures: code not in pre(store)
# ensures: store'[code] = url
# ensures: #store' = #store + 1


@given(url=valid_urls)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_shorten_returns_fresh_code(url):
    """
    Ensures: code not in pre(store)

    After shortening, the returned code must not have existed before
    the operation was called.
    """
    client = get_client()
    client.post("/test/reset")

    # Capture pre-state
    existing_codes = get_all_codes()

    # Execute operation
    response = client.post("/shorten", json={"url": url})
    assert response.status_code == 201

    code = response.json()["code"]
    assert code not in existing_codes, (
        f"Returned code '{code}' already existed in store before shorten was called"
    )


@given(url=valid_urls)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_shorten_stores_correct_mapping(url):
    """
    Ensures: store'[code] = url

    After shortening, resolving the returned code must yield the original URL.
    """
    client = get_client()
    client.post("/test/reset")

    response = client.post("/shorten", json={"url": url})
    assert response.status_code == 201

    code = response.json()["code"]
    lookup = client.get(f"/{code}", follow_redirects=False)
    assert lookup.status_code == 302
    assert lookup.headers["Location"] == url, (
        f"Expected Location '{url}', got '{lookup.headers.get('Location')}'"
    )


@given(url=valid_urls)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_shorten_adds_exactly_one(url):
    """
    Ensures: #store' = #store + 1

    The store must contain exactly one more entry after a successful shorten.
    """
    client = get_client()
    client.post("/test/reset")

    count_before = get_store_count()
    response = client.post("/shorten", json={"url": url})
    assert response.status_code == 201

    count_after = get_store_count()
    assert count_after == count_before + 1, (
        f"Store count went from {count_before} to {count_after}, expected {count_before + 1}"
    )


# ---------------------------------------------------------------------------
# OPERATION: Resolve
# ---------------------------------------------------------------------------
# ensures: url = store[code]
# ensures: store' = store


@given(url=valid_urls)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_resolve_returns_correct_url(url):
    """
    Ensures: url = store[code]

    Resolving a code must return the URL that was stored when it was shortened.
    """
    client = get_client()
    client.post("/test/reset")

    # Create a mapping
    shorten_resp = client.post("/shorten", json={"url": url})
    assert shorten_resp.status_code == 201
    code = shorten_resp.json()["code"]

    # Resolve it
    resolve_resp = client.get(f"/{code}", follow_redirects=False)
    assert resolve_resp.status_code == 302
    assert resolve_resp.headers["Location"] == url


@given(url=valid_urls)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_resolve_does_not_mutate_state(url):
    """
    Ensures: store' = store

    The resolve operation must not change the store. The count before and after
    must be identical.
    """
    client = get_client()
    client.post("/test/reset")

    # Create a mapping
    shorten_resp = client.post("/shorten", json={"url": url})
    assert shorten_resp.status_code == 201
    code = shorten_resp.json()["code"]

    count_before = get_store_count()
    codes_before = get_all_codes()

    # Resolve
    client.get(f"/{code}", follow_redirects=False)

    count_after = get_store_count()
    codes_after = get_all_codes()

    assert count_after == count_before, "Resolve mutated the store count"
    assert codes_after == codes_before, "Resolve mutated the store contents"


# ---------------------------------------------------------------------------
# OPERATION: Delete
# ---------------------------------------------------------------------------
# ensures: code not in store'
# ensures: #store' = #store - 1


@given(url=valid_urls)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_delete_removes_code(url):
    """
    Ensures: code not in store'

    After deletion, the code must no longer be in the store.
    """
    client = get_client()
    client.post("/test/reset")

    # Create a mapping
    shorten_resp = client.post("/shorten", json={"url": url})
    assert shorten_resp.status_code == 201
    code = shorten_resp.json()["code"]

    # Delete it
    delete_resp = client.delete(f"/{code}")
    assert delete_resp.status_code == 204

    # Verify removal
    remaining_codes = get_all_codes()
    assert code not in remaining_codes, f"Code '{code}' still in store after delete"


@given(url=valid_urls)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_delete_removes_exactly_one(url):
    """
    Ensures: #store' = #store - 1

    The store must contain exactly one fewer entry after a successful delete.
    """
    client = get_client()
    client.post("/test/reset")

    # Create a mapping
    shorten_resp = client.post("/shorten", json={"url": url})
    assert shorten_resp.status_code == 201
    code = shorten_resp.json()["code"]

    count_before = get_store_count()
    delete_resp = client.delete(f"/{code}")
    assert delete_resp.status_code == 204

    count_after = get_store_count()
    assert count_after == count_before - 1, (
        f"Store count went from {count_before} to {count_after}, expected {count_before - 1}"
    )


# ---------------------------------------------------------------------------
# NEGATIVE TESTS: requires clause violations
# ---------------------------------------------------------------------------


@given(url=invalid_urls)
@settings(
    max_examples=50,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_shorten_rejects_invalid_url(url):
    """
    Requires: isValidURI(url.value)

    Violating this precondition must return 422 and must NOT mutate state.
    """
    client = get_client()
    client.post("/test/reset")

    count_before = get_store_count()
    response = client.post("/shorten", json={"url": url})
    assert response.status_code == 422, (
        f"Expected 422 for invalid URL '{url}', got {response.status_code}"
    )

    count_after = get_store_count()
    assert count_after == count_before, (
        "State was mutated despite requires clause violation"
    )


def test_resolve_rejects_nonexistent_code():
    """
    Requires: code in store

    Resolving a code that does not exist must return 404.
    """
    client = get_client()
    client.post("/test/reset")

    response = client.get("/NONEXISTENT", follow_redirects=False)
    assert response.status_code == 404


def test_delete_rejects_nonexistent_code():
    """
    Requires: code in store

    Deleting a code that does not exist must return 404 and must NOT mutate state.
    """
    client = get_client()
    client.post("/test/reset")

    count_before = get_store_count()
    response = client.delete("/NONEXISTENT")
    assert response.status_code == 404

    count_after = get_store_count()
    assert count_after == count_before, (
        "State was mutated despite requires clause violation"
    )


# ---------------------------------------------------------------------------
# INVARIANT CHECKS: run after every operation
# ---------------------------------------------------------------------------


def check_global_invariants():
    """
    Invariant: all c in store | isValidURI(store[c].value)

    Every URL in the store must be a valid URI. Called after every operation
    in the stateful test layer, and spot-checked here.
    """
    client = get_client()
    codes = get_all_codes()
    for code in codes:
        resp = client.get(f"/{code}", follow_redirects=False)
        if resp.status_code == 302:
            location = resp.headers["Location"]
            assert is_valid_uri(location), (
                f"Invariant violated: store[{code}] = '{location}' is not a valid URI"
            )


@given(url=valid_urls)
@settings(
    max_examples=50,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_invariant_holds_after_shorten(url):
    """Global invariant must hold after every Shorten operation."""
    client = get_client()
    client.post("/test/reset")
    client.post("/shorten", json={"url": url})
    check_global_invariants()


@given(url=valid_urls)
@settings(
    max_examples=50,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_invariant_holds_after_delete(url):
    """Global invariant must hold after every Delete operation."""
    client = get_client()
    client.post("/test/reset")

    resp = client.post("/shorten", json={"url": url})
    assert resp.status_code == 201
    code = resp.json()["code"]
    client.delete(f"/{code}")
    check_global_invariants()

3.4 Todo list with state machine, complete property tests

Spec

service TodoList {
  entity TodoId    { value: UUID }
  entity TodoTitle { value: String; invariant: len(value) >= 1 and len(value) <= 200 }

  enum TodoStatus { todo, in_progress, done }

  entity TodoItem {
    id:     TodoId
    title:  TodoTitle
    status: TodoStatus
  }

  state { items: TodoId -> lone TodoItem }

  operation CreateTodo {
    input:  title: TodoTitle
    output: item: TodoItem
    requires: len(title.value) >= 1
    ensures:
      item.id not in pre(items)
      item.status = TodoStatus.todo
      item.title = title
      items'[item.id] = item
      #items' = #items + 1
  }

  operation StartTodo {
    input:  id: TodoId
    output: item: TodoItem
    requires: id in items
    requires: items[id].status = TodoStatus.todo
    ensures:
      item.status = TodoStatus.in_progress
      items'[id].status = TodoStatus.in_progress
      #items' = #items
  }

  operation CompleteTodo {
    input:  id: TodoId
    output: item: TodoItem
    requires: id in items
    requires: items[id].status = TodoStatus.in_progress
    ensures:
      item.status = TodoStatus.done
      items'[id].status = TodoStatus.done
      #items' = #items
  }

  operation DeleteTodo {
    input:  id: TodoId
    requires: id in items
    ensures:
      id not in items'
      #items' = #items - 1
  }

  // State transition constraints
  invariant: all t in items | items[t].status = done implies
    (items[t] was in_progress)  // done items must have been in_progress

  // General invariants
  invariant: all t in items | len(items[t].title.value) >= 1
}

Generated tests

# file: tests/test_behavioral_todo.py
#
# AUTO-GENERATED from spec: TodoList

import pytest
from hypothesis import given, settings, assume, HealthCheck
from hypothesis import strategies as st
from tests.conftest import get_client

# ---------------------------------------------------------------------------
# Strategies
# ---------------------------------------------------------------------------

valid_titles = st.text(
    alphabet=st.characters(whitelist_categories=("L", "N", "P", "Z")),
    min_size=1,
    max_size=200,
).filter(lambda t: len(t.strip()) >= 1)

invalid_titles = st.sampled_from(["", None]) | st.text(min_size=201, max_size=300)

# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------


def get_all_items():
    resp = get_client().get("/todos")
    resp.raise_for_status()
    return resp.json()


def get_item_count():
    return len(get_all_items())


# ---------------------------------------------------------------------------
# OPERATION: CreateTodo
# ---------------------------------------------------------------------------
# ensures: item.id not in pre(items)
# ensures: item.status = TodoStatus.todo
# ensures: item.title = title
# ensures: items'[item.id] = item
# ensures: #items' = #items + 1


@given(title=valid_titles)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_create_todo_returns_fresh_id(title):
    """Ensures: item.id not in pre(items)"""
    client = get_client()
    client.post("/test/reset")

    existing_ids = {item["id"] for item in get_all_items()}
    response = client.post("/todos", json={"title": title})
    assert response.status_code == 201

    new_id = response.json()["id"]
    assert new_id not in existing_ids, f"ID '{new_id}' already existed"


@given(title=valid_titles)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_create_todo_initial_status_is_todo(title):
    """Ensures: item.status = TodoStatus.todo"""
    client = get_client()
    client.post("/test/reset")

    response = client.post("/todos", json={"title": title})
    assert response.status_code == 201

    item = response.json()
    assert item["status"] == "todo", f"Expected status 'todo', got '{item['status']}'"


@given(title=valid_titles)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_create_todo_stores_correct_title(title):
    """Ensures: item.title = title"""
    client = get_client()
    client.post("/test/reset")

    response = client.post("/todos", json={"title": title})
    assert response.status_code == 201

    item = response.json()
    assert item["title"] == title, f"Expected title '{title}', got '{item['title']}'"


@given(title=valid_titles)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_create_todo_adds_exactly_one(title):
    """Ensures: #items' = #items + 1"""
    client = get_client()
    client.post("/test/reset")

    count_before = get_item_count()
    response = client.post("/todos", json={"title": title})
    assert response.status_code == 201

    count_after = get_item_count()
    assert count_after == count_before + 1


# ---------------------------------------------------------------------------
# OPERATION: StartTodo
# ---------------------------------------------------------------------------
# ensures: item.status = TodoStatus.in_progress
# ensures: items'[id].status = TodoStatus.in_progress
# ensures: #items' = #items


@given(title=valid_titles)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_start_todo_transitions_to_in_progress(title):
    """Ensures: item.status = TodoStatus.in_progress"""
    client = get_client()
    client.post("/test/reset")

    # Create a todo (starts in 'todo' state)
    create_resp = client.post("/todos", json={"title": title})
    assert create_resp.status_code == 201
    todo_id = create_resp.json()["id"]

    # Start it
    start_resp = client.patch(f"/todos/{todo_id}/start")
    assert start_resp.status_code == 200

    item = start_resp.json()
    assert item["status"] == "in_progress"


@given(title=valid_titles)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_start_todo_persists_status(title):
    """Ensures: items'[id].status = TodoStatus.in_progress (check via GET)"""
    client = get_client()
    client.post("/test/reset")

    create_resp = client.post("/todos", json={"title": title})
    todo_id = create_resp.json()["id"]
    client.patch(f"/todos/{todo_id}/start")

    # Verify persisted state via a fresh GET
    get_resp = client.get(f"/todos/{todo_id}")
    assert get_resp.status_code == 200
    assert get_resp.json()["status"] == "in_progress"


@given(title=valid_titles)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_start_todo_does_not_change_count(title):
    """Ensures: #items' = #items"""
    client = get_client()
    client.post("/test/reset")

    create_resp = client.post("/todos", json={"title": title})
    todo_id = create_resp.json()["id"]

    count_before = get_item_count()
    client.patch(f"/todos/{todo_id}/start")
    count_after = get_item_count()

    assert count_after == count_before


# ---------------------------------------------------------------------------
# OPERATION: CompleteTodo
# ---------------------------------------------------------------------------
# ensures: item.status = TodoStatus.done
# ensures: items'[id].status = TodoStatus.done
# ensures: #items' = #items


@given(title=valid_titles)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_complete_todo_transitions_to_done(title):
    """Ensures: item.status = TodoStatus.done"""
    client = get_client()
    client.post("/test/reset")

    create_resp = client.post("/todos", json={"title": title})
    todo_id = create_resp.json()["id"]
    client.patch(f"/todos/{todo_id}/start")  # must be in_progress first

    complete_resp = client.patch(f"/todos/{todo_id}/complete")
    assert complete_resp.status_code == 200
    assert complete_resp.json()["status"] == "done"


@given(title=valid_titles)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_complete_todo_persists_status(title):
    """Ensures: items'[id].status = TodoStatus.done (check via GET)"""
    client = get_client()
    client.post("/test/reset")

    create_resp = client.post("/todos", json={"title": title})
    todo_id = create_resp.json()["id"]
    client.patch(f"/todos/{todo_id}/start")
    client.patch(f"/todos/{todo_id}/complete")

    get_resp = client.get(f"/todos/{todo_id}")
    assert get_resp.json()["status"] == "done"


# ---------------------------------------------------------------------------
# OPERATION: DeleteTodo
# ---------------------------------------------------------------------------
# ensures: id not in items'
# ensures: #items' = #items - 1


@given(title=valid_titles)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_delete_todo_removes_item(title):
    """Ensures: id not in items'"""
    client = get_client()
    client.post("/test/reset")

    create_resp = client.post("/todos", json={"title": title})
    todo_id = create_resp.json()["id"]

    delete_resp = client.delete(f"/todos/{todo_id}")
    assert delete_resp.status_code == 204

    get_resp = client.get(f"/todos/{todo_id}")
    assert get_resp.status_code == 404


@given(title=valid_titles)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_delete_todo_removes_exactly_one(title):
    """Ensures: #items' = #items - 1"""
    client = get_client()
    client.post("/test/reset")

    create_resp = client.post("/todos", json={"title": title})
    todo_id = create_resp.json()["id"]

    count_before = get_item_count()
    client.delete(f"/todos/{todo_id}")
    count_after = get_item_count()

    assert count_after == count_before - 1


# ---------------------------------------------------------------------------
# NEGATIVE TESTS: requires clause violations
# ---------------------------------------------------------------------------


def test_start_rejects_nonexistent_todo():
    """Requires: id in items"""
    client = get_client()
    client.post("/test/reset")
    resp = client.patch("/todos/00000000-0000-0000-0000-000000000000/start")
    assert resp.status_code == 404


@given(title=valid_titles)
@settings(
    max_examples=50,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_start_rejects_non_todo_status(title):
    """Requires: items[id].status = TodoStatus.todo -- cannot start an in_progress item."""
    client = get_client()
    client.post("/test/reset")

    create_resp = client.post("/todos", json={"title": title})
    todo_id = create_resp.json()["id"]
    client.patch(f"/todos/{todo_id}/start")  # now in_progress

    # Trying to start again should fail
    resp = client.patch(f"/todos/{todo_id}/start")
    assert resp.status_code == 422 or resp.status_code == 409, (
        f"Expected 422 or 409 for invalid transition, got {resp.status_code}"
    )


@given(title=valid_titles)
@settings(
    max_examples=50,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_complete_rejects_todo_status(title):
    """Requires: items[id].status = TodoStatus.in_progress -- cannot complete from todo."""
    client = get_client()
    client.post("/test/reset")

    create_resp = client.post("/todos", json={"title": title})
    todo_id = create_resp.json()["id"]

    # Item is in 'todo' status, not 'in_progress'
    resp = client.patch(f"/todos/{todo_id}/complete")
    assert resp.status_code == 422 or resp.status_code == 409

    # Verify state was not mutated
    get_resp = client.get(f"/todos/{todo_id}")
    assert get_resp.json()["status"] == "todo"


@given(title=invalid_titles)
@settings(
    max_examples=50,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_create_rejects_invalid_title(title):
    """Requires: len(title.value) >= 1"""
    client = get_client()
    client.post("/test/reset")

    count_before = get_item_count()
    resp = client.post("/todos", json={"title": title})
    assert resp.status_code == 422

    count_after = get_item_count()
    assert count_after == count_before, "State mutated despite requires failure"

3.5 E-commerce orders, complete property tests

Spec

service OrderSystem {
  entity ProductId   { value: UUID }
  entity OrderId     { value: UUID }
  entity CustomerId  { value: UUID }
  entity Money       { cents: Int; invariant: cents >= 0 }

  enum OrderStatus { pending, confirmed, shipped, delivered, cancelled }

  entity OrderLine {
    product: ProductId
    quantity: Int
    unit_price: Money
    invariant: quantity >= 1
  }

  entity Order {
    id:       OrderId
    customer: CustomerId
    lines:    set OrderLine
    status:   OrderStatus
    total:    Money
  }

  state {
    orders:   OrderId -> lone Order
    products: ProductId -> lone Product    // catalog (read-only for order ops)
    inventory: ProductId -> Int            // stock counts
  }

  operation PlaceOrder {
    input:  customer: CustomerId, lines: set OrderLine
    output: order: Order
    requires: #lines >= 1
    requires: all l in lines | l.product in products
    requires: all l in lines | inventory[l.product] >= l.quantity
    ensures:
      order.id not in pre(orders)
      order.status = OrderStatus.pending
      order.customer = customer
      order.lines = lines
      order.total.cents = sum(l in lines | l.quantity * l.unit_price.cents)
      orders'[order.id] = order
      all l in lines | inventory'[l.product] = inventory[l.product] - l.quantity
      #orders' = #orders + 1
  }

  operation ConfirmOrder {
    input:  id: OrderId
    output: order: Order
    requires: id in orders
    requires: orders[id].status = OrderStatus.pending
    ensures:
      order.status = OrderStatus.confirmed
      orders'[id].status = OrderStatus.confirmed
  }

  operation CancelOrder {
    input:  id: OrderId
    output: order: Order
    requires: id in orders
    requires: orders[id].status in {OrderStatus.pending, OrderStatus.confirmed}
    ensures:
      order.status = OrderStatus.cancelled
      orders'[id].status = OrderStatus.cancelled
      // Restore inventory
      all l in orders[id].lines | inventory'[l.product] = inventory[l.product] + l.quantity
  }

  invariant: all o in orders | orders[o].total.cents >= 0
  invariant: all o in orders | #orders[o].lines >= 1
  invariant: all p in inventory | inventory[p] >= 0
}

Generated tests

# file: tests/test_behavioral_orders.py
#
# AUTO-GENERATED from spec: OrderSystem

import pytest
from hypothesis import given, settings, assume, HealthCheck
from hypothesis import strategies as st
from tests.conftest import get_client

# ---------------------------------------------------------------------------
# Strategies
# ---------------------------------------------------------------------------

# These assume a pre-seeded product catalog with known product IDs and prices.
SEED_PRODUCTS = [
    {"id": "prod-001", "name": "Widget", "price_cents": 999},
    {"id": "prod-002", "name": "Gadget", "price_cents": 2499},
    {"id": "prod-003", "name": "Doohickey", "price_cents": 50},
]

CUSTOMER_ID = "cust-001"

order_lines = st.lists(
    st.fixed_dictionaries({
        "product": st.sampled_from([p["id"] for p in SEED_PRODUCTS]),
        "quantity": st.integers(min_value=1, max_value=5),
        "unit_price_cents": st.sampled_from([p["price_cents"] for p in SEED_PRODUCTS]),
    }),
    min_size=1,
    max_size=5,
)

# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------


def get_all_orders():
    resp = get_client().get("/orders")
    resp.raise_for_status()
    return resp.json()


def get_order_count():
    return len(get_all_orders())


def get_inventory(product_id: str) -> int:
    resp = get_client().get(f"/products/{product_id}/inventory")
    resp.raise_for_status()
    return resp.json()["quantity"]


def seed_catalog():
    """Ensure seed products exist with sufficient inventory."""
    client = get_client()
    for p in SEED_PRODUCTS:
        client.put(f"/products/{p['id']}", json={
            "name": p["name"],
            "price_cents": p["price_cents"],
        })
        client.put(f"/products/{p['id']}/inventory", json={"quantity": 1000})


# ---------------------------------------------------------------------------
# OPERATION: PlaceOrder
# ---------------------------------------------------------------------------
# ensures: order.id not in pre(orders)
# ensures: order.status = OrderStatus.pending
# ensures: order.customer = customer
# ensures: order.total.cents = sum(...)
# ensures: orders'[order.id] = order
# ensures: all l in lines | inventory'[l.product] = inventory[l.product] - l.quantity
# ensures: #orders' = #orders + 1


@given(lines=order_lines)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_place_order_returns_fresh_id(lines):
    """Ensures: order.id not in pre(orders)"""
    client = get_client()
    client.post("/test/reset")
    seed_catalog()

    existing_ids = {o["id"] for o in get_all_orders()}
    resp = client.post("/orders", json={"customer": CUSTOMER_ID, "lines": lines})
    assert resp.status_code == 201

    new_id = resp.json()["id"]
    assert new_id not in existing_ids


@given(lines=order_lines)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_place_order_initial_status_pending(lines):
    """Ensures: order.status = OrderStatus.pending"""
    client = get_client()
    client.post("/test/reset")
    seed_catalog()

    resp = client.post("/orders", json={"customer": CUSTOMER_ID, "lines": lines})
    assert resp.status_code == 201
    assert resp.json()["status"] == "pending"


@given(lines=order_lines)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_place_order_correct_total(lines):
    """Ensures: order.total.cents = sum(l in lines | l.quantity * l.unit_price.cents)"""
    client = get_client()
    client.post("/test/reset")
    seed_catalog()

    resp = client.post("/orders", json={"customer": CUSTOMER_ID, "lines": lines})
    assert resp.status_code == 201

    order = resp.json()
    expected_total = sum(l["quantity"] * l["unit_price_cents"] for l in lines)
    assert order["total_cents"] == expected_total, (
        f"Expected total {expected_total}, got {order['total_cents']}"
    )


@given(lines=order_lines)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_place_order_decrements_inventory(lines):
    """Ensures: all l in lines | inventory'[l.product] = inventory[l.product] - l.quantity"""
    client = get_client()
    client.post("/test/reset")
    seed_catalog()

    # Record pre-state inventory
    inventory_before = {}
    for line in lines:
        pid = line["product"]
        if pid not in inventory_before:
            inventory_before[pid] = get_inventory(pid)

    resp = client.post("/orders", json={"customer": CUSTOMER_ID, "lines": lines})
    assert resp.status_code == 201

    # Check inventory decremented correctly
    expected_decrements = {}
    for line in lines:
        pid = line["product"]
        expected_decrements[pid] = expected_decrements.get(pid, 0) + line["quantity"]

    for pid, decrement in expected_decrements.items():
        inventory_after = get_inventory(pid)
        assert inventory_after == inventory_before[pid] - decrement, (
            f"Product {pid}: expected inventory {inventory_before[pid] - decrement}, "
            f"got {inventory_after}"
        )


@given(lines=order_lines)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_place_order_adds_exactly_one(lines):
    """Ensures: #orders' = #orders + 1"""
    client = get_client()
    client.post("/test/reset")
    seed_catalog()

    count_before = get_order_count()
    resp = client.post("/orders", json={"customer": CUSTOMER_ID, "lines": lines})
    assert resp.status_code == 201

    count_after = get_order_count()
    assert count_after == count_before + 1


# ---------------------------------------------------------------------------
# OPERATION: CancelOrder
# ---------------------------------------------------------------------------
# ensures: order.status = OrderStatus.cancelled
# ensures: all l in orders[id].lines | inventory'[l.product] = inventory[l.product] + l.quantity


@given(lines=order_lines)
@settings(
    max_examples=100,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_cancel_order_restores_inventory(lines):
    """
    Ensures: all l in orders[id].lines |
        inventory'[l.product] = inventory[l.product] + l.quantity

    Cancelling an order must restore the inventory that was decremented.
    """
    client = get_client()
    client.post("/test/reset")
    seed_catalog()

    # Record inventory before placing order
    inventory_initial = {}
    for line in lines:
        pid = line["product"]
        if pid not in inventory_initial:
            inventory_initial[pid] = get_inventory(pid)

    # Place and then cancel
    place_resp = client.post("/orders", json={"customer": CUSTOMER_ID, "lines": lines})
    assert place_resp.status_code == 201
    order_id = place_resp.json()["id"]

    cancel_resp = client.patch(f"/orders/{order_id}/cancel")
    assert cancel_resp.status_code == 200
    assert cancel_resp.json()["status"] == "cancelled"

    # Inventory should be fully restored
    for pid, initial_qty in inventory_initial.items():
        restored_qty = get_inventory(pid)
        assert restored_qty == initial_qty, (
            f"Product {pid}: inventory after cancel is {restored_qty}, "
            f"expected restoration to {initial_qty}"
        )


# ---------------------------------------------------------------------------
# NEGATIVE TESTS
# ---------------------------------------------------------------------------


def test_place_order_rejects_empty_lines():
    """Requires: #lines >= 1"""
    client = get_client()
    client.post("/test/reset")
    seed_catalog()

    resp = client.post("/orders", json={"customer": CUSTOMER_ID, "lines": []})
    assert resp.status_code == 422


def test_confirm_rejects_non_pending():
    """Requires: orders[id].status = OrderStatus.pending"""
    client = get_client()
    client.post("/test/reset")
    seed_catalog()

    # Create and confirm an order
    lines = [{"product": "prod-001", "quantity": 1, "unit_price_cents": 999}]
    place_resp = client.post("/orders", json={"customer": CUSTOMER_ID, "lines": lines})
    order_id = place_resp.json()["id"]
    client.patch(f"/orders/{order_id}/confirm")

    # Trying to confirm again should fail
    resp = client.patch(f"/orders/{order_id}/confirm")
    assert resp.status_code in (409, 422)


def test_cancel_rejects_shipped():
    """Requires: orders[id].status in {pending, confirmed}"""
    client = get_client()
    client.post("/test/reset")
    seed_catalog()

    lines = [{"product": "prod-001", "quantity": 1, "unit_price_cents": 999}]
    place_resp = client.post("/orders", json={"customer": CUSTOMER_ID, "lines": lines})
    order_id = place_resp.json()["id"]
    client.patch(f"/orders/{order_id}/confirm")
    client.patch(f"/orders/{order_id}/ship")  # transition to shipped

    # Cannot cancel a shipped order
    resp = client.patch(f"/orders/{order_id}/cancel")
    assert resp.status_code in (409, 422)

    # Verify status was NOT changed
    get_resp = client.get(f"/orders/{order_id}")
    assert get_resp.json()["status"] == "shipped"


# ---------------------------------------------------------------------------
# INVARIANT CHECKS
# ---------------------------------------------------------------------------


def check_order_invariants():
    """
    invariant: all o in orders | orders[o].total.cents >= 0
    invariant: all o in orders | #orders[o].lines >= 1
    invariant: all p in inventory | inventory[p] >= 0
    """
    client = get_client()
    orders = get_all_orders()

    for order in orders:
        assert order["total_cents"] >= 0, (
            f"Order {order['id']} has negative total: {order['total_cents']}"
        )
        assert len(order["lines"]) >= 1, (
            f"Order {order['id']} has no lines"
        )

    for p in SEED_PRODUCTS:
        inv = get_inventory(p["id"])
        assert inv >= 0, f"Product {p['id']} has negative inventory: {inv}"


@given(lines=order_lines)
@settings(
    max_examples=50,
    suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture],
)
def test_invariants_hold_after_place_order(lines):
    client = get_client()
    client.post("/test/reset")
    seed_catalog()
    client.post("/orders", json={"customer": CUSTOMER_ID, "lines": lines})
    check_order_invariants()

4. Hypothesis stateful testing (state machine layer)

The stateful test layer builds a RuleBasedStateMachine that mirrors the spec's abstract state. Each spec operation becomes a rule. Each requires clause becomes a @precondition. Each ensures clause becomes an assertion inside the rule body. Each invariant becomes a @invariant-decorated method.

4.1 Mapping from spec to state machine

Spec Element	StateMachine Element	Purpose
`state { store: K -> V }`	`self.store: dict[K, V] = {}`	Model state mirrors spec state
`operation Foo { output: x }`	`@rule(target=xs)` returning x	Track created resources in Bundles
`operation Foo { input: x }`	`@rule(x=xs)` drawing from Bundle	Consume previously created resources
`requires: condition`	`@precondition(lambda self: condition)`	Guard rule applicability
`ensures: assertion`	`assert assertion` inside rule body	Check postconditions
`invariant: property`	`@invariant() def check(): assert property`	Check after every rule

4.2 URL shortener, complete state machine

# file: tests/test_stateful_url_shortener.py
#
# AUTO-GENERATED from spec: UrlShortener

import random
from hypothesis import settings, HealthCheck
from hypothesis import strategies as st
from hypothesis.stateful import (
    RuleBasedStateMachine,
    Bundle,
    rule,
    invariant,
    precondition,
    initialize,
)
from urllib.parse import urlparse
import httpx

BASE_URL = "http://localhost:8000"
client = httpx.Client(base_url=BASE_URL, timeout=10.0)

# Strategies derived from entity definitions
valid_urls = st.sampled_from([
    "https://example.com",
    "https://example.com/path/to/page",
    "https://subdomain.example.org/page?q=1&r=2",
    "http://example.net/a/b/c",
]) | st.from_regex(
    r"https?://[a-z]{3,12}\.[a-z]{2,6}(/[a-z0-9]{1,20}){0,5}",
    fullmatch=True,
)


def is_valid_uri(s: str) -> bool:
    try:
        result = urlparse(s)
        return all([result.scheme in ("http", "https"), result.netloc])
    except Exception:
        return False


class UrlShortenerStateMachine(RuleBasedStateMachine):
    """
    State machine that mirrors the spec's abstract state.
    The model state (self.store) tracks what the service should contain.
    Every rule executes a real HTTP request and verifies the response against
    both the model and the spec's ensures clauses.
    """

    # -----------------------------------------------------------------------
    # Model state (mirrors spec: state { store: ShortCode -> lone LongURL })
    # -----------------------------------------------------------------------
    def __init__(self):
        super().__init__()
        self.store: dict[str, str] = {}

    # -----------------------------------------------------------------------
    # Bundles for tracking created resources
    # -----------------------------------------------------------------------
    codes = Bundle("codes")

    # -----------------------------------------------------------------------
    # Initialize: reset the service state before each test case
    # -----------------------------------------------------------------------
    @initialize()
    def reset_service(self):
        resp = client.post("/test/reset")
        assert resp.status_code == 200
        self.store = {}

    # -----------------------------------------------------------------------
    # OPERATION: Shorten
    # Spec: requires isValidURI(url.value)
    #       ensures code not in pre(store)
    #       ensures store'[code] = url
    #       ensures #store' = #store + 1
    # -----------------------------------------------------------------------
    @rule(target=codes, url=valid_urls)
    def shorten(self, url):
        response = client.post("/shorten", json={"url": url})
        assert response.status_code == 201, (
            f"Shorten failed with {response.status_code}: {response.text}"
        )

        data = response.json()
        code = data["code"]

        # ensures: code not in pre(store)
        assert code not in self.store, (
            f"Code '{code}' already existed in model store"
        )

        # ensures: ShortCode entity invariants
        assert 6 <= len(code) <= 10, f"Code length {len(code)} outside [6, 10]"
        assert code.isalnum(), f"Code '{code}' is not alphanumeric"

        # ensures: short_url = base_url + "/" + code.value
        short_url = data.get("short_url", "")
        assert short_url.endswith(f"/{code}"), (
            f"short_url '{short_url}' does not end with code '{code}'"
        )

        # Update model state: store'[code] = url
        self.store[code] = url

        # ensures: #store' = #store + 1
        # (implicitly true since we added exactly one key that wasn't there)

        return code

    # -----------------------------------------------------------------------
    # OPERATION: Resolve
    # Spec: requires code in store
    #       ensures url = store[code]
    #       ensures store' = store
    # -----------------------------------------------------------------------
    @rule(code=codes)
    def resolve(self, code):
        # Only resolve codes that are still in the store
        if code not in self.store:
            # Code was deleted; expect 404
            response = client.get(f"/{code}", follow_redirects=False)
            assert response.status_code == 404, (
                f"Expected 404 for deleted code '{code}', got {response.status_code}"
            )
            return

        count_before = len(self.store)
        response = client.get(f"/{code}", follow_redirects=False)

        # ensures: url = store[code]
        assert response.status_code == 302, (
            f"Expected 302 for existing code '{code}', got {response.status_code}"
        )
        assert response.headers["Location"] == self.store[code], (
            f"Expected Location '{self.store[code]}', "
            f"got '{response.headers.get('Location')}'"
        )

        # ensures: store' = store (no mutation)
        assert len(self.store) == count_before

    # -----------------------------------------------------------------------
    # OPERATION: Delete
    # Spec: requires code in store
    #       ensures code not in store'
    #       ensures #store' = #store - 1
    # -----------------------------------------------------------------------
    @rule(code=codes)
    def delete(self, code):
        if code not in self.store:
            # Already deleted; expect 404
            response = client.delete(f"/{code}")
            assert response.status_code == 404
            return

        count_before = len(self.store)
        response = client.delete(f"/{code}")
        assert response.status_code == 204, (
            f"Expected 204 for delete of '{code}', got {response.status_code}"
        )

        # Update model: ensures code not in store'
        del self.store[code]

        # ensures: #store' = #store - 1
        assert len(self.store) == count_before - 1

    # -----------------------------------------------------------------------
    # NEGATIVE: Resolve nonexistent code
    # Tests: requires code in store -> 404 when violated
    # -----------------------------------------------------------------------
    @rule(code=st.text(
        alphabet=st.characters(whitelist_categories=("L", "N")),
        min_size=6,
        max_size=10,
    ))
    def resolve_nonexistent(self, code):
        if code in self.store:
            return  # Skip; this code happens to exist
        response = client.get(f"/{code}", follow_redirects=False)
        assert response.status_code == 404, (
            f"Expected 404 for nonexistent code '{code}', got {response.status_code}"
        )

    # -----------------------------------------------------------------------
    # INVARIANT: all c in store | isValidURI(store[c].value)
    # Checked after EVERY rule execution
    # -----------------------------------------------------------------------
    @invariant()
    def all_stored_urls_are_valid(self):
        for code, url in self.store.items():
            assert is_valid_uri(url), (
                f"Invariant violated: store[{code}] = '{url}' is not a valid URI"
            )

    # -----------------------------------------------------------------------
    # INVARIANT: model/service state consistency
    # The model store and the actual service store must agree
    # -----------------------------------------------------------------------
    @invariant()
    def model_matches_service(self):
        resp = client.get("/codes")
        if resp.status_code == 200:
            service_codes = set(resp.json())
            model_codes = set(self.store.keys())
            assert service_codes == model_codes, (
                f"Model/service mismatch. "
                f"Model has {model_codes - service_codes} extra, "
                f"service has {service_codes - model_codes} extra"
            )


# ---------------------------------------------------------------------------
# Test case generation
# ---------------------------------------------------------------------------

TestUrlShortener = UrlShortenerStateMachine.TestCase
TestUrlShortener.settings = settings(
    max_examples=200,
    stateful_step_count=15,
    suppress_health_check=[HealthCheck.too_slow],
)

4.3 Todo list with state transitions, complete state machine

# file: tests/test_stateful_todo.py
#
# AUTO-GENERATED from spec: TodoList

from hypothesis import settings, HealthCheck
from hypothesis import strategies as st
from hypothesis.stateful import (
    RuleBasedStateMachine,
    Bundle,
    rule,
    invariant,
    precondition,
    initialize,
    multiple,
)
import httpx

BASE_URL = "http://localhost:8000"
client = httpx.Client(base_url=BASE_URL, timeout=10.0)

valid_titles = st.text(
    alphabet=st.characters(whitelist_categories=("L", "N", "P", "Z")),
    min_size=1,
    max_size=200,
).filter(lambda t: len(t.strip()) >= 1)


class TodoItem:
    """Model representation of a todo item, mirrors the spec entity."""
    def __init__(self, id: str, title: str, status: str = "todo"):
        self.id = id
        self.title = title
        self.status = status


class TodoListStateMachine(RuleBasedStateMachine):
    """
    State machine that mirrors the spec's abstract state and enforces
    the state transition constraints:
        todo -> in_progress -> done
    """

    def __init__(self):
        super().__init__()
        self.items: dict[str, TodoItem] = {}

    # Bundles for tracking resources by state
    todo_ids = Bundle("todo_ids")
    in_progress_ids = Bundle("in_progress_ids")
    done_ids = Bundle("done_ids")
    all_ids = Bundle("all_ids")

    @initialize()
    def reset_service(self):
        resp = client.post("/test/reset")
        assert resp.status_code == 200
        self.items = {}

    # -------------------------------------------------------------------
    # OPERATION: CreateTodo
    # ensures: item.id not in pre(items)
    # ensures: item.status = TodoStatus.todo
    # ensures: item.title = title
    # ensures: #items' = #items + 1
    # -------------------------------------------------------------------
    @rule(target=todo_ids, title=valid_titles)
    def create_todo(self, title):
        count_before = len(self.items)

        response = client.post("/todos", json={"title": title})
        assert response.status_code == 201

        data = response.json()
        todo_id = data["id"]

        # ensures: item.id not in pre(items)
        assert todo_id not in self.items

        # ensures: item.status = TodoStatus.todo
        assert data["status"] == "todo"

        # ensures: item.title = title
        assert data["title"] == title

        # Update model
        self.items[todo_id] = TodoItem(id=todo_id, title=title, status="todo")

        # ensures: #items' = #items + 1
        assert len(self.items) == count_before + 1

        return todo_id

    # -------------------------------------------------------------------
    # OPERATION: StartTodo
    # requires: id in items
    # requires: items[id].status = TodoStatus.todo
    # ensures: items'[id].status = TodoStatus.in_progress
    # ensures: #items' = #items
    # -------------------------------------------------------------------
    @rule(target=in_progress_ids, todo_id=todo_ids)
    def start_todo(self, todo_id):
        if todo_id not in self.items:
            return multiple()  # Item was deleted, skip
        if self.items[todo_id].status != "todo":
            return multiple()  # Wrong state, skip

        count_before = len(self.items)

        response = client.patch(f"/todos/{todo_id}/start")
        assert response.status_code == 200

        data = response.json()

        # ensures: items'[id].status = TodoStatus.in_progress
        assert data["status"] == "in_progress"

        # Update model
        self.items[todo_id].status = "in_progress"

        # ensures: #items' = #items
        assert len(self.items) == count_before

        return todo_id

    # -------------------------------------------------------------------
    # OPERATION: CompleteTodo
    # requires: id in items
    # requires: items[id].status = TodoStatus.in_progress
    # ensures: items'[id].status = TodoStatus.done
    # ensures: #items' = #items
    # -------------------------------------------------------------------
    @rule(target=done_ids, ip_id=in_progress_ids)
    def complete_todo(self, ip_id):
        if ip_id not in self.items:
            return multiple()
        if self.items[ip_id].status != "in_progress":
            return multiple()

        count_before = len(self.items)

        response = client.patch(f"/todos/{ip_id}/complete")
        assert response.status_code == 200

        data = response.json()

        # ensures: items'[id].status = TodoStatus.done
        assert data["status"] == "done"

        # Update model
        self.items[ip_id].status = "done"

        # ensures: #items' = #items
        assert len(self.items) == count_before

        return ip_id

    # -------------------------------------------------------------------
    # OPERATION: DeleteTodo
    # requires: id in items
    # ensures: id not in items'
    # ensures: #items' = #items - 1
    # -------------------------------------------------------------------
    @rule(todo_id=todo_ids)
    def delete_todo(self, todo_id):
        if todo_id not in self.items:
            return  # Already deleted

        count_before = len(self.items)

        response = client.delete(f"/todos/{todo_id}")
        assert response.status_code == 204

        # ensures: id not in items'
        del self.items[todo_id]

        # ensures: #items' = #items - 1
        assert len(self.items) == count_before - 1

    # -------------------------------------------------------------------
    # NEGATIVE: Invalid state transitions
    # -------------------------------------------------------------------
    @rule(ip_id=in_progress_ids)
    def start_in_progress_fails(self, ip_id):
        """Cannot start an already in-progress item."""
        if ip_id not in self.items or self.items[ip_id].status != "in_progress":
            return
        response = client.patch(f"/todos/{ip_id}/start")
        assert response.status_code in (409, 422)

    @rule(todo_id=todo_ids)
    def complete_todo_status_fails(self, todo_id):
        """Cannot complete an item that is still in 'todo' status."""
        if todo_id not in self.items or self.items[todo_id].status != "todo":
            return
        response = client.patch(f"/todos/{todo_id}/complete")
        assert response.status_code in (409, 422)

    @rule(done_id=done_ids)
    def start_done_fails(self, done_id):
        """Cannot start a done item (no backward transitions)."""
        if done_id not in self.items or self.items[done_id].status != "done":
            return
        response = client.patch(f"/todos/{done_id}/start")
        assert response.status_code in (409, 422)

    # -------------------------------------------------------------------
    # INVARIANTS
    # -------------------------------------------------------------------
    @invariant()
    def all_titles_non_empty(self):
        """invariant: all t in items | len(items[t].title.value) >= 1"""
        for item in self.items.values():
            assert len(item.title) >= 1

    @invariant()
    def model_matches_service(self):
        """The model and the service must agree on the set of items and their states."""
        resp = client.get("/todos")
        if resp.status_code != 200:
            return
        service_items = {item["id"]: item["status"] for item in resp.json()}
        model_items = {k: v.status for k, v in self.items.items()}
        assert service_items == model_items, (
            f"Model/service mismatch.\n"
            f"  Model:   {model_items}\n"
            f"  Service: {service_items}"
        )

    @invariant()
    def valid_status_values(self):
        """All items have a status that is one of the enum values."""
        for item in self.items.values():
            assert item.status in ("todo", "in_progress", "done"), (
                f"Item {item.id} has invalid status '{item.status}'"
            )


TestTodoList = TodoListStateMachine.TestCase
TestTodoList.settings = settings(
    max_examples=200,
    stateful_step_count=20,
    suppress_health_check=[HealthCheck.too_slow],
)

4.4 E-commerce order lifecycle, complete state machine

# file: tests/test_stateful_orders.py
#
# AUTO-GENERATED from spec: OrderSystem

from hypothesis import settings, HealthCheck
from hypothesis import strategies as st
from hypothesis.stateful import (
    RuleBasedStateMachine,
    Bundle,
    rule,
    invariant,
    precondition,
    initialize,
    multiple,
)
import httpx

BASE_URL = "http://localhost:8000"
client = httpx.Client(base_url=BASE_URL, timeout=10.0)

SEED_PRODUCTS = [
    {"id": "prod-001", "name": "Widget", "price_cents": 999},
    {"id": "prod-002", "name": "Gadget", "price_cents": 2499},
    {"id": "prod-003", "name": "Doohickey", "price_cents": 50},
]
CUSTOMER_ID = "cust-001"

order_lines = st.lists(
    st.fixed_dictionaries({
        "product": st.sampled_from([p["id"] for p in SEED_PRODUCTS]),
        "quantity": st.integers(min_value=1, max_value=3),
        "unit_price_cents": st.sampled_from([p["price_cents"] for p in SEED_PRODUCTS]),
    }),
    min_size=1,
    max_size=3,
)


class OrderModel:
    """Model of an order, mirrors the spec entity."""
    def __init__(self, id: str, customer: str, lines: list, status: str, total_cents: int):
        self.id = id
        self.customer = customer
        self.lines = lines
        self.status = status
        self.total_cents = total_cents


class OrderSystemStateMachine(RuleBasedStateMachine):
    """
    State machine for e-commerce order lifecycle.
    Tracks orders and inventory, verifies all spec ensures and invariants.

    State transitions:
        pending -> confirmed -> shipped -> delivered
        pending -> cancelled
        confirmed -> cancelled
    """

    def __init__(self):
        super().__init__()
        self.orders: dict[str, OrderModel] = {}
        self.inventory: dict[str, int] = {}

    # Bundles for tracking orders by status
    pending_ids = Bundle("pending_ids")
    confirmed_ids = Bundle("confirmed_ids")
    shipped_ids = Bundle("shipped_ids")

    @initialize()
    def reset_and_seed(self):
        resp = client.post("/test/reset")
        assert resp.status_code == 200
        self.orders = {}
        self.inventory = {}

        # Seed product catalog
        for p in SEED_PRODUCTS:
            client.put(f"/products/{p['id']}", json={
                "name": p["name"],
                "price_cents": p["price_cents"],
            })
            client.put(f"/products/{p['id']}/inventory", json={"quantity": 100})
            self.inventory[p["id"]] = 100

    # -------------------------------------------------------------------
    # OPERATION: PlaceOrder
    # -------------------------------------------------------------------
    @rule(target=pending_ids, lines=order_lines)
    def place_order(self, lines):
        # Check precondition: sufficient inventory
        needed = {}
        for line in lines:
            needed[line["product"]] = needed.get(line["product"], 0) + line["quantity"]
        for pid, qty in needed.items():
            if self.inventory.get(pid, 0) < qty:
                # Insufficient inventory; expect rejection
                resp = client.post("/orders", json={
                    "customer": CUSTOMER_ID, "lines": lines,
                })
                assert resp.status_code in (409, 422)
                return multiple()

        count_before = len(self.orders)

        resp = client.post("/orders", json={
            "customer": CUSTOMER_ID, "lines": lines,
        })
        assert resp.status_code == 201

        data = resp.json()
        order_id = data["id"]

        # ensures: order.id not in pre(orders)
        assert order_id not in self.orders

        # ensures: order.status = OrderStatus.pending
        assert data["status"] == "pending"

        # ensures: order.total.cents = sum(...)
        expected_total = sum(l["quantity"] * l["unit_price_cents"] for l in lines)
        assert data["total_cents"] == expected_total

        # Update model: store order
        self.orders[order_id] = OrderModel(
            id=order_id,
            customer=CUSTOMER_ID,
            lines=lines,
            status="pending",
            total_cents=expected_total,
        )

        # Update model: decrement inventory
        for line in lines:
            self.inventory[line["product"]] -= line["quantity"]

        # ensures: #orders' = #orders + 1
        assert len(self.orders) == count_before + 1

        return order_id

    # -------------------------------------------------------------------
    # OPERATION: ConfirmOrder
    # requires: orders[id].status = OrderStatus.pending
    # ensures: orders'[id].status = OrderStatus.confirmed
    # -------------------------------------------------------------------
    @rule(target=confirmed_ids, order_id=pending_ids)
    def confirm_order(self, order_id):
        if order_id not in self.orders:
            return multiple()
        if self.orders[order_id].status != "pending":
            return multiple()

        resp = client.patch(f"/orders/{order_id}/confirm")
        assert resp.status_code == 200
        assert resp.json()["status"] == "confirmed"

        self.orders[order_id].status = "confirmed"
        return order_id

    # -------------------------------------------------------------------
    # OPERATION: ShipOrder (implied by spec state machine)
    # requires: orders[id].status = OrderStatus.confirmed
    # ensures: orders'[id].status = OrderStatus.shipped
    # -------------------------------------------------------------------
    @rule(target=shipped_ids, order_id=confirmed_ids)
    def ship_order(self, order_id):
        if order_id not in self.orders:
            return multiple()
        if self.orders[order_id].status != "confirmed":
            return multiple()

        resp = client.patch(f"/orders/{order_id}/ship")
        assert resp.status_code == 200
        assert resp.json()["status"] == "shipped"

        self.orders[order_id].status = "shipped"
        return order_id

    # -------------------------------------------------------------------
    # OPERATION: CancelOrder
    # requires: orders[id].status in {pending, confirmed}
    # ensures: orders'[id].status = OrderStatus.cancelled
    # ensures: inventory restored
    # -------------------------------------------------------------------
    @rule(order_id=pending_ids)
    def cancel_pending_order(self, order_id):
        if order_id not in self.orders:
            return
        if self.orders[order_id].status != "pending":
            return

        resp = client.patch(f"/orders/{order_id}/cancel")
        assert resp.status_code == 200
        assert resp.json()["status"] == "cancelled"

        # Restore inventory in model
        for line in self.orders[order_id].lines:
            self.inventory[line["product"]] += line["quantity"]

        self.orders[order_id].status = "cancelled"

    @rule(order_id=confirmed_ids)
    def cancel_confirmed_order(self, order_id):
        if order_id not in self.orders:
            return
        if self.orders[order_id].status != "confirmed":
            return

        resp = client.patch(f"/orders/{order_id}/cancel")
        assert resp.status_code == 200
        assert resp.json()["status"] == "cancelled"

        # Restore inventory in model
        for line in self.orders[order_id].lines:
            self.inventory[line["product"]] += line["quantity"]

        self.orders[order_id].status = "cancelled"

    # -------------------------------------------------------------------
    # NEGATIVE: Cannot cancel shipped orders
    # -------------------------------------------------------------------
    @rule(order_id=shipped_ids)
    def cancel_shipped_fails(self, order_id):
        if order_id not in self.orders:
            return
        if self.orders[order_id].status != "shipped":
            return
        resp = client.patch(f"/orders/{order_id}/cancel")
        assert resp.status_code in (409, 422)
        # Verify status unchanged
        assert self.orders[order_id].status == "shipped"

    # -------------------------------------------------------------------
    # INVARIANTS
    # -------------------------------------------------------------------
    @invariant()
    def all_totals_non_negative(self):
        """invariant: all o in orders | orders[o].total.cents >= 0"""
        for order in self.orders.values():
            assert order.total_cents >= 0, (
                f"Order {order.id} has negative total: {order.total_cents}"
            )

    @invariant()
    def all_orders_have_lines(self):
        """invariant: all o in orders | #orders[o].lines >= 1"""
        for order in self.orders.values():
            assert len(order.lines) >= 1, (
                f"Order {order.id} has no lines"
            )

    @invariant()
    def inventory_non_negative(self):
        """invariant: all p in inventory | inventory[p] >= 0"""
        for pid, qty in self.inventory.items():
            assert qty >= 0, f"Product {pid} has negative inventory: {qty}"

    @invariant()
    def model_matches_service(self):
        """The service's order statuses must match the model."""
        resp = client.get("/orders")
        if resp.status_code != 200:
            return
        service_orders = {o["id"]: o["status"] for o in resp.json()}
        model_orders = {k: v.status for k, v in self.orders.items()}
        assert service_orders == model_orders, (
            f"Model/service order status mismatch.\n"
            f"  Model:   {model_orders}\n"
            f"  Service: {service_orders}"
        )


TestOrderSystem = OrderSystemStateMachine.TestCase
TestOrderSystem.settings = settings(
    max_examples=150,
    stateful_step_count=25,
    suppress_health_check=[HealthCheck.too_slow],
)

5. Custom Hypothesis strategies from spec types

Every entity in the spec with invariants generates a corresponding Hypothesis strategy that only produces values satisfying those invariants. This ensures that Hypothesis never wastes time on inputs the spec already declares invalid.

5.1 Strategy generation rules

Entity Invariant Pattern	Generated Strategy
`len(value) >= A and len(value) <= B`	`st.text(min_size=A, max_size=B)`
`value matches /^REGEX$/`	`st.from_regex(r"REGEX", fullmatch=True)`
`value >= A and value <= B` (Int)	`st.integers(min_value=A, max_value=B)`
`value >= A and value <= B` (Float)	`st.floats(min_value=A, max_value=B)`
`isValidURI(value)`	Custom URI strategy
`valid_email(value)`	Custom email strategy
`value in {a, b, c}` (enum)	`st.sampled_from([a, b, c])`

5.2 Shortcode (string with length and regex constraints)

entity ShortCode {
  value: String
  invariant: len(value) >= 6 and len(value) <= 10
  invariant: value matches /^[a-zA-Z0-9]+$/
}

# file: tests/strategies.py (auto-generated)

from hypothesis import strategies as st


@st.composite
def short_codes(draw):
    """
    Strategy for ShortCode entity.
    Derived from:
      invariant: len(value) >= 6 and len(value) <= 10
      invariant: value matches /^[a-zA-Z0-9]+$/
    """
    length = draw(st.integers(min_value=6, max_value=10))
    return draw(st.text(
        alphabet=st.characters(
            whitelist_categories=("Lu", "Ll", "Nd"),  # uppercase, lowercase, digits
        ),
        min_size=length,
        max_size=length,
    ))

5.3 Email addresses

entity Email {
  value: String
  invariant: value matches /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/
  invariant: len(value) <= 254
}

@st.composite
def emails(draw):
    """
    Strategy for Email entity.
    Generates RFC-compliant email addresses within the spec's constraints.
    """
    local_length = draw(st.integers(min_value=1, max_value=64))
    local_part = draw(st.text(
        alphabet=st.sampled_from(
            "abcdefghijklmnopqrstuvwxyz"
            "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
            "0123456789._%+-"
        ),
        min_size=local_length,
        max_size=local_length,
    ).filter(lambda s: not s.startswith(".") and not s.endswith(".")))

    domain_label = draw(st.text(
        alphabet=st.characters(whitelist_categories=("Ll", "Nd")),
        min_size=1,
        max_size=20,
    ))
    tld = draw(st.text(
        alphabet=st.characters(whitelist_categories=("Ll",)),
        min_size=2,
        max_size=6,
    ))

    email = f"{local_part}@{domain_label}.{tld}"
    # Enforce total length constraint
    if len(email) > 254:
        email = email[:254]
    return email

5.4 Urls (isvaliduri predicate)

entity LongURL {
  value: String
  invariant: isValidURI(value)
}

@st.composite
def valid_uris(draw):
    """
    Strategy for LongURL entity.
    Generates valid HTTP/HTTPS URIs.
    """
    scheme = draw(st.sampled_from(["http", "https"]))
    # Domain: 1-3 labels, each 3-12 lowercase chars
    num_labels = draw(st.integers(min_value=1, max_value=3))
    labels = [
        draw(st.text(
            alphabet=st.characters(whitelist_categories=("Ll",)),
            min_size=3,
            max_size=12,
        ))
        for _ in range(num_labels)
    ]
    tld = draw(st.sampled_from(["com", "org", "net", "io", "dev"]))
    domain = ".".join(labels) + "." + tld

    # Optional path segments
    num_segments = draw(st.integers(min_value=0, max_value=5))
    path = ""
    if num_segments > 0:
        segments = [
            draw(st.text(
                alphabet=st.characters(whitelist_categories=("Ll", "Nd")),
                min_size=1,
                max_size=20,
            ))
            for _ in range(num_segments)
        ]
        path = "/" + "/".join(segments)

    # Optional query string
    has_query = draw(st.booleans())
    query = ""
    if has_query:
        key = draw(st.text(
            alphabet=st.characters(whitelist_categories=("Ll",)),
            min_size=1,
            max_size=10,
        ))
        val = draw(st.text(
            alphabet=st.characters(whitelist_categories=("Ll", "Nd")),
            min_size=1,
            max_size=10,
        ))
        query = f"?{key}={val}"

    return f"{scheme}://{domain}{path}{query}"

5.5 Monetary amounts

entity Money {
  cents: Int
  invariant: cents >= 0
}

@st.composite
def money_amounts(draw):
    """
    Strategy for Money entity.
    Generates non-negative cent amounts. Biased towards realistic values
    while still exploring the boundary.
    """
    cents = draw(st.one_of(
        # Common prices: $0.01 - $999.99
        st.integers(min_value=1, max_value=99999),
        # Boundary: exactly $0.00
        st.just(0),
        # Large amounts: up to $1,000,000.00
        st.integers(min_value=100000, max_value=100000000),
    ))
    return cents

5.6 Date ranges

entity DateRange {
  start: Date
  end: Date
  invariant: start <= end
}

import datetime

@st.composite
def date_ranges(draw):
    """
    Strategy for DateRange entity.
    Generates (start, end) pairs where start <= end.
    """
    start = draw(st.dates(
        min_value=datetime.date(2020, 1, 1),
        max_value=datetime.date(2030, 12, 31),
    ))
    # End is at least start, at most 365 days later
    max_delta = draw(st.integers(min_value=0, max_value=365))
    end = start + datetime.timedelta(days=max_delta)
    return {"start": start.isoformat(), "end": end.isoformat()}

5.7 Enum / state machine states

enum OrderStatus { pending, confirmed, shipped, delivered, cancelled }

order_statuses = st.sampled_from(["pending", "confirmed", "shipped", "delivered", "cancelled"])

5.8 Nested objects

entity Address {
  street: String; invariant: len(street) >= 1
  city:   String; invariant: len(city) >= 1
  zip:    String; invariant: zip matches /^[0-9]{5}$/
  country: String; invariant: len(country) == 2
}

@st.composite
def addresses(draw):
    """Strategy for Address entity with nested constraints."""
    street = draw(st.text(
        alphabet=st.characters(whitelist_categories=("L", "N", "Z")),
        min_size=1,
        max_size=100,
    ))
    city = draw(st.text(
        alphabet=st.characters(whitelist_categories=("L",)),
        min_size=1,
        max_size=50,
    ))
    zip_code = draw(st.from_regex(r"[0-9]{5}", fullmatch=True))
    country = draw(st.text(
        alphabet=st.characters(whitelist_categories=("Lu",)),
        min_size=2,
        max_size=2,
    ))
    return {
        "street": street,
        "city": city,
        "zip": zip_code,
        "country": country,
    }

5.9 Relations with referential integrity

entity Order {
  customer: CustomerId        // foreign key
  lines: set OrderLine        // one-to-many
}

When generating an Order strategy, the compiler must ensure that customer refers to an existing customer. This is handled via Bundles in the stateful layer (the customer must be created first), or via a seeded database in the property layer:

@st.composite
def order_inputs(draw, existing_customer_ids: list[str], existing_product_ids: list[str]):
    """
    Strategy for Order creation input.
    Takes lists of existing IDs to ensure referential integrity.
    """
    customer = draw(st.sampled_from(existing_customer_ids))
    num_lines = draw(st.integers(min_value=1, max_value=10))
    lines = []
    for _ in range(num_lines):
        lines.append({
            "product": draw(st.sampled_from(existing_product_ids)),
            "quantity": draw(st.integers(min_value=1, max_value=20)),
        })
    return {"customer": customer, "lines": lines}

6. TLA+ trace validation (advanced)

For services with temporal properties, state machines, eventual consistency, or complex multi-entity invariants that span multiple operations, we generate TLA+ models and validate execution traces against them.

6.1 When TLA+ trace validation is warranted

Situation	Use TLA+?	Rationale
Simple CRUD with entity invariants	No	Hypothesis property tests suffice
State machine with simple transitions	No	Hypothesis stateful tests cover this
Multi-entity transactions (orders + inventory)	Maybe	If cancellation/rollback semantics matter
Concurrent operations on shared state	Yes	Need to verify linearizability
Eventual consistency / async workflows	Yes	Temporal properties cannot be checked synchronously
Saga / choreography patterns	Yes	Multi-step distributed transactions

6.2 The approach: Constrained model checking (kuppe et al., 2024)

The key insight from the MongoDB/TLA+ community is that trace validation can be reduced to constrained model checking:

Instrument the running service to emit a trace of state transitions (operation name, pre-state, post-state, timestamp).
Express the trace as a constrained TLA+ specification that fixes the sequence of actions to match the observed trace.
Use TLC (the TLA+ model checker) to verify that the constrained trace is a valid behavior of the unconstrained specification.
If TLC finds no valid assignment, the implementation violated the spec.

6.3 Concrete example: Order state machine

Step 1: The spec-generated TLA+ model

---------------------------- MODULE OrderStateMachine ----------------------------
EXTENDS Naturals, Sequences, FiniteSets

CONSTANTS OrderIds, ProductIds, MaxQuantity

VARIABLES orders, inventory

vars == <<orders, inventory>>

TypeOK ==
    /\ orders \in [OrderIds -> [status: {"pending", "confirmed", "shipped",
                                          "delivered", "cancelled"},
                                 lines: SUBSET (ProductIds \X (1..MaxQuantity)),
                                 total: Nat]]
    /\ inventory \in [ProductIds -> Nat]

Init ==
    /\ orders = [o \in {} |-> <<>>]
    /\ inventory \in [ProductIds -> 1..100]

\* ----- PlaceOrder -----
PlaceOrder(oid, lines) ==
    /\ oid \notin DOMAIN orders
    /\ lines /= {}
    /\ \A <<pid, qty>> \in lines : pid \in ProductIds /\ inventory[pid] >= qty
    /\ orders' = [orders EXCEPT ![oid] =
                    [status |-> "pending",
                     lines |-> lines,
                     total |-> SumLines(lines)]]
    /\ inventory' = [p \in ProductIds |->
                      inventory[p] - SumForProduct(p, lines)]

\* ----- ConfirmOrder -----
ConfirmOrder(oid) ==
    /\ oid \in DOMAIN orders
    /\ orders[oid].status = "pending"
    /\ orders' = [orders EXCEPT ![oid].status = "confirmed"]
    /\ UNCHANGED inventory

\* ----- ShipOrder -----
ShipOrder(oid) ==
    /\ oid \in DOMAIN orders
    /\ orders[oid].status = "confirmed"
    /\ orders' = [orders EXCEPT ![oid].status = "shipped"]
    /\ UNCHANGED inventory

\* ----- CancelOrder -----
CancelOrder(oid) ==
    /\ oid \in DOMAIN orders
    /\ orders[oid].status \in {"pending", "confirmed"}
    /\ orders' = [orders EXCEPT ![oid].status = "cancelled"]
    /\ inventory' = [p \in ProductIds |->
                      inventory[p] + SumForProduct(p, orders[oid].lines)]

\* ----- Next state -----
Next ==
    \/ \E oid \in OrderIds, lines \in SUBSET (ProductIds \X (1..MaxQuantity)) :
        PlaceOrder(oid, lines)
    \/ \E oid \in OrderIds : ConfirmOrder(oid)
    \/ \E oid \in OrderIds : ShipOrder(oid)
    \/ \E oid \in OrderIds : CancelOrder(oid)

\* ----- Invariants -----
InventoryNonNeg == \A p \in ProductIds : inventory[p] >= 0
TotalNonNeg == \A oid \in DOMAIN orders : orders[oid].total >= 0
NoEmptyOrders == \A oid \in DOMAIN orders : orders[oid].lines /= {}

\* ----- Temporal properties -----
Spec == Init /\ [][Next]_vars

\* An order that is shipped must eventually be delivered or remain shipped.
\* (This is a liveness property that cannot be checked by Hypothesis.)
ShippedEventuallyDelivered ==
    \A oid \in DOMAIN orders :
        orders[oid].status = "shipped" ~>
            orders[oid].status \in {"shipped", "delivered"}

=============================================================================

Step 2: Service instrumentation

The generated service includes a middleware that emits trace events:

# file: app/middleware/trace_logger.py (auto-generated)

import json
import time
from pathlib import Path

TRACE_FILE = Path("/tmp/service_trace.ndjson")


class TraceMiddleware:
    """
    Emits a trace event for every state-mutating operation.
    Used for TLA+ trace validation in exhaustive test mode.
    """

    def __init__(self, app):
        self.app = app

    async def __call__(self, scope, receive, send):
        if scope["type"] != "http":
            await self.app(scope, receive, send)
            return

        # Capture pre-state
        pre_state = await self._snapshot_state()

        # Process request
        response = await self.app(scope, receive, send)

        # Capture post-state
        post_state = await self._snapshot_state()

        # Emit trace event
        event = {
            "timestamp": time.time_ns(),
            "method": scope.get("method", ""),
            "path": scope.get("path", ""),
            "pre_orders": pre_state["orders"],
            "pre_inventory": pre_state["inventory"],
            "post_orders": post_state["orders"],
            "post_inventory": post_state["inventory"],
        }
        with TRACE_FILE.open("a") as f:
            f.write(json.dumps(event) + "\n")

        return response

    async def _snapshot_state(self):
        # Implementation depends on the storage backend
        # Returns {"orders": {...}, "inventory": {...}}
        pass

Step 3: Trace-to-TLA+ conversion

# file: tests/tla_trace_validator.py (auto-generated)

import json
import subprocess
from pathlib import Path


def convert_trace_to_tla(trace_file: Path, output_file: Path):
    """
    Convert an NDJSON trace file into a TLA+ trace specification
    that constrains the model checker to follow the observed behavior.
    """
    events = []
    with trace_file.open() as f:
        for line in f:
            events.append(json.loads(line))

    tla_lines = [
        "---- MODULE TraceValidation ----",
        "EXTENDS OrderStateMachine, TLC, Sequences",
        "",
        f"TraceLen == {len(events)}",
        "",
        "VARIABLE step",
        "",
        "TraceInit ==",
        "    /\\ Init",
        f"    /\\ step = 1",
        "",
        "TraceNext ==",
    ]

    for i, event in enumerate(events, start=1):
        action = _infer_action(event)
        tla_lines.append(f"    {'\\/' if i > 1 else '/\\'} (step = {i} /\\ {action} /\\ step' = {i + 1})")

    tla_lines.extend([
        "",
        "TraceSpec == TraceInit /\\ [][TraceNext]_<<vars, step>>",
        "",
        "TraceAccepted == <>(step = TraceLen + 1)",
        "",
        "====",
    ])

    output_file.write_text("\n".join(tla_lines))


def _infer_action(event: dict) -> str:
    """Map an HTTP trace event to the corresponding TLA+ action."""
    method = event["method"]
    path = event["path"]

    if method == "POST" and "/orders" in path:
        return "\\E oid \\in OrderIds, lines \\in SUBSET (ProductIds \\X (1..MaxQuantity)) : PlaceOrder(oid, lines)"
    elif method == "PATCH" and "/confirm" in path:
        oid = _extract_order_id(path)
        return f"ConfirmOrder(\"{oid}\")"
    elif method == "PATCH" and "/ship" in path:
        oid = _extract_order_id(path)
        return f"ShipOrder(\"{oid}\")"
    elif method == "PATCH" and "/cancel" in path:
        oid = _extract_order_id(path)
        return f"CancelOrder(\"{oid}\")"
    else:
        return "UNCHANGED vars"


def _extract_order_id(path: str) -> str:
    # /orders/{id}/confirm -> extract {id}
    parts = path.strip("/").split("/")
    if len(parts) >= 2:
        return parts[1]
    return ""


def validate_trace(trace_file: Path, tla_model_dir: Path) -> bool:
    """
    Run TLC model checker to validate the trace against the spec.
    Returns True if the trace is a valid behavior of the spec.
    """
    trace_tla = tla_model_dir / "TraceValidation.tla"
    convert_trace_to_tla(trace_file, trace_tla)

    result = subprocess.run(
        [
            "java", "-jar", str(tla_model_dir / "tla2tools.jar"),
            "-config", str(tla_model_dir / "TraceValidation.cfg"),
            str(trace_tla),
        ],
        capture_output=True,
        text=True,
        timeout=120,
    )

    if "Model checking completed. No error has been found." in result.stdout:
        return True
    else:
        print(f"TLA+ trace validation FAILED:\n{result.stdout}\n{result.stderr}")
        return False

Step 4: Integration with test runner

# file: tests/test_tla_trace.py (auto-generated, only for exhaustive profile)

import pytest
from pathlib import Path
from tests.tla_trace_validator import validate_trace

TRACE_FILE = Path("/tmp/service_trace.ndjson")
TLA_MODEL_DIR = Path(__file__).parent.parent / "tla"


@pytest.mark.skipif(
    not TLA_MODEL_DIR.exists(),
    reason="TLA+ model not generated (requires exhaustive profile)",
)
def test_execution_trace_conforms_to_spec():
    """
    After running the stateful tests, validate that the recorded trace
    is a valid behavior of the TLA+ specification.
    """
    assert TRACE_FILE.exists(), "No trace file found. Was TraceMiddleware enabled?"
    assert validate_trace(TRACE_FILE, TLA_MODEL_DIR), (
        "Execution trace does not conform to the TLA+ specification. "
        "See output above for the violating step."
    )

7. Conformance test runner architecture

7.1 The complete test execution pipeline

7.2 Generated test runner script

#!/usr/bin/env python3
# file: tests/run_conformance.py
#
# AUTO-GENERATED conformance test runner.
# Usage: python tests/run_conformance.py [smoke|thorough|exhaustive]

import subprocess
import sys
import time
import httpx

BASE_URL = "http://localhost:8000"
PROFILE = sys.argv[1] if len(sys.argv) > 1 else "thorough"


def wait_for_health(url: str, timeout: int = 60, interval: float = 1.0):
    """Wait for the service to become healthy."""
    deadline = time.time() + timeout
    while time.time() < deadline:
        try:
            resp = httpx.get(f"{url}/health", timeout=2.0)
            if resp.status_code == 200:
                print(f"Service is healthy at {url}")
                return
        except httpx.ConnectError:
            pass
        time.sleep(interval)
    print(f"FATAL: Service at {url} did not become healthy within {timeout}s")
    sys.exit(1)


def run_phase(name: str, cmd: list[str]) -> bool:
    """Run a test phase and return True if it passed."""
    print(f"\n{'='*60}")
    print(f"PHASE: {name}")
    print(f"{'='*60}\n")

    result = subprocess.run(cmd, env={
        **dict(__import__("os").environ),
        "SPEC_TEST_PROFILE": PROFILE,
    })
    passed = result.returncode == 0
    status = "PASSED" if passed else "FAILED"
    print(f"\n{name}: {status}")
    return passed


def main():
    print(f"Running conformance tests (profile: {PROFILE})")

    # Step 1: Start services
    print("Starting services...")
    subprocess.run(["docker-compose", "up", "-d"], check=True)

    try:
        # Step 2: Wait for health
        wait_for_health(BASE_URL)

        # Step 3: Reset state
        httpx.post(f"{BASE_URL}/test/reset")

        results = {}

        # Step 4: Structural tests (Schemathesis)
        results["structural"] = run_phase(
            "Structural Tests (Schemathesis)",
            [
                "pytest", "tests/test_structural.py",
                "-v", "--tb=short",
                f"--junitxml=results/structural-{PROFILE}.xml",
            ],
        )

        # Step 5: Reset and run behavioral tests
        httpx.post(f"{BASE_URL}/test/reset")
        results["behavioral"] = run_phase(
            "Behavioral Tests (Hypothesis Properties)",
            [
                "pytest", "tests/test_behavioral_url_shortener.py",
                "-v", "--tb=short",
                f"--junitxml=results/behavioral-{PROFILE}.xml",
            ],
        )

        # Step 6: Reset and run stateful tests
        httpx.post(f"{BASE_URL}/test/reset")
        results["stateful"] = run_phase(
            "Stateful Tests (Hypothesis State Machine)",
            [
                "pytest", "tests/test_stateful_url_shortener.py",
                "-v", "--tb=long",
                f"--junitxml=results/stateful-{PROFILE}.xml",
            ],
        )

        # Step 7: TLA+ trace validation (exhaustive only)
        if PROFILE == "exhaustive":
            results["tla_trace"] = run_phase(
                "TLA+ Trace Validation",
                [
                    "pytest", "tests/test_tla_trace.py",
                    "-v", "--tb=long",
                    f"--junitxml=results/tla-trace-{PROFILE}.xml",
                ],
            )

        # Step 8: Report summary
        print(f"\n{'='*60}")
        print("CONFORMANCE TEST SUMMARY")
        print(f"{'='*60}")
        all_passed = True
        for phase, passed in results.items():
            status = "PASS" if passed else "FAIL"
            print(f"  {phase:20s} {status}")
            if not passed:
                all_passed = False

        print(f"\nOverall: {'ALL PASSED' if all_passed else 'FAILURES DETECTED'}")

        return 0 if all_passed else 1

    finally:
        # Step 9: Tear down
        print("\nStopping services...")
        subprocess.run(["docker-compose", "down"])


if __name__ == "__main__":
    sys.exit(main())

7.3 GitHub actions workflow

# file: .github/workflows/conformance.yml
#
# AUTO-GENERATED conformance test workflow.

name: Conformance Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    # Nightly exhaustive run at 2 AM UTC
    - cron: "0 2 * * *"

jobs:
  # ---------------------------------------------------------------------------
  # Structural tests (fast, run on every push)
  # ---------------------------------------------------------------------------
  structural:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install dependencies
        run: pip install -r requirements-test.txt

      - name: Start service
        run: docker-compose up -d

      - name: Wait for health check
        run: |
          for i in $(seq 1 30); do
            if curl -s http://localhost:8000/health | grep -q ok; then
              echo "Service is healthy"
              exit 0
            fi
            sleep 2
          done
          echo "Service failed to start"
          exit 1

      - name: Run structural tests
        env:
          SPEC_TEST_PROFILE: thorough
        run: pytest tests/test_structural.py -v --tb=short --junitxml=results/structural.xml

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: structural-results
          path: results/

      - name: Tear down
        if: always()
        run: docker-compose down

  # ---------------------------------------------------------------------------
  # Behavioral + stateful tests (medium, run on every push)
  # ---------------------------------------------------------------------------
  behavioral:
    runs-on: ubuntu-latest
    timeout-minutes: 20
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install dependencies
        run: pip install -r requirements-test.txt

      - name: Start service
        run: docker-compose up -d

      - name: Wait for health check
        run: |
          for i in $(seq 1 30); do
            if curl -s http://localhost:8000/health | grep -q ok; then
              echo "Service is healthy"
              exit 0
            fi
            sleep 2
          done
          echo "Service failed to start"
          exit 1

      - name: Run behavioral tests
        env:
          SPEC_TEST_PROFILE: thorough
        run: |
          pytest tests/test_behavioral_url_shortener.py -v --tb=short --junitxml=results/behavioral.xml
          pytest tests/test_stateful_url_shortener.py -v --tb=long --junitxml=results/stateful.xml

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: behavioral-results
          path: results/

      - name: Tear down
        if: always()
        run: docker-compose down

  # ---------------------------------------------------------------------------
  # Exhaustive tests (slow, nightly only)
  # ---------------------------------------------------------------------------
  exhaustive:
    runs-on: ubuntu-latest
    timeout-minutes: 60
    if: github.event_name == 'schedule'
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install dependencies
        run: pip install -r requirements-test.txt

      - name: Start service with tracing
        run: docker-compose -f docker-compose.yml -f docker-compose.trace.yml up -d

      - name: Wait for health check
        run: |
          for i in $(seq 1 30); do
            if curl -s http://localhost:8000/health | grep -q ok; then
              echo "Service is healthy"
              exit 0
            fi
            sleep 2
          done
          echo "Service failed to start"
          exit 1

      - name: Run exhaustive conformance suite
        env:
          SPEC_TEST_PROFILE: exhaustive
        run: python tests/run_conformance.py exhaustive

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: exhaustive-results
          path: results/

      - name: Tear down
        if: always()
        run: docker-compose down

  # ---------------------------------------------------------------------------
  # Report aggregation
  # ---------------------------------------------------------------------------
  report:
    needs: [structural, behavioral]
    runs-on: ubuntu-latest
    if: always()
    steps:
      - name: Download all results
        uses: actions/download-artifact@v4
        with:
          path: all-results

      - name: Publish test results
        uses: dorny/test-reporter@v1
        with:
          name: Conformance Test Results
          path: "all-results/**/*.xml"
          reporter: java-junit

7.4 Failure triage

When a test fails, the error message identifies which spec element was violated:

Test Name Pattern	Spec Element	Triage Action
`test__returns_fresh_`	`ensures: X not in pre(state)`	Check ID generation: is it using a collision-prone algorithm?
`test__stores_correct_`	`ensures: state'[k] = v`	Check database write: is it persisting correctly?
`test_*_adds_exactly_one`	`ensures: #state' = #state + 1`	Check for accidental double-writes or off-by-one
`test__does_not_mutate_`	`ensures: state' = state`	Check for unintended side effects in read operations
`test__rejects_invalid_`	`requires: predicate(input)`	Check input validation: is it too lenient?
`test__rejects_nonexistent_`	`requires: key in state`	Check 404 handling: is it returning wrong status?
`model_matches_service` invariant	Overall state consistency	Most serious: the service and spec have diverged

8. Spec-to-test mapping table

This is the comprehensive mapping from every possible spec element to the test artifacts it generates.

8.1 Entity-level elements

Spec Element	Test Type	What It Checks	Framework	Generated File
Entity string invariant (`len(v) >= A`)	Strategy constraint	Generated values satisfy invariant	Hypothesis strategy	`tests/strategies.py`
Entity regex invariant (`v matches R`)	Strategy constraint + property test	Values match pattern	Hypothesis strategy	`tests/strategies.py`
Entity numeric invariant (`v >= 0`)	Strategy constraint	Generated values in range	Hypothesis strategy	`tests/strategies.py`
Entity `isValidURI` predicate	Custom strategy	Only valid URIs generated	Hypothesis `@composite`	`tests/strategies.py`
Entity `valid_email` predicate	Custom strategy	Only valid emails generated	Hypothesis `@composite`	`tests/strategies.py`
Enum declaration	`st.sampled_from`	Only valid enum values	Hypothesis	`tests/strategies.py`
Entity with nested fields	`st.fixed_dictionaries`	All sub-constraints satisfied	Hypothesis `@composite`	`tests/strategies.py`

8.2 State-level elements

Spec Element	Test Type	What It Checks	Framework	Generated File
`state { store: K -> V }`	Model attribute	State machine tracks this collection	Hypothesis SM	`tests/test_stateful_*.py`
`state { store: K -> lone V }`	Property test	At most one V per K	pytest + Hypothesis	`tests/test_behavioral_*.py`
`state { items: set T }`	Model attribute	State machine tracks this set	Hypothesis SM	`tests/test_stateful_*.py`

8.3 Operation-level elements

Spec Element	Test Type	What It Checks	Framework	Generated File
`requires: key in state`	Negative test (404)	Returns 404 when key missing	pytest	`tests/test_behavioral_*.py`
`requires: predicate(input)`	Negative test (422)	Returns 422 for invalid input	pytest + Hypothesis	`tests/test_behavioral_*.py`
`requires: state[k].status = X`	Negative test (409/422)	Rejects wrong-state transitions	pytest + Hypothesis	`tests/test_behavioral_*.py`
`ensures: key not in pre(state)`	Freshness property test	New ID was not reused	Hypothesis `@given`	`tests/test_behavioral_*.py`
`ensures: state'[k] = v`	Storage correctness test	Value persisted correctly	Hypothesis `@given`	`tests/test_behavioral_*.py`
`ensures: #state' = #state + N`	Cardinality test	Exactly N items added	Hypothesis `@given`	`tests/test_behavioral_*.py`
`ensures: #state' = #state - N`	Cardinality test	Exactly N items removed	Hypothesis `@given`	`tests/test_behavioral_*.py`
`ensures: state' = state`	No-mutation test	Read ops don't write	Hypothesis `@given`	`tests/test_behavioral_*.py`
`ensures: output = f(input)`	Output correctness test	Return value matches formula	Hypothesis `@given`	`tests/test_behavioral_*.py`
`ensures: output.field = value`	Field correctness test	Specific field has expected value	Hypothesis `@given`	`tests/test_behavioral_*.py`
All `ensures` together	SM rule body assertions	All postconditions in sequence	Hypothesis SM	`tests/test_stateful_*.py`
All `requires` together	SM precondition guards	Rule only fires when valid	Hypothesis SM	`tests/test_stateful_*.py`

8.4 Invariant-level elements

Spec Element	Test Type	What It Checks	Framework	Generated File
`invariant: forall x in S \| P(x)`	Post-operation check	Property holds for all elements after every op	pytest + Hypothesis	`tests/test_behavioral_*.py`
`invariant: forall x in S \| P(x)`	SM `@invariant`	Property holds after every rule in sequence	Hypothesis SM	`tests/test_stateful_*.py`
`invariant: #state >= 0`	SM `@invariant`	Count never goes negative	Hypothesis SM	`tests/test_stateful_*.py`
`invariant: state[k].field = f(...)`	SM `@invariant`	Derived field stays consistent	Hypothesis SM	`tests/test_stateful_*.py`
Model-service consistency	SM `@invariant`	Model state matches service state	Hypothesis SM	`tests/test_stateful_*.py`

8.5 Convention-level elements

Spec Element	Test Type	What It Checks	Framework	Generated File
Operation -> HTTP method mapping	Schemathesis endpoint test	Correct HTTP method accepted	Schemathesis	`tests/test_structural.py`
Operation -> HTTP status mapping	Schemathesis response check	Correct status codes returned	Schemathesis	`tests/test_structural.py`
Entity -> JSON schema mapping	Schemathesis schema check	Response body matches schema	Schemathesis	`tests/test_structural.py`
`conventions { X.http_header }`	Schemathesis custom check	Header present with correct value	Schemathesis	`tests/test_structural.py`
OpenAPI Links (data flow)	Schemathesis stateful test	Multi-step sequences work	Schemathesis	`tests/test_structural.py`

8.6 Temporal properties (TLA+ only)

Spec Element	Test Type	What It Checks	Framework	Generated File
Liveness (`eventually X`)	TLA+ trace validation	Property eventually holds in trace	TLC model checker	`tests/test_tla_trace.py`
Safety (`always X`)	TLA+ trace validation	Property never violated in trace	TLC model checker	`tests/test_tla_trace.py`
Fairness (`leads-to`)	TLA+ trace validation	If X then eventually Y	TLC model checker	`tests/test_tla_trace.py`

9. Test quality metrics

9.1 Spec coverage

Definition. The percentage of spec elements that have at least one corresponding test.

spec_coverage = (tested_elements / total_elements) * 100

The compiler can compute this statically by counting how many spec elements appear in the generated test files:

Spec Element Type	Count Method
`ensures` clauses	Count clauses; each should have >= 1 `@given` test
`requires` clauses	Count clauses; each should have >= 1 negative test
`invariant` declarations	Count declarations; each should have >= 1 `@invariant`
Entity invariants	Count invariants; each should appear in strategy constraints
State declarations	Count fields; each should appear as SM model attribute
Operations	Count operations; each should be an SM `@rule`

Target: 100% spec coverage. Since all tests are generated from the spec, this should be achieved by construction. The metric is a self-check on the compiler.

9.2 Mutation testing

Goal. Verify that the generated tests would catch real bugs in the implementation.

Approach. Introduce controlled mutations into the generated service code and verify that at least one test fails for each mutation.

Mutation Type	What It Simulates	Test That Should Catch It
Remove a `store[key] = value` write	Forgetting to persist data	`test__stores_correct_`
Return wrong status code	Incorrect HTTP mapping	Schemathesis structural test
Skip input validation	Missing requires enforcement	`test__rejects_invalid_`
Off-by-one in count	Incorrect cardinality	`test_*_adds_exactly_one`
Allow invalid state transition	Missing precondition check	SM negative rule test
Fail to restore inventory on cancel	Incomplete rollback	`test_cancel_order_restores_inventory`
Return wrong field value	Incorrect output mapping	`test__returns_correct_`

Mutation testing tools: mutmut (Python) or cosmic-ray (Python) can be configured to run against the generated service code with the generated test suite as the test oracle.

# Example: run mutation testing with mutmut
mutmut run \
    --paths-to-mutate=app/ \
    --tests-dir=tests/ \
    --runner="pytest tests/test_behavioral_url_shortener.py -x --tb=no -q"

Target metric. Mutation score >= 90%. A mutation that is not caught indicates either a gap in the spec (the spec does not constrain the mutated behavior) or a gap in the test generator.

9.3 Test effectiveness comparison

To compare the generated test suite against hand-written alternatives:

Metric	How to Measure	What It Tells You
Bug detection rate	Inject known bugs, measure detection %	How many bugs the tests catch
Time to write	Wall-clock time: hand-written vs generated	Developer productivity gain
Maintenance cost	Lines changed when spec changes	Cost of keeping tests current
False positive rate	Tests that fail on correct code	Noise level
Shrinking quality	Size of minimal counterexample	Debuggability of failures

Expected advantages of generated tests

Zero time to write, tests are generated from the spec.
Zero maintenance when spec changes, regenerate tests from updated spec.
No missed spec elements, 100% spec coverage by construction.
Better shrinking, Hypothesis's shrinking finds minimal counterexamples.
Broader input coverage, property tests explore more inputs than example tests.

Expected disadvantages

Slower execution, property tests run many examples per test function.
Harder to debug, generated test names are less intuitive than hand-written ones.
Cannot test unstated requirements, only tests what the spec declares.
Requires a running service, integration tests, rather than unit tests.

10. Comparison with existing testing approaches

10.1 Comparison matrix

Approach	Structural	Behavioral	Stateful	From Formal Spec	Auto-Generated	Shrinking
Our approach	Yes (Schemathesis)	Yes (Hypothesis props)	Yes (Hypothesis SM)	Yes	Yes	Yes
Schemathesis alone	Yes	No	Partial (links only)	From OpenAPI only	Yes	Partial
RESTler	Partial	No	Yes (fuzzing)	From OpenAPI only	Yes	No
EvoMaster	Yes	No	Yes (evolutionary)	From OpenAPI only	Yes	No
Dredd	Yes	No	No	From OpenAPI/Blueprint	Yes	No
Pact	No	Partial (contracts)	No	No (consumer-driven)	Partial	No
QuickCheck SM	No	Yes	Yes	From model	No (hand-written model)	Yes
Spec Explorer	No	Yes	Yes	From C# model	Yes	No
Hand-written pytest	Depends	Depends	Depends	No	No	No

10.2 What each approach misses

Schemathesis alone

Checks structural conformance (schemas, status codes, content types)
Cannot verify behavioral postconditions (e.g., "the returned code was fresh")
Stateful testing is limited to OpenAPI Links (data flow only, no model comparison)
No invariant checking beyond schema validation
What we add. Behavioral property tests that check ensures clauses, stateful tests that maintain a model and compare it against the service, invariant checks after every operation

Restler (microsoft research)

Excellent at finding security bugs (500 errors, resource leaks)
Infers producer-consumer dependencies from OpenAPI
Does not check postconditions or invariants
Fuzzing is unguided by a behavioral specification
What we add. Spec-guided testing that checks not just "does it crash?" but "does it satisfy the postconditions?" and "do invariants hold?"

Evomaster

Evolutionary search for inputs that maximize code coverage
White-box: instruments the service to guide search
Good at achieving high line/branch coverage
Does not check behavioral correctness (only crashes and 500s)
What we add. An oracle. EvoMaster finds inputs; we check outputs against the spec. The two approaches are complementary.

Dredd (archived)

One request per documented endpoint, check response matches schema
No randomization, no edge cases, no stateful sequences
What we add. Everything beyond "does the happy path return the documented shape?"

Pact (consumer-driven contracts)

Verifies that a provider satisfies consumer expectations
Consumer writes the contract, rather than the spec author
Does not test internal invariants or state transitions
What we add. Provider-side behavioral verification derived from the authoritative specification, rather than from consumer expectations

QuickCheck state machine testing (erlang/haskell)

The closest conceptual match to our stateful testing layer
Requires manually writing the state machine model in Erlang/Haskell
Excellent at finding bugs in stateful systems (used at Volvo, Ericsson)
No HTTP/REST awareness, no structural testing
What we add. Automatic generation of the state machine model from the spec, HTTP client integration, structural testing via Schemathesis, entity-level strategy generation

Spec explorer (microsoft)

Model-based testing from C# model programs
Explored state graphs, generated covering test suites
Saved 50 person-years at Microsoft
Visual Studio-only, rather than maintained, no REST awareness
What we add. REST-native, Python ecosystem, alive, spec-driven rather than code-driven

10.3 What our approach cannot do (honest limitations)

Limitation	Why	Mitigation
Cannot test unstated requirements	Tests derive from spec; if the spec omits a requirement, no test is generated	Supplement with hand-written tests for domain-specific edge cases
Cannot test performance/latency	Spec does not declare performance requirements	Separate performance testing (e.g., Locust, k6)
Cannot test concurrent race conditions	Hypothesis SM is sequential by default	Use Hypothesis's `@given` with `st.runner()` for limited parallelism; use TLA+ for full concurrency analysis
Cannot test UI/frontend	Spec covers API behavior only	Separate frontend testing
Cannot test third-party integrations	Spec describes the service's own behavior	Mock external dependencies in test fixtures
Slower than unit tests	Integration tests require a running service	Use `smoke` profile for fast feedback, `exhaustive` for release gates

10.4 Positioning on the testing spectrum

                    Structural                    Behavioral
                    conformance                   conformance
                    |                             |
  Dredd ---------->|                              |
  Schemathesis --->|------->                      |
  RESTler -------->|----------->                  |
  EvoMaster ------>|--------------->              |
  Pact ----------->|                  |           |
  QuickCheck SM    |                  |---------->|
  Spec Explorer    |                  |---------->|
  Our approach --->|--------------------------------->|
                   ^                                  ^
                   Least                              Most
                   coverage                           coverage

Our approach is the only one that covers the full spectrum from structural conformance (does the API match its schema?) through behavioral conformance (does each operation satisfy its postconditions?) to stateful conformance (do invariants hold across arbitrary operation sequences?).

Appendix A: Requirements file

# file: requirements-test.txt
#
# AUTO-GENERATED test dependencies.

pytest>=8.0
hypothesis>=6.100
schemathesis~=4.15
httpx>=0.27

Appendix B: Generated file manifest

Every spec-to-REST compilation with --tests produces these files:

tests/
  conftest.py                        # Shared fixtures, helpers, strategies
  strategies.py                      # Hypothesis strategies from entity types
  test_structural.py                 # Schemathesis structural tests
  test_behavioral_{service}.py       # Hypothesis property tests per service
  test_stateful_{service}.py         # Hypothesis state machine tests per service
  test_tla_trace.py                  # TLA+ trace validation (exhaustive only)
  run_conformance.py                 # Test runner orchestrator
tla/
  {service}.tla                      # TLA+ model (exhaustive only)
  TraceValidation.tla                # Trace validation spec (generated at runtime)
  TraceValidation.cfg                # TLC configuration
.github/
  workflows/
    conformance.yml                  # CI workflow
results/                             # JUnit XML output directory
requirements-test.txt                # Test dependencies
Makefile                             # Test targets (smoke, thorough, exhaustive)

Test Generation Pipeline

On this page