Coordinator¶

The coordinator owns admission and queuing policy for executions. It enforces per-user limits to prevent any single user from monopolizing the system and manages the priority queue for scheduling. Resource limits (CPU, memory) are enforced by Kubernetes via pod manifests, not by the coordinator.

graph LR
    Kafka[(Kafka)] --> Coord[Coordinator]
    Coord --> Queue[Priority Queue]
    Coord --> Accepted[Accepted Events]
    Accepted --> Kafka

How it works¶

When an ExecutionRequested event arrives, the coordinator checks:

Is the queue full? (max 10,000 pending)
Has this user exceeded their limit? (max 100 concurrent)

If checks pass, the execution is queued and ExecutionAccepted is published. Scheduling is reactive: when an execution lands at the front of the queue, or when an active execution completes/fails/is cancelled, the coordinator immediately pops the next item and publishes a CreatePodCommand. A dedup guard prevents double-publishing for the same execution.

Priority queue¶

Executions are processed in priority order. Lower numeric values are processed first:

    Coordinates execution scheduling across the system.

    This service:
    1. Consumes ExecutionRequested events
    2. Manages execution queue with priority
    3. Enforces per-user rate limits

Configuration¶

Parameter	Default	Description
`max_queue_size`	10,000	Maximum pending executions
`max_executions_per_user`	100	Per-user limit
`stale_timeout_seconds`	3,600	Stale execution timeout

Topics¶

Consumes: execution_events (requested, completed, failed, cancelled)
Produces: execution_events (accepted)

Key files¶

File	Purpose
`run_coordinator.py`	Entry point
`coordinator.py`	Coordinator service with integrated priority queue

Deployment¶

coordinator:
  image: ghcr.io/hardmax71/integr8scode/backend:${IMAGE_TAG:-latest}
  command: ["python", "workers/run_coordinator.py"]

Usually runs as a single replica. Leader election via Redis is available if scaling is needed.