Coordinator¶
The coordinator owns admission and queuing policy for executions. It enforces per-user limits to prevent any single user from monopolizing the system and manages the priority queue for scheduling. Resource limits (CPU, memory) are enforced by Kubernetes via pod manifests, not by the coordinator.
graph LR
Kafka[(Kafka)] --> Coord[Coordinator]
Coord --> Queue[Priority Queue]
Coord --> Accepted[Accepted Events]
Accepted --> Kafka
How it works¶
When an ExecutionRequested event arrives, the coordinator checks:
- Is the queue full? (max 10,000 pending)
- Has this user exceeded their limit? (max 100 concurrent)
If checks pass, the execution is queued and ExecutionAccepted is published. Scheduling is reactive: when an execution
lands at the front of the queue, or when an active execution completes/fails/is cancelled, the coordinator immediately
pops the next item and publishes a CreatePodCommand. A dedup guard prevents double-publishing for the same execution.
Priority queue¶
Executions are processed in priority order. Lower numeric values are processed first:
Coordinates execution scheduling across the system.
This service:
1. Consumes ExecutionRequested events
2. Manages execution queue with priority
3. Enforces per-user rate limits
Configuration¶
| Parameter | Default | Description |
|---|---|---|
max_queue_size |
10,000 | Maximum pending executions |
max_executions_per_user |
100 | Per-user limit |
stale_timeout_seconds |
3,600 | Stale execution timeout |
Topics¶
- Consumes:
execution_events(requested, completed, failed, cancelled) - Produces:
execution_events(accepted)
Key files¶
| File | Purpose |
|---|---|
run_coordinator.py |
Entry point |
coordinator.py |
Coordinator service with integrated priority queue |
Deployment¶
coordinator:
image: ghcr.io/hardmax71/integr8scode/backend:${IMAGE_TAG:-latest}
command: ["python", "workers/run_coordinator.py"]
Usually runs as a single replica. Leader election via Redis is available if scaling is needed.