Workers¶
The execution pipeline is split across seven background workers, each running as a separate container. This separation keeps concerns isolated - the API doesn't block waiting for pods to finish, the saga orchestrator doesn't care how pods are created, and the result processor doesn't know anything about Kubernetes.
Workers communicate through Kafka. Each publishes events when it completes work, and subscribes to events from upstream. MongoDB and Redis provide shared state where needed.
graph LR
API[Backend API] -->|execution_requested| Kafka[(Kafka)]
Kafka --> Coord[Coordinator]
Coord -->|execution_accepted| Kafka
Kafka --> Saga[Saga Orchestrator]
Saga -->|create_pod_command| Kafka
Kafka --> K8s[K8s Worker]
K8s -->|pod_created| Kafka
K8s --> Pod[Kubernetes Pod]
Pod --> PodMon[Pod Monitor]
PodMon -->|execution_completed| Kafka
Kafka --> Result[Result Processor]
Result --> Mongo[(MongoDB)]
The workers¶
| Worker | What it does | Entry point |
|---|---|---|
| Coordinator | Admits executions, manages the priority queue | run_coordinator.py |
| Saga Orchestrator | Drives the execution state machine, issues pod commands | run_saga_orchestrator.py |
| K8s Worker | Creates ConfigMaps and Pods with security hardening | run_k8s_worker.py |
| Pod Monitor | Watches pods, translates K8s events to domain events | run_pod_monitor.py |
| Result Processor | Persists execution results, cleans up resources | run_result_processor.py |
| Event Replay | Re-emits historical events for debugging | run_event_replay.py |
| DLQ Processor | Retries failed messages from the dead letter queue | run_dlq_processor.py |
All entry points live in backend/workers/.
Running locally¶
Docker Compose starts everything:
For debugging a specific worker, run it directly:
Scaling¶
Most workers can run as single replicas. The stateful ones (Coordinator, Saga Orchestrator) use event sourcing to recover after restarts. The stateless ones (K8s Worker, Pod Monitor, Result Processor) can scale horizontally if throughput becomes an issue.