Deployment¶

Integr8sCode uses Docker Compose for deployment. All services — backend, frontend, workers, and infrastructure — run as containers orchestrated by a single docker-compose.yaml. Workers reuse the backend image with different command: overrides, so there's only one application image to build. Kubernetes is used only for executor pods (running user code); workers just need a kubeconfig to talk to the K8s API.

Architecture¶

flowchart TB
    Script[deploy.sh] --> DC

    subgraph Images["Container Images"]
        Base[Dockerfile.base] --> Backend[Backend Image]
    end

    subgraph DC["Docker Compose"]
        Compose[docker-compose.yaml] --> Containers
    end

    Images --> Containers

Deployment script¶

The deploy.sh script wraps Docker Compose:

# Usage:
#   ./deploy.sh dev                 # Start local development (docker-compose)
#   ./deploy.sh dev --build         # Rebuild and start local development
#   ./deploy.sh down                # Stop local development
#   ./deploy.sh check               # Run local quality checks (lint, type, security)
#   ./deploy.sh test                # Run full test suite locally
#   ./deploy.sh logs [service]      # View logs (dev mode)
#   ./deploy.sh status              # Show status of running services
#   ./deploy.sh openapi [path]      # Generate OpenAPI spec from backend
#   ./deploy.sh types               # Generate TypeScript types for frontend
#
# =============================================================================

The script wraps Docker Compose with convenience commands for building, starting, stopping, and running tests.

Local development¶

Local development uses Docker Compose to spin up the entire stack on your machine. The compose file defines all services with health checks and dependency ordering, so containers start in the correct sequence.

./deploy.sh dev

This brings up MongoDB, Redis, Kafka with Zookeeper, all seven workers, the backend API, and the frontend. Two initialization containers run automatically: kafka-init creates required Kafka topics, and user-seed populates the database with default user accounts.

Once the stack is running, you can access the services at their default ports.

Service	URL
Frontend	https://localhost:5001
Backend API	https://localhost:443
Kafdrop (Kafka UI)	http://localhost:9000
Jaeger (Tracing)	http://localhost:16686
Grafana	http://localhost:3000

The default credentials created by the seed job are user / user123 for a regular account and admin / admin123 for an administrator. You can override these via environment variables if needed.

DEFAULT_USER_PASSWORD=mypass ADMIN_USER_PASSWORD=myadmin ./deploy.sh dev

Hot reloading works for the backend since the source directory is mounted into the container. Changes to Python files trigger Uvicorn to restart automatically. The frontend runs its own dev server with similar behavior.

Docker build strategy¶

The backend uses a multi-stage build with a shared base image to keep startup fast:

flowchart LR
    subgraph Base["Dockerfile.base"]
        B1[python:3.12-slim]
        B2[system deps]
        B3[uv sync --locked]
    end

    subgraph Services["Service Images"]
        S1[Backend + Workers]
    end

    Base --> S1

The base image installs all production dependencies:

# Shared base image for all backend services
# Contains: Python, system deps, uv, and all Python dependencies
FROM python:3.12-slim

WORKDIR /app

# Install OS security patches + system dependencies needed by any service
RUN apt-get update && apt-get upgrade -y \
    && apt-get install -y --no-install-recommends \
    gcc \
    curl \
    libsnappy-dev \
    liblzma-dev \
    && rm -rf /var/lib/apt/lists/*

# Install uv (using Docker Hub mirror - ghcr.io has rate limiting issues)
COPY --from=astral/uv:latest /uv /uvx /bin/

# Copy dependency files
COPY pyproject.toml uv.lock ./

# Install Python dependencies (production only)
RUN uv sync --locked --no-dev --no-install-project

# Set paths: PYTHONPATH for imports, PATH for venv binaries (no uv run needed at runtime)
ENV PYTHONPATH=/app
ENV PATH="/app/.venv/bin:$PATH"
ENV KUBECONFIG=/app/kubeconfig.yaml

Each service image extends the base and copies only application code. Since dependencies rarely change, Docker's layer caching means most builds only rebuild the thin application layer.

For local development, the compose file mounts source directories:

      user-seed:
        condition: service_completed_successfully
      mongo:
        condition: service_healthy
      redis:

This preserves the container's .venv while allowing live code changes. Gunicorn watches for file changes and reloads automatically. The design means git clone followed by docker compose up just works—no local Python environment needed.

To stop everything and clean up volumes:

./deploy.sh down
docker compose down -v  # Also removes persistent volumes

Running tests locally¶

The test command runs the full unit and E2E test suite:

./deploy.sh test

This builds images, starts services, waits for the backend health endpoint using curl's built-in retry mechanism, runs pytest with coverage reporting, then tears down the stack. The curl retry approach is cleaner than shell loops and avoids issues with Docker Compose's --wait flag (which fails on init containers that exit after completion). Key services define healthchecks in docker-compose.yaml:

Service	Healthcheck
MongoDB	`mongosh ping`
Redis	`redis-cli ping`
Backend	`curl /api/v1/health/live`
Kafka	`kafka-broker-api-versions`
Zookeeper	`echo ruok \\| nc localhost 2181 \\| grep imok`

Services without explicit healthchecks (workers, Grafana, Kafdrop) are considered "started" when their container is running. The test suite doesn't require worker containers since tests instantiate worker classes directly.

Monitoring¶

Check service status using the deploy script or Docker Compose directly.

./deploy.sh status
docker compose ps
docker compose logs backend

Troubleshooting¶

Issue	Cause	Solution
Unknown topic errors	kafka-init failed or wrong prefix	Check `docker compose logs kafka-init`
MongoDB auth errors	Password mismatch	Verify `secrets.toml` matches compose env vars
Worker crash loop	Config file missing	Ensure `config.<worker>.toml` exists

Kafka topic debugging¶

docker compose logs kafka-init
docker compose exec kafka kafka-topics --list --bootstrap-server localhost:29092

Topics should be prefixed (e.g., prefexecution_events not execution_events).

k3s crash loop after VPN or IP change¶

Symptoms:

systemctl status k3s shows Active: activating (auto-restart) (Result: exit-code)
k3s repeatedly crashes with status=1/FAILURE
kubectl commands fail with connection refused or ServiceUnavailable
API intermittently responds then stops

Root cause:

When the host IP changes (VPN on/off, network switch, DHCP renewal), k3s stores stale IP references in two locations:

SQLite database (/var/lib/rancher/k3s/server/db/) — contains cluster state with old IP
TLS certificates (/var/lib/rancher/k3s/server/tls/) — generated with old IP in SAN field

k3s detects the mismatch between config (node-ip in /etc/rancher/k3s/config.yaml) and stored data, causing the crash loop.

Solution:

WARNING: DATA LOSS — The steps below will permanently delete all cluster state, including: - All deployed workloads (pods, deployments, services) - All cluster configuration (namespaces, RBAC, ConfigMaps, Secrets) - All PersistentVolume data stored in the default local-path provisioner

Before proceeding, back up: - etcd snapshots: sudo k3s etcd-snapshot save - kubeconfig files - Application manifests - Any critical PersistentVolume data

Confirm backups are complete before continuing.

# 1. Stop k3s
sudo systemctl stop k3s

# 2. Delete corrupted database (k3s will rebuild it)
sudo rm -rf /var/lib/rancher/k3s/server/db/

# 3. Delete old TLS certificates (k3s will regenerate them)
sudo rm -rf /var/lib/rancher/k3s/server/tls/

# 4. Start k3s with clean state
sudo systemctl start k3s

After k3s restarts, regenerate the application kubeconfig:

# Regenerate kubeconfig with fresh ServiceAccount token
docker compose restart cert-generator

# Restart workers to pick up new kubeconfig
docker compose restart k8s-worker pod-monitor

Verification:

# Check k3s is running
systemctl status k3s  # Should show "active (running)"

# Test API access
KUBECONFIG=/path/to/backend/kubeconfig.yaml kubectl get namespaces

# Check workers connected
docker logs k8s-worker 2>&1 | tail -5
docker logs pod-monitor 2>&1 | tail -5

VPN-specific notes:

When using VPN (e.g., NordVPN with WireGuard/NordLynx):

LAN Discovery must be enabled: nordvpn set lan-discovery enabled
VPN can interfere with Docker's host network mode and k3s flannel networking
Consider using bridge networking for containers that need to reach k3s

References:

Pre-built images¶

The CI pipeline automatically builds and pushes images to GitHub Container Registry on every merge to main. To use pre-built images instead of building locally, set IMAGE_TAG:

IMAGE_TAG=sha-abc1234 docker compose up -d --no-build

Available tags¶

Tag	Description
`latest`	Most recent build from main branch
`sha-abc1234`	Specific commit SHA

Key files¶

File	Purpose
`deploy.sh`	Deployment script
`docker-compose.yaml`	Full stack definition
`backend/Dockerfile.base`	Shared base image with deps
`.github/workflows/docker.yml`	CI/CD image build pipeline