Log Aggregation (Loki)¶
Loki collects structured JSON logs from the backend and all workers so you can search, filter and correlate them with traces in Grafana. Without it the logs only live on container stdout — useful for docker logs but not for querying across services or time ranges.
Architecture¶
flowchart LR
Backend[Backend / Workers] -->|structlog JSON| Stdlib[stdlib logging]
Stdlib -->|stdout| Docker[Container stdout]
Stdlib -->|OTLP gRPC| Collector[OTel Collector :4317]
Collector -->|otlphttp| Loki[Loki :3100]
Loki --> Grafana
Grafana -->|trace_id link| Jaeger
Each Python process (the FastAPI backend and all six workers) emits structured JSON logs via structlog. The setup_log_exporter function attaches an OpenTelemetry LoggingHandler to stdlib logging, which forwards log records over OTLP gRPC to the OTel Collector. The collector then pushes them to Loki via its native OTLP HTTP endpoint (/otlp/v1/logs). Logs still go to stdout as before — the OTLP path is additive.
How it's wired¶
Python side¶
The log exporter is initialized once per process, right after setup_logger. The backend calls it in create_app, and each worker calls it in its main():
def setup_logger(log_level: str) -> structlog.stdlib.BoundLogger:
"""Configure structlog and return a bound logger for the application.
Called by DI with Settings.LOG_LEVEL and also directly by main.py/lifespan.
"""
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
structlog.stdlib.filter_by_level,
structlog.stdlib.add_logger_name,
structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
add_otel_context,
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
sanitize_sensitive_data,
structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.stdlib.BoundLogger,
context_class=dict,
logger_factory=structlog.stdlib.LoggerFactory(),
cache_logger_on_first_use=True,
)
logging.basicConfig(format="%(message)s", handlers=[logging.StreamHandler()])
logging.getLogger().setLevel(log_level.upper())
logger: structlog.stdlib.BoundLogger = structlog.get_logger("integr8scode")
return logger
The exporter follows the same guard pattern as setup_metrics: skipped when TESTING is true or when OTEL_EXPORTER_OTLP_ENDPOINT is empty. In those cases stdlib logging still works, it just doesn't ship anywhere.
OTel Collector¶
The collector receives OTLP logs on :4317 (same endpoint used for traces and metrics), runs them through the resource and attributes processors to attach service.name and service.namespace labels, then exports to Loki:
Loki 3.x natively accepts OTLP at /otlp/v1/logs. Resource attributes (service.name, service.namespace) become index labels. OTLP trace_id and span_id become structured metadata, which is what enables the log-to-trace linking in Grafana.
Loki¶
Loki runs as a single instance with filesystem storage:
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: warn
common:
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
replication_factor: 1
path_prefix: /loki
schema_config:
configs:
- from: "2026-01-01"
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
filesystem:
directory: /loki/chunks
limits_config:
retention_period: 720h
allow_structured_metadata: true
ruler:
enable_api: false
storage:
type: local
local:
directory: /loki/rules
compactor:
working_directory: /loki/compactor
compaction_interval: 10m
retention_enabled: true
delete_request_store: filesystem
retention_delete_delay: 2h
retention_delete_worker_count: 150
Key settings:
| Setting | Value | Why |
|---|---|---|
allow_structured_metadata |
true |
Required for OTLP ingestion — stores trace_id, span_id, service.name as structured metadata |
schema: v13 + store: tsdb |
— | Modern Loki 3.x defaults, required for structured metadata support |
retention_period |
720h (30 days) |
Matches Victoria Metrics retention |
auth_enabled |
false |
Internal Docker network only |
Grafana datasource¶
Loki is provisioned as a datasource with a derived field that parses trace_id from the JSON log body and links to Jaeger:
- name: Loki
type: loki
uid: loki
access: proxy
url: http://loki:3100
editable: true
jsonData:
derivedFields:
- datasourceUid: jaeger
matcherRegex: '"trace_id":"(\w+)"'
name: TraceID
url: '$${__value.raw}'
This means any log line containing a trace_id field gets a clickable link to the corresponding Jaeger trace.
Practical use¶
Open Grafana Explore (http://localhost:3000/explore), select the Loki datasource, and run LogQL queries:
# All logs from the saga orchestrator
{service_name="integr8scode-saga-orchestrator"}
# Errors across all services
{service_namespace="integr8scode"} |= `"level":"error"`
# Logs for a specific execution
{service_namespace="integr8scode"} | json | execution_id="abc123"
# Logs correlated with a trace
{service_namespace="integr8scode"} | json | trace_id="0af7651916cd43dd8448eb211c80319c"
When you find an interesting log line, click the TraceID link to jump to the full distributed trace in Jaeger. This works in the other direction too — from a Jaeger trace you can switch to Loki and filter by the same trace_id to see all logs emitted during that trace.
Docker Compose¶
Loki runs under the observability profile alongside Grafana, Jaeger, and Victoria Metrics:
The container has a 256 MiB memory limit and stores data in the loki_data volume. The OTel Collector and Grafana both depend on Loki being healthy before they start.
| Port | Binding | Purpose |
|---|---|---|
| 3100 | 127.0.0.1:3100 |
HTTP API and OTLP ingestion |
Key files¶
| File | Purpose |
|---|---|
loki/loki-config.yaml |
Loki server configuration |
otel-collector-config.yaml |
Collector pipeline with otlphttp/loki exporter |
grafana/provisioning/datasources/datasources.yml |
Loki datasource with trace linking |
core/logging.py |
setup_log_exporter — OTLP log export initialization |