Logging¶
This backend uses structured JSON logging with automatic correlation IDs, trace context injection, and sensitive data sanitization. The goal is logs that are both secure against injection attacks and easy to query in aggregation systems like Elasticsearch or Loki.
Architecture¶
flowchart LR
Code[Application Code] --> Logger
Logger --> CF[CorrelationFilter]
CF --> TF[TracingFilter]
TF --> JF[JSONFormatter]
JF --> Output[JSON stdout]
How it's wired¶
The logger is created once during application startup via dependency injection. The
setup_logger function configures a
JSON formatter and attaches filters for correlation IDs and OpenTelemetry trace context:
def setup_logger(log_level: str) -> logging.Logger:
"""Create and configure the application logger. Called by DI with Settings.LOG_LEVEL."""
new_logger = logging.getLogger("integr8scode")
new_logger.handlers.clear()
console_handler = logging.StreamHandler()
formatter = JSONFormatter()
console_handler.setFormatter(formatter)
correlation_filter = CorrelationFilter()
console_handler.addFilter(correlation_filter)
class TracingFilter(logging.Filter):
def filter(self, record: logging.LogRecord) -> bool:
# Inline minimal helpers to avoid circular import on tracing.utils
span = trace.get_current_span()
trace_id = None
span_id = None
if span and span.is_recording():
span_context = span.get_span_context()
if span_context.is_valid:
trace_id = format(span_context.trace_id, "032x")
span_id = format(span_context.span_id, "016x")
if trace_id:
record.trace_id = trace_id
if span_id:
record.span_id = span_id
return True
console_handler.addFilter(TracingFilter())
new_logger.addHandler(console_handler)
level = LOG_LEVELS.get(log_level.upper(), logging.DEBUG)
new_logger.setLevel(level)
return new_logger
The JSON formatter does two things beyond basic formatting. First, it injects context that would be tedious to pass manually—the correlation ID from the current request, the trace and span IDs from OpenTelemetry, and request metadata like method and path. Second, it sanitizes sensitive data by pattern-matching things like API keys, JWT tokens, and database URLs:
def _sanitize_sensitive_data(self, data: str) -> str:
"""Remove or mask sensitive information from log data."""
# Mask API keys, tokens, and similar sensitive data
patterns = [
# API keys and tokens
(
r'(["\']?(?:api[_-]?)?(?:key|token|secret|password|passwd|pwd)["\']?\s*[:=]\s*["\']?)([^"\']+)(["\']?)',
r"\1***API_KEY_OR_TOKEN_REDACTED***\3",
),
# Bearer tokens
(r"(Bearer\s+)([A-Za-z0-9\-_]+)", r"\1***BEARER_TOKEN_REDACTED***"),
# JWT tokens
(r"(eyJ[A-Za-z0-9\-_]+\.eyJ[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+)", r"***JWT_REDACTED***"),
# MongoDB URLs with credentials
(r"(mongodb(?:\+srv)?://[^:]+:)([^@]+)(@)", r"\1***MONGODB_REDACTED***\3"),
# Generic URLs with credentials
(r"(https?://[^:]+:)([^@]+)(@)", r"\1***URL_CREDS_REDACTED***\3"),
# Email addresses (optional - uncomment if needed)
(r"([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})", r"***EMAIL_REDACTED***"),
]
for pattern, replacement in patterns:
data = re.sub(pattern, replacement, data, flags=re.IGNORECASE)
return data
Structured logging¶
All log calls use the extra parameter to pass structured data rather than interpolating values into the message string. The message itself is a static string that describes what happened; the details go in extra where they become separate JSON fields.
# This is how logging looks throughout the codebase
self.logger.info(
"Event deleted by admin",
extra={
"event_id": event_id,
"admin_email": admin.email,
"event_type": result.event_type,
},
)
The reason for this pattern is partly about queryability - log aggregators can index the event_id field separately and let you filter on it - but mostly about security. When you interpolate user-controlled data into a log message, you open the door to log injection attacks.
Log injection¶
Log injection is what happens when an attacker crafts input that corrupts your log output. The classic attack looks like this: a user submits an event ID containing a newline and a fake log entry.
# Attacker submits this as event_id
event_id = "abc123\n[CRITICAL] System compromised - contact security@evil.com"
# If you log it directly in the message...
logger.warning(f"Processing event {event_id}")
# Your log output now contains a forged critical alert
The fix is to keep user data out of the message string entirely. When you put it in extra, the JSON formatter escapes special characters, and the malicious content becomes a harmless string value rather than a log line injection.
The codebase treats these as user-controlled and keeps them in extra: path parameters like execution_id or saga_id,
query parameters, request body fields, Kafka message content, database results derived from user input, and exception
messages (which often contain user data).
What gets logged¶
Correlation and trace IDs are injected automatically by filters:
| Field | Source | Purpose |
|---|---|---|
correlation_id |
Request header or generated | Track request across services |
trace_id |
OpenTelemetry | Link to distributed traces |
span_id |
OpenTelemetry | Link to specific span |
request_method |
HTTP request | GET, POST, etc. |
request_path |
HTTP request | API endpoint path |
client_host |
HTTP request | Client IP address |
For domain-specific context, developers add fields to extra based on what operation they're logging. The pattern is
consistent: the message says what happened, extra says to what and by whom.
Practical use¶
When something goes wrong, start by filtering logs by correlation_id to see everything that happened during that
request. If you need to correlate with traces, use the trace_id to jump to Jaeger.
| Log Level | Use case |
|---|---|
| DEBUG | Detailed diagnostics (noisy, local debugging only) |
| INFO | Normal operations (started, completed, processed) |
| WARNING | Recoverable issues |
| ERROR | Failures requiring attention |
The log level is controlled by the LOG_LEVEL environment variable.
Key files¶
| File | Purpose |
|---|---|
core/logging.py |
Logger setup, filters, JSON formatter |
core/correlation.py |
Correlation ID middleware |