Skip to content

Logging

This backend uses structured JSON logging with automatic correlation IDs, trace context injection, and sensitive data sanitization. The goal is logs that are both secure against injection attacks and easy to query in aggregation systems like Elasticsearch or Loki.

Architecture

flowchart LR
    Code[Application Code] --> Logger
    Logger --> CF[CorrelationFilter]
    CF --> TF[TracingFilter]
    TF --> JF[JSONFormatter]
    JF --> Output[JSON stdout]

How it's wired

The logger is created once during application startup via dependency injection. The setup_logger function configures a JSON formatter and attaches filters for correlation IDs and OpenTelemetry trace context:

def setup_logger(log_level: str) -> logging.Logger:
    """Create and configure the application logger. Called by DI with Settings.LOG_LEVEL."""
    new_logger = logging.getLogger("integr8scode")
    new_logger.handlers.clear()

    console_handler = logging.StreamHandler()
    formatter = JSONFormatter()

    console_handler.setFormatter(formatter)

    correlation_filter = CorrelationFilter()
    console_handler.addFilter(correlation_filter)

    class TracingFilter(logging.Filter):
        def filter(self, record: logging.LogRecord) -> bool:
            # Inline minimal helpers to avoid circular import on tracing.utils
            span = trace.get_current_span()
            trace_id = None
            span_id = None
            if span and span.is_recording():
                span_context = span.get_span_context()
                if span_context.is_valid:
                    trace_id = format(span_context.trace_id, "032x")
                    span_id = format(span_context.span_id, "016x")
            if trace_id:
                record.trace_id = trace_id
            if span_id:
                record.span_id = span_id
            return True

    console_handler.addFilter(TracingFilter())

    new_logger.addHandler(console_handler)

    level = LOG_LEVELS.get(log_level.upper(), logging.DEBUG)
    new_logger.setLevel(level)

    return new_logger

The JSON formatter does two things beyond basic formatting. First, it injects context that would be tedious to pass manually—the correlation ID from the current request, the trace and span IDs from OpenTelemetry, and request metadata like method and path. Second, it sanitizes sensitive data by pattern-matching things like API keys, JWT tokens, and database URLs:

    def _sanitize_sensitive_data(self, data: str) -> str:
        """Remove or mask sensitive information from log data."""
        # Mask API keys, tokens, and similar sensitive data
        patterns = [
            # API keys and tokens
            (
                r'(["\']?(?:api[_-]?)?(?:key|token|secret|password|passwd|pwd)["\']?\s*[:=]\s*["\']?)([^"\']+)(["\']?)',
                r"\1***API_KEY_OR_TOKEN_REDACTED***\3",
            ),
            # Bearer tokens
            (r"(Bearer\s+)([A-Za-z0-9\-_]+)", r"\1***BEARER_TOKEN_REDACTED***"),
            # JWT tokens
            (r"(eyJ[A-Za-z0-9\-_]+\.eyJ[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+)", r"***JWT_REDACTED***"),
            # MongoDB URLs with credentials
            (r"(mongodb(?:\+srv)?://[^:]+:)([^@]+)(@)", r"\1***MONGODB_REDACTED***\3"),
            # Generic URLs with credentials
            (r"(https?://[^:]+:)([^@]+)(@)", r"\1***URL_CREDS_REDACTED***\3"),
            # Email addresses (optional - uncomment if needed)
            (r"([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})", r"***EMAIL_REDACTED***"),
        ]

        for pattern, replacement in patterns:
            data = re.sub(pattern, replacement, data, flags=re.IGNORECASE)

        return data

Structured logging

All log calls use the extra parameter to pass structured data rather than interpolating values into the message string. The message itself is a static string that describes what happened; the details go in extra where they become separate JSON fields.

# This is how logging looks throughout the codebase
self.logger.info(
    "Event deleted by admin",
    extra={
        "event_id": event_id,
        "admin_email": admin.email,
        "event_type": result.event_type,
    },
)

The reason for this pattern is partly about queryability - log aggregators can index the event_id field separately and let you filter on it - but mostly about security. When you interpolate user-controlled data into a log message, you open the door to log injection attacks.

Log injection

Log injection is what happens when an attacker crafts input that corrupts your log output. The classic attack looks like this: a user submits an event ID containing a newline and a fake log entry.

# Attacker submits this as event_id
event_id = "abc123\n[CRITICAL] System compromised - contact security@evil.com"

# If you log it directly in the message...
logger.warning(f"Processing event {event_id}")

# Your log output now contains a forged critical alert

The fix is to keep user data out of the message string entirely. When you put it in extra, the JSON formatter escapes special characters, and the malicious content becomes a harmless string value rather than a log line injection.

The codebase treats these as user-controlled and keeps them in extra: path parameters like execution_id or saga_id, query parameters, request body fields, Kafka message content, database results derived from user input, and exception messages (which often contain user data).

What gets logged

Correlation and trace IDs are injected automatically by filters:

Field Source Purpose
correlation_id Request header or generated Track request across services
trace_id OpenTelemetry Link to distributed traces
span_id OpenTelemetry Link to specific span
request_method HTTP request GET, POST, etc.
request_path HTTP request API endpoint path
client_host HTTP request Client IP address

For domain-specific context, developers add fields to extra based on what operation they're logging. The pattern is consistent: the message says what happened, extra says to what and by whom.

Practical use

When something goes wrong, start by filtering logs by correlation_id to see everything that happened during that request. If you need to correlate with traces, use the trace_id to jump to Jaeger.

Log Level Use case
DEBUG Detailed diagnostics (noisy, local debugging only)
INFO Normal operations (started, completed, processed)
WARNING Recoverable issues
ERROR Failures requiring attention

The log level is controlled by the LOG_LEVEL environment variable.

Key files

File Purpose
core/logging.py Logger setup, filters, JSON formatter
core/correlation.py Correlation ID middleware