Logging¶

This backend uses structured JSON logging with automatic correlation IDs, trace context injection, and sensitive data sanitization. The goal is logs that are both secure against injection attacks and easy to query in aggregation systems like Elasticsearch or Loki.

Architecture¶

flowchart LR
    Code[Application Code] --> Logger
    Logger --> CF[CorrelationFilter]
    CF --> TF[TracingFilter]
    TF --> JF[JSONFormatter]
    JF --> Output[JSON stdout]

How it's wired¶

The logger is created once during application startup via dependency injection. The setup_logger function configures a JSON formatter and attaches filters for correlation IDs and OpenTelemetry trace context:

def setup_logger(log_level: str) -> logging.Logger:
    """Create and configure the application logger. Called by DI with Settings.LOG_LEVEL."""
    new_logger = logging.getLogger("integr8scode")
    new_logger.handlers.clear()

    console_handler = logging.StreamHandler()
    formatter = JSONFormatter()

    console_handler.setFormatter(formatter)

    correlation_filter = CorrelationFilter()
    console_handler.addFilter(correlation_filter)

    class TracingFilter(logging.Filter):
        def filter(self, record: logging.LogRecord) -> bool:
            # Inline minimal helpers to avoid circular import on tracing.utils
            span = trace.get_current_span()
            trace_id = None
            span_id = None
            if span and span.is_recording():
                span_context = span.get_span_context()
                if span_context.is_valid:
                    trace_id = format(span_context.trace_id, "032x")
                    span_id = format(span_context.span_id, "016x")
            if trace_id:
                record.trace_id = trace_id
            if span_id:
                record.span_id = span_id
            return True

    console_handler.addFilter(TracingFilter())

    new_logger.addHandler(console_handler)

    level = LOG_LEVELS.get(log_level.upper(), logging.DEBUG)
    new_logger.setLevel(level)

    return new_logger

The JSON formatter does two things beyond basic formatting. First, it injects context that would be tedious to pass manually—the correlation ID from the current request, the trace and span IDs from OpenTelemetry, and request metadata like method and path. Second, it sanitizes sensitive data by pattern-matching things like API keys, JWT tokens, and database URLs:

    def _sanitize_sensitive_data(self, data: str) -> str:
        """Remove or mask sensitive information from log data."""
        # Mask API keys, tokens, and similar sensitive data
        patterns = [
            # API keys and tokens
            (
                r'(["\']?(?:api[_-]?)?(?:key|token|secret|password|passwd|pwd)["\']?\s*[:=]\s*["\']?)([^"\']+)(["\']?)',
                r"\1***API_KEY_OR_TOKEN_REDACTED***\3",
            ),
            # Bearer tokens
            (r"(Bearer\s+)([A-Za-z0-9\-_]+)", r"\1***BEARER_TOKEN_REDACTED***"),
            # JWT tokens
            (r"(eyJ[A-Za-z0-9\-_]+\.eyJ[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+)", r"***JWT_REDACTED***"),
            # MongoDB URLs with credentials
            (r"(mongodb(?:\+srv)?://[^:]+:)([^@]+)(@)", r"\1***MONGODB_REDACTED***\3"),
            # Generic URLs with credentials
            (r"(https?://[^:]+:)([^@]+)(@)", r"\1***URL_CREDS_REDACTED***\3"),
            # Email addresses (optional - uncomment if needed)
            (r"([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})", r"***EMAIL_REDACTED***"),
        ]

        for pattern, replacement in patterns:
            data = re.sub(pattern, replacement, data, flags=re.IGNORECASE)

        return data

Structured logging¶

All log calls use the extra parameter to pass structured data rather than interpolating values into the message string. The message itself is a static string that describes what happened; the details go in extra where they become separate JSON fields.

# This is how logging looks throughout the codebase
self.logger.info(
    "Event deleted by admin",
    extra={
        "event_id": event_id,
        "admin_email": admin.email,
        "event_type": result.event_type,
    },
)

The reason for this pattern is partly about queryability - log aggregators can index the event_id field separately and let you filter on it - but mostly about security. When you interpolate user-controlled data into a log message, you open the door to log injection attacks.

Log injection¶

Log injection is what happens when an attacker crafts input that corrupts your log output. The classic attack looks like this: a user submits an event ID containing a newline and a fake log entry.

# Attacker submits this as event_id
event_id = "abc123\n[CRITICAL] System compromised - contact security@evil.com"

# If you log it directly in the message...
logger.warning(f"Processing event {event_id}")

# Your log output now contains a forged critical alert

The fix is to keep user data out of the message string entirely. When you put it in extra, the JSON formatter escapes special characters, and the malicious content becomes a harmless string value rather than a log line injection.

The codebase treats these as user-controlled and keeps them in extra: path parameters like execution_id or saga_id, query parameters, request body fields, Kafka message content, database results derived from user input, and exception messages (which often contain user data).

What gets logged¶

Correlation and trace IDs are injected automatically by filters:

Field	Source	Purpose
`correlation_id`	Request header or generated	Track request across services
`trace_id`	OpenTelemetry	Link to distributed traces
`span_id`	OpenTelemetry	Link to specific span
`request_method`	HTTP request	GET, POST, etc.
`request_path`	HTTP request	API endpoint path
`client_host`	HTTP request	Client IP address

For domain-specific context, developers add fields to extra based on what operation they're logging. The pattern is consistent: the message says what happened, extra says to what and by whom.

Practical use¶

When something goes wrong, start by filtering logs by correlation_id to see everything that happened during that request. If you need to correlate with traces, use the trace_id to jump to Jaeger.

Log Level	Use case
DEBUG	Detailed diagnostics (noisy, local debugging only)
INFO	Normal operations (started, completed, processed)
WARNING	Recoverable issues
ERROR	Failures requiring attention

The log level is controlled by the LOG_LEVEL environment variable.

Key files¶

File	Purpose
`core/logging.py`	Logger setup, filters, JSON formatter
`core/correlation.py`	Correlation ID middleware