DLQ Processor¶
The DLQ (Dead Letter Queue) processor drains and retries dead-lettered messages with exponential backoff. When a message fails processing too many times, it ends up in the DLQ for manual inspection and retry.
For the full picture on how dead-lettering works, see Dead Letter Queue.
graph LR
DLQ[(Dead Letter Queue)] --> Processor[DLQ Processor]
Processor --> Original[(Original Topic)]
Processor --> Archive[(Archive)]
How it works¶
When a Kafka consumer fails to process a message after exhausting its retries, the message is sent to the dead letter queue topic. The DLQ processor picks up these messages and applies a retry policy:
- Check if the message has exceeded max retry attempts
- If not, wait for the backoff delay
- Republish to the original topic
- If successful, remove from DLQ
- If max attempts exceeded, archive the message
The processor uses exponential backoff — each retry waits longer than the previous one, up to a maximum delay. This prevents overwhelming downstream services during outages.
Configuration¶
| Variable | Default | Description |
|---|---|---|
DLQ_RETRY_MAX_ATTEMPTS |
5 | Maximum retry attempts |
DLQ_RETRY_BASE_DELAY_SECONDS |
60 | Base retry delay |
DLQ_RETRY_MAX_DELAY_SECONDS |
3,600 | Maximum retry delay (1 hour) |
DLQ_RETENTION_DAYS |
7 | Message retention |
DLQ_WARNING_THRESHOLD |
100 | Threshold for warning alerts |
DLQ_CRITICAL_THRESHOLD |
1,000 | Threshold for critical alerts |
Monitoring¶
The DLQ can be monitored via the admin API:
GET /api/v1/dlq/stats— DLQ statistics by status, topic, event typeGET /api/v1/dlq/messages— List DLQ messages with filteringGET /api/v1/dlq/messages/{event_id}— Retrieve a specific message by IDGET /api/v1/dlq/topics— List all topics with DLQ messagesPOST /api/v1/dlq/retry— Manually retry messagesPOST /api/v1/dlq/retry-policy— Configure retry policy for a topicDELETE /api/v1/dlq/messages/{event_id}— Discard a message
Key files¶
| File | Purpose |
|---|---|
run_dlq_processor.py |
Entry point |
manager.py |
DLQ management logic |
dlq.py |
Admin API routes |
Deployment¶
dlq-processor:
image: ghcr.io/hardmax71/integr8scode/backend:${IMAGE_TAG:-latest}
command: ["python", "workers/run_dlq_processor.py"]
Usually runs as a single replica. The processor is designed to handle periodic retries, not real-time processing.