Rate Limiting¶
The platform uses Redis-backed rate limiting with per-user and per-endpoint controls. Two algorithms are available—sliding window for precise time-based limits and token bucket for bursty workloads. Authenticated users are tracked by user ID; anonymous requests fall back to IP-based limiting.
Architecture¶
flowchart TB
subgraph Request Flow
REQ[Incoming Request] --> MW[RateLimitMiddleware]
MW --> AUTH{Authenticated?}
AUTH -->|Yes| UID[User ID]
AUTH -->|No| IP[IP Address]
UID --> CHECK[Check Rate Limit]
IP --> CHECK
end
subgraph Rate Limit Service
CHECK --> CONFIG[Load Config from Redis]
CONFIG --> MATCH[Match Endpoint Rule]
MATCH --> ALGO{Algorithm}
ALGO -->|Sliding Window| SW[ZSET Counter]
ALGO -->|Token Bucket| TB[Token State]
SW --> RESULT[RateLimitStatus]
TB --> RESULT
end
subgraph Response
RESULT --> ALLOWED{Allowed?}
ALLOWED -->|Yes| HEADERS[Add Rate Limit Headers]
ALLOWED -->|No| REJECT[429 Too Many Requests]
HEADERS --> APP[Application]
end
Algorithms¶
The rate limiter supports two algorithms, selectable per rule.
Sliding window tracks requests in a Redis sorted set, with timestamps as scores. Each request adds an entry; stale entries outside the window are pruned. This provides precise limiting but uses more memory for high-traffic endpoints.
Token bucket maintains a bucket of tokens that refill at a constant rate. Each request consumes one token. When
empty, requests are rejected until tokens refill. The burst_multiplier controls how many extra tokens can accumulate
beyond the base limit, allowing controlled bursts.
SLIDING_WINDOW = "sliding_window"
TOKEN_BUCKET = "token_bucket"
FIXED_WINDOW = "fixed_window"
LEAKY_BUCKET = "leaky_bucket"
Default Rules¶
The platform ships with default rate limits organized by endpoint group. Higher priority rules match first:
| Pattern | Group | Limit | Window | Priority |
|---|---|---|---|---|
^/api/v1/execute |
execution | 10 req | 60s | 10 |
^/api/v1/auth/.* |
auth | 20 req | 60s | 7 |
^/api/v1/admin/.* |
admin | 100 req | 60s | 5 |
^/api/v1/events/.* |
sse | 5 req | 60s | 3 |
^/api/v1/ws |
websocket | 5 req | 60s | 3 |
^/api/v1/.* |
api | 60 req | 60s | 1 |
Execution endpoints have the strictest limits since they spawn Kubernetes pods. The catch-all API rule (priority 1) applies to any endpoint not matching a more specific pattern.
WebSocket rule
The /api/v1/ws pattern is reserved for future WebSocket support. The platform currently uses Server-Sent Events
(SSE) for real-time updates via /api/v1/events/*.
Middleware Integration¶
The RateLimitMiddleware intercepts all HTTP requests, extracts the user identifier, and checks against the configured
limits:
class RateLimitMiddleware:
"""
Middleware for rate limiting API requests.
Features:
- User-based limits for authenticated requests
- IP-based limits for anonymous requests
- Dynamic configuration via Redis
- Graceful degradation on errors
"""
# Paths exempt from rate limiting
EXCLUDED_PATHS = frozenset(
{
"/health",
"/metrics",
"/docs",
"/openapi.json",
"/favicon.ico",
"/api/v1/auth/login", # Auth endpoints handle their own limits
"/api/v1/auth/register",
"/api/v1/auth/logout",
}
)
For authenticated requests, the middleware uses the user ID from the request state. Anonymous requests are identified by client IP address:
def _extract_user_id(self, request: Request) -> str:
user: User | None = request.state.__dict__.get("user")
if user:
return str(user.user_id)
return f"ip:{get_client_ip(request)}"
Response Headers¶
Every response includes rate limit headers so clients can implement backoff logic:
| Header | Description |
|---|---|
X-RateLimit-Limit |
Maximum requests allowed in the window |
X-RateLimit-Remaining |
Requests remaining in current window |
X-RateLimit-Reset |
Unix timestamp when the window resets |
Retry-After |
Seconds to wait before retrying (429 responses only) |
When a request is rejected, the middleware returns a 429 response with these headers plus a JSON body:
Usage Statistics¶
Administrators can query current rate limit usage for any user via the admin API. The response follows the
IETF RateLimit headers draft convention,
using a single remaining field that represents available capacity regardless of the underlying algorithm:
class EndpointUsageStats:
"""Usage statistics for a single endpoint (IETF RateLimit-style)."""
algorithm: RateLimitAlgorithm
remaining: int
For sliding window, remaining is calculated as limit - requests_in_window. For token bucket, it's the current token
count. This unified representation lets clients implement backoff logic without caring which algorithm is in use.
Per-User Overrides¶
Administrators can customize limits for specific users through the admin API. User overrides support:
- Bypass: Completely disable rate limiting for the user
- Global multiplier: Scale all limits up or down (e.g., 2.0 doubles the limit)
- Custom rules: Add user-specific rules that take priority over defaults
class UserRateLimit:
user_id: str
rules: list[RateLimitRule] = field(default_factory=list)
global_multiplier: float = 1.0
bypass_rate_limit: bool = False
created_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
updated_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
notes: str | None = None
Redis Storage¶
Rate limit state is stored in Redis with automatic TTL expiration. The sliding window algorithm uses sorted sets:
pipe.zremrangebyscore(key, 0, window_start)
pipe.zadd(key, {str(now): now})
pipe.zcard(key)
pipe.expire(key, window_seconds * 2)
results = await pipe.execute()
Token bucket state is stored as JSON with the current token count and last refill time:
if bucket_data:
bucket = json.loads(bucket_data)
tokens = bucket["tokens"]
last_refill = bucket["last_refill"]
time_passed = now - last_refill
tokens_to_add = time_passed * refill_rate
tokens = min(max_tokens, tokens + tokens_to_add)
else:
tokens = max_tokens
if tokens >= 1:
tokens -= 1
await self.redis.setex(key, window_seconds * 2, json.dumps({"tokens": tokens, "last_refill": now}))
Configuration is cached in Redis for 5 minutes to reduce database load while allowing dynamic updates.
Configuration¶
Rate limiting is controlled by environment variables:
| Variable | Default | Description |
|---|---|---|
RATE_LIMIT_REDIS_PREFIX |
rate_limit: |
Redis key prefix for isolation |
RATE_LIMIT_ALGORITHM |
sliding_window |
Algorithm to use (sliding_window or token_bucket) |
RATE_LIMIT_DEFAULT_REQUESTS |
100 |
Default request limit |
RATE_LIMIT_DEFAULT_WINDOW |
60 |
Default window in seconds |
RATE_LIMIT_BURST_MULTIPLIER |
1.5 |
Burst multiplier for token bucket |
The system gracefully degrades when Redis is unavailable—requests are allowed through rather than failing closed.
Key Files¶
| File | Purpose |
|---|---|
services/rate_limit_service.py |
Rate limit algorithms and Redis operations |
core/middlewares/rate_limit.py |
ASGI middleware for request interception |
domain/rate_limit/ |
Domain models and default configuration |