Skip to content

Rate Limiting

The platform uses Redis-backed rate limiting with per-user and per-endpoint controls. Two algorithms are available—sliding window for precise time-based limits and token bucket for bursty workloads. Authenticated users are tracked by user ID; anonymous requests fall back to IP-based limiting.

Architecture

flowchart TB
    subgraph Request Flow
        REQ[Incoming Request] --> MW[RateLimitMiddleware]
        MW --> AUTH{Authenticated?}
        AUTH -->|Yes| UID[User ID]
        AUTH -->|No| IP[IP Address]
        UID --> CHECK[Check Rate Limit]
        IP --> CHECK
    end

    subgraph Rate Limit Service
        CHECK --> CONFIG[Load Config from Redis]
        CONFIG --> MATCH[Match Endpoint Rule]
        MATCH --> ALGO{Algorithm}
        ALGO -->|Sliding Window| SW[ZSET Counter]
        ALGO -->|Token Bucket| TB[Token State]
        SW --> RESULT[RateLimitStatus]
        TB --> RESULT
    end

    subgraph Response
        RESULT --> ALLOWED{Allowed?}
        ALLOWED -->|Yes| HEADERS[Add Rate Limit Headers]
        ALLOWED -->|No| REJECT[429 Too Many Requests]
        HEADERS --> APP[Application]
    end

Algorithms

The rate limiter supports two algorithms, selectable per rule.

Sliding window tracks requests in a Redis sorted set, with timestamps as scores. Each request adds an entry; stale entries outside the window are pruned. This provides precise limiting but uses more memory for high-traffic endpoints.

Token bucket maintains a bucket of tokens that refill at a constant rate. Each request consumes one token. When empty, requests are rejected until tokens refill. The burst_multiplier controls how many extra tokens can accumulate beyond the base limit, allowing controlled bursts.

class EndpointGroup(StringEnum):
    EXECUTION = "execution"
    ADMIN = "admin"
    SSE = "sse"
    WEBSOCKET = "websocket"
    AUTH = "auth"
    PUBLIC = "public"
    API = "api"

Default Rules

The platform ships with default rate limits organized by endpoint group. Higher priority rules match first:

Pattern Group Limit Window Priority
^/api/v1/execute execution 10 req 60s 10
^/api/v1/auth/.* auth 20 req 60s 7
^/api/v1/admin/.* admin 100 req 60s 5
^/api/v1/events/.* sse 5 req 60s 3
^/api/v1/ws websocket 5 req 60s 3
^/api/v1/.* api 60 req 60s 1

Execution endpoints have the strictest limits since they spawn Kubernetes pods. The catch-all API rule (priority 1) applies to any endpoint not matching a more specific pattern.

WebSocket rule

The /api/v1/ws pattern is reserved for future WebSocket support. The platform currently uses Server-Sent Events (SSE) for real-time updates via /api/v1/events/*.

Middleware Integration

The RateLimitMiddleware intercepts every HTTP request except a set of excluded paths (health checks, auth endpoints, static assets):

class RateLimitMiddleware:
    """
    Middleware for rate limiting API requests.

    Features:
    - User-based limits for authenticated requests
    - IP-based limits for anonymous requests
    - Dynamic configuration via Redis
    - Graceful degradation on errors
    """

    # Paths exempt from rate limiting
    EXCLUDED_PATHS = frozenset(
        {
            "/health",
            "/metrics",
            "/docs",
            "/openapi.json",
            "/favicon.ico",
            "/api/v1/auth/login",  # Auth endpoints handle their own limits
            "/api/v1/auth/register",
            "/api/v1/auth/logout",
        }
    )

For authenticated requests, the middleware uses the user ID from the request state. Anonymous requests are identified by client IP address:

    @staticmethod
    def _extract_user_id(request: Request) -> str:
        """Extract rate-limit bucket key from client IP.

        Middleware runs before route-level auth, so no verified identity is
        available here. Using unverified JWT claims would let an attacker
        craft arbitrary bucket keys to bypass IP-based limits.
        """
        return f"ip:{get_client_ip(request)}"

Response Headers

Every response includes rate limit headers so clients can implement backoff logic:

Header Description
X-RateLimit-Limit Maximum requests allowed in the window
X-RateLimit-Remaining Requests remaining in current window
X-RateLimit-Reset Unix timestamp when the window resets
Retry-After Seconds to wait before retrying (429 responses only)

When a request is rejected, the middleware returns a 429 response with these headers plus a JSON body:

{
  "detail": "Rate limit exceeded",
  "retry_after": 45,
  "reset_at": "2024-01-15T10:30:00+00:00"
}

Usage Statistics

Administrators can query current rate limit usage for any user via the admin API. The response follows the IETF RateLimit headers draft convention, using a single remaining field that represents available capacity regardless of the underlying algorithm:

@dataclass
class EndpointUsageStats:
    """Usage statistics for a single endpoint (IETF RateLimit-style)."""

    algorithm: RateLimitAlgorithm
    remaining: int

For sliding window, remaining is calculated as limit - requests_in_window. For token bucket, it's the current token count. This unified representation lets clients implement backoff logic without caring which algorithm is in use.

Per-User Overrides

Administrators can customize limits for specific users through the admin API. User overrides support:

  • Bypass: Completely disable rate limiting for the user
  • Global multiplier: Scale all limits up or down (e.g., 2.0 doubles the limit)
  • Custom rules: Add user-specific rules that take priority over defaults
@dataclass
class UserRateLimit:
    user_id: str
    rules: list[RateLimitRule] = field(default_factory=list)
    global_multiplier: float = 1.0
    bypass_rate_limit: bool = False
    created_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
    updated_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
    notes: str | None = None

Redis Storage

Rate limit state is stored in Redis with automatic TTL expiration. The sliding window algorithm uses sorted sets:

        pipe = self.redis.pipeline()
        pipe.zremrangebyscore(key, 0, window_start)
        pipe.zadd(key, {str(now): now})
        pipe.zcard(key)
        pipe.expire(key, window_seconds * 2)
        results = await pipe.execute()

Token bucket state is stored as JSON with the current token count and last refill time:

        result = await self.redis.eval(  # type: ignore[misc]
            self._TOKEN_BUCKET_LUA, 1, key, max_tokens, refill_rate, now, window_seconds * 2,
        )
        allowed = bool(result[0])
        tokens = float(result[1])

Configuration is cached in Redis for 5 minutes to reduce database load while allowing dynamic updates.

Configuration

Rate limiting is controlled by TOML settings:

Variable Default Description
RATE_LIMIT_REDIS_PREFIX rate_limit: Redis key prefix for isolation
RATE_LIMIT_ALGORITHM sliding_window Algorithm to use (sliding_window or token_bucket)
RATE_LIMIT_DEFAULT_REQUESTS 100 Default request limit
RATE_LIMIT_DEFAULT_WINDOW 60 Default window in seconds
RATE_LIMIT_BURST_MULTIPLIER 1.5 Burst multiplier for token bucket

The system gracefully degrades when Redis is unavailable—requests are allowed through rather than failing closed.

Key Files

File Purpose
services/rate_limit_service.py Rate limit algorithms and Redis operations
core/middlewares/rate_limit.py ASGI middleware for request interception
domain/rate_limit/ Domain models and default configuration