Rate Limiting¶

The platform uses Redis-backed rate limiting with per-user and per-endpoint controls. Two algorithms are available—sliding window for precise time-based limits and token bucket for bursty workloads. Authenticated users are tracked by user ID; anonymous requests fall back to IP-based limiting.

Architecture¶

flowchart TB
    subgraph Request Flow
        REQ[Incoming Request] --> MW[RateLimitMiddleware]
        MW --> AUTH{Authenticated?}
        AUTH -->|Yes| UID[User ID]
        AUTH -->|No| IP[IP Address]
        UID --> CHECK[Check Rate Limit]
        IP --> CHECK
    end

    subgraph Rate Limit Service
        CHECK --> CONFIG[Load Config from Redis]
        CONFIG --> MATCH[Match Endpoint Rule]
        MATCH --> ALGO{Algorithm}
        ALGO -->|Sliding Window| SW[ZSET Counter]
        ALGO -->|Token Bucket| TB[Token State]
        SW --> RESULT[RateLimitStatus]
        TB --> RESULT
    end

    subgraph Response
        RESULT --> ALLOWED{Allowed?}
        ALLOWED -->|Yes| HEADERS[Add Rate Limit Headers]
        ALLOWED -->|No| REJECT[429 Too Many Requests]
        HEADERS --> APP[Application]
    end

Algorithms¶

The rate limiter supports two algorithms, selectable per rule.

Sliding window tracks requests in a Redis sorted set, with timestamps as scores. Each request adds an entry; stale entries outside the window are pruned. This provides precise limiting but uses more memory for high-traffic endpoints.

Token bucket maintains a bucket of tokens that refill at a constant rate. Each request consumes one token. When empty, requests are rejected until tokens refill. The burst_multiplier controls how many extra tokens can accumulate beyond the base limit, allowing controlled bursts.

    SLIDING_WINDOW = "sliding_window"
    TOKEN_BUCKET = "token_bucket"
    FIXED_WINDOW = "fixed_window"
    LEAKY_BUCKET = "leaky_bucket"

Default Rules¶

The platform ships with default rate limits organized by endpoint group. Higher priority rules match first:

Pattern	Group	Limit	Window	Priority
`^/api/v1/execute`	execution	10 req	60s	10
`^/api/v1/auth/.*`	auth	20 req	60s	7
`^/api/v1/admin/.*`	admin	100 req	60s	5
`^/api/v1/events/.*`	sse	5 req	60s	3
`^/api/v1/ws`	websocket	5 req	60s	3
`^/api/v1/.*`	api	60 req	60s	1

Execution endpoints have the strictest limits since they spawn Kubernetes pods. The catch-all API rule (priority 1) applies to any endpoint not matching a more specific pattern.

WebSocket rule

The /api/v1/ws pattern is reserved for future WebSocket support. The platform currently uses Server-Sent Events (SSE) for real-time updates via /api/v1/events/*.

Middleware Integration¶

The RateLimitMiddleware intercepts all HTTP requests, extracts the user identifier, and checks against the configured limits:

class RateLimitMiddleware:
    """
    Middleware for rate limiting API requests.

    Features:
    - User-based limits for authenticated requests
    - IP-based limits for anonymous requests
    - Dynamic configuration via Redis
    - Graceful degradation on errors
    """

    # Paths exempt from rate limiting
    EXCLUDED_PATHS = frozenset(
        {
            "/health",
            "/metrics",
            "/docs",
            "/openapi.json",
            "/favicon.ico",
            "/api/v1/auth/login",  # Auth endpoints handle their own limits
            "/api/v1/auth/register",
            "/api/v1/auth/logout",
        }
    )

For authenticated requests, the middleware uses the user ID from the request state. Anonymous requests are identified by client IP address:

    def _extract_user_id(self, request: Request) -> str:
        user: User | None = request.state.__dict__.get("user")
        if user:
            return str(user.user_id)
        return f"ip:{get_client_ip(request)}"

Response Headers¶

Every response includes rate limit headers so clients can implement backoff logic:

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed in the window
`X-RateLimit-Remaining`	Requests remaining in current window
`X-RateLimit-Reset`	Unix timestamp when the window resets
`Retry-After`	Seconds to wait before retrying (429 responses only)

When a request is rejected, the middleware returns a 429 response with these headers plus a JSON body:

{
  "detail": "Rate limit exceeded",
  "retry_after": 45,
  "reset_at": "2024-01-15T10:30:00+00:00"
}

Usage Statistics¶

Administrators can query current rate limit usage for any user via the admin API. The response follows the IETF RateLimit headers draft convention, using a single remaining field that represents available capacity regardless of the underlying algorithm:

class EndpointUsageStats:
    """Usage statistics for a single endpoint (IETF RateLimit-style)."""

    algorithm: RateLimitAlgorithm
    remaining: int

For sliding window, remaining is calculated as limit - requests_in_window. For token bucket, it's the current token count. This unified representation lets clients implement backoff logic without caring which algorithm is in use.

Per-User Overrides¶

Administrators can customize limits for specific users through the admin API. User overrides support:

Bypass: Completely disable rate limiting for the user
Global multiplier: Scale all limits up or down (e.g., 2.0 doubles the limit)
Custom rules: Add user-specific rules that take priority over defaults

class UserRateLimit:
    user_id: str
    rules: list[RateLimitRule] = field(default_factory=list)
    global_multiplier: float = 1.0
    bypass_rate_limit: bool = False
    created_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
    updated_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
    notes: str | None = None

Redis Storage¶

Rate limit state is stored in Redis with automatic TTL expiration. The sliding window algorithm uses sorted sets:

        pipe.zremrangebyscore(key, 0, window_start)
        pipe.zadd(key, {str(now): now})
        pipe.zcard(key)
        pipe.expire(key, window_seconds * 2)
        results = await pipe.execute()

Token bucket state is stored as JSON with the current token count and last refill time:

        if bucket_data:
            bucket = json.loads(bucket_data)
            tokens = bucket["tokens"]
            last_refill = bucket["last_refill"]
            time_passed = now - last_refill
            tokens_to_add = time_passed * refill_rate
            tokens = min(max_tokens, tokens + tokens_to_add)
        else:
            tokens = max_tokens

        if tokens >= 1:
            tokens -= 1
            await self.redis.setex(key, window_seconds * 2, json.dumps({"tokens": tokens, "last_refill": now}))

Configuration is cached in Redis for 5 minutes to reduce database load while allowing dynamic updates.

Configuration¶

Rate limiting is controlled by environment variables:

Variable	Default	Description
`RATE_LIMIT_REDIS_PREFIX`	`rate_limit:`	Redis key prefix for isolation
`RATE_LIMIT_ALGORITHM`	`sliding_window`	Algorithm to use (`sliding_window` or `token_bucket`)
`RATE_LIMIT_DEFAULT_REQUESTS`	`100`	Default request limit
`RATE_LIMIT_DEFAULT_WINDOW`	`60`	Default window in seconds
`RATE_LIMIT_BURST_MULTIPLIER`	`1.5`	Burst multiplier for token bucket

The system gracefully degrades when Redis is unavailable—requests are allowed through rather than failing closed.

Key Files¶

File	Purpose
`services/rate_limit_service.py`	Rate limit algorithms and Redis operations
`core/middlewares/rate_limit.py`	ASGI middleware for request interception
`domain/rate_limit/`	Domain models and default configuration