PRD-008 — Authentication & Authorization¶
| Field | Value |
|---|---|
| Document ID | PRD-008 |
| Version | 1.0 |
| Status | DRAFT |
| Date | March 2026 |
| Author | Engineering Team |
| Parent | PRD-001 |
| Related Docs | PRD-003 (job endpoints), PRD-006 (input validation), PRD-007 (tooling standards) |
Detailed specs: Auth Implementation Spec — fills 10 implementation gaps including the missing
POST /auth/refreshhandler, Pydantic model definitions, complete env var reference, GitHub token lifecycle, Content-Security-Policy, conditional cookie Secure flag, OAuth scope guidance, library maintenance status, and JWT clock-skew leeway.
Overview¶
Why This PRD Exists¶
PRD-003 specifies five authenticated endpoints — POST /jobs, GET /jobs/{id}/stream,
POST /jobs/{id}/answer, POST /jobs/{id}/pause, DELETE /jobs/{id} — with no authentication
mechanism defined. In the current spec:
- Any actor who knows a job UUID can read another user's output stream
- Any actor can answer another user's
interrupt(), injecting arbitrary text into a running agent - Any actor can kill any job
- Any actor can create jobs, consuming operator-paid LLM API credits and GitHub API quota
This is not an oversight to defer — it is a design gap that affects the data model
(BugTriageState must carry owner_id), the API layer (every endpoint needs a dependency),
and the infrastructure (job registry, token store).
Scope¶
This PRD specifies:
- GitHub OAuth 2.0 as the sole identity provider (natural fit: product is GitHub-native)
- JWT-based session tokens for all API calls
- The authentication strategy for SSE streams (non-trivial; see SSE Endpoint Authentication)
- Per-job authorization (ownership model)
- Secure storage of GitHub OAuth tokens for repo operations
- Inter-service authentication for ARQ worker → LangServe agent calls (see Inter-Service Authentication)
- Token lifecycle and the v1/v2 simplification boundary
Threat Model¶
Assets Worth Protecting¶
| Asset | Why It Matters |
|---|---|
| Job output stream | May contain proprietary codebase content, security findings |
| Human-in-the-loop interrupt | Injecting a malicious answer steers the agent toward attacker-controlled output |
| Job lifecycle control (kill/pause) | Denial of service against a legitimate user |
| GitHub OAuth token | Repo read/write access; stored server-side |
| LLM API budget | Unlimited unauthenticated job creation burns operator credits |
LangServe agent endpoints (/agents/*) |
Unauthenticated invocation burns LLM credits and bypasses all job ownership controls |
In-Scope Threats¶
| Threat | Vector | Mitigation in This PRD |
|---|---|---|
| Unauthenticated job creation | No auth on POST /jobs |
Depends(get_current_user) on all /jobs/* |
| Cross-job access (IDOR) | Knowing or guessing a job UUID | Job ownership check on every per-job endpoint |
| UUID enumeration | Job UUIDs are v4 random (122 bits); guessing is computationally infeasible | Defense-in-depth only — ownership check is the primary control |
| Token theft via XSS | JWT in localStorage |
JWT in sessionStorage (v1) / memory + HttpOnly refresh cookie (v2) |
| CSRF on state-mutating endpoints | Cross-origin form/fetch triggering POST /jobs/{id}/answer |
SameSite=Strict on refresh cookie; Bearer token on all REST calls (CSRF requires stolen token, not just cookies) |
| GitHub OAuth token exfiltration | Token in JWT payload (base64-readable) | Token stored encrypted in Redis; only referenced by user ID |
| SSE stream hijack | Attacker opens EventSource to stream URL without auth | Auth required on SSE endpoint; see SSE Endpoint Authentication |
| Unauthenticated agent invocation | Direct POST to /agents/*/invoke ports (8001–8005) without going through the job pipeline |
Network isolation (primary) + shared secret header (defense-in-depth); see Inter-Service Authentication |
Out-of-Scope Threats (not in v1)¶
- Compromised server / database — infrastructure-layer concern
- Brute-force of GitHub OAuth state parameter — mitigated by GitHub's own rate limiting
- Stolen refresh tokens — addressed in v2 (refresh token rotation)
- LLM prompt injection via authenticated user — separate concern (content moderation)
Identity Provider: GitHub OAuth 2.0¶
Why GitHub OAuth¶
- The product is GitHub-native: users connect GitHub repos, submit GitHub issues, and write back GitHub comments
- No need for a separate user database — GitHub identity (
github_id,login) is the authoritative user record - Users already have GitHub accounts by definition (PRD-001 §Assumptions, Constraints, Dependencies)
- GitHub's OAuth flow is standard, well-documented, and handles MFA transparently
OAuth Application Scopes¶
Request the minimum scopes at authorization time:
| Scope | Why Required |
|---|---|
read:user |
Get user identity: id, login, name, avatar_url |
repo |
Read issues and write comments on private and public repos |
If the target repository is always public and write-back is not required,
repocan be dropped to reduce the permission surface. v1 always requestsrepofor full functionality.
OAuth Web Application Flow¶
sequenceDiagram
actor Browser
participant Backend
participant GitHub
Browser->>Backend: GET /auth/login
Backend-->>Browser: 302 → github.com/login/oauth/authorize
Note over Backend,GitHub: scope=read:user+repo, state=<csrf_nonce>
Browser->>GitHub: GET /login/oauth/authorize
GitHub-->>Browser: 302 → /auth/callback?code=...&state=...
Browser->>Backend: GET /auth/callback?code=...&state=...
Backend->>GitHub: POST /login/oauth/access_token (code + client_secret)
GitHub-->>Backend: github_access_token
Backend->>GitHub: GET /user (Bearer github_access_token)
GitHub-->>Backend: {id, login, name, avatar_url}
Note over Backend: encrypt & store github_token in Redis
Note over Backend: issue one-time auth_code (30s TTL) → Redis
Backend-->>Browser: 302 → frontend /auth/callback?code=<auth_code>
Browser->>Backend: POST /auth/token {code: auth_code}
Note over Backend: validate & atomically delete auth_code (single-use)
Backend-->>Browser: {access_token, token_type: bearer}
Note over Backend,Browser: Set-Cookie: refresh_token (HttpOnly; Secure; SameSite=Strict)
Why the one-time auth code step?
The backend OAuth callback issues a short-lived (30-second, single-use) opaque auth_code stored in
Redis, then redirects the browser to the frontend with ?code=<auth_code>. The frontend immediately
POSTs this code to POST /auth/token and receives the JWT. This ensures the JWT itself never appears
in a URL (not in browser history, not in server access logs, not as a URL fragment). The auth_code
is single-use and 30-second TTL, limiting its attack window even if a redirect is intercepted.
CSRF State Parameter¶
The state parameter sent to GitHub's authorization endpoint is a securely generated 32-byte random
nonce, stored in the user's session (Redis, 10-minute TTL). On callback, the backend validates that
the returned state matches the stored nonce before processing the code. This prevents CSRF attacks
on the OAuth flow itself.
Session & JWT Model¶
Token Architecture¶
Two tokens exist per authenticated session:
| Token | Type | Expiry | Storage | Purpose |
|---|---|---|---|---|
| Access token | Signed JWT | 15 minutes | React in-memory (useRef / context) |
Authenticates all API calls |
| Refresh token | Opaque UUID | 7 days | HttpOnly; Secure; SameSite=Strict cookie |
Silently renews access tokens |
v1 simplification: If the refresh flow adds too much complexity for the initial build, v1 may use
an 8-hour JWT stored in sessionStorage (survives page refresh within the tab, cleared on tab close).
This is explicitly acknowledged as an XSS risk tradeoff acceptable for a developer-facing internal
tool. The full two-token architecture is the target for v1.1.
JWT Claims¶
{
"sub": "1234567", // GitHub user ID (stable integer, string-encoded)
"login": "alex-dev", // GitHub username (for display; not used for authz)
"iat": 1741305600, // issued-at (UTC epoch)
"exp": 1741306500, // expiry (iat + 900 seconds = 15 minutes)
"jti": "uuid-v4" // JWT ID — used for access token revocation if needed
}
What is NOT in the JWT payload:
- GitHub OAuth token — stored encrypted in Redis (JWT payload is base64-encoded, not encrypted)
- Email — not needed; sub (GitHub user ID) is the authoritative identity
- Permissions/roles — v1 has no RBAC; all authenticated users have equal access to their own jobs
Signing Algorithm¶
v1: HS256 (HMAC-SHA256 with a shared secret). Simpler key management — one JWT_SECRET env var.
v2: RS256 (RSA-SHA256, asymmetric). Private key signs on backend; public key exposed at
GET /auth/.well-known/jwks.json for potential third-party verification. Migrate when the system
scales to multiple backend services that need to verify tokens independently.
# Environment variables (managed via pydantic-settings — PRD-006)
JWT_SECRET=<64-byte random hex> # v1
JWT_ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_SECONDS=900 # 15 minutes
REFRESH_TOKEN_EXPIRE_SECONDS=604800 # 7 days
GITHUB_CLIENT_ID=<from GitHub App settings>
GITHUB_CLIENT_SECRET=<from GitHub App settings>
GITHUB_TOKEN_ENCRYPTION_KEY=<Fernet.generate_key() — URL-safe base64, 32 bytes>
INTERNAL_SERVICE_SECRET=<64-byte random hex> # shared by ARQ worker + all agent services; see Inter-Service Authentication section
REST API Authentication¶
Protected Endpoints¶
Every /jobs/* endpoint requires authentication. The Depends(get_current_user) FastAPI dependency
is injected at the router level, not per-endpoint, so new endpoints cannot accidentally be added
without auth.
# routes/jobs.py
from fastapi import APIRouter, Depends
from auth.dependencies import get_current_user
from auth.models import AuthenticatedUser
router = APIRouter(prefix="/jobs", dependencies=[Depends(get_current_user)])
@router.post("/")
async def create_job(
body: CreateJobRequest,
current_user: Annotated[AuthenticatedUser, Depends(get_current_user)],
) -> JobResponse:
...
@router.get("/{job_id}/stream")
async def stream_job(
job_id: str,
current_user: Annotated[AuthenticatedUser, Depends(get_current_user)],
) -> StreamingResponse:
...
get_current_user Dependency¶
from fastapi import Depends, HTTPException, Request, status
import jwt
from pydantic import BaseModel
class AuthenticatedUser(BaseModel):
"""Resolved identity from a validated JWT."""
github_id: str
github_login: str
jti: str
async def get_current_user(
request: Request,
settings: Annotated[Settings, Depends(get_settings)],
) -> AuthenticatedUser:
"""Validate Bearer JWT and return the resolved user.
Args:
request: Incoming HTTP request; Authorization header is read directly.
settings: Application settings (JWT_SECRET, algorithm).
Returns:
Resolved authenticated user from JWT claims.
Raises:
HTTPException: 401 if token is missing, expired, or invalid.
"""
credentials_exception = HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid or expired token",
headers={"WWW-Authenticate": "Bearer"},
)
authorization = request.headers.get("Authorization", "")
if not authorization.startswith("Bearer "):
raise credentials_exception
token = authorization.removeprefix("Bearer ")
try:
payload = jwt.decode(
token,
settings.JWT_SECRET,
algorithms=[settings.JWT_ALGORITHM],
)
github_id: str | None = payload.get("sub")
github_login: str | None = payload.get("login")
jti: str | None = payload.get("jti")
if github_id is None or jti is None:
raise credentials_exception
except jwt.ExpiredSignatureError:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Token has expired",
headers={"WWW-Authenticate": "Bearer"},
)
except jwt.InvalidTokenError:
raise credentials_exception
return AuthenticatedUser(github_id=github_id, github_login=github_login or "", jti=jti)
Auth Endpoints (not under /jobs/*)¶
GET /auth/login → redirect to GitHub OAuth authorize URL
GET /auth/callback → exchange GitHub code, issue auth_code, redirect to frontend
POST /auth/token → exchange auth_code for {access_token, token_type}
POST /auth/refresh → exchange refresh cookie for new access_token
POST /auth/logout → clear refresh cookie, invalidate refresh token in Redis
DELETE /auth/github-token → delete stored GitHub OAuth token from Redis; user stays logged in
GET /auth/me → return current user info (requires Bearer)
SSE Endpoint Authentication¶
The Problem¶
The browser's native EventSource API (new EventSource(url)) does not support setting custom
request headers. This is a hard W3C specification constraint — not a browser quirk. Sending
Authorization: Bearer <token> via the EventSource constructor is impossible.
Three approaches exist; this PRD selects one and documents the rejected alternatives:
| Approach | Security | Complexity | Selected |
|---|---|---|---|
Fetch-based EventSource (@microsoft/fetch-event-source) |
Same as REST Bearer | Low — uniform auth | ✅ Yes |
Short-lived stream ticket (?t=<token>) |
Token in URL (logs, history) | Medium — extra endpoint | Fallback |
| HttpOnly session cookie | XSS-safe for SSE | High — CSRF on REST, two auth mechanisms | No |
Selected Approach: Fetch-Based EventSource¶
The React frontend uses @microsoft/fetch-event-source instead of the native EventSource. This
library implements the SSE protocol on top of the Fetch API, which supports arbitrary headers.
Frontend (React):
import { fetchEventSource } from "@microsoft/fetch-event-source";
// Sentinel used to signal "token was refreshed, retry immediately".
// Identity comparison in onerror avoids string matching.
const TOKEN_REFRESHED = new Error("token_refreshed");
async function streamJob(jobId: string, accessToken: string): Promise<void> {
// Mutable record — fetchEventSource passes this same reference to every
// fetch() attempt in its retry loop, so updating it in onopen takes effect
// on the next attempt automatically.
const headers: Record<string, string> = {
Authorization: `Bearer ${accessToken}`,
};
await fetchEventSource(`/jobs/${jobId}/stream`, {
method: "GET",
headers,
// onopen receives the Response — the only place HTTP status is available.
async onopen(response) {
if (response.status === 401) {
headers.Authorization = `Bearer ${await refreshAccessToken()}`;
throw TOKEN_REFRESHED;
}
if (!response.ok) throw new Error(`Stream error: ${response.status}`);
},
onmessage(event) {
dispatch(handleStreamEvent(JSON.parse(event.data)));
},
// onerror receives an Error (never a Response); it is synchronous.
// Returning 0 retries immediately; throwing stops the loop.
onerror(err) {
if (err !== TOKEN_REFRESHED) throw err;
return 0; // retry immediately — headers already carry the refreshed token
},
});
}
Backend: The GET /jobs/{id}/stream endpoint authenticates via Depends(get_current_user)
identically to all other endpoints. No special handling is needed on the server side.
Fallback: Short-Lived Stream Ticket¶
If @microsoft/fetch-event-source introduces a compatibility issue (e.g., proxy that buffers Fetch
responses but passes EventSource connections), the fallback is a stream ticket:
POST /jobs/{id}/stream-token (requires Bearer auth)
→ {"stream_token": "<opaque>", "expires_in": 60}
GET /jobs/{id}/stream?t=<stream_token> (single-use, 60s window)
The stream token is a UUID stored in Redis with a 60-second TTL. It is validated and immediately deleted when the stream connection is established (single-use). Once the SSE connection is open, it is authenticated by the open TCP socket — the token is only used for the handshake.
Fallback is not default — it is documented here only for operational contingency.
Token Expiry During an Active Stream¶
Access tokens expire in 15 minutes. An ongoing stream may outlast the token. The frontend handles
this by detecting the 401 status in onopen, calling POST /auth/refresh to get a new access
token, and re-opening the stream — see the streamJob pattern in Selected Approach: Fetch-Based EventSource. The Last-Event-ID header
is sent automatically by fetchEventSource on reconnect, but the backend does not guarantee
resume from the last event in v1: Redis Pub/Sub has no message history, so events emitted during
the reconnect window are lost. v1 reconnects within 2 seconds; missed events are not replayed.
Event history and gapless resume are a v2 concern (requires a persistent event log, e.g.,
Redis Streams or a DB-backed event table).
Job Ownership & Per-Resource Authorization¶
The IDOR Risk¶
Job UUIDs are v4 random (122 bits of entropy). Guessing a UUID is computationally infeasible, but relying on UUID unpredictability as the sole access control mechanism is security through obscurity — not authorization. The correct control is: only the user who created a job can operate on it.
Job Registry¶
A lightweight job registry is maintained in Redis alongside the ARQ job queue. On job creation:
# POST /jobs — after enqueuing the ARQ job
await redis.hset(
f"job_registry:{job_id}",
mapping={
"owner_id": current_user.github_id,
"created_at": datetime.utcnow().isoformat(),
"status": "queued",
},
)
await redis.expire(f"job_registry:{job_id}", 7 * 24 * 3600) # 7-day TTL
This allows O(1) ownership checks without loading the full LangGraph checkpoint state from Postgres.
Ownership Check Dependency¶
async def get_job_and_verify_owner(
job_id: str,
current_user: Annotated[AuthenticatedUser, Depends(get_current_user)],
redis: Annotated[Redis, Depends(get_redis)],
) -> str:
"""Fetch job from registry and verify current user is the owner.
Args:
job_id: Job UUID from path parameter.
current_user: Resolved authenticated user.
redis: Redis connection.
Returns:
The validated job_id.
Raises:
HTTPException: 404 if job not found or caller does not own the job (both cases return 404 to avoid revealing existence to non-owners).
"""
registry = await redis.hgetall(f"job_registry:{job_id}")
if not registry:
raise HTTPException(status_code=404, detail="Job not found")
if registry["owner_id"] != current_user.github_id:
# Return 404, not 403 — do not reveal that the job exists to non-owners
raise HTTPException(status_code=404, detail="Job not found")
return job_id
Why 404 instead of 403: Returning 403 confirms to an attacker that a job with that UUID exists. Returning 404 reveals nothing about existence.
BugTriageState Schema Change¶
BugTriageState (PRD-003 §Shared State Schema) requires one new field:
This field is checkpointed with all other state, providing an immutable ownership record in the LangGraph state history for audit purposes.
Endpoint Authorization Matrix¶
| Endpoint | Auth | Ownership check | Notes |
|---|---|---|---|
POST /jobs |
✅ get_current_user |
N/A — creates new job | Sets owner_id in state + registry |
GET /jobs/{id}/stream |
✅ get_current_user |
✅ get_job_and_verify_owner |
Read-only but contains private data |
POST /jobs/{id}/answer |
✅ get_current_user |
✅ get_job_and_verify_owner |
Critical: injects text into running agent |
POST /jobs/{id}/pause |
✅ get_current_user |
✅ get_job_and_verify_owner |
|
DELETE /jobs/{id} |
✅ get_current_user |
✅ get_job_and_verify_owner |
|
GET /auth/me |
✅ get_current_user |
N/A | Returns caller's own identity |
POST /auth/refresh |
✅ refresh cookie | N/A | No Bearer needed — cookie carries identity |
POST /auth/logout |
✅ refresh cookie | N/A |
GitHub OAuth Token Management¶
What Needs Storing¶
After the GitHub OAuth callback, the backend holds a GitHub access token (gho_...). This token is
used by LangGraph worker nodes when calling the GitHub API (reading issues, writing comments). It
must be accessible at job execution time, which may be hours after the initial login.
Storage: Encrypted in Redis¶
The GitHub OAuth token is never stored in the JWT payload (JWTs are signed, not encrypted — the payload is base64-decodable). It is stored encrypted in Redis:
import os
from cryptography.fernet import Fernet
fernet = Fernet(settings.GITHUB_TOKEN_ENCRYPTION_KEY)
async def store_github_token(github_user_id: str, github_token: str, redis: Redis) -> None:
"""Encrypt and store GitHub OAuth token in Redis."""
encrypted = fernet.encrypt(github_token.encode())
await redis.setex(
f"github_token:{github_user_id}",
365 * 24 * 3600, # 365-day TTL — re-stored on every OAuth login; see PRD-008-1 §5
encrypted,
)
async def get_github_token(github_user_id: str, redis: Redis) -> str | None:
"""Retrieve and decrypt GitHub OAuth token from Redis."""
encrypted = await redis.get(f"github_token:{github_user_id}")
if encrypted is None:
return None
return fernet.decrypt(encrypted).decode()
Token Availability in Worker Nodes¶
The ARQ worker context does not carry the JWT. writer_node does not post to GitHub — it only
prepares the draft. Actual posting is user-initiated via POST /jobs/{id}/post-comment (see
PRD-002 §GitHub Write-Back Flow). The GitHub token is fetched at that point, not inside the
graph node:
async def writer_node(state: BugTriageState, ctx: dict) -> dict:
github_comment_draft = _build_comment_draft(state)
ticket_draft = _build_ticket_draft(state)
return {
"github_comment_draft": github_comment_draft,
"ticket_draft": ticket_draft,
"status": "completed",
}
The get_github_token helper (Storage: Encrypted in Redis) is called only from the write-back endpoint handler, which
runs in the FastAPI request context after the user explicitly clicks "Post Comment":
@router.post("/jobs/{job_id}/post-comment")
async def post_comment(
job_id: Annotated[str, Depends(get_job_and_verify_owner)],
body: PostCommentRequest,
current_user: Annotated[AuthenticatedUser, Depends(get_current_user)],
redis: Annotated[Redis, Depends(get_redis)],
) -> PostCommentResponse:
github_token = await get_github_token(current_user.github_id, redis)
if github_token is None:
raise HTTPException(status_code=401, detail="GitHub session expired — re-authenticate")
# ... post comment via GitHub API ...
Token Expiry¶
GitHub OAuth tokens for OAuth Apps do not expire by default. However, GitHub automatically
revokes any OAuth token unused for 1 year of inactivity. Additionally, if a user revokes
the app's authorization via GitHub Settings → Authorized OAuth Apps, the stored token is
immediately invalidated. See PRD-008-1 §5
for the full lifecycle spec, recommended TTL (365 days), and the GitHub API 401 error path in
POST /jobs/{id}/post-comment.
GitHub App user tokens expire after 8 hours (with refresh tokens). v1 uses an OAuth App (non-expiring tokens subject to 1-year inactivity policy). v2 migration to GitHub Apps requires implementing the token refresh cycle.
Token Revocation¶
Users can disconnect their GitHub account without logging out. DELETE /auth/github-token
deletes the Fernet-encrypted token from Redis; the user's JWT and refresh token remain valid.
@router.delete("/github-token", status_code=204)
async def revoke_github_token(
current_user: Annotated[AuthenticatedUser, Depends(get_current_user)],
redis: Annotated[Redis, Depends(get_redis)],
) -> None:
"""Delete stored GitHub OAuth token. Session (JWT + refresh token) remains active."""
await redis.delete(f"github_token:{current_user.github_id}")
After revocation, POST /jobs/{id}/post-comment returns
401 GitHub session expired — re-authenticate. The user must complete a new OAuth flow to
re-authorize write-back operations. This is surfaced in the UI via the Settings page
(see PRD-002 §Settings Page).
Inter-Service Authentication¶
The Gap¶
The ARQ worker calls LangServe agent endpoints with a plain httpx.post and no auth header:
response = await client.post(
"http://localhost:8001/agents/investigator/invoke",
json={"input": {...}},
)
The /agents/* services (ports 8001–8005, see PRD-004 §Service Registry) have no auth
guard. get_current_user does not protect them — it applies only to the public /jobs/* router.
Anyone who can reach those ports can invoke agents directly, burning LLM API credits and bypassing
all job ownership checks.
Primary Control: Network Isolation¶
Agent services must not be reachable from the public internet. This is non-negotiable:
- Bind each agent service to
127.0.0.1(loopback) or an internal-only network interface — never0.0.0.0without a firewall rule blocking external access - The public-facing reverse proxy (nginx / Caddy) must not route any
/agents/*paths - In production, deploy agent services in the same VPC / private subnet as the ARQ workers with no public IP and no public security group ingress rule
Network isolation alone prevents external abuse. Everything below is defense-in-depth.
Defense-in-Depth: Shared Secret Header¶
Network isolation can fail (misconfigured proxy, SSRF pivot from another internal service). A shared secret header provides a second independent layer.
INTERNAL_SERVICE_SECRET is set in all ARQ worker and agent service environments (see Signing Algorithm for env vars). The worker sends it on every agent call:
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:8001/agents/investigator/invoke",
json={"input": {...}},
headers={"X-Internal-Token": settings.INTERNAL_SERVICE_SECRET},
)
Each LangServe agent service validates it via a FastAPI dependency applied at the router level:
async def verify_internal_token(
x_internal_token: Annotated[str | None, Header()] = None,
settings: Annotated[Settings, Depends(get_settings)],
) -> None:
"""Reject requests that do not carry the correct internal service secret.
The header is declared optional (str | None) so FastAPI does not return 422
on a missing header — both absent and wrong-value cases are handled here as 403.
Raises:
HTTPException: 403 if the header is missing or does not match.
"""
if x_internal_token is None or not secrets.compare_digest(
x_internal_token, settings.INTERNAL_SERVICE_SECRET
):
raise HTTPException(status_code=403, detail="Forbidden")
router = APIRouter(
prefix="/agents",
dependencies=[Depends(verify_internal_token)],
)
secrets.compare_digest prevents timing attacks. The secret is rotated by updating the env var
and redeploying all affected services — no token store or expiry mechanism required.
What Is Not Used Here¶
| Rejected option | Reason |
|---|---|
get_current_user (user JWT) on /agents/* |
Agent calls originate from the ARQ worker — no user session exists in that context |
Re-routing under /jobs/{id}/agents/* |
Couples LangServe microservices to the job lifecycle; intentionally avoided (PRD-004 §Architecture Decision) |
| mTLS | Correct for zero-trust infrastructure; operationally complex for v1 single-operator deployment |
| Internal JWT signed by a service account | Appropriate for multi-team microservice meshes; shared secret is simpler and sufficient for v1 |
Token Lifecycle¶
Access Token¶
| Property | Value |
|---|---|
| Type | Signed JWT (HS256) |
| Expiry | 15 minutes |
| Storage | React in-memory (useRef in auth context) |
| Lost on | Page refresh (triggers silent refresh via refresh cookie) |
| Revocation | Not supported in v1 (15-min window is the effective revocation delay) |
Refresh Token¶
| Property | Value |
|---|---|
| Type | Opaque UUID v4 |
| Expiry | 7 days |
| Storage | HttpOnly; Secure; SameSite=Strict cookie |
| Redis key | refresh_token:{uuid} → "{github_id}:{github_login}" colon-separated string (e.g. "1234567:octocat") |
| Revocation | Delete Redis key → immediate revocation |
| Rotation | Not in v1; add on v1.1 (new refresh token issued on every /auth/refresh call) |
Silent Token Refresh Flow¶
On page load and on every 401 response from the API, React calls POST /auth/refresh. The
browser automatically sends the HttpOnly refresh cookie; the backend validates it, issues a new
access JWT, and returns it in the response body. React stores the new access token in memory and
retries the failed request.
// src/api/client.ts
const api = axios.create({ baseURL: "/api", withCredentials: true });
api.interceptors.response.use(
(response) => response,
async (error) => {
if (error.response?.status === 401 && !error.config._retry) {
error.config._retry = true;
const { data } = await axios.post("/auth/refresh", {}, { withCredentials: true });
setAccessToken(data.access_token);
error.config.headers.Authorization = `Bearer ${data.access_token}`;
return api(error.config);
}
return Promise.reject(error);
}
);
Logout¶
POST /auth/logout:
1. Delete the refresh token from Redis (immediate revocation)
2. Clear the refresh_token cookie (Set-Cookie: refresh_token=; Max-Age=0; ...)
3. React clears the in-memory access token
The 15-minute access token window remains valid after logout if the token is extracted, but this is an accepted tradeoff (no server-side access token revocation in v1).
CORS & Security Headers¶
CORS Configuration¶
from fastapi.middleware.cors import CORSMiddleware
app.add_middleware(
CORSMiddleware,
allow_origins=[settings.FRONTEND_ORIGIN], # e.g. "http://localhost:5173" in dev
allow_credentials=True, # required for refresh token cookie
allow_methods=["GET", "POST", "DELETE"],
allow_headers=["Authorization", "Content-Type"],
)
allow_origins is a strict allowlist — never ["*"] with allow_credentials=True (browsers
reject this combination, and it would negate cookie security).
Security Headers¶
from starlette.middleware.base import BaseHTTPMiddleware
class SecurityHeadersMiddleware(BaseHTTPMiddleware):
"""Add security headers to all responses."""
async def dispatch(self, request, call_next):
response = await call_next(request)
response.headers["Strict-Transport-Security"] = "max-age=63072000; includeSubDomains"
response.headers["X-Content-Type-Options"] = "nosniff"
response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
response.headers["X-Frame-Options"] = "DENY"
return response
Strict-Transport-Security is only meaningful in production (behind HTTPS). In development it
is harmless.
Gap:
Content-Security-Policyis absent from this middleware. CSP is the primary XSS mitigation for in-memory JWT storage. See PRD-008-1 §6 for the complete middleware including CSP and the Swagger UI carve-out strategy.
HTTPS¶
All production traffic terminates TLS at the reverse proxy (nginx / Caddy). The FastAPI app itself
runs on HTTP internally. The Secure cookie flag and SameSite=Strict on the refresh token
cookie are meaningless without HTTPS, so production deployment must enforce it.
Dependencies & Libraries¶
Backend (add to [project.dependencies] in pyproject.toml)¶
"PyJWT>=2.8", # JWT encode/decode (HS256/RS256)
"cryptography>=42.0", # Fernet (AES-128-CBC + HMAC-SHA256) for GitHub token encryption
"httpx>=0.27", # GitHub API calls in OAuth callback (already in deps)
No python-jose — it is effectively unmaintained. PyJWT is the actively maintained standard.
Frontend (add to package.json)¶
@microsoft/fetch-event-source (MIT) — Fetch-based SSE client supporting custom headers.
Maintained by Microsoft, used in Azure and VS Code. Drop-in replacement for EventSource
with a hook-based API compatible with React.
Implementation Patterns¶
Auth Router¶
# src/auth/router.py
from __future__ import annotations
from datetime import datetime, timedelta, timezone
from urllib.parse import urlencode
from uuid import uuid4
from fastapi import APIRouter, Depends, HTTPException, Response, Request
from fastapi.responses import RedirectResponse
import secrets
import httpx
import jwt
router = APIRouter(prefix="/auth")
GITHUB_AUTHORIZE_URL = "https://github.com/login/oauth/authorize"
GITHUB_TOKEN_URL = "https://github.com/login/oauth/access_token"
GITHUB_USER_URL = "https://api.github.com/user"
@router.get("/login")
async def login(
request: Request,
redis: Annotated[Redis, Depends(get_redis)],
settings: Annotated[Settings, Depends(get_settings)],
) -> RedirectResponse:
"""Initiate GitHub OAuth flow."""
state = secrets.token_urlsafe(32)
await redis.setex(f"oauth_state:{state}", 600, "valid") # 10-minute window
params = {
"client_id": settings.GITHUB_CLIENT_ID,
"redirect_uri": settings.GITHUB_REDIRECT_URI,
"scope": "read:user repo",
"state": state,
}
url = GITHUB_AUTHORIZE_URL + "?" + urlencode(params)
return RedirectResponse(url)
@router.get("/callback")
async def callback(
code: str,
state: str,
redis: Annotated[Redis, Depends(get_redis)],
settings: Annotated[Settings, Depends(get_settings)],
) -> RedirectResponse:
"""Handle GitHub OAuth callback; issue one-time auth_code."""
stored = await redis.get(f"oauth_state:{state}")
if stored is None:
raise HTTPException(status_code=400, detail="Invalid or expired OAuth state")
await redis.delete(f"oauth_state:{state}")
async with httpx.AsyncClient() as client:
token_resp = await client.post(
GITHUB_TOKEN_URL,
data={
"client_id": settings.GITHUB_CLIENT_ID,
"client_secret": settings.GITHUB_CLIENT_SECRET,
"code": code,
},
headers={"Accept": "application/json"},
)
github_token = token_resp.json().get("access_token")
if not github_token:
raise HTTPException(status_code=400, detail="GitHub token exchange failed")
async with httpx.AsyncClient() as client:
user_resp = await client.get(
GITHUB_USER_URL,
headers={"Authorization": f"Bearer {github_token}"},
)
github_user = user_resp.json()
await store_github_token(str(github_user["id"]), github_token, redis)
auth_code = secrets.token_urlsafe(32)
await redis.setex(
f"auth_code:{auth_code}",
30, # 30-second single-use window
f"{github_user['id']}:{github_user['login']}",
)
return RedirectResponse(f"{settings.FRONTEND_ORIGIN}/auth/callback?code={auth_code}")
@router.post("/token")
async def exchange_token(
body: AuthCodeRequest,
response: Response,
redis: Annotated[Redis, Depends(get_redis)],
settings: Annotated[Settings, Depends(get_settings)],
) -> AccessTokenResponse:
"""Exchange one-time auth_code for access + refresh tokens."""
value = await redis.getdel(f"auth_code:{body.code}") # getdel: atomic fetch-and-delete guarantees single-use
if value is None:
raise HTTPException(status_code=400, detail="Invalid or expired auth code")
github_id, github_login = value.split(":", 1)
jti = str(uuid4())
access_token = jwt.encode(
{
"sub": github_id,
"login": github_login,
"jti": jti,
"iat": datetime.now(timezone.utc),
"exp": datetime.now(timezone.utc) + timedelta(seconds=settings.ACCESS_TOKEN_EXPIRE_SECONDS),
},
settings.JWT_SECRET,
algorithm=settings.JWT_ALGORITHM,
)
refresh_token = str(uuid4())
await redis.setex(
f"refresh_token:{refresh_token}",
settings.REFRESH_TOKEN_EXPIRE_SECONDS,
github_id,
)
response.set_cookie(
key="refresh_token",
value=refresh_token,
httponly=True,
secure=True,
samesite="strict",
max_age=settings.REFRESH_TOKEN_EXPIRE_SECONDS,
path="/auth", # cookie only sent to /auth endpoints
)
return AccessTokenResponse(access_token=access_token, token_type="bearer")
Protecting a Job Endpoint (Full Example)¶
@router.get("/{job_id}/stream")
async def stream_job(
job_id: Annotated[str, Depends(get_job_and_verify_owner)],
redis: Annotated[Redis, Depends(get_redis)],
) -> StreamingResponse:
"""Stream live agent events for a job.
Authenticated via Bearer token (Authorization header).
Ownership verified via job registry in Redis.
Uses @microsoft/fetch-event-source on the frontend — not native EventSource.
"""
channel = f"jobs:{job_id}:events"
async def event_generator() -> AsyncGenerator[str, None]:
pubsub = redis.pubsub()
await pubsub.subscribe(channel)
seq = 0
try:
async for message in pubsub.listen():
if message["type"] != "message":
continue
yield f"id: {seq}\ndata: {message['data']}\n\n"
seq += 1
if json.loads(message["data"]).get("type") == "job.done":
break
finally:
await pubsub.unsubscribe(channel)
await pubsub.close()
return StreamingResponse(
event_generator(),
media_type="text/event-stream",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)
Out of Scope (v1)¶
| Feature | Rationale for Deferral |
|---|---|
| RBAC / roles | Single-user-per-job model; no shared access or team permissions in v1 |
| Refresh token rotation | Adds client complexity; 15-min access token + 7-day non-rotating refresh is acceptable for v1 |
| Access token revocation | Requires a Redis token blocklist; 15-min expiry limits the damage window |
| GitHub App (vs OAuth App) | GitHub Apps have expiring tokens + finer permission granularity; migration path documented in Token Expiry |
| Multi-user job sharing | Share a job stream with a colleague — v2 feature requiring explicit ACL |
| Rate limiting per user | Prevent a single user from creating thousands of jobs; separate infrastructure concern (API gateway / Redis rate limiter) |
| Audit log | Record of who did what to which job; v2 compliance feature |
| RS256 JWT signing | Asymmetric keys useful when multiple services verify tokens; overkill for v1 monolith |