Roadmap v1 — Precise Delivery Plan¶

Field	Value
Version	v1.0
Status	DRAFT
Date	March 2026
Parent	PRD-001 Master Overview

This document is the authoritative, week-precise delivery plan for AgentOps Dashboard v1.0. Each phase lists concrete deliverables, binary acceptance criteria per deliverable, and an exit gate — the set of conditions that must all pass before the next phase begins.

Phase 1 — Core Loop¶

Target: Weeks 1–3

Deliverables¶

D1.1 Project scaffolding
pyproject.toml with uv workspace, ruff, ty, pytest configured
Docker Compose file with placeholder services
GitHub Actions CI running lint + test on push
README.md with quickstart skeleton
D1.2 Single Investigator LCEL chain
InvestigatorChain: ChatPromptTemplate | ChatOpenAI.with_structured_output(InvestigatorOutput)
Output model: InvestigatorOutput(summary: str, hypotheses: list[str], files_to_check: list[str])
Unit tests covering happy path and malformed LLM output fallback
D1.3 LangGraph skeleton (single-agent)
StateGraph with nodes: supervisor, investigator, writer, END
AgentState TypedDict with: issue_url, messages, agent_outputs, status
Supervisor routing logic (stub: always routes to investigator then writer)
SQLite checkpointer configured
D1.4 LangSmith tracing active
LANGCHAIN_TRACING_V2=true in .env.example
Every chain call produces a trace with project name agent-ops-v1
Trace has named spans for prompt, LLM call, and parser
D1.5 Minimal FastAPI job endpoint
POST /jobs — creates a job record (SQLite), enqueues via ARQ, returns {id, status}
GET /jobs/{id} — returns job record with current status
Pydantic v2 request model: CreateJobRequest(issue_url: HttpUrl)

Acceptance Criteria¶

Deliverable	Criterion
D1.1	`ruff check .` exits 0; `pytest` passes; `docker compose up` starts without error
D1.2	`pytest tests/unit/test_investigator_chain.py` passes all cases including output validation
D1.3	Running `python -m src.graph.run --issue-url <url>` completes and prints `AgentState.agent_outputs`
D1.4	LangSmith UI shows a trace with ≥3 nested spans after running D1.3
D1.5	`curl -X POST /jobs -d '{"issue_url":"..."}'` returns `200` with `id` and `status: queued`

Exit Gate (Phase 1 → Phase 2)¶

All of the following must be true:

[ ] D1.1–D1.5 acceptance criteria all pass
[ ] A real GitHub issue (not a fixture) produces a non-empty InvestigatorOutput
[ ] LangSmith trace is visible for the real-issue run
[ ] CI is green on main

Phase 2 — Multi-Agent¶

Target: Weeks 4–5

Deliverables¶

D2.1 Full agent suite (5 LCEL chains)
CodebaseSearchChain: semantic search over embedded repo (Chroma + text-embedding-3-small)
WebSearchChain: Tavily API tool integrated into LCEL chain
CriticChain: reviews prior agent outputs, outputs CriticOutput(issues: list[str], verdict: Literal["pass","revise"])
WriterChain: produces WriterOutput(severity, root_cause, relevant_files, draft_comment, ticket_draft)
D2.2 LangServe endpoints (one per chain)
POST /agents/investigator/invoke
POST /agents/codebase-search/invoke
POST /agents/web-search/invoke
POST /agents/critic/invoke
POST /agents/writer/invoke
Each endpoint has /health returning 200 {"status": "ok"}
Each agent service has its own Dockerfile and entry in docker-compose.yml
D2.3 Supervisor routing logic (full)
Supervisor reads AgentState and decides next agent via LLM-powered routing
Routing respects dependency order: investigator and codebase-search before critic; critic before writer
max_iterations guard: terminates after 10 supervisor decisions
D2.4 Shared state flowing through all nodes
AgentState carries accumulated agent_outputs across all nodes
Each worker reads relevant prior outputs from state before calling its chain
Final writer node has access to all prior outputs

Acceptance Criteria¶

Deliverable	Criterion
D2.1	`pytest tests/unit/test_*_chain.py` passes for all 5 chains
D2.2	`curl /agents/investigator/health` → 200; `curl /agents/writer/invoke` with fixture input → valid `WriterOutput` JSON
D2.3	Running the full graph on a fixture issue routes through ≥3 distinct agents before reaching writer
D2.4	`WriterOutput.relevant_files` contains files identified by `CodebaseSearchChain` in the same run

Exit Gate (Phase 2 → Phase 3)¶

All of the following must be true:

[ ] D2.1–D2.4 acceptance criteria all pass
[ ] Full 5-agent run completes end-to-end in < 3 minutes on a real GitHub issue
[ ] LangSmith trace shows all 5 agent spans under the supervisor span
[ ] All 5 Docker services start cleanly via docker compose up
[ ] CI green on main

Phase 3 — Human-in-the-Loop¶

Target: Week 6

Deliverables¶

D3.1 interrupt() nodes in LangGraph
human_input node added to the graph with interrupt_before configuration
Supervisor can route to human_input when confidence is below threshold
Interrupt stores the question in AgentState.pending_question
D3.2 Job control endpoints
POST /jobs/{id}/answer — injects user answer and calls graph.update_state() to resume
POST /jobs/{id}/pause — sets Redis flag checked by worker before each node
POST /jobs/{id}/resume — clears pause flag
POST /jobs/{id}/redirect — injects instruction into supervisor's next routing context
POST /jobs/{id}/kill — calls Job.abort() on the ARQ job, sets status cancelled
D3.3 Checkpoint persistence (Postgres)
Migrate checkpointer from SQLite to Postgres for production durability
AsyncPostgresSaver configured in graph builder
Graph state survives a worker process restart mid-job
D3.4 Timeout handling
Jobs in waiting status for > 10 minutes automatically transition to timed_out
ARQ scheduled task runs every minute to enforce timeout

Acceptance Criteria¶

Deliverable	Criterion
D3.1	Running fixture `FIXTURE_HITL` issue causes job status to reach `waiting` with non-empty `pending_question`
D3.2	`pytest tests/integration/test_job_control.py` passes all 5 control action tests
D3.3	Killing and restarting the worker mid-job, then calling `/resume`, completes the job successfully
D3.4	A job left in `waiting` for 10+ minutes (mocked clock) transitions to `timed_out`

Exit Gate (Phase 3 → Phase 4)¶

All of the following must be true:

[ ] D3.1–D3.4 acceptance criteria all pass
[ ] A human-in-the-loop round-trip (question → answer → completion) works on a real issue
[ ] Pause → resume preserves all agent outputs accumulated before the pause
[ ] Kill leaves no orphaned ARQ jobs (confirmed via ARQ dashboard or arq.inspect)

Phase 4 — Backend API¶

Target: Weeks 7–8

Deliverables¶

D4.1 SSE streaming endpoint
GET /jobs/{id}/stream returns Content-Type: text/event-stream
Events: {"type": "agent_output", "agent": "...", "chunk": "..."} and {"type": "status_change", "status": "..."}
Redis Pub/Sub used for fanout (worker publishes → API subscribes → SSE to client)
Connection drops reconnect within ~2 s; missed events are not replayed — Pub/Sub has no history (gapless resume via Last-Event-ID is a v2 concern requiring Redis Streams or a DB-backed event log)
D4.2 Job persistence layer
Postgres table jobs: id, issue_url, status, created_at, updated_at, output, langsmith_url
SQLAlchemy async ORM models
Alembic migrations for schema
D4.3 GitHub API integration
GET /repos/{owner}/{repo}/issues/{number} called on job creation to fetch issue body, labels, author
Issue data stored in AgentState.issue_context
GitHub OAuth token passed via Authorization: Bearer header on API calls
D4.4 API hardening
Rate limiting: max 10 concurrent jobs per user
Input validation: issue_url must match github.com/{owner}/{repo}/issues/{number} pattern
OpenAPI docs auto-generated and accessible at /docs
All endpoints return structured error responses {"error": {"code": ..., "message": ...}}

Acceptance Criteria¶

Deliverable	Criterion
D4.1	`curl -N /jobs/{id}/stream` outputs ≥5 SSE events during a real job run; connection drop reconnects within 2 s and stream resumes from the live position (no historical replay)
D4.2	`alembic upgrade head` runs cleanly; job survives an API server restart (fetched from Postgres)
D4.3	Job state contains `issue_context.body` and `issue_context.labels` populated from the real GitHub API
D4.4	`curl -X POST /jobs -d '{"issue_url":"https://not-github.com/x"}'` returns `422`; `/docs` loads without error

Exit Gate (Phase 4 → Phase 5)¶

All of the following must be true:

[ ] D4.1–D4.4 acceptance criteria all pass
[ ] pytest tests/integration/test_api.py fully green
[ ] SSE stream tested with a real browser EventSource connection (manual check)
[ ] Alembic migration is idempotent (upgrade head run twice produces no error)

Phase 5 — React UI¶

Target: Weeks 9–11

Deliverables¶

D5.1 Job queue panel (Jira-style)
Left sidebar: list of job cards with issue_url title, status badge, creation time
Status badge colors: grey (Queued), blue (Running), amber (Waiting), purple (Paused), green (Done), red (Cancelled), dark-red (Timed Out)
"New Job" button opens a modal with issue_url input and Submit
D5.2 Live workspace panel
Center panel opens when a job card is selected
Streaming agent output rendered in real time via EventSource
Per-agent section headers with agent name and status indicator
Auto-scrolls to latest output; user can scroll up to read history
D5.3 Question cards (human-in-the-loop UI)
When job status is waiting, a question card appears above the workspace
Card shows the agent's question and a free-text answer input
Submitting the answer calls POST /jobs/{id}/answer and dismisses the card
Card is visually distinct (amber border, "Agent needs your input" label)
D5.4 Output panel
Right panel shows the final WriterOutput when job status is done
Fields: Severity badge, Root Cause, Relevant Files (file links), Draft Comment (editable), Ticket Draft (editable)
"Post to GitHub" button (calls write-back endpoint — stubbed in Phase 5, live in Phase 6)
D5.5 Agent control bar
Appears at the top of the workspace when a job is running or paused
Buttons: Pause / Resume, Redirect (opens text input), Kill
Each button calls the respective control endpoint and updates the UI state

Acceptance Criteria¶

Deliverable	Criterion
D5.1	Submitting a job via the UI creates a card that appears in the queue without page refresh
D5.2	Agent output chunks appear in the workspace < 2s after backend emits them (measure with browser DevTools)
D5.3	Question card appears when job reaches `waiting`; submitting answer causes card to disappear and job to resume
D5.4	Output panel renders all 5 fields when job reaches `done`; edits to Draft Comment persist in component state
D5.5	Clicking Pause → badge changes to `paused`; clicking Resume → badge returns to `running`

Exit Gate (Phase 5 → Phase 6)¶

All of the following must be true:

[ ] D5.1–D5.5 acceptance criteria all pass
[ ] Playwright E2E tests pass: npx playwright test tests/e2e/
[ ] UI tested in Chrome and Firefox (latest versions)
[ ] Lighthouse accessibility score ≥ 80 on the main dashboard page
[ ] No console errors during a full job lifecycle in the browser

Phase 6 — Polish¶

Target: Weeks 12–13

Deliverables¶

D6.1 GitHub write-back
"Post to GitHub" button calls POST /jobs/{id}/publish
Backend calls POST /repos/{owner}/{repo}/issues/{number}/comments with draft_comment
Backend calls PATCH /repos/{owner}/{repo}/issues/{number} to add severity label
Result URL is stored on the job and displayed in the output panel
D6.2 LangSmith evaluation suite
Eval dataset bug_triage_v1 created in LangSmith with ≥ 10 reference examples
Evaluator script scripts/run_evals.py runs the full graph on dataset and submits results
Metrics tracked: helpfulness (LLM judge, 1–5), file_relevance (overlap with reference), severity_match (exact match)
Eval results visible in LangSmith Experiments tab
D6.3 Codebase vector index
On job creation, if repo not already indexed: clone repo, chunk Python files, embed with text-embedding-3-small, store in Chroma
Index stored persistently in data/chroma/{owner}-{repo}/
CodebaseSearchChain queries the index with k=5 results
Re-indexing triggered if repo has new commits since last index (checked via GitHub API)
D6.4 LangFlow configuration UI
LangFlow running as a Docker service at /langflow
Flow files for all 5 agents checked into langflow/flows/
Dashboard "Configure Agents" nav item opens the LangFlow UI
Non-technical user can change the system prompt of any agent and save (flow re-exported to langflow/flows/)

Acceptance Criteria¶

Deliverable	Criterion
D6.1	Clicking "Post to GitHub" on a completed job creates a real comment on the GitHub issue (verified manually on a test repo)
D6.2	`python scripts/run_evals.py --dataset bug_triage_v1` exits 0 and prints an aggregate `helpfulness` score ≥ 4.0 / 5.0
D6.3	Second job on the same repo uses cached index (confirmed: no re-clone in logs); `CodebaseSearchChain` returns ≥1 relevant file for a known fixture issue
D6.4	LangFlow UI accessible at `/langflow`; modifying a system prompt in LangFlow and running the graph produces output reflecting the change

Exit Gate (Phase 6 = v1.0 Release)¶

All of the following must be true:

[ ] D6.1–D6.4 acceptance criteria all pass
[ ] All Phase 1–5 exit gates remain satisfied (run full regression)
[ ] mkdocs build exits 0 with no warnings
[ ] docker compose up starts all services in < 60 seconds on a fresh clone with only .env configured
[ ] LangSmith eval score ≥ 4.0 / 5.0 on bug_triage_v1 dataset
[ ] README is complete with architecture diagram, quickstart, and links to deployed docs
[ ] All PRD documents published and linked from the MkDocs nav