Roadmap v1 — Precise Delivery Plan¶
| Field | Value |
|---|---|
| Version | v1.0 |
| Status | DRAFT |
| Date | March 2026 |
| Parent | PRD-001 Master Overview |
This document is the authoritative, week-precise delivery plan for AgentOps Dashboard v1.0. Each phase lists concrete deliverables, binary acceptance criteria per deliverable, and an exit gate — the set of conditions that must all pass before the next phase begins.
Phase 1 — Core Loop¶
Target: Weeks 1–3
Deliverables¶
- D1.1 Project scaffolding
pyproject.tomlwithuvworkspace,ruff,ty,pytestconfigured- Docker Compose file with placeholder services
- GitHub Actions CI running lint + test on push
-
README.mdwith quickstart skeleton -
D1.2 Single Investigator LCEL chain
InvestigatorChain:ChatPromptTemplate | ChatOpenAI.with_structured_output(InvestigatorOutput)- Output model:
InvestigatorOutput(summary: str, hypotheses: list[str], files_to_check: list[str]) -
Unit tests covering happy path and malformed LLM output fallback
-
D1.3 LangGraph skeleton (single-agent)
StateGraphwith nodes:supervisor,investigator,writer,ENDAgentStateTypedDict with:issue_url,messages,agent_outputs,status- Supervisor routing logic (stub: always routes to investigator then writer)
-
SQLite checkpointer configured
-
D1.4 LangSmith tracing active
LANGCHAIN_TRACING_V2=truein.env.example- Every chain call produces a trace with project name
agent-ops-v1 -
Trace has named spans for prompt, LLM call, and parser
-
D1.5 Minimal FastAPI job endpoint
POST /jobs— creates a job record (SQLite), enqueues via ARQ, returns{id, status}GET /jobs/{id}— returns job record with current status- Pydantic v2 request model:
CreateJobRequest(issue_url: HttpUrl)
Acceptance Criteria¶
| Deliverable | Criterion |
|---|---|
| D1.1 | ruff check . exits 0; pytest passes; docker compose up starts without error |
| D1.2 | pytest tests/unit/test_investigator_chain.py passes all cases including output validation |
| D1.3 | Running python -m src.graph.run --issue-url <url> completes and prints AgentState.agent_outputs |
| D1.4 | LangSmith UI shows a trace with ≥3 nested spans after running D1.3 |
| D1.5 | curl -X POST /jobs -d '{"issue_url":"..."}' returns 200 with id and status: queued |
Exit Gate (Phase 1 → Phase 2)¶
All of the following must be true:
- [ ] D1.1–D1.5 acceptance criteria all pass
- [ ] A real GitHub issue (not a fixture) produces a non-empty
InvestigatorOutput - [ ] LangSmith trace is visible for the real-issue run
- [ ] CI is green on
main
Phase 2 — Multi-Agent¶
Target: Weeks 4–5
Deliverables¶
- D2.1 Full agent suite (5 LCEL chains)
CodebaseSearchChain: semantic search over embedded repo (Chroma +text-embedding-3-small)WebSearchChain: Tavily API tool integrated into LCEL chainCriticChain: reviews prior agent outputs, outputsCriticOutput(issues: list[str], verdict: Literal["pass","revise"])-
WriterChain: producesWriterOutput(severity, root_cause, relevant_files, draft_comment, ticket_draft) -
D2.2 LangServe endpoints (one per chain)
POST /agents/investigator/invokePOST /agents/codebase-search/invokePOST /agents/web-search/invokePOST /agents/critic/invokePOST /agents/writer/invoke- Each endpoint has
/healthreturning200 {"status": "ok"} -
Each agent service has its own
Dockerfileand entry indocker-compose.yml -
D2.3 Supervisor routing logic (full)
- Supervisor reads
AgentStateand decides next agent via LLM-powered routing - Routing respects dependency order: investigator and codebase-search before critic; critic before writer
-
max_iterationsguard: terminates after 10 supervisor decisions -
D2.4 Shared state flowing through all nodes
AgentStatecarries accumulatedagent_outputsacross all nodes- Each worker reads relevant prior outputs from state before calling its chain
- Final writer node has access to all prior outputs
Acceptance Criteria¶
| Deliverable | Criterion |
|---|---|
| D2.1 | pytest tests/unit/test_*_chain.py passes for all 5 chains |
| D2.2 | curl /agents/investigator/health → 200; curl /agents/writer/invoke with fixture input → valid WriterOutput JSON |
| D2.3 | Running the full graph on a fixture issue routes through ≥3 distinct agents before reaching writer |
| D2.4 | WriterOutput.relevant_files contains files identified by CodebaseSearchChain in the same run |
Exit Gate (Phase 2 → Phase 3)¶
All of the following must be true:
- [ ] D2.1–D2.4 acceptance criteria all pass
- [ ] Full 5-agent run completes end-to-end in < 3 minutes on a real GitHub issue
- [ ] LangSmith trace shows all 5 agent spans under the supervisor span
- [ ] All 5 Docker services start cleanly via
docker compose up - [ ] CI green on
main
Phase 3 — Human-in-the-Loop¶
Target: Week 6
Deliverables¶
- D3.1
interrupt()nodes in LangGraph human_inputnode added to the graph withinterrupt_beforeconfiguration- Supervisor can route to
human_inputwhen confidence is below threshold -
Interrupt stores the question in
AgentState.pending_question -
D3.2 Job control endpoints
POST /jobs/{id}/answer— injects user answer and callsgraph.update_state()to resumePOST /jobs/{id}/pause— sets Redis flag checked by worker before each nodePOST /jobs/{id}/resume— clears pause flagPOST /jobs/{id}/redirect— injects instruction into supervisor's next routing context-
POST /jobs/{id}/kill— callsJob.abort()on the ARQ job, sets statuscancelled -
D3.3 Checkpoint persistence (Postgres)
- Migrate checkpointer from SQLite to Postgres for production durability
AsyncPostgresSaverconfigured in graph builder-
Graph state survives a worker process restart mid-job
-
D3.4 Timeout handling
- Jobs in
waitingstatus for > 10 minutes automatically transition totimed_out - ARQ scheduled task runs every minute to enforce timeout
Acceptance Criteria¶
| Deliverable | Criterion |
|---|---|
| D3.1 | Running fixture FIXTURE_HITL issue causes job status to reach waiting with non-empty pending_question |
| D3.2 | pytest tests/integration/test_job_control.py passes all 5 control action tests |
| D3.3 | Killing and restarting the worker mid-job, then calling /resume, completes the job successfully |
| D3.4 | A job left in waiting for 10+ minutes (mocked clock) transitions to timed_out |
Exit Gate (Phase 3 → Phase 4)¶
All of the following must be true:
- [ ] D3.1–D3.4 acceptance criteria all pass
- [ ] A human-in-the-loop round-trip (question → answer → completion) works on a real issue
- [ ] Pause → resume preserves all agent outputs accumulated before the pause
- [ ] Kill leaves no orphaned ARQ jobs (confirmed via ARQ dashboard or
arq.inspect)
Phase 4 — Backend API¶
Target: Weeks 7–8
Deliverables¶
- D4.1 SSE streaming endpoint
GET /jobs/{id}/streamreturnsContent-Type: text/event-stream- Events:
{"type": "agent_output", "agent": "...", "chunk": "..."}and{"type": "status_change", "status": "..."} - Redis Pub/Sub used for fanout (worker publishes → API subscribes → SSE to client)
-
Connection drops reconnect within ~2 s; missed events are not replayed — Pub/Sub has no history (gapless resume via
Last-Event-IDis a v2 concern requiring Redis Streams or a DB-backed event log) -
D4.2 Job persistence layer
- Postgres table
jobs:id,issue_url,status,created_at,updated_at,output,langsmith_url - SQLAlchemy async ORM models
-
Alembic migrations for schema
-
D4.3 GitHub API integration
GET /repos/{owner}/{repo}/issues/{number}called on job creation to fetch issue body, labels, author- Issue data stored in
AgentState.issue_context -
GitHub OAuth token passed via
Authorization: Bearerheader on API calls -
D4.4 API hardening
- Rate limiting: max 10 concurrent jobs per user
- Input validation:
issue_urlmust matchgithub.com/{owner}/{repo}/issues/{number}pattern - OpenAPI docs auto-generated and accessible at
/docs - All endpoints return structured error responses
{"error": {"code": ..., "message": ...}}
Acceptance Criteria¶
| Deliverable | Criterion |
|---|---|
| D4.1 | curl -N /jobs/{id}/stream outputs ≥5 SSE events during a real job run; connection drop reconnects within 2 s and stream resumes from the live position (no historical replay) |
| D4.2 | alembic upgrade head runs cleanly; job survives an API server restart (fetched from Postgres) |
| D4.3 | Job state contains issue_context.body and issue_context.labels populated from the real GitHub API |
| D4.4 | curl -X POST /jobs -d '{"issue_url":"https://not-github.com/x"}' returns 422; /docs loads without error |
Exit Gate (Phase 4 → Phase 5)¶
All of the following must be true:
- [ ] D4.1–D4.4 acceptance criteria all pass
- [ ]
pytest tests/integration/test_api.pyfully green - [ ] SSE stream tested with a real browser
EventSourceconnection (manual check) - [ ] Alembic migration is idempotent (
upgrade headrun twice produces no error)
Phase 5 — React UI¶
Target: Weeks 9–11
Deliverables¶
- D5.1 Job queue panel (Jira-style)
- Left sidebar: list of job cards with
issue_urltitle, status badge, creation time - Status badge colors: grey (Queued), blue (Running), amber (Waiting), purple (Paused), green (Done), red (Cancelled), dark-red (Timed Out)
-
"New Job" button opens a modal with
issue_urlinput and Submit -
D5.2 Live workspace panel
- Center panel opens when a job card is selected
- Streaming agent output rendered in real time via
EventSource - Per-agent section headers with agent name and status indicator
-
Auto-scrolls to latest output; user can scroll up to read history
-
D5.3 Question cards (human-in-the-loop UI)
- When job status is
waiting, a question card appears above the workspace - Card shows the agent's question and a free-text answer input
- Submitting the answer calls
POST /jobs/{id}/answerand dismisses the card -
Card is visually distinct (amber border, "Agent needs your input" label)
-
D5.4 Output panel
- Right panel shows the final
WriterOutputwhen job status isdone - Fields: Severity badge, Root Cause, Relevant Files (file links), Draft Comment (editable), Ticket Draft (editable)
-
"Post to GitHub" button (calls write-back endpoint — stubbed in Phase 5, live in Phase 6)
-
D5.5 Agent control bar
- Appears at the top of the workspace when a job is
runningorpaused - Buttons: Pause / Resume, Redirect (opens text input), Kill
- Each button calls the respective control endpoint and updates the UI state
Acceptance Criteria¶
| Deliverable | Criterion |
|---|---|
| D5.1 | Submitting a job via the UI creates a card that appears in the queue without page refresh |
| D5.2 | Agent output chunks appear in the workspace < 2s after backend emits them (measure with browser DevTools) |
| D5.3 | Question card appears when job reaches waiting; submitting answer causes card to disappear and job to resume |
| D5.4 | Output panel renders all 5 fields when job reaches done; edits to Draft Comment persist in component state |
| D5.5 | Clicking Pause → badge changes to paused; clicking Resume → badge returns to running |
Exit Gate (Phase 5 → Phase 6)¶
All of the following must be true:
- [ ] D5.1–D5.5 acceptance criteria all pass
- [ ] Playwright E2E tests pass:
npx playwright test tests/e2e/ - [ ] UI tested in Chrome and Firefox (latest versions)
- [ ] Lighthouse accessibility score ≥ 80 on the main dashboard page
- [ ] No console errors during a full job lifecycle in the browser
Phase 6 — Polish¶
Target: Weeks 12–13
Deliverables¶
- D6.1 GitHub write-back
- "Post to GitHub" button calls
POST /jobs/{id}/publish - Backend calls
POST /repos/{owner}/{repo}/issues/{number}/commentswithdraft_comment - Backend calls
PATCH /repos/{owner}/{repo}/issues/{number}to add severity label -
Result URL is stored on the job and displayed in the output panel
-
D6.2 LangSmith evaluation suite
- Eval dataset
bug_triage_v1created in LangSmith with ≥ 10 reference examples - Evaluator script
scripts/run_evals.pyruns the full graph on dataset and submits results - Metrics tracked:
helpfulness(LLM judge, 1–5),file_relevance(overlap with reference),severity_match(exact match) -
Eval results visible in LangSmith Experiments tab
-
D6.3 Codebase vector index
- On job creation, if repo not already indexed: clone repo, chunk Python files, embed with
text-embedding-3-small, store in Chroma - Index stored persistently in
data/chroma/{owner}-{repo}/ CodebaseSearchChainqueries the index withk=5results-
Re-indexing triggered if repo has new commits since last index (checked via GitHub API)
-
D6.4 LangFlow configuration UI
- LangFlow running as a Docker service at
/langflow - Flow files for all 5 agents checked into
langflow/flows/ - Dashboard "Configure Agents" nav item opens the LangFlow UI
- Non-technical user can change the system prompt of any agent and save (flow re-exported to
langflow/flows/)
Acceptance Criteria¶
| Deliverable | Criterion |
|---|---|
| D6.1 | Clicking "Post to GitHub" on a completed job creates a real comment on the GitHub issue (verified manually on a test repo) |
| D6.2 | python scripts/run_evals.py --dataset bug_triage_v1 exits 0 and prints an aggregate helpfulness score ≥ 4.0 / 5.0 |
| D6.3 | Second job on the same repo uses cached index (confirmed: no re-clone in logs); CodebaseSearchChain returns ≥1 relevant file for a known fixture issue |
| D6.4 | LangFlow UI accessible at /langflow; modifying a system prompt in LangFlow and running the graph produces output reflecting the change |
Exit Gate (Phase 6 = v1.0 Release)¶
All of the following must be true:
- [ ] D6.1–D6.4 acceptance criteria all pass
- [ ] All Phase 1–5 exit gates remain satisfied (run full regression)
- [ ]
mkdocs buildexits 0 with no warnings - [ ]
docker compose upstarts all services in < 60 seconds on a fresh clone with only.envconfigured - [ ] LangSmith eval score ≥ 4.0 / 5.0 on
bug_triage_v1dataset - [ ] README is complete with architecture diagram, quickstart, and links to deployed docs
- [ ] All PRD documents published and linked from the MkDocs nav