PRD-001-1 — Goals & Definition of Done¶
| Field | Value |
|---|---|
| Document ID | PRD-001-1 |
| Version | 1.0 |
| Status | DRAFT |
| Date | March 2026 |
| Parent | PRD-001 Master Overview |
Goals¶
Each goal below includes a Definition of Done (binary, pass/fail criteria) and a Verification step (exact command, UI step, or check to confirm the DoD is met).
Goal 1 — Working multi-agent bug triage system (full LangX stack)¶
Deliver a functional end-to-end system: a GitHub issue URL goes in, a structured triage report comes out, powered by the full LangChain ecosystem (LCEL, LangGraph, LangServe, LangSmith, LangFlow).
Definition of Done
- [ ] A GitHub issue URL submitted via
POST /jobsproduces a job record with statusqueued - [ ] The LangGraph supervisor starts within 5 seconds and routes to at least one worker agent
- [ ] All 5 worker agents (Investigator, Codebase Searcher, Web Searcher, Critic, Writer) execute successfully on a sample issue
- [ ] The final output contains:
severity,root_cause,relevant_files(≥1),draft_comment, andticket_draftfields - [ ] End-to-end wall-clock time is < 3 minutes for a standard GitHub issue
Verification
# Submit a job and poll until done
JOB_ID=$(curl -s -X POST http://localhost:8000/jobs \
-H "Content-Type: application/json" \
-d '{"issue_url":"https://github.com/owner/repo/issues/1"}' | jq -r .id)
# Poll status
curl -s http://localhost:8000/jobs/$JOB_ID | jq '{status, output}'
# Assert output fields exist
curl -s http://localhost:8000/jobs/$JOB_ID | jq '
.output | has("severity") and has("root_cause") and
has("relevant_files") and has("draft_comment") and has("ticket_draft")
'
# Expected: true
Goal 2 — Jira-inspired real-time dashboard (streaming tickets)¶
The frontend presents submitted jobs as ticket cards. Each ticket updates in real time as agents stream their output — no full-page refresh required.
Definition of Done
- [ ] The job queue panel renders submitted jobs as Jira-style cards with status badges (
Queued,Running,Waiting,Done) - [ ] Selecting a card opens the live workspace panel with streaming agent output
- [ ] New agent output appears in the UI within 2 seconds of the backend emitting it (SSE latency)
- [ ] Status badge transitions correctly through all states without a page reload
- [ ] UI is functional in Chrome (latest) and Firefox (latest)
Verification
- Start the backend and frontend (
uvicorn main:app+npm run dev) - Submit a job via the UI
- Open Chrome DevTools → Network → filter
text/event-stream - Confirm SSE events arrive; verify the workspace panel updates each time an event fires
- Run Playwright smoke test:
npx playwright test tests/e2e/streaming.spec.ts - Test asserts that the status badge changes from
Queued→Running→Doneduring a real job - Test asserts at least 3 streaming chunks appear in the workspace panel before completion
Goal 3 — Bidirectional human-in-the-loop (agent questions block graph)¶
An agent can stop mid-execution, surface a clarifying question to the user, and the graph stays paused until the user answers. The answer is injected back and execution resumes from the exact checkpoint.
Definition of Done
- [ ] When the supervisor emits a
human_inputnode interrupt, job status changes towaiting - [ ] A question card appears in the UI within 2 seconds of the interrupt
- [ ] Submitting an answer via
POST /jobs/{id}/answerresumes graph execution from the checkpoint (no state is lost) - [ ] A job with 2 question rounds completes successfully with all prior context retained
- [ ] If the user does not answer within 30 minutes, the job transitions to
timed_out
Verification
# Trigger a known question-asking scenario (use a test fixture issue)
JOB_ID=$(curl -s -X POST http://localhost:8000/jobs \
-H "Content-Type: application/json" \
-d '{"issue_url":"https://github.com/owner/repo/issues/FIXTURE_HITL"}' | jq -r .id)
# Wait for status=waiting
until [ "$(curl -s http://localhost:8000/jobs/$JOB_ID | jq -r .status)" = "waiting" ]; do sleep 2; done
# Answer the question
curl -s -X POST http://localhost:8000/jobs/$JOB_ID/answer \
-H "Content-Type: application/json" \
-d '{"answer": "The issue reproduces on Python 3.11 only"}'
# Confirm status returns to running then done
curl -s http://localhost:8000/jobs/$JOB_ID | jq .status
# Expected: "done" (eventually)
UI check: Playwright test tests/e2e/hitl.spec.ts asserts question card appears and disappears after answer submission.
Goal 4 — Pause / redirect / kill any agent mid-execution¶
A user can interrupt a running job at any point: pause it (freeze state), redirect it (inject a new instruction into the supervisor), or kill it (terminate cleanly).
Definition of Done
- [ ]
POST /jobs/{id}/pausetransitions status topausedand the LangGraph worker stops processing new nodes - [ ]
POST /jobs/{id}/resumeresumes from the last checkpoint without restarting the graph - [ ]
POST /jobs/{id}/redirectwith a{"instruction": "..."}body injects the instruction into the supervisor's next routing decision - [ ]
POST /jobs/{id}/killterminates the ARQ job viaJob.abort()and sets status tocancelled - [ ] All control actions complete within 3 seconds of the API call
- [ ] The UI exposes Pause, Redirect, and Kill buttons that trigger the respective endpoints
Verification
# Start a job, immediately pause it
JOB_ID=$(curl -s -X POST http://localhost:8000/jobs -d '{"issue_url":"..."}' -H "Content-Type: application/json" | jq -r .id)
curl -s -X POST http://localhost:8000/jobs/$JOB_ID/pause | jq .status
# Expected: "paused"
# Resume
curl -s -X POST http://localhost:8000/jobs/$JOB_ID/resume | jq .status
# Expected: "running"
# Kill
curl -s -X POST http://localhost:8000/jobs/$JOB_ID/kill | jq .status
# Expected: "cancelled"
Run pytest tests/integration/test_job_control.py -v — all 6 control action tests must pass.
Goal 5 — LangSmith instrumentation (tracing, eval, cost)¶
Every agent call, LangGraph node transition, and LCEL chain execution is automatically traced in LangSmith. A LangSmith deep-link appears on each job card. Cost and latency metrics are queryable.
Definition of Done
- [ ]
LANGCHAIN_TRACING_V2=trueis set; every job produces a LangSmith trace with a uniquerun_id - [ ] The trace tree shows LCEL chain internals (prompt, LLM call, output parser) as child spans
- [ ] LangGraph node transitions appear as named spans in the trace
- [ ] Each job record in the DB contains a
langsmith_urlfield pointing to the trace - [ ] The UI job card renders a "View in LangSmith" link that opens the correct trace
- [ ] At least one eval dataset exists in LangSmith with ≥ 10 reference examples
- [ ] Running
langsmith evaluateon the dataset produces a score and stores it in LangSmith
Verification
# Confirm trace is created after a job
JOB=$(curl -s http://localhost:8000/jobs/$JOB_ID)
echo $JOB | jq .langsmith_url
# Expected: non-empty URL like https://smith.langchain.com/...
# Open the URL — verify trace tree shows child spans
# (Manual: open URL in browser, confirm LCEL and LangGraph spans are visible)
# Run evals
python scripts/run_evals.py --dataset bug_triage_v1
# Expected: prints eval results with score >= 4.0 / 5.0
Goal 6 — LangFlow as visual prototyping layer¶
Each agent chain is first designed in LangFlow before being committed to code. LangFlow serves as the live configuration UI for non-technical users to adjust agent prompts and model settings.
Definition of Done
- [ ] LangFlow is running at
http://localhost:7860(Docker orlangflow run) - [ ] Each of the 5 worker agent chains has a saved LangFlow flow file in
langflow/flows/ - [ ] Exporting a flow from LangFlow and running it produces the same output as the Python LCEL implementation (within LLM variance)
- [ ] LangFlow configuration UI is linked from the main dashboard ("Configure Agents" nav item)
Verification
# Start LangFlow
langflow run --port 7860
# Verify flow files exist
ls langflow/flows/
# Expected: investigator.json, codebase_search.json, web_search.json, critic.json, writer.json
# Run flow via LangFlow API and compare to LangServe endpoint
python scripts/compare_flow_vs_langserve.py --agent investigator --input tests/fixtures/sample_issue.json
# Expected: both outputs have the same top-level keys; no assertion errors
UI check: navigate to http://localhost:3000 → "Configure Agents" → verify LangFlow iframe or link is reachable.
Goal 7 — LangServe microservice deployment per agent chain¶
Each finalized agent chain is deployed as an independent LangServe HTTP endpoint. The LangGraph supervisor calls these endpoints as tools over HTTP.
Definition of Done
- [ ] All 5 LangServe endpoints are running and pass their
/healthchecks: POST /agents/investigator/invokePOST /agents/codebase-search/invokePOST /agents/web-search/invokePOST /agents/critic/invokePOST /agents/writer/invoke- [ ] Each endpoint returns a valid structured response within 60 seconds for a sample input
- [ ] The LangGraph supervisor successfully calls all 5 endpoints during a real job (confirmed via LangSmith trace)
- [ ] Each endpoint has its own Docker service in
docker-compose.yml
Verification
# Health checks — each agent runs on its own port (PRD-004 Service Registry)
declare -A AGENT_PORTS=([investigator]=8001 [codebase-search]=8002 [web-search]=8003 [critic]=8004 [writer]=8005)
for agent in investigator codebase-search web-search critic writer; do
PORT=${AGENT_PORTS[$agent]}
STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:${PORT}/agents/$agent/health)
echo "$agent (port $PORT): $STATUS"
done
# Expected: all 200
# Invoke investigator with sample input
curl -s -X POST http://localhost:8001/agents/investigator/invoke \
-H "Content-Type: application/json" \
-d @tests/fixtures/sample_issue.json | jq .output
# Expected: non-empty object with investigation fields
# Docker services running
docker compose ps | grep agent
# Expected: 5 agent service rows, all "Up"
Goal 8 — Portfolio-grade production quality¶
The codebase demonstrates professional engineering standards: typed, tested, linted, documented, and deployable via Docker Compose with a one-command setup.
Definition of Done
- [ ] All Python files pass
ruff checkandty checkwith zero errors - [ ] Test coverage ≥ 80% (
pytest --cov) - [ ]
docker compose upstarts all services successfully with no manual steps beyond setting.env - [ ]
mkdocs buildcompletes with zero warnings - [ ] All PRD documents are published and navigable via MkDocs
- [ ] A
README.mdexists with: project description, architecture diagram, quickstart (≤5 steps), and links to docs - [ ] The repo has CI (GitHub Actions) running lint + tests on every PR
Verification
# Lint
ruff check . && echo "ruff: OK"
ty check . && echo "ty: OK"
# Tests
pytest --cov=src --cov-report=term-missing
# Expected: coverage >= 80%
# Docker
docker compose up -d
docker compose ps
# Expected: all services Up, no Exit codes
# Docs
cd docs && mkdocs build
# Expected: "INFO - Documentation built successfully" with 0 warnings
CI check: open the GitHub Actions tab on the PR — all checks must be green.
Non-Goals (v1.0)¶
The following are explicitly out of scope for v1.0 and are tracked in the v2 roadmap.
| Non-Goal | Reason deferred |
|---|---|
| Support for non-GitHub trackers (Jira, Linear, GitLab) | Adds auth/API complexity; GitHub covers the core demo use case |
| Fully autonomous operation (zero human oversight) | Safety and trust require human-in-the-loop for v1 |
| Non-software-dev domains | Domain-specific tooling and prompts need separate PRDs |
| Real-time multi-user collaborative sessions | Requires session isolation and conflict-resolution logic |
| Mobile interface | Desktop-first; mobile layout deferred post-MVP |
| Self-hosted LLM support (Ollama, etc.) | API compatibility and performance tuning are non-trivial |