PRD-001 — AgentOps Dashboard¶

Field	Value
Document ID	PRD-001
Version	1.0
Status	DRAFT
Date	March 2026
Author	Product & Engineering Team
Related Docs	PRD-002, PRD-003, PRD-004, PRD-005, PRD-006, PRD-007, PRD-008, PRD-009

Executive Summary¶

AgentOps Dashboard is a developer-facing platform for orchestrating, supervising, and debugging multi-agent AI workflows for software development tasks. The product combines the full LangChain ecosystem — LangChain, LCEL, LangGraph, LangSmith, LangServe, and LangFlow — into a coherent, production-grade system fronted by a Jira-inspired user interface.

The core insight driving this product is that multi-agent AI systems today suffer from a critical UX gap: models are capable enough to do real work, but there is no intuitive interface for a developer to supervise, steer, and trust a coordinated team of AI agents. AgentOps Dashboard fills this gap.

Elevator Pitch: AgentOps Dashboard is Jira for AI agents. Submit a GitHub issue, and watch a team of specialized AI agents investigate, reason, debate, and produce a full triage report — while you retain full control to pause, redirect, or override any agent at any moment. Agents can also stop and ask you questions when they need more context, just like a junior developer would.

Problem Statement¶

The Multi-Agent Visibility Gap¶

Modern LLM agents are capable of autonomous reasoning, tool use, and multi-step planning. However, when multiple agents collaborate on a complex task, developers currently face:

No real-time visibility into what each agent is doing and why
No ability to intervene mid-task when an agent heads in the wrong direction
No standardized way to evaluate whether agent outputs are correct or improving over time
No clean separation between agent logic (what agents do) and orchestration logic (how they coordinate)
No human-friendly interface — everything lives in terminal logs or raw LangSmith traces

The Software Dev Triage Pain Point¶

Bug triage is a high-value, time-consuming workflow with clear AI ROI. A typical senior engineer spends 30–45 minutes per complex issue: reading the issue, searching the codebase, checking similar past bugs, forming a hypothesis, and writing up findings. This is exactly the kind of structured, multi-step, tool-heavy workflow that multi-agent systems excel at — but only if a human can trust and steer the process.

Goals and Non-Goals¶

Full goal definitions, Definition of Done criteria, and verification steps are in PRD-001-1 — Goals & Definition of Done.

Goals (summary)¶

Working multi-agent bug triage system (full LangX stack)
Jira-inspired real-time dashboard (streaming tickets)
Bidirectional human-in-the-loop (agent questions block graph)
Pause / redirect / kill any agent mid-execution
LangSmith instrumentation (tracing, eval, cost)
LangFlow as visual prototyping layer
LangServe microservice deployment per agent chain
Portfolio-grade production quality

Non-Goals (v1.0, summary)¶

See PRD-001-1 Non-Goals for the full list. Items deferred to v2 include: non-GitHub trackers, non-software domains, self-hosted LLMs, multi-user sessions, and a mobile interface.

User Personas¶

Persona	Role	Primary Goal	Key Pain Point
Alex	Senior Backend Engineer	Triage incoming GitHub bugs faster without losing context	Spends 30+ min per issue reading the codebase before forming a hypothesis
Jordan	Engineering Manager	Monitor AI agent quality and cost across team repos	Black-box AI feels untrustworthy — no visibility into agent reasoning
Sam	ML/AI Engineer	Iterate on agent prompts and chains quickly	Edit code → redeploy → re-test cycle is too slow for prompt iteration
Taylor	Tech Lead	Configure agent behavior per repo without touching internals	Every AI tool requires bespoke integration work

Product Overview¶

What It Is¶

AgentOps Dashboard is a web application with a Python/FastAPI backend and a React frontend. Users connect a GitHub repository, submit an issue URL, and the system:

Spins up a LangGraph supervisor agent that decomposes the triage task
Spawns specialized worker agents (Investigator, Codebase Searcher, Web Searcher, Critic, Writer) — each backed by a LangServe endpoint containing an LCEL chain
Streams real-time state updates to the frontend via Server-Sent Events
Pauses whenever an agent needs human input, surfacing a question card in the UI
Produces a final structured output: severity rating, root cause, relevant files, a drafted GitHub comment, and a ticket draft — all editable before posting

The Jira Analogy¶

The product is intentionally modeled after Jira's mental model:

Jira Concept	AgentOps Equivalent
Project	GitHub Repository
Issue / Ticket	Triage Job (submitted GitHub issue)
Assignee	Active Agent
Status column	Agent execution state (Queued / Running / Waiting / Done)
Comments thread	Agent reasoning + question/answer exchanges
Labels	Severity, root cause category, affected modules
Workflow transitions	LangGraph node transitions
Activity log	LangSmith trace (linked from each job)

LangChain Ecosystem Map¶

Every tool in the LangX ecosystem plays a specific, non-forced role in this product.

Tool	Layer	Exact Role
LangChain + LCEL	Micro (agent internals)	Each worker agent is an LCEL chain: `prompt \\| llm \\| output_parser`. Handles streaming, async, parallel execution, and structured output parsing.
LangFlow	Prototyping	Visual canvas used to design and test each agent's LCEL chain before it's committed to code. Also serves as the agent configuration UI for non-technical users.
LangServe	Deployment	Each finalized agent chain is deployed as an independent HTTP endpoint (`POST /agents/investigator/invoke`). The LangGraph supervisor calls these endpoints as tools — clean microservice separation.
LangGraph	Macro (orchestration)	Supervisor pattern: manages which agent runs next, handles shared state flowing through all nodes, implements `interrupt()` for human-in-the-loop pauses, and persists state via checkpointing.
LangSmith	Observability	Instruments all layers automatically. Traces LCEL chain internals, LangGraph node transitions, and cross-agent job traces. Provides eval datasets, A/B prompt testing, and a "View in LangSmith" deep-link from every job in the UI.

High-Level Architecture¶

flowchart TD
    GH(["GitHub Issue URL"])

    subgraph BE["FastAPI Backend"]
        JOBS["POST /jobs — creates LangGraph thread"]
        STREAM["GET /jobs/{id}/stream — SSE state diffs"]
        ANSWER["POST /jobs/{id}/answer — resume after human input"]
    end

    subgraph ORCH["LangGraph Orchestration Layer"]
        SUP["supervisor"]
        WORKERS["investigator | codebase_search\nweb_search | critic | human_input"]
        WR["writer"]
        CP[("Checkpointer — SQLite / Postgres")]
        SUP -->|routes to| WORKERS
        WORKERS -->|returns to| SUP
        SUP -->|when done| WR
    end

    subgraph SVC["LangServe Agent Microservices"]
        S1["/agents/investigator — LCEL"]
        S2["/agents/codebase-search — LCEL + RAG"]
        S3["/agents/web-search — LCEL + Tavily"]
        S4["/agents/critic — LCEL"]
        S5["/agents/writer — LCEL"]
    end

    subgraph OBS["LangSmith (auto-instrumented)"]
        direction LR
        T["Traces"] --- E["Evals"] --- D["Dashboard"]
    end

    subgraph FE["React Frontend — Jira-style"]
        direction LR
        JQ["Job queue"] --- LW["Live workspace"] --- OP["Output panel"]
    end

    GH --> BE
    BE --> ORCH
    ORCH -->|"HTTP calls"| SVC
    BE -.->|instruments| OBS
    ORCH -.->|instruments| OBS
    SVC -.->|instruments| OBS
    BE -->|"SSE stream"| FE

Feature Summary¶

Feature	Priority	PRD Reference
Job queue with Jira-style ticket cards	P0	PRD-002
Real-time streaming agent output in workspace	P0	PRD-002
Agent question cards (human-in-the-loop)	P0	PRD-003
Pause / redirect / kill agent mid-execution	P0	PRD-003
Supervisor + 5 worker agents (LangGraph)	P0	PRD-003
LCEL agent chains deployed via LangServe	P0	PRD-004
LangFlow canvas for agent prototyping	P1	PRD-004
LangSmith trace deep-link per job	P0	PRD-005
Eval dataset + quality scoring	P1	PRD-005
GitHub write-back (comment + label)	P1	PRD-002
Codebase vector index (semantic search)	P1	PRD-004
Agent configuration UI (model, prompt, endpoint)	P2	PRD-004
Cost and latency analytics dashboard	P2	PRD-005
Input validation (issue_url, Pydantic v2)	P0	PRD-006
Python tooling (uv / ruff / ty / pyproject.toml)	P0	PRD-007
Authentication & authorization (GitHub OAuth, JWT)	P0	PRD-008

Success Metrics¶

Metric	Target (v1.0)
End-to-end triage time (agent)	< 3 minutes per issue
Human triage agreement rate	≥ 70% match with human-written triage on eval dataset
LangSmith eval score (helpfulness)	≥ 4.0 / 5.0
Agent question relevance (human rating)	≥ 80% of agent-asked questions rated "useful"
Time-to-first streaming output in UI	< 5 seconds from job submission
System uptime	≥ 99% during active sessions

Release Roadmap¶

Detailed roadmaps are maintained in the Plans section:

Roadmap v1 (precise) — Week-by-week delivery plan with per-phase deliverables, acceptance criteria, and exit gates for all 6 phases (Weeks 1–13).
Roadmap v2 (general) — High-level backlog of post-v1 themes: multi-tracker support, non-software domains, self-hosted LLMs, multi-user sessions, mobile, expanded evals, and a plugin system.

v1 Phase Summary¶

Phase	Scope	Target
Phase 1 — Core Loop	Single investigator agent, LangGraph basics, LCEL chains, LangSmith tracing	Weeks 1–3
Phase 2 — Multi-Agent	Supervisor + all 5 worker agents, LangServe endpoints, shared state	Weeks 4–5
Phase 3 — Human-in-the-Loop	`interrupt()` nodes, pause/kill/redirect, checkpointing	Week 6
Phase 4 — Backend API	FastAPI + SSE streaming, job persistence, answer endpoint	Weeks 7–8
Phase 5 — React UI	Jira-style dashboard, live workspace, question cards, output panel	Weeks 9–11
Phase 6 — Polish	GitHub write-back, LangSmith evals, codebase vector index, LangFlow config UI	Weeks 12–13

Risks and Mitigations¶

Risk	Likelihood	Impact	Mitigation
Supervisor agent makes poor routing decisions	Medium	High	LangSmith evals + prompt iteration via LangFlow before production
Agent asks irrelevant or too many questions	Medium	Medium	Calibrate supervisor's question threshold in system prompt; rate-limit to 2 questions per job in v1
Codebase search returns irrelevant results on large repos	High	Medium	Use semantic vector search (Chroma + embeddings) rather than keyword search
LangGraph state explosion on long jobs	Low	High	Set max node execution limits; add job timeout at 10 minutes
LangServe endpoint latency degrades overall job time	Medium	Medium	Add per-agent timeout of 60s; supervisor retries once before escalating to human
Cost overruns from GPT-4 calls	Medium	Medium	Default to GPT-4o-mini; allow model override per agent in config

Assumptions, Constraints, Dependencies¶

Assumptions¶

Users have a GitHub account and are comfortable with GitHub issues/PRs
The target repository is accessible via the GitHub REST API (public or authenticated private)
Users have an OpenAI or Anthropic API key
LangSmith account is available (free tier is sufficient for v1)

Constraints¶

v1.0 supports only Python-based repositories for codebase analysis
Maximum job execution time: 10 minutes
Maximum codebase size for vector indexing: 500MB
Human-in-the-loop question limit per job: 2 questions (to prevent over-reliance on user)

External Dependencies¶

GitHub REST API — for reading issues, PRs, and writing comments
OpenAI / Anthropic API — LLM inference
Tavily API — web search tool for the Web Search agent
LangSmith API — observability and evaluation
LangChain, LangGraph, LangServe, LangFlow — open-source packages (MIT licensed)
Redis — job queue (ARQ), Pub/Sub event bus for SSE fanout, and job status storage
ARQ — async Redis queue for distributed LangGraph job execution across worker processes; provides Job.abort() for cross-process kill, built-in status tracking, and automatic requeue on worker crash