PRD-004 — Agent Layer: LangChain, LCEL, LangServe & LangFlow¶

Field	Value
Document ID	PRD-004
Version	1.0
Status	DRAFT
Date	March 2026
Parent Doc	PRD-001
Related Docs	PRD-003 (Orchestration), PRD-005 (Observability)

Overview¶

Detailed specs: Full LCEL Chain Specs · Codebase Vector Index

This document covers the individual agent layer — the internals of each of the five specialized worker agents. These agents are the leaves of the system: the LangGraph supervisor (PRD-003) calls them, but doesn't know or care what's inside.

Each agent follows the same lifecycle:

Designed and tested visually in LangFlow (prototyping)
Implemented as an LCEL chain in Python using LangChain primitives
Deployed as a LangServe endpoint — an independent HTTP microservice
Called by the LangGraph supervisor via HTTP during job execution

This separation is intentional. Agents are independently swappable, versioned, and testable without touching the orchestration layer.

LangChain + LCEL — Agent Internals¶

Why LCEL¶

Every agent's core logic is an LCEL (LangChain Expression Language) chain. LCEL is chosen over the legacy LLMChain approach because:

Streaming by default — every LCEL chain supports astream(), which LangServe exposes automatically
Async by default — critical for running multiple agents concurrently
Composable — complex agents (like Codebase Search with RAG) can be built by piping components cleanly
LangSmith integration — LCEL chains are automatically traced without any additional instrumentation code
Parallel execution — RunnableParallel lets agents fetch context from multiple sources simultaneously

Standard LCEL Chain Pattern¶

Every agent in this system follows this base pattern:

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_messages([
    ("system", AGENT_SYSTEM_PROMPT),
    ("human", AGENT_HUMAN_TEMPLATE),
])

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    base_url=settings.model_gateway_url,  # e.g. http://litellm:4000
    api_key=settings.service_token,        # internal token, not the OpenAI key
)

chain = prompt | llm.with_structured_output(AgentOutputSchema)

with_structured_output() uses the model's native function-calling / JSON mode to enforce the schema. Do not use PydanticOutputParser — it requires explicit format instructions in the prompt and is fragile with complex nested schemas.

The chain object is what gets served via LangServe.

LCEL Features Used Per Agent¶

Feature	Agents	Purpose
`RunnableParallel`	Codebase Search	Fetches vector store results and issue body simultaneously
`RunnablePassthrough`	All	Passes through input fields that the prompt needs unchanged
`RunnableLambda`	Writer	Custom post-processing to format the GitHub comment markdown
`.with_retry()`	All	Automatic retry on LLM rate limits (max 3 attempts)
`.with_fallbacks()`	All	Falls back from GPT-4o to GPT-4o-mini on repeated failures
`.astream()`	All	Enables token-by-token streaming through LangServe

LangServe — Agent Microservices¶

Architecture Decision¶

LangServe status (March 2026): LangServe is in maintenance mode. The LangChain team recommends LangGraph Platform for new projects and will not accept new feature contributions to LangServe (source: github.com/langchain-ai/langserve README). This project continues to use LangServe because it provides a stable, simple REST interface for LCEL runnables and our use case does not require LangGraph Platform's additional features. Pin langserve>=0.3,<0.4 in pyproject.toml to prevent unintended upgrades.

Rather than importing agent functions directly into the LangGraph process, each agent is deployed as a standalone FastAPI + LangServe microservice. The LangGraph nodes call these services over HTTP.

Benefits:

Each agent is independently deployable and restartable
Agent prompts, models, and logic can be updated without redeploying the orchestration layer
Horizontal scaling: a heavy agent (like Codebase Search) can be scaled independently
Clear ownership boundary: a team member can "own" one agent service
The agent config UI (Zone 3 settings) just stores endpoint URLs — fully pluggable

Service Registry¶

Service Name	Default Port	Endpoint	Description
`agentops-investigator`	8001	`/agents/investigator`	Reads and interprets GitHub issues
`agentops-codebase-search`	8002	`/agents/codebase-search`	Semantic search over repository code
`agentops-web-search`	8003	`/agents/web-search`	Web search for errors and stack traces
`agentops-critic`	8004	`/agents/critic`	Reviews findings for correctness
`agentops-writer`	8005	`/agents/writer`	Produces final structured report

LangServe Setup Pattern¶

Each service uses the same setup pattern:

# agent_service/server.py
from functools import lru_cache
from pydantic_settings import BaseSettings
from fastapi import FastAPI
from langserve import add_routes
from .chain import chain


class Settings(BaseSettings):
    enable_playground: bool = False  # ENABLE_PLAYGROUND=true to opt in (dev only)
    model_gateway_url: str   # e.g. http://litellm:4000
    service_token: str       # unique per service, validated by gateway


@lru_cache
def get_settings() -> Settings:
    return Settings()


def create_app() -> FastAPI:
    settings = get_settings()
    app = FastAPI(title="AgentOps — Investigator Agent")

    # playground_type=None removes /playground from the router at registration time.
    # "default" re-enables it only when ENABLE_PLAYGROUND=true.
    add_routes(
        app,
        chain,
        path="/agents/investigator",
        playground_type="default" if settings.enable_playground else None,
        enable_feedback_endpoint=True,
    )
    return app


app = create_app()

LangServe automatically generates:

POST /agents/investigator/invoke — single synchronous call
POST /agents/investigator/stream — streaming response
POST /agents/investigator/batch — batch processing
GET /agents/investigator/playground — interactive testing UI

LangServe Playground¶

Each LangServe endpoint comes with a built-in /playground UI at no extra cost. This is used during development to manually test agent behavior with real GitHub issues before running full end-to-end jobs. This complements the LangFlow prototyping workflow.

Security requirement: The /playground route must be disabled in staging and production. LangServe exposes the playground by default; it is disabled by passing playground_type=None to add_routes(), which removes the route from the FastAPI router at registration time — no middleware or network rule required. The ENABLE_PLAYGROUND environment variable (default: false) controls this via a pydantic-settings BaseSettings class; the value is read inside create_app() so no module-level globals are introduced. ENABLE_PLAYGROUND=true must only appear in local development environments. In staging, if playground access is required for debugging, the service must be redeployed with ENABLE_PLAYGROUND=true behind an internal network or VPN — it must never be reachable from the public internet.

Model Gateway¶

Each LangServe service does not hold an OpenAI API key. Instead, all five services authenticate to a shared model gateway (LiteLLM proxy). The gateway is the only component that holds the real OPENAI_API_KEY.

Why: - Single key surface — rotate or revoke in one place; no per-service key management - Unified rate limiting and spend cap — LiteLLM enforces a single RPM/TPM budget across all agents - Audit log — every LLM call is logged centrally with the originating service token - Key isolation — a compromised agent service can only call the gateway with its own service token, which can be revoked without cycling the OpenAI key

Authentication flow:

Each service is issued a unique SERVICE_TOKEN environment variable. The gateway validates this token and forwards the request to OpenAI using the real API key.

Service	Service Token Env Var
`agentops-investigator`	`GATEWAY_TOKEN_INVESTIGATOR`
`agentops-codebase-search`	`GATEWAY_TOKEN_CODEBASE_SEARCH`
`agentops-web-search`	`GATEWAY_TOKEN_WEB_SEARCH`
`agentops-critic`	`GATEWAY_TOKEN_CRITIC`
`agentops-writer`	`GATEWAY_TOKEN_WRITER`

Configuration in each service:

class Settings(BaseSettings):
    enable_playground: bool = False
    model_gateway_url: str   # e.g. http://litellm:4000
    service_token: str       # unique per service, validated by gateway

The ChatOpenAI client in each LCEL chain is initialized with base_url=settings.model_gateway_url and api_key=settings.service_token. LiteLLM's OpenAI-compatible API surface means no other code changes are needed.

Gateway deployment: The LiteLLM proxy runs as a sixth Docker service (agentops-model-gateway, port 4000). It is the only container with OPENAI_API_KEY in its environment. It is not exposed externally — only reachable within the Docker network.

LangFlow — Prototyping Workflow¶

Role in Development Process¶

LangFlow is the first stop for any new agent or prompt change. The development workflow is:

1. DESIGN in LangFlow
   ↓  Drag components, connect them, test with real issue data
   ↓  Iterate on system prompt, try different models, test edge cases

2. EXPORT from LangFlow
   ↓  Export the flow as Python code (LangFlow supports LCEL export)
   ↓  Refine the exported code, add Pydantic output schemas

3. DEPLOY via LangServe
   ↓  Wrap the chain in a LangServe FastAPI app
   ↓  Run locally, validate via /playground

4. EVALUATE in LangSmith
   ↓  Run against golden dataset, check scores
   ↓  If failing: back to step 1 in LangFlow

LangFlow prevents wasted engineering time. A prompt that fails visually in LangFlow won't get deployed to LangServe.

Agent Configuration UI¶

In v1.1, the Settings page of AgentOps Dashboard will embed LangFlow's canvas (via iframe or the LangFlow API). This allows users to:

Visually modify an agent's prompt, model, or chain structure
Test the modified chain against a sample issue before saving
Push the updated chain to the LangServe endpoint

This is the product's equivalent of a "plugin editor" — power users can customize agent behavior without writing code.

What Gets Prototyped in LangFlow¶

Agent	LangFlow Prototype Focus
Investigator	System prompt calibration — how structured should the hypothesis be?
Codebase Search	RAG retriever settings — chunk size, k, similarity threshold
Web Search	Tavily query formulation — how to turn an issue into a good search query
Critic	Scoring rubric — what makes a finding "confident enough" vs "needs more investigation"
Writer	Output format — structured report vs. free-form markdown; GitHub comment tone

Agent Specifications¶

Investigator Agent¶

Purpose: First agent to run. Reads the full GitHub issue body and forms an initial hypothesis about the bug's nature, affected areas, and likely root cause.

Input:

class InvestigatorInput(BaseModel):
    issue_title: str
    issue_body: str
    prior_findings: list[dict] = []

Output:

class InvestigatorFinding(AgentFindingBase):
    hypothesis: str
    affected_areas: list[str]
    keywords_for_search: list[str]
    error_messages: list[str]

LCEL Chain:

chain = (
        investigator_prompt
        | llm.with_structured_output(InvestigatorFinding)
)

Model: GPT-4o-mini (sufficient for issue interpretation; cost-effective)

Tools: None — this agent only reads its inputs, no external tool calls

Codebase Search Agent¶

Purpose: Searches the repository's source code for files and code snippets relevant to the bug. Uses semantic vector search against a pre-built Chroma index of the repository.

Input:

class CodebaseSearchInput(BaseModel):
    repository: str
    keywords: list[str]
    hypothesis: str
    affected_areas: list[str]

Output:

class CodebaseFinding(AgentFindingBase):
    relevant_files: list[RelevantFile]
    root_cause_location: str | None

LCEL Chain (with RAG):

chain = (
        RunnableParallel({
            "context": codebase_retriever,  # Chroma vector store
            "hypothesis": RunnablePassthrough(),
            "keywords": RunnablePassthrough(),
        })
        | codebase_search_prompt
        | llm.with_structured_output(CodebaseFinding)
)

Model: GPT-4o (larger context needed for code comprehension)

Tools: Chroma vector store retriever (see Section 6)

Web Search Agent¶

Purpose: Searches the web for the error messages, stack traces, and library issues mentioned in the GitHub issue. Particularly useful for third-party library bugs and environment-specific errors.

Input:

class WebSearchInput(BaseModel):
    error_messages: list[str]  # from Investigator
    issue_title: str
    affected_areas: list[str]

Output:

class WebSearchFinding(AgentFindingBase):
    relevant_results: list[WebResult]
    external_root_cause: str | None

LCEL Chain (with tool):

from langchain_tavily import TavilySearch

tavily = TavilySearch(max_results=5)
llm_with_tools = llm.bind_tools([tavily])

chain = (
        web_search_prompt
        | llm_with_tools
        | tool_executor  # executes the Tavily search
        | web_result_prompt
        | llm.with_structured_output(WebSearchFinding)
)

Model: GPT-4o-mini

Tools: Tavily Search API

Critic Agent¶

Purpose: Reviews the accumulated findings from other agents and challenges weak hypotheses. Outputs a revised confidence score and flags gaps that need more investigation.

Input:

class CriticInput(BaseModel):
    all_findings: list[dict]  # all AgentFindings so far
    current_hypothesis: str
    codebase_evidence: list[str]  # code snippets supporting hypothesis

Output:

class CritiqueFinding(AgentFindingBase):
    verdict: Literal["CONFIRMED", "UNCERTAIN", "CHALLENGED"]
    confidence_adjustment: float
    gaps: list[str]
    revised_hypothesis: str | None
    ready_for_report: bool

LCEL Chain:

chain = (
        critic_prompt
        | llm.with_structured_output(CritiqueFinding)
)

Model: GPT-4o (adversarial reasoning benefits from a stronger model)

Tools: None

Writer Agent¶

Purpose: Takes all accumulated findings and produces the final structured outputs: triage report, GitHub comment draft, and ticket draft.

Input:

class WriterInput(BaseModel):
    all_findings: list[dict]
    human_exchanges: list[dict]  # Q&A with user incorporated into report
    issue_title: str
    issue_url: str
    repository: str

Output:

class WriterOutput(AgentFindingBase):
    report: TriageReport  # structured severity/root cause/files
    github_comment_markdown: str  # ready to post to GitHub
    ticket_title: str
    ticket_labels: list[str]
    ticket_assignee_suggestion: str
    ticket_effort_estimate: str  # XS / S / M / L / XL

LCEL Chain:

chain = (
        RunnableParallel({
            "report": report_chain,  # generates TriageReport
            "comment": comment_chain,  # generates GitHub markdown comment
            "ticket": ticket_chain,  # generates ticket fields
        })
        | merge_outputs_lambda  # RunnableLambda to merge into WriterOutput
)

Model: GPT-4o (quality of final output matters most here)

Tools: None — pure generation from accumulated context

Codebase Vector Index¶

Purpose¶

The Codebase Search Agent needs semantic search over the repository's source code. A keyword-based search is insufficient for finding the code path related to "JWT token expiry on UTC server" when the actual code says if token.exp < time.time().

Implementation¶

Repository Clone → Code Chunking → Embedding → Chroma Vector Store

Chunking: RecursiveCharacterTextSplitter with chunk_size=1000, chunk_overlap=200, splitting on Python/JS/TS syntax boundaries
Embedding model: text-embedding-3-small (OpenAI) — cost-effective for code
Vector store: Chroma (local, persistent) — one collection per repository
Retriever: VectorStoreRetriever with k=8, similarity_threshold=0.3

Index Lifecycle¶

Trigger	Action
First job on a repository	Full index build (clones repo, chunks, embeds, stores)
Repository updated (webhook or manual trigger)	Incremental re-index of changed files only
Index older than 24 hours on an active repo	Background refresh triggered before next job

Limitations (v1.0)¶

Maximum repository size: 500MB
Supported languages: Python, JavaScript, TypeScript, Go, Java
Binary files, generated code, and node_modules are excluded from indexing

Agent Configuration¶

Users can configure each agent via the Settings page (Zone 1 header → Settings → Agents):

Setting	Options	Default
LLM Provider	OpenAI, Anthropic	OpenAI
Model (per agent)	gpt-4o-mini, gpt-4o, claude-3-5-haiku, claude-3-5-sonnet	See per-agent spec
LangServe Endpoint URL	Any valid URL	localhost defaults
Model Gateway URL	Any valid internal URL	`http://litellm:4000`
System Prompt	Free text (advanced)	Built-in default
Max Tokens	500–4000	1500
Temperature	0.0–0.5	0.0

Configuration is stored per-repository in the backend database. Changes take effect on the next job submission.

Inter-Agent Contract¶

All agents share a common interface contract to ensure the LangGraph supervisor can treat them uniformly.

HTTP Contract (LangServe)¶

Every LangServe endpoint must accept:

POST /agents/{name}/invoke with JSON body { "input": { ...AgentInput fields } }
Return { "output": { ...AgentFinding fields } }

AgentFinding Base Fields¶

Every agent output must include these base fields regardless of agent-specific fields:

class AgentFindingBase(BaseModel):
    agent_name: str
    summary: str
    confidence: float  # 0.0–1.0
    reasoning: str
    error: str | None

The supervisor reads confidence and error to decide whether to accept the finding, retry, or escalate to human input.