Agent System Concepts & Architecture

Introduction

The OpsHub Agent System is a production-grade AI orchestration platform that powers intelligent automation across investment operations. Built on LangGraph and FastAPI with GPT-4 reasoning, it delivers 10 specialized domain agents with 62+ enterprise tools, enabling natural language interactions with complex financial workflows. This guide explains how the agent system works under the hood—from the LangGraph state machines that power agent reasoning to the AG-UI protocol that enables real-time bidirectional communication between frontend and backend.

Core Concepts

Multi-Agent Orchestration

OpsHub uses a domain-specialized multi-agent architecture rather than a single general-purpose chatbot. Each agent is an expert in a specific domain with:

Scoped database access - Limited to relevant schemas and tables
Specialized tool suite - Domain-specific automation capabilities
Optimized prompts - Fine-tuned for investment operations terminology
Role-based permissions - Aligned with organizational access controls

Available Agents:

Agent ID	Domain	Key Capabilities
`app`	OpsHub Orchestrator	Workspace context, task delegation, validation status
`dashboard`	Dashboard Architect	RG94-compliant dashboards, NAV variance tracking
`workbook`	Workbook Engineer	Spreadsheet automation, validation logic, formula translation
`analytics`	Investment Analytics Strategist	Performance attribution, scenario analysis, variance investigation
`workflow`	Workflow Director	Process automation, NAV certification, breach escalation
`integration`	Integration Specialist	Data connectivity, custodian integration, sync pipelines
`data-quality`	Data Quality Analyst	Exception handling, profiling, reconciliation
`fund-accountant`	Fund Accountant Assistant	Break investigation, NAV operations, certification
`portfolio-manager`	Portfolio Manager Copilot	Performance insights, exposure analysis, action planning
`compliance`	Compliance Sentinel	RG94 mapping, audit evidence, regulatory controls
`risk-analyst`	Risk Analyst Copilot	VaR calculations, stress testing, limit monitoring

Why Multi-Agent? Single large language models struggle with specialized domains due to:

Generic training data lacking domain expertise
Context windows insufficient for entire knowledge bases
Difficulty maintaining focus across diverse tasks

Multi-agent systems solve this by:

Domain specialization - Each agent masters a narrow domain
Scoped context - Only relevant data and tools in context
Superior accuracy - 30-40% better than general-purpose chatbots
Scalability - Add new specialists without retraining existing agents

State Management & Checkpointing

The agent system maintains persistent conversational state across sessions using LangGraph’s checkpointing mechanism:

interface AgentState {
  // Conversation history
  messages: Message[];

  // Current workspace context
  workspace: {
    surface?: string;           // Current app surface
    workbookId?: string;         // Active spreadsheet
    activeSheetId?: string;      // Current sheet tab
    activeView?: string;         // spreadsheet | dashboard | workflow
    selectedCells?: string;      // Cell range (e.g., "A1:C10")
    selectedDashboardId?: string;
    selectedWorkflowId?: string;
  };

  // Agent selection and delegation
  activeAgent: AgentId;
  delegationChain: Array<{
    fromAgent: AgentId;
    toAgent: AgentId;
    reason: string;
    timestamp: string;
  }>;

  // User-facing proposals awaiting approval
  drafts: Array<Draft>;

  // Real-time insights and recommendations
  insights: Array<Insight>;

  // Session metadata
  sessionId: string;
  userId: string;
  error?: string;
  pendingTools: string[];
  toolResults: Record<string, any>;
}

Checkpointing Benefits:

Session Continuity - Conversations persist across page reloads
Multi-Device Access - Resume conversations from any device
Error Recovery - Retry failed operations without losing context
Audit Trail - Complete history of agent interactions
Time Travel Debugging - Replay conversations from any checkpoint

State is persisted to Supabase in these tables:

agent.sessions - Session metadata and configuration
agent.messages - Full conversation history with tool calls
agent.drafts - User-approved/pending agent proposals
agent.insights - Real-time recommendations and alerts
agent.tool_calls - Detailed tool execution audit trail

Tool Execution Framework

Agents interact with the OpsHub platform through a comprehensive tool suite (62+ tools) organized by category: Core Automation Tools:

Spreadsheet Tools (5) - Cell manipulation, formula generation, range operations
Natural Language Query (2) - Convert English to SQL queries
Database Operations (4) - Schema inspection, query execution, data summaries
Draft Management (1) - Capture agent suggestions for user review

Advanced Intelligence Tools:

Breach Prediction (2) - ML-powered forecasting and risk scoring
Anomaly Detection (2) - Real-time outlier detection in holdings/transactions
Auto-Reconciliation (2) - Automated break analysis and resolution
Self-Healing Workflows (2) - Automatic error recovery

Governance & Compliance Tools:

Approval Management (3) - Auto-approval rules, workflow orchestration
Compliance Checks (6) - ASIC RG94 verification, audit evidence
Explainability & HITL (3) - Decision explanations, confidence scoring

Operational Productivity Tools:

Report Generation (3) - Automated regulatory and operational reports
Workflow Generation (3) - AI-generated workflow definitions
Scheduling Optimization (3) - Intelligent task scheduling
Smart Suggestions (5) - Context-aware automation recommendations
Agent Delegation (3) - Multi-agent task decomposition
Memory & Context (4) - User preference learning, case history recall
Document Extraction (2) - Structured data extraction from PDFs
Analytics & Performance (4) - Cross-agent performance analytics

Tool Execution Flow: Tenant Isolation: Every tool execution is wrapped with tenant context decorators:

Automatic tenant_id injection into database queries
Row-level security (RLS) policy enforcement
Audit logging to audit.audit_log
Quota tracking and enforcement

Natural Language Processing

The agent system uses GPT-4o (OpenAI) for natural language understanding with: Optimized Prompting:

Domain-specific system prompts for each agent
Investment operations terminology and abbreviations
ASIC RG94 compliance context
Example few-shot prompts for complex tasks

Context Loading:

Automatic workspace context detection from current page
Relevant catalog items (dashboards, workflows, datasets)
Recent conversation history (last 20 messages)
User preferences and memory (learned patterns)

Response Streaming:

Token-by-token streaming via Server-Sent Events (SSE)
Real-time tool execution updates
Progressive artifact rendering
Optimistic UI updates

Architecture Overview

High-Level System Architecture

Component Responsibilities: Frontend (Next.js 15)

React UI components for chat, artifacts, context
AG-UI client for real-time bidirectional state sync
Zustand stores for optimistic updates and local state
Server-Sent Events (SSE) for streaming responses

API Layer

Next.js API routes proxy requests to backend
Supabase authentication and JWT validation
Environment variable protection (no CORS exposure)

Backend (FastAPI + Python)

LangGraph workflow orchestration
GPT-4o reasoning and tool selection
Tool execution with tenant isolation
State checkpointing to Supabase

Data Layer (Supabase PostgreSQL)

14 specialized business schemas
Agent state persistence tables
Row-level security (RLS) policies
Real-time subscriptions for live updates

LangGraph Workflow Engine

OpsHub uses LangGraph (LangChain’s graph-based orchestration framework) for durable agent workflows:

# app/agent/graph.py
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver

def create_agent_graph():
    workflow = StateGraph(AgentState)

    # Add nodes
    workflow.add_node("agent", agent_node)  # LLM reasoning
    workflow.add_node("tools", tool_node)   # Tool execution

    # Set entry point
    workflow.set_entry_point("agent")

    # Conditional edge: tools or end?
    workflow.add_conditional_edges(
        "agent",
        should_continue,
        {"tools": "tools", "end": END}
    )

    # Loop back to agent after tools
    workflow.add_edge("tools", "agent")

    # Compile with checkpointing
    memory = MemorySaver()
    graph = workflow.compile(
        checkpointer=memory,
        debug=settings.DEBUG
    )

    return graph

Workflow Execution:

Start - User message enters workflow at agent node
Agent Node - GPT-4o analyzes message and decides:
- Generate text response (go to END)
- Execute tools (go to Tool Node)
Tool Node - Execute requested tools in parallel
Loop - Return results to agent node for next turn
End - Stream final response to user

Why LangGraph?

Durable Execution - Survives crashes, continues from checkpoints
Event Sourcing - Complete audit trail of all state transitions
Retry Policies - Automatic error recovery with exponential backoff
Human-in-Loop - Approval gates for high-risk actions
Observability - LangSmith tracing for debugging

Technology Stack

LangGraph Workflow Engine

Purpose: Orchestrate multi-turn agent conversations with state persistence Key Features:

State Machines - Define agent workflows as directed graphs
Checkpointing - Persist state at each node transition
Conditional Routing - Dynamic workflow paths based on LLM decisions
Tool Execution - Parallel tool calls with result aggregation
Error Handling - Retry policies and fallback strategies

Configuration:

# app/config.py
OPENAI_MODEL = "gpt-4o"
OPENAI_TEMPERATURE = 0.0  # Deterministic responses
MAX_TOOL_ITERATIONS = 10
AGENT_RECURSION_LIMIT = 25
ENABLE_CHECKPOINTING = True

FastAPI Backend

Purpose: High-performance Python API server for agent endpoints Key Endpoints:

POST /agent/stream - AG-UI protocol streaming endpoint (primary)
POST /api/chat - Simple chat API (alternative)
POST /api/agents - Multi-agent orchestration
GET /api/agents - List available agents
POST /api/agents/set-active - Switch active agent
GET /health - Service health monitoring

Middleware:

JWT authentication via Supabase
Tenant context injection
Request/response logging
Rate limiting (optional Redis)
CORS configuration

Supabase Integration

Database Schemas (14 total): Core Business Schemas:

investment - Portfolios, securities, holdings, transactions
validation - ASIC RG94 compliance rules and results
risk - Risk metrics and VaR calculations
compliance - Regulatory checks and audit evidence
performance - Attribution and TWR calculations
market_data - Time-series price data (TimescaleDB)
workflow - Daily pricing workflows (Temporal)
audit - Audit logs (partitioned by quarter)

Platform Schemas:

iam - Teams, users, roles, permissions
agent - Sessions, messages, drafts, insights, tool_calls
integration - Data sources, sync jobs, field mappings
vault - Secure credential storage
distribution - Report deliverables and recipients
analytics - Advanced analytics and reporting

Agent State Tables:

-- agent.sessions
CREATE TABLE agent.sessions (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    active_agent_id TEXT,
    workspace_context JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- agent.messages
CREATE TABLE agent.messages (
    id UUID PRIMARY KEY,
    session_id UUID REFERENCES agent.sessions(id),
    role TEXT NOT NULL, -- 'user' | 'assistant' | 'system'
    content TEXT NOT NULL,
    tool_calls JSONB,
    artifacts JSONB,
    timestamp TIMESTAMPTZ DEFAULT NOW()
);

-- agent.drafts
CREATE TABLE agent.drafts (
    id UUID PRIMARY KEY,
    session_id UUID REFERENCES agent.sessions(id),
    agent_id TEXT NOT NULL,
    title TEXT NOT NULL,
    summary TEXT,
    payload JSONB NOT NULL,
    status TEXT DEFAULT 'proposed', -- 'proposed' | 'pending-approval' | 'applied' | 'rejected'
    metadata JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Row-Level Security (RLS):

62%+ of tables have RLS enabled
IAM role-based access control
Team-scoped data visibility
Automatic tenant isolation

GPT-4 Reasoning

Model: gpt-4o (OpenAI) Configuration:

Temperature: 0.0 (deterministic responses)
Max Tokens: 4096 per response
Streaming: Token-by-token via SSE
Function Calling: Native tool schema support

System Prompt Structure:

def build_system_prompt(agent_id: str, workspace_context: dict):
    return f"""
You are the {agent_name} for OpsHub NAV, an investment operations platform.

ROLE: {agent_description}

DATABASE ACCESS: You have read/write access to these schemas:
{schema_list}

AVAILABLE TOOLS: {tool_count} specialized tools
{tool_categories}

WORKSPACE CONTEXT: {workspace_context}

RESPONSE GUIDELINES:
- Use domain-specific terminology (NAV, RG94, attribution, VaR)
- Cite specific data sources and calculations
- Propose drafts for high-risk actions (require user approval)
- Emit insights for proactive recommendations
- Delegate to specialist agents when needed
"""

Data Flow

Request/Response Flow

Step-by-Step Breakdown:

User Input
- User types message in chat interface
- Frontend captures workspace context (active workbook, sheet, dashboard)
- Context includes page surface, selected cells, catalog items
Frontend Processing
- AG-UI client prepares RunAgentInput with messages and context
- Zustand stores handle optimistic UI updates
- SSE connection established for streaming
API Proxy
- Next.js API route validates Supabase JWT
- Adds authentication headers (Bearer token)
- Proxies request to FastAPI backend
Backend Orchestration
- FastAPI receives request, validates tenant
- Loads agent configuration and system prompt
- Invokes LangGraph workflow with current state
Agent Reasoning
- LangGraph agent node sends prompt to GPT-4o
- Model analyzes message and workspace context
- Decides to execute tools or generate text response
Tool Execution
- Tool node receives tool call requests
- Injects tenant context into each tool
- Executes tools in parallel where possible
- Returns results to agent node
Response Streaming
- Agent generates response tokens
- Streamed via SSE to frontend
- Artifacts extracted and rendered separately
- Insights and drafts displayed in real-time
State Persistence
- LangGraph saves checkpoint to Supabase
- Messages, drafts, insights written to agent tables
- Audit log entry created for compliance

State Synchronization Flow

The AG-UI protocol enables bidirectional state synchronization between frontend and backend: State Sync Features:

Workspace Awareness - Agents see current page, workbook, dashboard
Real-Time Updates - UI responds instantly to agent actions
Multi-Agent Coordination - Delegation chains tracked with full context
Draft System - AI proposals require explicit user approval
Offline Resilience - State cached locally, syncs when online
Audit Trail - Every state change logged for compliance

Security & Permissions

Authentication Flow

Authentication Helpers:

// lib/api/backend-auth.ts
import { createClient } from '@/lib/supabase/server';

export async function getAuthHeaders(): Promise<HeadersInit> {
  const supabase = await createClient();
  const { data: { session }, error } = await supabase.auth.getSession();

  if (error || !session) {
    throw new Error('Unauthorized: User must be authenticated');
  }

  return {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${session.access_token}`,
  };
}

Row-Level Security (RLS)

PostgreSQL RLS policies enforce fine-grained access control: Example: Portfolio Access Policy

-- Users can only see portfolios in their teams
CREATE POLICY portfolio_team_access ON investment.portfolios
  FOR SELECT
  USING (
    team_id IN (
      SELECT team_id FROM iam.team_members
      WHERE user_id = auth.uid()
    )
  );

-- Portfolio managers can modify their portfolios
CREATE POLICY portfolio_manager_update ON investment.portfolios
  FOR UPDATE
  USING (
    portfolio_manager_id = auth.uid()
    AND EXISTS (
      SELECT 1 FROM iam.user_roles
      WHERE user_id = auth.uid()
      AND role_code = 'PORTFOLIO_MANAGER'
    )
  );

IAM Roles:

Role	Scope	Permissions
`ADMIN`	GLOBAL	Full system access
`FUND_MANAGER`	ORGANIZATION	Manages funds and strategies
`PORTFOLIO_MANAGER`	TEAM	Manages portfolios
`OPERATIONS_LEAD`	ORGANIZATION	Operational activities
`COMPLIANCE_OFFICER`	ORGANIZATION	Compliance and audits
`VIEWER`	TEAM	Read-only access

Tenant Isolation

Every agent tool execution is wrapped with tenant context:

# app/tenant/isolation.py
from functools import wraps

def with_tenant_context(func):
    @wraps(func)
    async def wrapper(*args, **kwargs):
        tenant_id = get_current_tenant_id()

        # Inject tenant_id into database queries
        async with db.session() as session:
            await session.execute(
                "SET LOCAL app.current_tenant_id = :tenant_id",
                {"tenant_id": tenant_id}
            )

            # Execute tool
            result = await func(*args, **kwargs)

            # Audit log
            await log_tool_execution(
                tenant_id=tenant_id,
                tool_name=func.__name__,
                params=kwargs,
                result=result
            )

            return result

    return wrapper

Performance Considerations

Response Time Optimization

Target Latencies:

Initial response: < 2 seconds
Tool execution: < 5 seconds
Stream first token: < 500ms
Database queries: < 100ms

Optimization Strategies:

Parallel Tool Execution - LangGraph executes independent tools concurrently
Database Indexing - All foreign keys and frequently queried columns indexed
Materialized Views - Pre-aggregated data for dashboard queries
Query Caching - Redis cache for read-heavy endpoints
Connection Pooling - Persistent database connections (Supabase Supavisor)

Scalability

Horizontal Scaling:

FastAPI backend deployed on Fly.io with auto-scaling
Multiple worker instances handle concurrent requests
Stateless design (all state in Supabase)

Vertical Scaling:

Supabase Pro plan: 8GB RAM, 4 vCPU
Connection pool: 100 concurrent connections
TimescaleDB optimizations for time-series data

Cost Optimization:

Infrastructure: ~$65/month (Fly.io + Supabase)
OpenAI API: ~$0.01-0.03 per conversation
Total: Less than $100/month for 1,000 conversations

Monitoring

Health Checks:

@app.get("/health")
async def health_check():
    return {
        "status": "healthy",
        "agents": len(agent_registry.get_all_agents()),
        "database": await check_database_connection(),
        "openai": await check_openai_connection(),
        "version": settings.VERSION
    }

Metrics Tracked:

Agent response times (p50, p95, p99)
Tool execution success rates
Database query performance
Token usage and costs
Error rates by agent/tool

Learn More

Deeper Dives

Agent Integration Patterns - Learn about CopilotKit, AG-UI, SSE streaming, and workspace sync patterns
Backend Fact Sheet - Complete technical architecture and tool suite documentation
Architecture Decision Record - Why we chose LangGraph + Python over TypeScript alternatives
Database Schema - Explore the 14 business schemas and agent state tables

User Guides - Step-by-step tutorials for building with agents
API Reference - Complete agent API documentation
Tool Development - Create custom tools for your agents
Deployment - Production deployment guide for Fly.io

Summary

The OpsHub Agent System delivers intelligent automation for investment operations through:

Multi-Agent Orchestration - 10 specialized domain experts
LangGraph Workflows - Durable execution with checkpointing
Comprehensive Tool Suite - 62+ enterprise automation tools
Real-Time Bidirectional Sync - AG-UI protocol integration
GPT-4 Reasoning - Natural language understanding
Enterprise Security - RLS policies, tenant isolation, audit trails
Production-Ready - Less than $100/month infrastructure, auto-scaling

Key Technical Decisions:

Decision	Rationale
LangGraph over TypeScript	Battle-tested Python ecosystem, superior state management
FastAPI backend	High performance, async support, OpenAPI documentation
Supabase PostgreSQL	Real-time subscriptions, RLS policies, TimescaleDB extensions
AG-UI protocol	Framework-agnostic bidirectional state sync
GPT-4o model	Best balance of performance, cost, and reasoning capability

Next Steps:

Explore the Agent Integration Patterns guide
Review the Backend Fact Sheet for complete tool documentation
Build your first agent integration using the API Reference

Agent System Overview

Available Agents

Integration Patterns

Agent Tools & Capabilities

Agent System Concepts & Architecture

Introduction

Core Concepts

Multi-Agent Orchestration

State Management & Checkpointing

Tool Execution Framework

Natural Language Processing

Architecture Overview

High-Level System Architecture

LangGraph Workflow Engine

Technology Stack

LangGraph Workflow Engine

FastAPI Backend

Supabase Integration

GPT-4 Reasoning

Data Flow

Request/Response Flow

State Synchronization Flow

Security & Permissions

Authentication Flow

Row-Level Security (RLS)

Tenant Isolation

Performance Considerations

Response Time Optimization

Scalability

Monitoring

Learn More

Deeper Dives

Summary

Agent System Overview

Available Agents

Integration Patterns

Agent Tools & Capabilities

​Introduction

​Core Concepts

​Multi-Agent Orchestration

​State Management & Checkpointing

​Tool Execution Framework

​Natural Language Processing

​Architecture Overview

​High-Level System Architecture

​LangGraph Workflow Engine

​Technology Stack

​LangGraph Workflow Engine

​FastAPI Backend

​Supabase Integration

​GPT-4 Reasoning

​Data Flow

​Request/Response Flow

​State Synchronization Flow

​Security & Permissions

​Authentication Flow

​Row-Level Security (RLS)

​Tenant Isolation

​Performance Considerations

​Response Time Optimization

​Scalability

​Monitoring

​Learn More

​Deeper Dives

​Related Topics

​Summary

Introduction

Core Concepts

Multi-Agent Orchestration

State Management & Checkpointing

Tool Execution Framework

Natural Language Processing

Architecture Overview

High-Level System Architecture

LangGraph Workflow Engine

Technology Stack

LangGraph Workflow Engine

FastAPI Backend

Supabase Integration

GPT-4 Reasoning

Data Flow

Request/Response Flow

State Synchronization Flow

Security & Permissions

Authentication Flow

Row-Level Security (RLS)

Tenant Isolation

Performance Considerations

Response Time Optimization

Scalability

Monitoring

Learn More

Deeper Dives

Related Topics

Summary