Skip to main content

Introduction

The OpsHub Agent System is a production-grade AI orchestration platform that powers intelligent automation across investment operations. Built on LangGraph and FastAPI with GPT-4 reasoning, it delivers 10 specialized domain agents with 62+ enterprise tools, enabling natural language interactions with complex financial workflows. This guide explains how the agent system works under the hood—from the LangGraph state machines that power agent reasoning to the AG-UI protocol that enables real-time bidirectional communication between frontend and backend.

Core Concepts

Multi-Agent Orchestration

OpsHub uses a domain-specialized multi-agent architecture rather than a single general-purpose chatbot. Each agent is an expert in a specific domain with:
  • Scoped database access - Limited to relevant schemas and tables
  • Specialized tool suite - Domain-specific automation capabilities
  • Optimized prompts - Fine-tuned for investment operations terminology
  • Role-based permissions - Aligned with organizational access controls
Available Agents:
Agent IDDomainKey Capabilities
appOpsHub OrchestratorWorkspace context, task delegation, validation status
dashboardDashboard ArchitectRG94-compliant dashboards, NAV variance tracking
workbookWorkbook EngineerSpreadsheet automation, validation logic, formula translation
analyticsInvestment Analytics StrategistPerformance attribution, scenario analysis, variance investigation
workflowWorkflow DirectorProcess automation, NAV certification, breach escalation
integrationIntegration SpecialistData connectivity, custodian integration, sync pipelines
data-qualityData Quality AnalystException handling, profiling, reconciliation
fund-accountantFund Accountant AssistantBreak investigation, NAV operations, certification
portfolio-managerPortfolio Manager CopilotPerformance insights, exposure analysis, action planning
complianceCompliance SentinelRG94 mapping, audit evidence, regulatory controls
risk-analystRisk Analyst CopilotVaR calculations, stress testing, limit monitoring
Why Multi-Agent? Single large language models struggle with specialized domains due to:
  • Generic training data lacking domain expertise
  • Context windows insufficient for entire knowledge bases
  • Difficulty maintaining focus across diverse tasks
Multi-agent systems solve this by:
  • Domain specialization - Each agent masters a narrow domain
  • Scoped context - Only relevant data and tools in context
  • Superior accuracy - 30-40% better than general-purpose chatbots
  • Scalability - Add new specialists without retraining existing agents

State Management & Checkpointing

The agent system maintains persistent conversational state across sessions using LangGraph’s checkpointing mechanism:
interface AgentState {
  // Conversation history
  messages: Message[];

  // Current workspace context
  workspace: {
    surface?: string;           // Current app surface
    workbookId?: string;         // Active spreadsheet
    activeSheetId?: string;      // Current sheet tab
    activeView?: string;         // spreadsheet | dashboard | workflow
    selectedCells?: string;      // Cell range (e.g., "A1:C10")
    selectedDashboardId?: string;
    selectedWorkflowId?: string;
  };

  // Agent selection and delegation
  activeAgent: AgentId;
  delegationChain: Array<{
    fromAgent: AgentId;
    toAgent: AgentId;
    reason: string;
    timestamp: string;
  }>;

  // User-facing proposals awaiting approval
  drafts: Array<Draft>;

  // Real-time insights and recommendations
  insights: Array<Insight>;

  // Session metadata
  sessionId: string;
  userId: string;
  error?: string;
  pendingTools: string[];
  toolResults: Record<string, any>;
}
Checkpointing Benefits:
  1. Session Continuity - Conversations persist across page reloads
  2. Multi-Device Access - Resume conversations from any device
  3. Error Recovery - Retry failed operations without losing context
  4. Audit Trail - Complete history of agent interactions
  5. Time Travel Debugging - Replay conversations from any checkpoint
State is persisted to Supabase in these tables:
  • agent.sessions - Session metadata and configuration
  • agent.messages - Full conversation history with tool calls
  • agent.drafts - User-approved/pending agent proposals
  • agent.insights - Real-time recommendations and alerts
  • agent.tool_calls - Detailed tool execution audit trail

Tool Execution Framework

Agents interact with the OpsHub platform through a comprehensive tool suite (62+ tools) organized by category: Core Automation Tools:
  • Spreadsheet Tools (5) - Cell manipulation, formula generation, range operations
  • Natural Language Query (2) - Convert English to SQL queries
  • Database Operations (4) - Schema inspection, query execution, data summaries
  • Draft Management (1) - Capture agent suggestions for user review
Advanced Intelligence Tools:
  • Breach Prediction (2) - ML-powered forecasting and risk scoring
  • Anomaly Detection (2) - Real-time outlier detection in holdings/transactions
  • Auto-Reconciliation (2) - Automated break analysis and resolution
  • Self-Healing Workflows (2) - Automatic error recovery
Governance & Compliance Tools:
  • Approval Management (3) - Auto-approval rules, workflow orchestration
  • Compliance Checks (6) - ASIC RG94 verification, audit evidence
  • Explainability & HITL (3) - Decision explanations, confidence scoring
Operational Productivity Tools:
  • Report Generation (3) - Automated regulatory and operational reports
  • Workflow Generation (3) - AI-generated workflow definitions
  • Scheduling Optimization (3) - Intelligent task scheduling
  • Smart Suggestions (5) - Context-aware automation recommendations
  • Agent Delegation (3) - Multi-agent task decomposition
  • Memory & Context (4) - User preference learning, case history recall
  • Document Extraction (2) - Structured data extraction from PDFs
  • Analytics & Performance (4) - Cross-agent performance analytics
Tool Execution Flow: Tenant Isolation: Every tool execution is wrapped with tenant context decorators:
  • Automatic tenant_id injection into database queries
  • Row-level security (RLS) policy enforcement
  • Audit logging to audit.audit_log
  • Quota tracking and enforcement

Natural Language Processing

The agent system uses GPT-4o (OpenAI) for natural language understanding with: Optimized Prompting:
  • Domain-specific system prompts for each agent
  • Investment operations terminology and abbreviations
  • ASIC RG94 compliance context
  • Example few-shot prompts for complex tasks
Context Loading:
  • Automatic workspace context detection from current page
  • Relevant catalog items (dashboards, workflows, datasets)
  • Recent conversation history (last 20 messages)
  • User preferences and memory (learned patterns)
Response Streaming:
  • Token-by-token streaming via Server-Sent Events (SSE)
  • Real-time tool execution updates
  • Progressive artifact rendering
  • Optimistic UI updates

Architecture Overview

High-Level System Architecture

Component Responsibilities: Frontend (Next.js 15)
  • React UI components for chat, artifacts, context
  • AG-UI client for real-time bidirectional state sync
  • Zustand stores for optimistic updates and local state
  • Server-Sent Events (SSE) for streaming responses
API Layer
  • Next.js API routes proxy requests to backend
  • Supabase authentication and JWT validation
  • Environment variable protection (no CORS exposure)
Backend (FastAPI + Python)
  • LangGraph workflow orchestration
  • GPT-4o reasoning and tool selection
  • Tool execution with tenant isolation
  • State checkpointing to Supabase
Data Layer (Supabase PostgreSQL)
  • 14 specialized business schemas
  • Agent state persistence tables
  • Row-level security (RLS) policies
  • Real-time subscriptions for live updates

LangGraph Workflow Engine

OpsHub uses LangGraph (LangChain’s graph-based orchestration framework) for durable agent workflows:
# app/agent/graph.py
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver

def create_agent_graph():
    workflow = StateGraph(AgentState)

    # Add nodes
    workflow.add_node("agent", agent_node)  # LLM reasoning
    workflow.add_node("tools", tool_node)   # Tool execution

    # Set entry point
    workflow.set_entry_point("agent")

    # Conditional edge: tools or end?
    workflow.add_conditional_edges(
        "agent",
        should_continue,
        {"tools": "tools", "end": END}
    )

    # Loop back to agent after tools
    workflow.add_edge("tools", "agent")

    # Compile with checkpointing
    memory = MemorySaver()
    graph = workflow.compile(
        checkpointer=memory,
        debug=settings.DEBUG
    )

    return graph
Workflow Execution:
  1. Start - User message enters workflow at agent node
  2. Agent Node - GPT-4o analyzes message and decides:
    • Generate text response (go to END)
    • Execute tools (go to Tool Node)
  3. Tool Node - Execute requested tools in parallel
  4. Loop - Return results to agent node for next turn
  5. End - Stream final response to user
Why LangGraph?
  • Durable Execution - Survives crashes, continues from checkpoints
  • Event Sourcing - Complete audit trail of all state transitions
  • Retry Policies - Automatic error recovery with exponential backoff
  • Human-in-Loop - Approval gates for high-risk actions
  • Observability - LangSmith tracing for debugging

Technology Stack

LangGraph Workflow Engine

Purpose: Orchestrate multi-turn agent conversations with state persistence Key Features:
  • State Machines - Define agent workflows as directed graphs
  • Checkpointing - Persist state at each node transition
  • Conditional Routing - Dynamic workflow paths based on LLM decisions
  • Tool Execution - Parallel tool calls with result aggregation
  • Error Handling - Retry policies and fallback strategies
Configuration:
# app/config.py
OPENAI_MODEL = "gpt-4o"
OPENAI_TEMPERATURE = 0.0  # Deterministic responses
MAX_TOOL_ITERATIONS = 10
AGENT_RECURSION_LIMIT = 25
ENABLE_CHECKPOINTING = True

FastAPI Backend

Purpose: High-performance Python API server for agent endpoints Key Endpoints:
  • POST /agent/stream - AG-UI protocol streaming endpoint (primary)
  • POST /api/chat - Simple chat API (alternative)
  • POST /api/agents - Multi-agent orchestration
  • GET /api/agents - List available agents
  • POST /api/agents/set-active - Switch active agent
  • GET /health - Service health monitoring
Middleware:
  • JWT authentication via Supabase
  • Tenant context injection
  • Request/response logging
  • Rate limiting (optional Redis)
  • CORS configuration

Supabase Integration

Database Schemas (14 total): Core Business Schemas:
  • investment - Portfolios, securities, holdings, transactions
  • validation - ASIC RG94 compliance rules and results
  • risk - Risk metrics and VaR calculations
  • compliance - Regulatory checks and audit evidence
  • performance - Attribution and TWR calculations
  • market_data - Time-series price data (TimescaleDB)
  • workflow - Daily pricing workflows (Temporal)
  • audit - Audit logs (partitioned by quarter)
Platform Schemas:
  • iam - Teams, users, roles, permissions
  • agent - Sessions, messages, drafts, insights, tool_calls
  • integration - Data sources, sync jobs, field mappings
  • vault - Secure credential storage
  • distribution - Report deliverables and recipients
  • analytics - Advanced analytics and reporting
Agent State Tables:
-- agent.sessions
CREATE TABLE agent.sessions (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    active_agent_id TEXT,
    workspace_context JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- agent.messages
CREATE TABLE agent.messages (
    id UUID PRIMARY KEY,
    session_id UUID REFERENCES agent.sessions(id),
    role TEXT NOT NULL, -- 'user' | 'assistant' | 'system'
    content TEXT NOT NULL,
    tool_calls JSONB,
    artifacts JSONB,
    timestamp TIMESTAMPTZ DEFAULT NOW()
);

-- agent.drafts
CREATE TABLE agent.drafts (
    id UUID PRIMARY KEY,
    session_id UUID REFERENCES agent.sessions(id),
    agent_id TEXT NOT NULL,
    title TEXT NOT NULL,
    summary TEXT,
    payload JSONB NOT NULL,
    status TEXT DEFAULT 'proposed', -- 'proposed' | 'pending-approval' | 'applied' | 'rejected'
    metadata JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW()
);
Row-Level Security (RLS):
  • 62%+ of tables have RLS enabled
  • IAM role-based access control
  • Team-scoped data visibility
  • Automatic tenant isolation

GPT-4 Reasoning

Model: gpt-4o (OpenAI) Configuration:
  • Temperature: 0.0 (deterministic responses)
  • Max Tokens: 4096 per response
  • Streaming: Token-by-token via SSE
  • Function Calling: Native tool schema support
System Prompt Structure:
def build_system_prompt(agent_id: str, workspace_context: dict):
    return f"""
You are the {agent_name} for OpsHub NAV, an investment operations platform.

ROLE: {agent_description}

DATABASE ACCESS: You have read/write access to these schemas:
{schema_list}

AVAILABLE TOOLS: {tool_count} specialized tools
{tool_categories}

WORKSPACE CONTEXT: {workspace_context}

RESPONSE GUIDELINES:
- Use domain-specific terminology (NAV, RG94, attribution, VaR)
- Cite specific data sources and calculations
- Propose drafts for high-risk actions (require user approval)
- Emit insights for proactive recommendations
- Delegate to specialist agents when needed
"""

Data Flow

Request/Response Flow

Step-by-Step Breakdown:
  1. User Input
    • User types message in chat interface
    • Frontend captures workspace context (active workbook, sheet, dashboard)
    • Context includes page surface, selected cells, catalog items
  2. Frontend Processing
    • AG-UI client prepares RunAgentInput with messages and context
    • Zustand stores handle optimistic UI updates
    • SSE connection established for streaming
  3. API Proxy
    • Next.js API route validates Supabase JWT
    • Adds authentication headers (Bearer token)
    • Proxies request to FastAPI backend
  4. Backend Orchestration
    • FastAPI receives request, validates tenant
    • Loads agent configuration and system prompt
    • Invokes LangGraph workflow with current state
  5. Agent Reasoning
    • LangGraph agent node sends prompt to GPT-4o
    • Model analyzes message and workspace context
    • Decides to execute tools or generate text response
  6. Tool Execution
    • Tool node receives tool call requests
    • Injects tenant context into each tool
    • Executes tools in parallel where possible
    • Returns results to agent node
  7. Response Streaming
    • Agent generates response tokens
    • Streamed via SSE to frontend
    • Artifacts extracted and rendered separately
    • Insights and drafts displayed in real-time
  8. State Persistence
    • LangGraph saves checkpoint to Supabase
    • Messages, drafts, insights written to agent tables
    • Audit log entry created for compliance

State Synchronization Flow

The AG-UI protocol enables bidirectional state synchronization between frontend and backend: State Sync Features:
  • Workspace Awareness - Agents see current page, workbook, dashboard
  • Real-Time Updates - UI responds instantly to agent actions
  • Multi-Agent Coordination - Delegation chains tracked with full context
  • Draft System - AI proposals require explicit user approval
  • Offline Resilience - State cached locally, syncs when online
  • Audit Trail - Every state change logged for compliance

Security & Permissions

Authentication Flow

Authentication Helpers:
// lib/api/backend-auth.ts
import { createClient } from '@/lib/supabase/server';

export async function getAuthHeaders(): Promise<HeadersInit> {
  const supabase = await createClient();
  const { data: { session }, error } = await supabase.auth.getSession();

  if (error || !session) {
    throw new Error('Unauthorized: User must be authenticated');
  }

  return {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${session.access_token}`,
  };
}

Row-Level Security (RLS)

PostgreSQL RLS policies enforce fine-grained access control: Example: Portfolio Access Policy
-- Users can only see portfolios in their teams
CREATE POLICY portfolio_team_access ON investment.portfolios
  FOR SELECT
  USING (
    team_id IN (
      SELECT team_id FROM iam.team_members
      WHERE user_id = auth.uid()
    )
  );

-- Portfolio managers can modify their portfolios
CREATE POLICY portfolio_manager_update ON investment.portfolios
  FOR UPDATE
  USING (
    portfolio_manager_id = auth.uid()
    AND EXISTS (
      SELECT 1 FROM iam.user_roles
      WHERE user_id = auth.uid()
      AND role_code = 'PORTFOLIO_MANAGER'
    )
  );
IAM Roles:
RoleScopePermissions
ADMINGLOBALFull system access
FUND_MANAGERORGANIZATIONManages funds and strategies
PORTFOLIO_MANAGERTEAMManages portfolios
OPERATIONS_LEADORGANIZATIONOperational activities
COMPLIANCE_OFFICERORGANIZATIONCompliance and audits
VIEWERTEAMRead-only access

Tenant Isolation

Every agent tool execution is wrapped with tenant context:
# app/tenant/isolation.py
from functools import wraps

def with_tenant_context(func):
    @wraps(func)
    async def wrapper(*args, **kwargs):
        tenant_id = get_current_tenant_id()

        # Inject tenant_id into database queries
        async with db.session() as session:
            await session.execute(
                "SET LOCAL app.current_tenant_id = :tenant_id",
                {"tenant_id": tenant_id}
            )

            # Execute tool
            result = await func(*args, **kwargs)

            # Audit log
            await log_tool_execution(
                tenant_id=tenant_id,
                tool_name=func.__name__,
                params=kwargs,
                result=result
            )

            return result

    return wrapper

Performance Considerations

Response Time Optimization

Target Latencies:
  • Initial response: < 2 seconds
  • Tool execution: < 5 seconds
  • Stream first token: < 500ms
  • Database queries: < 100ms
Optimization Strategies:
  1. Parallel Tool Execution - LangGraph executes independent tools concurrently
  2. Database Indexing - All foreign keys and frequently queried columns indexed
  3. Materialized Views - Pre-aggregated data for dashboard queries
  4. Query Caching - Redis cache for read-heavy endpoints
  5. Connection Pooling - Persistent database connections (Supabase Supavisor)

Scalability

Horizontal Scaling:
  • FastAPI backend deployed on Fly.io with auto-scaling
  • Multiple worker instances handle concurrent requests
  • Stateless design (all state in Supabase)
Vertical Scaling:
  • Supabase Pro plan: 8GB RAM, 4 vCPU
  • Connection pool: 100 concurrent connections
  • TimescaleDB optimizations for time-series data
Cost Optimization:
  • Infrastructure: ~$65/month (Fly.io + Supabase)
  • OpenAI API: ~$0.01-0.03 per conversation
  • Total: Less than $100/month for 1,000 conversations

Monitoring

Health Checks:
@app.get("/health")
async def health_check():
    return {
        "status": "healthy",
        "agents": len(agent_registry.get_all_agents()),
        "database": await check_database_connection(),
        "openai": await check_openai_connection(),
        "version": settings.VERSION
    }
Metrics Tracked:
  • Agent response times (p50, p95, p99)
  • Tool execution success rates
  • Database query performance
  • Token usage and costs
  • Error rates by agent/tool

Learn More

Deeper Dives

  • Agent Integration Patterns - Learn about CopilotKit, AG-UI, SSE streaming, and workspace sync patterns
  • Backend Fact Sheet - Complete technical architecture and tool suite documentation
  • Architecture Decision Record - Why we chose LangGraph + Python over TypeScript alternatives
  • Database Schema - Explore the 14 business schemas and agent state tables
  • User Guides - Step-by-step tutorials for building with agents
  • API Reference - Complete agent API documentation
  • Tool Development - Create custom tools for your agents
  • Deployment - Production deployment guide for Fly.io

Summary

The OpsHub Agent System delivers intelligent automation for investment operations through:
  • Multi-Agent Orchestration - 10 specialized domain experts
  • LangGraph Workflows - Durable execution with checkpointing
  • Comprehensive Tool Suite - 62+ enterprise automation tools
  • Real-Time Bidirectional Sync - AG-UI protocol integration
  • GPT-4 Reasoning - Natural language understanding
  • Enterprise Security - RLS policies, tenant isolation, audit trails
  • Production-Ready - Less than $100/month infrastructure, auto-scaling
Key Technical Decisions:
DecisionRationale
LangGraph over TypeScriptBattle-tested Python ecosystem, superior state management
FastAPI backendHigh performance, async support, OpenAPI documentation
Supabase PostgreSQLReal-time subscriptions, RLS policies, TimescaleDB extensions
AG-UI protocolFramework-agnostic bidirectional state sync
GPT-4o modelBest balance of performance, cost, and reasoning capability
Next Steps:
  1. Explore the Agent Integration Patterns guide
  2. Review the Backend Fact Sheet for complete tool documentation
  3. Build your first agent integration using the API Reference