The RaptorB Reference Architecture for Production AI

Most reference architectures for enterprise AI look impressive in a slide deck and collapse the moment they meet a real procurement review, a real security team, or a real CFO. They get the boxes right — agents, runtimes, models, data — and miss the layers that actually decide whether AI ships, scales, and survives its first audit.

This is the architecture we use at RaptorB. It's vendor-agnostic on purpose. It treats security and cost as first-class pillars, not shared services. And it assumes you will, at some point, swap a provider — so it's designed for that day.

The four layers that matter

A working production AI stack has four layers that have to be designed together, not bolted onto each other:

Agentic systems — where the actual work happens
Runtime & interfaces — how systems talk to models, tools, and each other
Shared services — identity, evaluation, observability, cost telemetry
Foundation — compute, data, models, network

Most reference diagrams stop at describing these. The interesting part is what runs across them: the security plane, the cost plane, and the policy plane. We'll come back to that.

Layer 1 — Agentic systems

This is the layer business stakeholders care about, because it's where outputs come from. It has four common shapes:

Enterprise productivity agents — horizontal capabilities everyone in the company uses (summarization, search, drafting). High volume, low complexity per interaction.
Purpose-fit agents — built around specific business workflows (claims triage, KYC review, contract analysis). Tight integration with domain logic.
Highly custom agents — bespoke systems built on lower-level frameworks for problems that don't fit any pattern. The most expensive to maintain.
Workflow automation — agents as a step in a larger deterministic process. The agent is the variable; the process around it is fixed.

The mistake most teams make at this layer is treating everything as a custom agent. Custom is expensive. Most enterprise AI value is unlocked by purpose-fit agents wired into existing workflows — not by reinventing them.

Layer 2 — Runtime & interfaces

This is where things break first when you scale. The architecture has three gateways and one fallback:

API Gateway — the front door for agent-to-system communication. Authn, authz, rate limiting, request shaping.
MCP Gateway — for tool integrations using Model Context Protocol. The emerging standard for how agents discover and use enterprise capabilities.
LLM Gateway — abstraction layer between your agents and the model providers. This is where vendor-agnostic actually happens.
Direct API calls — the escape hatch for cases that don't fit a gateway pattern.

The LLM Gateway is the single most underrated component in the stack. Without it, you're hardcoded to one provider's SDK in dozens of services, and switching becomes a six-month rewrite. With it, you can route by use case, fall back when a provider degrades, run A/B tests across models, and enforce cost ceilings centrally.

The MCP Gateway is the second most underrated. As MCP adoption grows, every agent will need to discover, authenticate against, and call dozens of tools. Doing this without a gateway means every agent re-implements the same logic, badly.

Layer 3 — Shared services

This is the layer where most reference architectures get hand-wavy. Each of these deserves to be a real product, not a TODO:

Agent & Workflow Registry & Discovery — where agents live, who owns them, what they can do.
Tool Registry & Discovery — the inventory of capabilities agents can use, with descriptions agents can reason about.
Logging & Observability — request traces, model calls, tool invocations, latencies. Without this, debugging production issues is guessing.
Evaluations — automated quality checks on agent outputs against representative inputs. Run continuously, not just at release.
Identity & Access Management — for both humans and agents. Token issuance, scoped permissions, audit trails. This is the layer most teams treat as optional and regret within twelve months.
Tuning, Training Data, Feedback — the loop that turns operational data into model improvement.
Compliance, Risk, Control Outputs — the policy plane. What can the agent do, when, and with what data.

These aren't add-ons. They're how a prototype becomes a system that 500+ people can rely on without it breaking weekly.

Layer 4 — Foundation

The boring layer, and the one most teams over-invest in early:

Automation & IaC — every component deployable from code
Foundational models — the LLMs themselves, owned or rented
Data processing & catalog — where the inputs live and how they're discovered
Compute & network — the substrate
Memory management — vector stores, conversation history, agent state
MLOps — the discipline for keeping the models honest over time

Most companies have most of this already. The mistake is rebuilding it for AI specifically when the existing stack would work fine.

What most reference architectures miss

Three things, consistently:

Security is not a shared service

Treating identity, access, and policy enforcement as just another service in Layer 3 is how breaches happen. In our architecture, security is a plane that cuts across all four layers — every component is configured to authenticate, authorize, and audit by default. It's not opt-in.

This matters most in regulated industries (financial services, healthcare, critical infrastructure) where a single agent making an unauthorized data access can derail an entire AI rollout in a compliance review.

Cost is not an afterthought

The cost of running production AI compounds in invisible ways. Tool calls retroactively triggered by long conversations. Embedding storage growing untracked. Evaluation pipelines burning tokens silently. Retry loops that double real spend.

Cost telemetry should be a first-class pillar, instrumented at every gateway and aggregated by use case, team, and model. Without this, your CFO finds out about the AI bill from the invoice, not from your dashboard.

Vendor-agnostic isn't optional

Every reference architecture from a Big Three consultancy assumes you've already committed to a cloud provider or a model partner. Ours assumes you haven't, and shouldn't have to. Open-source frameworks, model abstraction, modular gateways — designed so that the day a new model is meaningfully better, you can swap to it without rewriting your application code.

This isn't ideology. It's risk management. Provider markets in AI are moving fast enough that committing to a single vendor for a three-year contract is the most expensive decision most enterprises will make this decade.

How to use this

If you're starting an AI rollout: build Layer 2 (the gateways) before Layer 1 (the agents). Most teams do the opposite, ship an agent prototype, and then discover they need a gateway around month nine. Building the gateways first makes every agent that follows cheaper to ship.

If you have an AI rollout already in production: audit Layer 3. Most teams are missing identity, evaluations, or cost telemetry. Pick the most painful gap and close it before the next agent ships.

If your AI rollout stalled in compliance review: the problem is almost always missing audit trails, missing data residency controls, or missing explainability — all Layer 3 problems that look optional until they aren't.

—

If any of this sounds familiar, our AI Production Readiness Audit is designed to surface exactly where your architecture is exposed — in two weeks, with a concrete remediation plan. Or start a conversation and we'll figure out the shape together.