Flagship case study
One ERP AI layer
Three production AI systems for a single enterprise ERP client — Operations Copilot, NL2SQL Data Agent, and Knowledge Assistant — built on one shared safety and evaluation practice. Independently deployed; wired together only where a real dependency exists.
The problem
An ERP holds the data a business actually runs on — inventory, orders, suppliers, money. Bolting a chatbot onto that is easy; building one that can act on it without corrupting state or inventing numbers is the hard part. The bar for shipping was concrete: every write passes through human approval, every analytical query is safe by construction, and every answer carries grounding you can audit.
Architecture
Operations Copilot is the centerpiece. It routes to role-scoped domain specialists and calls governed tools over MCP — including the optional NL2SQL Data Agent service. Knowledge Assistant is a sibling system in the same ERP AI layer, not orchestrated by the copilot.
* optional / read-only paths.
How it works
Operations Copilot
A FastAPI service running a DeepAgents harness. A router classifies each request to a role-scoped specialist that receives only the tools it needs — selected from a static catalog and tags, not prompt guesswork. Answers are labeled authoritative, derived, or unverified from captured tool traces.
Governed tools & approval boundary
A Spring AI / Java 21 MCP server owns the MySQL business data and exposes 10 read + 4 write tools. Approval is deliberately not an agent tool: the model can propose a write, but execution runs through a human-controlled REST path bound to a single-use, payload-hashed, TTL'd approval.
Analytics Agent (NL2SQL)
Self-service analytics over the warehouse, reached by Operations Copilot via optional MCP. A deterministic SQLGlot guard (SELECT-only, scope/fanout checks, auto-LIMIT) and a Qdrant semantic layer that cut prompt context ~73% vs full-schema dumps; bounded SQL repair and result-equivalence regression evals across DuckDB and ClickHouse.
Knowledge Assistant (RAG)
A standalone LangGraph-orchestrated RAG system over enterprise documents: Milvus hybrid retrieval with intent-routed strategies, citation grounding with strict-evidence refusal, RBAC, and LLM-judge / citation evals with per-query observability.
Outcomes & evidence
- ~73% prompt-context reduction via the Qdrant semantic layer vs full-schema dumps.
- Deterministic SQLGlot guard — SELECT-only, scope/fanout checks, auto-LIMIT; dangerous SQL is blocked before it reaches the warehouse.
- Single-use, cryptographically-bound approval on every write — validated for actor/session/tool binding, payload hash, expiry, and one-time use before execution.
- Trace-grounded answers marked authoritative / derived / unverified, with source evidence from tool calls.
- Eval harnesses for routing, tool-choice, grounding, result-equivalence, and live smoke — agent behavior you can measure before shipping.