Notes
An approval boundary agents can't bypass
The scariest moment in an agent demo is the first time it says “done — I've updated the order.” Once a language model can write to your business data, the usual safety story — a careful system prompt, a polite “only do this if you're sure” — stops being a control. The model is a text generator that can be wrong, jailbroken, or injected. You can't prompt your way out of that.
So in the ERP operations copilot, I drew a hard line: the model can
propose a write, but it can never approve one. Approval is not in the
agent's toolset at all. The agent has a request_approval tool that
only creates a pending request; a human approves it through a separate
REST path the agent has no access to; and execution happens against a record the
model never controls.
The chain that runs before a write
The interesting part is execution. When an approved write finally runs, the Java MCP server re-validates the whole chain — not because it trusts the agent, but because it trusts nothing:
// Approval is NOT an agent tool. The model requests; a human approves.
// Execution re-validates the entire chain before the write runs:
if (!APPROVED.equals(record.getStatus())) reject("not approved");
if (record.isExpired()) reject("expired"); // 15-min TTL
if (!record.getPayloadHash().equals(hash(payload))) reject("hash mismatch"); // integrity
if (!record.getToolName().equals(op.toolName())) reject("binding"); // no reuse
if (!record.getOperationType().equals(op.type())) reject("binding");
if (!markConsumed(record)) reject("already used"); // single-use
revalidatePreconditions(op); // stale world -> reject
execute(op); // only now, in a transaction Each line closes a specific door:
- Status & expiry. The approval has to be in the
APPROVEDstate and inside a 15-minute TTL. An approval that's been sitting around, or was never granted, is worthless. - Payload integrity. The payload must be valid JSON whose hash matches what was approved. You can't get a human to approve a small change and then execute a large one — the bytes are pinned.
- Binding. The
toolNameand operation type must match the record. An approval minted fororder_updatecan't be redirected intopurchase_order_create. It's also bound to the actor and session that approved it. - Single use. Execution marks the record consumed in the same step. Replaying the call does nothing — the second attempt finds it spent.
- Freshness. Preconditions are re-checked at execution time; if the world moved since approval, the write is rejected and a fresh approval is required.
Why split it this way
The model's job is to be useful; the boundary's job is to be paranoid. Keeping those in different processes — Python agent on one side, Java approval executor on the other — means the two concerns can't blur. A prompt injection that convinces the agent to “just run the refund” still hits a wall: it can request approval, but a human never clicked, so nothing executes. A hallucinated tool call produces an approval ID that doesn't validate. A replayed request is already consumed.
“Be careful” is a vibe. A signed, hashed, expiring, single-use, actor-bound approval is a control — and it's the difference between a demo and something you'd let touch production.