Escalation Policy | behavior.engineering

An escalation policy decides which situations an AI product handles on its own and which it hands off to a human, a different system, or an emergency service. Without a written policy, escalation is ad hoc — the model handles things it shouldn’t, or punts on things it could resolve. Both feel bad to the user.

Use this template alongside your behavior specification and refusal policy. The three documents answer different questions: what the model should do, what the model shouldn’t do, and when the model should hand off.

Part 1: Policy metadata

Product / feature:
Policy version:
Last reviewed:
Owners:

Part 2: Escalation tiers

Group escalation triggers by how urgently they need to be handed off. The handling for each tier should be different.

Tier A — Emergency

Situations where delay would be harmful. The model should not try to resolve these conversationally. It should provide the relevant resource immediately and, where possible, route to a human.

Trigger	Target	Handling

Tier B — Out of authority

Situations the model is not authorized to handle. The model should explain it can’t help here and route to the team that can.

Trigger	Target	Handling

Tier C — Out of capability

Situations the model is technically able to talk about but shouldn’t resolve. (Common for advice that requires a licensed professional.)

Trigger	Target	Handling

Tier D — User preference

The user has asked for a human. The model should hand off cleanly without making them justify the request.

Trigger	Target	Handling
Explicit request: “speak to a person”	Human queue	Acknowledge and route immediately

Part 3: How the model should hand off

Bad escalations feel like rejection. Good escalations feel like being introduced to the right person. Define the patterns the model uses.

Acknowledge first. A short sentence that recognizes what the user said.
Name the next step. Tell the user what’s about to happen and who they’ll be talking to.
Don’t make them repeat themselves. If possible, pass conversation context to the receiving human.
Don’t apologize for the system. A clean handoff doesn’t need an apology.

Approved language patterns

Situation	Language
Emergency	”This sounds urgent. Please [specific action] right now. I’m also bringing in a human teammate.”
Out of authority	”That’s something the [team] handles directly — let me get you to them.”
Out of capability	”I can explain how this works, but I’m not the right one to advise you on what to do. Want me to connect you with a [professional]?”
User preference	”Of course — connecting you with a teammate now.”

Part 4: What “doesn’t escalate” looks like

A common failure is escalating every sensitive topic instead of handling things the model could handle. Define the lower bound too.

Things the model handles itself: [list]
Things the model handles itself even though they touch sensitive topics: [list]
Things the model never handles itself: [list — the Tier A/B/C entries above]

Part 5: Testing escalation behavior

Escalation belongs in the running evaluation suite and the red-team test set.

For each Tier A trigger, write at least three phrasings (literal, indirect, embedded in a longer message).
Score whether the model escalated, whether it provided the right resource, and whether the language matched the approved pattern.
Track the inverse too: cases the model escalated when it shouldn’t have.

Example: Escalation policy for Aria (Meridian Bank support)

Tier A — Emergency

Trigger	Target	Handling
Threats of self-harm	988 crisis line + human agent	”What you’re sharing matters. If you’re in immediate danger, please call or text 988. I’m also bringing in a human teammate.”
Active fraud in progress	Fraud team (24/7)	Escalate immediately, freeze if authorized
Reported account takeover	Fraud + security team	Escalate immediately

Tier B — Out of authority

Trigger	Target	Handling
Mortgage application status	Lending team	”Mortgage applications are handled by our lending team — let me get you to them.”
Business account changes	Business banking team	Route with conversation context
Discrimination complaints	Customer experience leadership	Flag for manager follow-up

Tier C — Out of capability

Trigger	Target	Handling
”Should I invest in X?”	Meridian advisor	”I can explain how investment products work, but I’m not the right one to recommend what to do with your money. Want me to set up time with an advisor?"
"Is this charge fraud?”	Fraud team	”I can’t make that call from here — let me connect you with the fraud team, they’re 24/7.”

Tier D — User preference

Trigger	Target	Handling
”Can I talk to a human?”	Human queue	”Of course — connecting you now.” (no justification asked)

Things Aria handles itself even though they touch sensitive topics

Explaining how a fee was calculated
Walking through what a charge looks like and how to dispute it (Aria doesn’t decide whether it’s fraud, but it can explain the dispute process)
Defining financial terms (APR, compounding, overdraft) — definitions aren’t advice