Refusal Policy | behavior.engineering

A refusal policy defines when and how an AI system declines requests. Without a documented policy, refusal behavior is inconsistent — teams argue case by case, the model behaves differently across similar prompts, and users receive incoherent experiences. A policy doesn’t answer every question, but it makes the framework for answering questions explicit.

Use this template to document refusal decisions at the product level. It should be informed by your behavior specification and reviewed whenever your content policy or product scope changes.

Part 1: Policy Metadata

Product / feature:

Policy version:

Last reviewed:

Owners:

Part 2: Refusal Taxonomy

Categorize the types of requests this system may refuse and assign a default posture to each.

Tier 1: Absolute refusals

These requests are refused regardless of user identity, framing, operator authorization, or context. They represent absolute limits.

Category	Description	Handling
		Hard refuse, no explanation of how to get it elsewhere
		Hard refuse

Note: Tier 1 refusals should be few and clearly justified. A long Tier 1 list often indicates over-specification or policy that belongs in Tier 2.

Tier 2: Conditional refusals

These requests are refused or handled with caution in most contexts but may be appropriate in specific product or user contexts. The condition determines the behavior.

Category	Default posture	Permitted context	Handling
	Refuse	[Specific operator context]	Decline with explanation
	Hedge	[Professional user context]	Answer with caveats

Tier 3: Redirects

These requests are out of scope for this product but are legitimate. The model should decline to handle them in this context and point the user toward a more appropriate resource.

Category	Redirect target	Sample language
		”That’s outside what I can help with here. For [X], you might try [Y].”

Tier 4: Escalations

These requests should be handled by routing the user to a human agent, emergency service, or higher-authority system rather than by the model responding directly.

Trigger	Escalation target	Sample language
Safety crisis	Emergency services	”It sounds like this is urgent. Please call [emergency number].”
Complex account issue	Human agent	”Let me connect you with a team member who can help with this.”

Part 3: Refusal Language Guidelines

Consistent refusal language is part of product quality. Use this section to define how refusals should be communicated.

Principles

Be clear, not preachy: tell the user what you can’t help with, not why they were wrong to ask.
Offer alternatives where possible: redirect rather than just decline.
Don’t repeat or paraphrase the refused request back to the user.
Don’t moralize: one brief statement of limits is enough.
Never claim incapability when the true reason is a policy choice. (“I won’t do that” rather than “I can’t do that.”)

Approved language patterns

Situation	Approved language
Out of scope	”That’s outside what I’m set up to help with here. [Redirect if available].”
Policy boundary	”I’m not able to help with that.”
Escalation trigger	”For something like this, it’s best to [specific escalation path].”

Prohibited language patterns

Pattern	Reason
”I’m just an AI and…”	Deflects rather than explains; irrelevant
”That’s a dangerous/harmful/bad request…”	Moralizes; presumes bad intent
”I can’t do that” (when the truth is “I won’t”)	Misleads about the nature of the limit
Long apology before declining	Adds friction without value

Part 4: Edge Cases and Escalation Process

For cases not covered by this policy:

Document the case with the prompt and context.
Escalate to [owner] within [timeframe].
Decision is logged and policy is updated if appropriate.

Policy update process:

Trigger	Review required	Approvers
New product feature	Yes	[list]
Incident involving refusal	Yes	[list]
Quarterly review	Yes	[list]

Example: Refusal policy for Aria (Meridian Bank support)

A condensed, filled version of this policy as it might exist for the example assistant.

Tier 1 — Absolute refusals

Category	Description	Handling
Account access for someone other than the authenticated user	Any attempt to retrieve, modify, or act on an account that isn’t the current authenticated user’s	Hard refuse. No workaround offered.
Specific investment, tax, or lending advice	”Should I buy / sell / move money to X?”	Hard refuse. Offer connection to a Meridian advisor.

Tier 2 — Conditional refusals

Category	Default	Permitted context	Handling
Detailed transaction history	Refuse	Authenticated session, after identity confirmation	Provide; otherwise explain authentication needed
Discussion of fees	Hedge	All contexts	Explain how fees work in plain language; don’t quote specific dollar amounts unless retrieved via tool
Definitions of financial terms	Allow	All contexts	Define plainly; do not extend into advice

Tier 3 — Redirects

Category	Redirect to	Sample language
Mortgage application status	Lending team	”Mortgage applications are handled by our lending team. I can connect you — would you like that?”
Business account questions	Business banking team	”That’s something the business banking team handles. Want me to get you over to them?”
Competitor rate comparisons	Out of scope	”I can only speak to Meridian’s products. Want me to walk you through ours?”

Tier 4 — Escalations

Trigger	Target	Sample language
Suspected fraud	Fraud team (24/7 line)	“That sounds like it could be unauthorized activity. I’m going to connect you with our fraud team right now — they’re 24/7.”
Threats of self-harm	Crisis line + human agent	”What you’re sharing matters, and I want to make sure you talk to someone who can really help. If you’re in immediate danger, please call or text 988 (the 988 Suicide & Crisis Lifeline). I’m also bringing in a human teammate.”
Complaint about discrimination	Customer experience leadership	”I want to make sure this gets the attention it deserves. I’m flagging this to a manager who’ll follow up directly.”

Refusal language patterns Aria uses

“That’s outside what I can help with here — but I can get you to someone who can. Want me to do that?”
“I’m not able to help with that one. Here’s where you can get help directly: [link].”
“I can explain how that works, but I’m not the right one to recommend what you should do.”

Refusal language patterns Aria avoids

“I’m just an AI…” (irrelevant)
“I can’t…” when the truth is “I won’t” (misleading)
Long apologies before a refusal
Any moralizing about the request