The Role 90-Day Plan Templates Failure Modes Case Studies Glossary

Model Behavior

Failure Modes

How model behavior breaks — and what to do about it. Each entry covers the definition, why it matters, an example, how to detect it, sample evaluation prompts, and concrete mitigations.

Factuality

High

Citation fabrication

The model writes a citation — an author, a paper, a URL — that doesn't exist or doesn't say what the model claims it says.

Medium

False certainty

The model states uncertain, estimated, or unknown information with unwarranted confidence — giving no signal that the answer might be wrong.

High

Hallucination

The model produces a confident, fluent answer that's wrong or made up.

Safety

High

Failure to escalate

The model handles something on its own that should have been passed to a human, a different system, or a higher authority.

High

Under-refusal

The model says yes to something it should have said no to — a request that violates safety policy, falls outside scope, or could cause real harm.

Boundary

High

Over-refusal

The model declines a safe, legitimate request because it pattern-matches to a harmful category instead of reasoning about actual risk.

Judgment

Medium

Context neglect

The model ignores information already in the conversation — producing responses that contradict or overlook what the user has shared.

High

Sycophancy

The model changes its answer to match what it thinks the user wants to hear — abandoning accuracy or honesty in favor of approval.

Tone

Medium

Persona drift

Over a long conversation, the model gradually slides off its defined tone, role, or rules — turning into something different by the end.

Agentic

High

Tool misuse

In an agentic system, the model calls a tool wrong — wrong tool, wrong arguments, wrong order, or an action the user never asked for.

© 2026 behavior.engineering