Model Behavior
Failure Modes
How model behavior breaks — and what to do about it. Each entry covers the definition, why it matters, an example, how to detect it, sample evaluation prompts, and concrete mitigations.
Factuality
Citation fabrication
The model writes a citation — an author, a paper, a URL — that doesn't exist or doesn't say what the model claims it says.
MediumFalse certainty
The model states uncertain, estimated, or unknown information with unwarranted confidence — giving no signal that the answer might be wrong.
HighHallucination
The model produces a confident, fluent answer that's wrong or made up.
Safety
Failure to escalate
The model handles something on its own that should have been passed to a human, a different system, or a higher authority.
HighUnder-refusal
The model says yes to something it should have said no to — a request that violates safety policy, falls outside scope, or could cause real harm.