Glossary
Failure Mode
A specific, recurring pattern in which a model produces incorrect, harmful, or otherwise unacceptable outputs.
A failure mode is a categorized way the model can go wrong — not just a one-off mistake, but a recognizable pattern that appears across multiple inputs or contexts. Common failure modes include hallucination (confidently stating false information), over-refusal (declining benign requests), sycophancy (agreeing with incorrect user assertions), and format drift (abandoning required output structure mid-conversation). Naming and categorizing failure modes is a behavior architecture practice: it turns individual incidents into patterns that can be understood, measured, and addressed systematically. For behavior architects, maintaining a failure mode taxonomy helps teams prioritize improvement efforts based on frequency, severity, and the feasibility of fixing each type of failure.