Glossary
Failure Taxonomy
An organized classification system that categorizes the different ways a model can fail, enabling systematic tracking and prioritization.
A failure taxonomy is the vocabulary your team uses to talk about how the model goes wrong — a consistent set of categories like “hallucination,” “over-refusal,” “format drift,” “policy violation,” or “sycophancy” that can be applied to individual failures across log review, red-teaming, and evaluation. Having a shared taxonomy transforms incident review from informal storytelling (“the model said something weird”) into structured analysis (“we’re seeing a 15% increase in over-refusal on medical topics”). It also enables prioritization: once failures are categorized, you can measure frequency and severity by type and direct resources toward the most impactful problems. For behavior architects, building and maintaining a failure taxonomy is one of the most valuable organizational contributions they can make.