Glossary
Model Judgment
A model's ability to reason through ambiguous or novel situations and arrive at contextually appropriate decisions without explicit instructions for every case.
Model judgment refers to the capacity a model develops — through training and alignment — to handle situations its creators didn’t specifically anticipate and still make reasonable, principled decisions. A model with good judgment, when faced with an ambiguous request, considers the likely intent, weighs potential harms, and responds in a way that a thoughtful person would recognize as sensible. This is distinct from rule-following: a rule-following model can only handle situations it has explicit instructions for, while a model with good judgment can generalize to new cases. For behavior architects, cultivating good judgment in a model is one of the most ambitious goals — it requires not just specifying rules, but helping the model internalize the values behind them.