Goodhart's Law | behavior.engineering

Goodhart’s Law originates in economics but applies sharply to AI model development: the moment you optimize for a specific metric, that metric tends to decouple from the thing you actually cared about. In model behavior work, this shows up constantly — a helpfulness score goes up, but the model has learned to pad responses with unnecessary reassurances rather than actually being more helpful. Or a safety metric improves because the model refuses more often, not because it’s making better judgments. For behavior architects, Goodhart’s Law is a useful check on overconfidence in any single measure of quality. Good evaluation strategies use multiple metrics and qualitative review precisely to catch these kinds of gaming effects.