Glossary
Quantitative Metrics
Numerical measurements used to track model performance across dimensions like accuracy, refusal rate, response length, or user satisfaction.
Quantitative metrics give behavior teams a shared, objective language for discussing model performance — things like accuracy rate on factual questions, percentage of appropriate refusals, average response length, or thumbs-up rate in the product. They’re essential for tracking improvement over time, comparing model versions, and making data-driven decisions. The risk is overreliance: metrics are always proxies for what you actually care about, and the relationship between a metric and its underlying goal can decay through Goodhart’s Law. For behavior architects, the discipline is selecting metrics that genuinely reflect user experience, tracking multiple metrics simultaneously to catch tradeoffs, and pairing quantitative monitoring with qualitative review to understand what the numbers actually mean.