Glossary
Regression Testing
Running a consistent set of test cases after a change to verify that previously working behavior hasn't broken.
Regression testing is borrowed from software engineering: after any change — a new model version, an updated system prompt, a modified finetuning dataset — you run a fixed battery of tests to make sure nothing that was working before has broken. In model behavior work, regressions are especially insidious because they can be subtle: the model might improve on one dimension while quietly degrading on another. A good regression suite covers a range of behavioral scenarios and flags any statistically meaningful decline. For behavior architects, regression testing is non-negotiable infrastructure — without it, you can’t know whether a change was genuinely an improvement or just a shuffle of which things work and which don’t.