Glossary
Red-Teaming
Deliberately attempting to find failure modes, safety vulnerabilities, and policy violations in a model by acting as an adversarial user.
Red-teaming borrows the term from military and security contexts, where a “red team” plays the adversary to test defenses. In AI development, red-teamers try to make the model misbehave — producing harmful content, violating its policies, leaking system instructions, or behaving in ways that would embarrass the organization if made public. Red-teaming is most valuable when red teamers think creatively and persistently rather than mechanically: a vulnerability that’s hard to find with a naive approach may be straightforward once you think like an adversary. For behavior architects, red-teaming is essential before major launches and should feed directly into evaluation suites, policy updates, and training data — found vulnerabilities should become permanent regression tests.