Glossary
AI Safety
The field concerned with ensuring that AI systems behave in ways that are safe, controllable, and aligned with human values — especially as they become more capable.
AI safety is a research and engineering discipline focused on the risks that arise from advanced AI systems — from near-term problems like harmful outputs and misuse, to longer-term concerns about AI systems that pursue goals humans didn’t intend. In practical product work, AI safety shows up in content moderation, refusal calibration, red-teaming, and the development of behavioral specifications that constrain harmful behavior. For behavior architects, AI safety is the broader field context for the work: much of what they do daily — evaluating for harm, writing policy, designing refusal behavior — is applied AI safety, even if it’s not always labeled that way. Staying connected to the broader safety research community helps practitioners anticipate problems before they become urgent.