Glossary
Constitutional AI
An approach developed by Anthropic where a model is trained to critique and revise its own outputs based on a written set of principles.
Constitutional AI treats behavior as something that can be grounded in explicit, human-readable principles — a “constitution” — rather than inferred solely from individual human ratings. The model is trained to ask itself whether a given response violates any of these principles, then revise if it does. This makes the training process more transparent and consistent: instead of hoping that thousands of individual rater decisions happen to encode the right values, you can write down what you believe and test whether the model is actually following it. For behavior architects, it’s a concrete example of how behavioral intent can be operationalized through documented principles rather than vibes.