The discipline
of model
behavior.
Model Behavior Architecture is the practice of specifying, testing, governing, and improving how AI systems act in real-world products. This is the place to learn the role, use the tools, study examples, and build the artifacts that make model behavior reliable.
Why model behavior needs architecture
Most people experience an AI product through one thing: how it behaves. What it says, what it refuses, how it handles being wrong, what tone it takes. When that behavior is inconsistent or surprising, the product feels broken — even when the underlying model is capable.
Prompting is one piece of the answer, but it isn't the whole answer. Real products need a written description of how the model should behave, tests that show whether it does, clear lines around what it shouldn't do, and a way to keep watching once real people are using it. Model Behavior Architecture is the work of putting those pieces together on purpose, instead of leaving them to chance.
What model behavior architects work on
Behavior Specification
Write down what the model should and shouldn't do — turning product goals, user needs, and safety limits into clear, testable rules.
Prompt Architecture
Translate those rules into the system instructions, examples, and context that actually produce the behavior you want.
Evaluation
Test whether the model behaves the way you said it should — across normal use, edge cases, and the situations someone will try to break.
Safety and Governance
Turn policy into real behavior: when to refuse, when to escalate, what to log, and how to keep behavior in line as the product changes.
Failure Mode Analysis
Study how model behavior breaks — hallucination, over-refusal, sycophancy, tool misuse — and build the checks that catch each pattern.
Human-AI Interaction
Shape tone, uncertainty, clarification, and recovery — the moment-by-moment behavior the user actually feels.
The behavior stack
Model behavior isn't shaped by the prompt alone. Each of these layers contributes — and a model behavior architect works across all of them.
- 01
Product intent
What the product is for and the outcome it owes the user.
- 02
User context
What the user is trying to do, who they are, and what they already know.
- 03
Behavioral principles
The values and priorities the model should follow when things conflict.
- 04
System instructions
The prompt and examples that tell the model how to behave.
- 05
Tools and retrieval
The tools, data, and documents the model can reach for.
- 06
Safety boundaries
The lines the model shouldn't cross, and what it does instead.
- 07
Interface design
How the response is shown to the user — phrasing, formatting, controls, recovery.
- 08
Evaluation
How you check whether the model is actually behaving the way you said.
- 09
Monitoring and governance
How you keep watching once real users are involved, and how behavior gets changed on purpose.
Start here
New to model behavior?
What is a Model Behavior Architect?
Understand the role, what it owns, and how it differs from prompt engineering.
Want to learn the discipline?
The 90-Day Plan
A structured path from vocabulary to artifacts to evaluation practice.
Need working artifacts?
Templates and Frameworks
Behavior specs, evaluation rubrics, system prompt architectures, refusal policies, and more.
Building an AI product?
Case Studies
Behavior architecture applied to real product contexts — fintech support, education, and more.
Diagnosing a problem?
Failure Mode Library
How model behavior breaks — definitions, examples, how to detect each pattern, and how to fix it.
Learning the vocabulary?
Glossary
The key terms in model behavior architecture, defined for practitioners at every level.
A field guide for the discipline
- 6 practice areas
- 9 layers in the behavior stack
- 12 templates
- 10 failure modes
- 5 case studies
- 184+ glossary terms
- 90 day learning plan
Who this is for
Aspiring model behavior architects
Learn the discipline, build the core artifacts, and develop a portfolio of behavior work you can show.
AI product teams
Use specs, evaluations, and failure modes to make your product's behavior more consistent and easier to improve.
Responsible AI and safety teams
Turn policy into the actual responses, refusals, and escalations users see — and check that it holds.
UX and conversation designers
Shape tone, uncertainty, and recovery — the behavior the user feels at the moment of interaction.
How model behavior breaks
Full library →Factuality
Citation fabrication
The model writes a citation — an author, a paper, a URL — that doesn't exist or doesn't say what the model claims it says.
Safety
Failure to escalate
The model handles something on its own that should have been passed to a human, a different system, or a higher authority.
Factuality
Hallucination
The model produces a confident, fluent answer that's wrong or made up.
Boundary
Over-refusal
The model declines a safe, legitimate request because it pattern-matches to a harmful category instead of reasoning about actual risk.
Judgment
Sycophancy
The model changes its answer to match what it thinks the user wants to hear — abandoning accuracy or honesty in favor of approval.
Agentic
Tool misuse
In an agentic system, the model calls a tool wrong — wrong tool, wrong arguments, wrong order, or an action the user never asked for.
Core artifacts
All templates →specification
Model Behavior Specification
A structured template for defining what an AI system should and should not do — the foundational artifact of model behavior architecture.
evaluation
Evaluation Rubric
A template for scoring model responses against defined behavioral criteria — the core instrument for systematic AI behavior testing.
testing
Red-Team Test Set
A structured template for building adversarial test cases that probe the safety and robustness of AI behavior — covering jailbreaks, manipulation, edge cases, and policy boundary tests.
governance
Refusal Policy
A template for documenting what an AI system refuses, why it refuses, and how it communicates refusals — making the hardest behavioral decisions explicit and consistent.
Key terms
Full glossary →Model Behavior
The observable outputs, responses, and actions of an AI model as experienced by users and systems interacting with it.
Behavioral Specification
A written document or set of guidelines that defines how a model is expected to behave across different situations.
System Prompt
Instructions provided to a model at the start of a session, before any user input, that establish its role, behavior, and constraints.
Refusal Behavior
The patterns and decisions behind when and how a model declines to fulfill a request.
Failure Mode
A specific, recurring pattern in which a model produces incorrect, harmful, or otherwise unacceptable outputs.
Red-Teaming
Deliberately attempting to find failure modes, safety vulnerabilities, and policy violations in a model by acting as an adversarial user.