Glossary | behavior.engineering

Practice with flashcards →

A

A/B Testing: Comparing two versions of a model or prompt by splitting traffic between them and measuring outcomes on real users.
Related: Regression Testing , Evaluation Pipeline , Preference-Based Evaluation , Model Comparison , Experimental Design
Ablation Study: An experiment that systematically removes or varies individual components of a system to understand each component's contribution to overall performance.
Related: Experimental Design , Hypothesis Testing , Empirical Research , Prompt Engineering , Evaluation Pipeline
Adversarial Examples: Inputs carefully designed to cause a model to fail or produce unintended outputs, often by exploiting specific vulnerabilities in its training or architecture.
Related: Adversarial Prompting , Edge Case Construction , Red-Teaming , Jailbreaking , Prompt Robustness
Adversarial Prompting: Crafting inputs specifically designed to cause a model to behave in unintended, harmful, or policy-violating ways.
Related: Prompt Injection , Jailbreaking , Red-Teaming , Prompt Robustness , Edge Case Testing
AI Ethics: The study and application of ethical principles — fairness, accountability, transparency, harm avoidance — to the design and deployment of AI systems.
Related: AI Safety , Responsible AI , Fairness , Value Alignment , Normative Ethics
AI Governance: The frameworks, policies, processes, and oversight mechanisms that guide how AI is developed, deployed, and monitored within an organization or across society.
Related: Responsible AI , AI Ethics , Content Policy , Usage Policy , Model Card
AI Observability: The ability to monitor, inspect, and understand what an AI system is doing in production — including inputs, outputs, errors, and behavioral patterns over time.
Related: Monitoring , Log Analysis , Production Data , Behavioral Audit , Data Pipeline
AI Safety: The field concerned with ensuring that AI systems behave in ways that are safe, controllable, and aligned with human values — especially as they become more capable.
Related: Value Alignment , Harm Avoidance , Red-Teaming , AI Ethics , HHH Framework
Alignment Science: The research field focused on developing methods, theories, and techniques for making AI systems reliably pursue intended goals and values.
Related: Value Alignment , AI Safety , RLHF (Reinforcement Learning from Human Feedback) , Constitutional AI , Responsible AI
Alignment Tax: The performance costs a model may incur when trained to be safer or more value-aligned, such as reduced capability or increased refusals.
Related: RLHF (Reinforcement Learning from Human Feedback) , Value Alignment , Harm Avoidance , Refusal Behavior , Helpfulness
Annotation: The process of adding labels, ratings, or structured information to data so a model can learn from it.
Related: Labeling , Human Feedback , Data Quality , Inter-Annotator Agreement , Annotation Platform
Annotation Platform: Software designed to help teams collect, manage, and quality-control labeled data from human raters.
Related: Annotation , Labeling , Data Quality , Inter-Annotator Agreement , Label Studio
API Integration: Connecting an AI model to a product or system through a programming interface, allowing it to send requests and receive model responses programmatically.
Related: Inference Infrastructure , Eval Platform , Data Pipeline , Model Registry , Latency Optimization
Applied Ethics: The application of ethical theory and principles to specific real-world domains and practical decisions.
Related: Normative Ethics , AI Ethics , Moral Philosophy , Ethical Judgment , Responsible AI
Applied Finetuning: The practice of finetuning models in a product or business context to improve behavior for a specific use case, distinct from academic or research finetuning.
Related: Finetuning , Supervised Finetuning , Data Quality , Annotation , Behavioral Specification
Automated Evaluation: Using software, scripts, or models to score outputs without requiring human review for each individual case.
Related: LLM-as-Judge , Eval Framework , Evaluation Pipeline , Semi-Automated Evaluation , Regression Testing

B

Batch Testing: Running a large set of prompts through a model simultaneously rather than one at a time, to evaluate behavior at scale.
Related: Evaluation Pipeline , Automated Evaluation , Regression Testing , Eval Platform , Experiment Tracking
Behavior Design: The practice of intentionally defining and shaping how an AI model acts across a range of situations, rather than leaving behavior to emerge by default.
Related: Behavioral Specification , Model Behavior , Steerability , Value Alignment , Behavioral Specification
Behavioral Audit: A systematic review of a model's behavior across a defined set of scenarios to assess whether it meets expected standards.
Related: Evaluation Pipeline , Log Analysis , Failure Mode Analysis , Behavioral Specification , AI Governance
Behavioral Consistency: The degree to which a model produces similar outputs for similar inputs across different sessions, users, or contexts.
Related: Model Quality , Character Consistency , Behavioral Drift , Output Distribution , Regression Testing
Behavioral Drift: A gradual, unintended change in how a model behaves over time — often across model updates, prompt changes, or accumulating context — that wasn't explicitly planned.
Related: Behavioral Regression , Behavioral Consistency , Regression Testing , KL Divergence , Model Behavior
Behavioral Economics: A field that combines psychology and economics to study how real people make decisions, often in ways that deviate from purely rational models.
Related: Cognitive Bias , Moral Psychology , Decision-Making Frameworks , User Feedback , Human Evaluation
Behavioral Regression: When a model update or change causes behavior that was previously working correctly to degrade or break.
Related: Behavioral Drift , Regression Testing , Regression Identification , Behavioral Consistency , Evaluation Pipeline
Behavioral Specification: A written document or set of guidelines that defines how a model is expected to behave across different situations.
Related: Model Behavior Specification , Behavior Design , Content Policy , Model Behavior , Value Alignment , Policy Writing , The Role
Behavioral Taxonomy: A hierarchical classification system that organizes model behaviors into categories and subcategories for analysis, evaluation, and communication.
Related: Failure Taxonomy , Behavioral Specification , Coverage Matrix , Behavior Design , Model Behavior
Benchmark: A standardized test or dataset used to compare model performance across versions or across different models.
Related: Eval Suite , Evaluation Dataset , Automated Evaluation , Model Comparison , Capability Evaluation
Bias Detection: The process of identifying systematic patterns in model behavior that produce unfair or unequal outcomes across groups of people.
Related: Fairness , Responsible AI , AI Ethics , Evaluation Pipeline , Human Evaluation
Boundary Exploration: The systematic practice of probing the edges of a model's behavioral constraints to understand where and how its policies apply.
Related: Edge Case Construction , Edge Case Testing , Red-Teaming , Boundary Setting , Behavioral Specification
Boundary Setting: The practice of defining and communicating the limits of what a model will do — what it considers out of scope, inappropriate, or harmful.
Related: Content Policy , Refusal Behavior , Behavioral Specification , Harm Avoidance , Model Judgment
Braintrust: An evaluation and observability platform for AI products that focuses on running experiments, logging traces, and comparing prompt or model variants.
Related: Eval Platform , Evaluation Pipeline , LangSmith , Promptfoo , AI Observability

C

Calibration: The alignment between a model's expressed confidence and its actual accuracy — a well-calibrated model is appropriately uncertain when it might be wrong.
Related: Confidence , Honesty , Hallucination , Hedging , Model Quality
Capability Evaluation: Assessment of what a model is able to do — the range and level of tasks it can perform successfully under defined conditions.
Related: Benchmark , Model Comparison , Domain Evaluation , Evaluation Pipeline , Model Quality
Categorical Thinking: Reasoning by placing things into defined categories with clear rules, rather than reasoning about each case on its individual merits.
Related: Decision-Making Frameworks , Content Policy , Behavioral Specification , Model Judgment , Ethical Judgment
Chain-of-Thought Prompting: A prompting technique that encourages a model to work through a problem step by step before producing a final answer.
Related: Few-Shot Prompting , Reasoning Chain , Prompt Engineering , In-Context Learning , Task Decomposition
ChainForge: An open-source visual tool for testing and comparing LLM prompts and responses across models and configurations.
Related: Eval Platform , Promptfoo , Model Playground , Evaluation Pipeline , A/B Testing
Character Consistency: The degree to which a model maintains a stable persona, voice, and set of values across different conversations and contexts.
Related: Behavioral Consistency , Tone , Behavioral Specification , Model Behavior , Behavior Design
Chatbot Arena: A public platform developed by LMSYS where users compare responses from anonymous AI models side by side and vote for the better one, generating a crowd-sourced Elo leaderboard.
Related: Elo Ranking , Preference-Based Evaluation , Benchmark , Model Comparison , Human Evaluation
Cognitive Bias: Systematic patterns in human thinking that lead to errors in judgment, often unconsciously — including biases that can affect annotation, evaluation, and behavioral design.
Related: Moral Psychology , Behavioral Economics , Inter-Annotator Agreement , Human Evaluation , Sycophancy
Confabulation: A more precise term for hallucination that emphasizes the model is generating plausible-sounding but false information to fill gaps, rather than intentionally lying.
Related: Hallucination , Calibration , Honesty , Output Quality , Failure Mode
Confidence: The degree of certainty a model expresses — or implies — when producing an output.
Related: Calibration , Hedging , Hallucination , Honesty , Output Quality
Consequentialism: A moral framework that judges actions by their outcomes — the right action is the one that produces the best overall results.
Related: Moral Philosophy , Normative Ethics , Deontological Ethics , Virtue Ethics , Harm Avoidance
Constitutional AI: An approach developed by Anthropic where a model is trained to critique and revise its own outputs based on a written set of principles.
Related: RLAIF (Reinforcement Learning from AI Feedback) , RLHF (Reinforcement Learning from Human Feedback) , Value Alignment , Harm Avoidance , Behavioral Specification
Content Policy: A documented set of rules governing what types of content a model will and will not produce.
Related: Usage Policy , Behavioral Specification , Harm Avoidance , Refusal Behavior , Sensitive Topics
Context Engineering: The broader practice of designing what information a model has access to at inference time — including instructions, memory, tools, and retrieved content.
Related: Prompt Engineering , System Prompt , Context Window , Context Strategy , In-Context Learning
Context Strategy: A deliberate plan for what information to include in a model's context window, how to structure it, and what to exclude given space and quality constraints.
Related: Context Engineering , Context Window , Prompt Engineering , Few-Shot Prompting , System Prompt
Context Window: The maximum amount of text — measured in tokens — that a model can read and reason over in a single interaction.
Related: Context Engineering , Context Strategy , Few-Shot Prompting , In-Context Learning , Prompt Engineering
Conversation Transcripts: Records of full multi-turn interactions between users and a model, used to analyze behavior in context.
Related: Production Data , Log Analysis , Qualitative Analysis , Failure Mode Analysis , Behavioral Audit
Cost Benchmarking: Measuring and comparing the financial cost of running a model across different providers, configurations, or usage patterns.
Related: Latency Benchmarking , Benchmark , Inference Infrastructure , Model Comparison , Model Quality
Coverage Matrix: A structured map that shows which behavioral scenarios, use cases, or risk categories are represented in an evaluation suite, and which are missing.
Related: Eval Suite , Evaluation Dataset , Behavioral Taxonomy , Edge Case Construction , Behavioral Specification
Cross-Functional Collaboration: Working across organizational functions — engineering, design, policy, research, safety — to align on behavioral goals and coordinate the work needed to achieve them.
Related: Product Design Collaboration , Stakeholder Communication , Knowledge Sharing , Model Strategy , Alignment Science

D

Data Generation Pipelines: Automated systems for producing, filtering, and formatting training or evaluation data at scale.
Related: Synthetic Data , Data Quality , Annotation , Data Pipeline , Evaluation Dataset
Data Guideline Authorship: Writing clear, detailed instructions for annotators that define what good and bad model responses look like and how to evaluate them consistently.
Related: Annotation , Labeling , Data Quality , Inter-Annotator Agreement , Policy Writing
Data Pipeline: A series of automated steps that collect, process, transform, and deliver data from one system to another.
Related: Data Generation Pipelines , Data Quality , Production Data , AI Observability , Annotation
Data Quality: The degree to which training or evaluation data is accurate, consistent, relevant, and representative of the desired behavior.
Related: Annotation , Labeling , Ground Truth , Inter-Annotator Agreement , Data Generation Pipelines
Data Versioning: The practice of tracking changes to datasets over time so that training and evaluation runs are reproducible and changes can be traced.
Related: Experiment Tracking , Data Pipeline , Evaluation Dataset , Data Quality , Model Registry
Decision-Making Frameworks: Structured approaches for reasoning through complex choices, especially when values conflict or outcomes are uncertain.
Related: Normative Ethics , Ethical Judgment , Behavioral Economics , Moral Philosophy , Behavioral Specification
Deontological Ethics: A moral framework that judges actions as right or wrong based on rules and duties, regardless of their consequences.
Related: Moral Philosophy , Normative Ethics , Consequentialism , Virtue Ethics , Applied Ethics
Discourse Structure: The organization and coherence of language across multiple sentences or turns — how ideas connect, flow, and build on each other in extended text or conversation.
Related: Linguistic Analysis , Pragmatics , Conversation Transcripts , Verbosity , Format Adherence
Domain Evaluation: Evaluating a model's performance specifically within a defined subject area or application context, rather than in general.
Related: Capability Evaluation , Benchmark , Evaluation Dataset , Model Comparison , Eval Suite
Dual-Use Risk: The risk that a capability or piece of information provided by a model could be used both for legitimate purposes and to cause harm.
Related: Harm Avoidance , Content Policy , Ethical Judgment , Sensitive Topics , Red-Teaming

E

Edge Case Behavior: How a model responds to unusual, ambiguous, or boundary-pushing inputs that fall outside the common range of expected use.
Related: Edge Case Testing , Failure Mode , Edge Case Construction , Behavioral Specification , Model Judgment
Edge Case Construction: The deliberate process of designing inputs that test a model's behavior at the boundaries of expected usage.
Related: Edge Case Testing , Edge Case Behavior , Adversarial Examples , Red-Teaming , Behavioral Taxonomy
Edge Case Testing: Evaluating model behavior on unusual, extreme, or boundary-pushing inputs that are unlikely but consequential when they occur.
Related: Edge Case Behavior , Failure Mode Analysis , Red-Teaming , Adversarial Prompting , Edge Case Construction
Elo Ranking: A system for ranking models or outputs by their win rate in head-to-head comparisons, borrowed from competitive chess.
Related: Preference-Based Evaluation , Human Evaluation , Chatbot Arena , Model Comparison , Preference Learning
Empirical Research: Research based on observation and evidence rather than theory alone — drawing conclusions from data collected through experiments or systematic observation.
Related: Hypothesis Testing , Experimental Design , Qualitative Research , Quantitative Research , Ablation Study
Ethical Judgment: The capacity to reason through situations where values conflict, weigh competing interests, and arrive at a principled decision about what is right.
Related: Moral Philosophy , Moral Psychology , Model Judgment , Value Alignment , Applied Ethics
Eval Framework: A structured approach or set of tools for designing, running, and interpreting model evaluations.
Related: Evaluation Pipeline , Eval Suite , Benchmark , Eval Platform , Automated Evaluation
Eval Platform: Software infrastructure for running, tracking, and comparing model evaluations systematically.
Related: Evaluation Pipeline , Eval Framework , LangSmith , Braintrust , Promptfoo
Eval Suite: A comprehensive, organized collection of evaluation datasets and test cases that together cover the full range of behavioral requirements for a model.
Related: Evaluation Dataset , Eval Framework , Evaluation Pipeline , Benchmark , Regression Testing
Evaluation Dataset: A curated collection of inputs and expected outputs used to measure model performance in a consistent and repeatable way.
Related: Eval Suite , Benchmark , Ground Truth , Regression Testing , Evaluation Pipeline
Evaluation Pipeline: An end-to-end system for consistently measuring model behavior across a defined set of inputs and criteria.
Related: Eval Framework , Eval Suite , Automated Evaluation , Regression Testing , Eval Platform
Experiment Tracking: Recording the configuration, inputs, and outcomes of model experiments so results can be compared and reproduced.
Related: Data Versioning , Evaluation Pipeline , Model Registry , Ablation Study , A/B Testing
Experimental Design: The planning of a study or test — defining variables, controls, sample sizes, and measurement methods — so that results are valid and interpretable.
Related: Hypothesis Testing , Ablation Study , A/B Testing , Evaluation Pipeline , Quantitative Research

F

Failure Mode: A specific, recurring pattern in which a model produces incorrect, harmful, or otherwise unacceptable outputs.
Related: Hallucination , Over-refusal , Sycophancy , Failure Mode Report , Failure Mode Analysis , Failure Taxonomy , Edge Case Behavior , Hallucination , Refusal Behavior
Failure Mode Analysis: A systematic process for identifying, categorizing, and understanding the ways a model can behave incorrectly or harmfully.
Related: Failure Mode , Failure Taxonomy , Root Cause Analysis , Edge Case Testing , Red-Teaming
Failure Taxonomy: An organized classification system that categorizes the different ways a model can fail, enabling systematic tracking and prioritization.
Related: Failure Mode , Failure Mode Analysis , Root Cause Analysis , Behavioral Taxonomy , Behavioral Audit
Fairness: The property of a model treating different individuals and groups equitably and without unjustified discrimination.
Related: Bias Detection , Responsible AI , AI Ethics , Value Alignment , Evaluation Pipeline
Few-Shot Prompting: Providing a model with a small number of examples of the desired input-output pattern before asking it to complete a new task.
Related: Zero-Shot Prompting , In-Context Learning , Prompt Engineering , Chain-of-Thought Prompting , Context Strategy
Finetuning: Further training a model on a specific dataset to adjust its behavior, style, or knowledge for a particular purpose.
Related: Supervised Finetuning , RLHF (Reinforcement Learning from Human Feedback) , Training Signal , Applied Finetuning , Model Behavior
Format Adherence: A model's ability to consistently follow specified output formats, such as JSON, markdown, bullet lists, or length constraints.
Related: Instruction Following , Output Quality , Behavioral Specification , Prompt Robustness , Verbosity
Freeplay: An AI product development platform designed for testing prompts, managing model configurations, and iterating on AI features collaboratively.
Related: Eval Platform , Model Playground , Evaluation Pipeline , LangSmith , Braintrust

G

Goodhart's Law: The principle that when a measure becomes a target, it ceases to be a good measure.
Related: Reward Hacking , Reward Modeling , Quality Metrics , Alignment Tax , Training Signal
Ground Truth: The accepted correct answer or label for a data example, used as the standard against which model outputs are measured.
Related: Labeling , Annotation , Data Quality , Evaluation Dataset , Benchmark

H

Hallucination: When a model generates information that sounds plausible but is factually incorrect or entirely fabricated.
Related: Confabulation , Calibration , Honesty , Output Quality , Failure Mode
Harm Avoidance: The practice of designing model behavior to minimize the risk of producing outputs that cause physical, psychological, social, or financial harm.
Related: Content Policy , Harmlessness , Refusal Behavior , AI Safety , Dual-Use Risk
Harmlessness: A model's disposition to avoid producing outputs that could cause physical, psychological, social, or other harm to users or third parties.
Related: HHH Framework , Helpfulness , Honesty , Harm Avoidance , Content Policy
Hedging: The use of qualifying language — like "it depends," "I'm not sure," or "you may want to consult an expert" — to soften or add uncertainty to a model's response.
Related: Calibration , Confidence , Refusal Behavior , Honesty , Alignment Tax
Helpfulness: A model's ability to genuinely assist users in accomplishing their goals in a way that's accurate, clear, and appropriately complete.
Related: HHH Framework , Harmlessness , Honesty , Output Quality , Alignment Tax
HHH Framework: A framework developed by Anthropic that identifies Helpful, Harmless, and Honest as the three core properties a well-aligned AI assistant should have.
Related: Helpfulness , Harmlessness , Honesty , Value Alignment , Constitutional AI
Honesty: A model's disposition to tell the truth, accurately represent its uncertainty, and avoid creating false impressions in users' minds.
Related: HHH Framework , Calibration , Sycophancy , Hallucination , Harmlessness
Human Evaluation: Assessment of model outputs by people, used to measure quality dimensions that automated systems can't reliably capture.
Related: Preference-Based Evaluation , Inter-Annotator Agreement , Semi-Automated Evaluation , Annotation , LLM-as-Judge
Human Feedback: Ratings, comparisons, or corrections provided by people that are used to guide model training and improve behavior.
Related: RLHF (Reinforcement Learning from Human Feedback) , Preference Learning , Annotation , Labeling , Inter-Annotator Agreement
Hypothesis Testing: A research method for evaluating whether observed data supports a specific claim, by defining a hypothesis and testing it against evidence.
Related: Experimental Design , Empirical Research , Ablation Study , Quantitative Research , A/B Testing

I

In-Context Learning: A model's ability to adapt its behavior or improve at a task based on examples and information provided in the prompt, without any change to its underlying weights.
Related: Few-Shot Prompting , Context Strategy , Context Window , Prompt Engineering , Chain-of-Thought Prompting
Inference Infrastructure: The systems and compute resources that host a deployed model and serve its responses to users in production.
Related: Latency Optimization , Latency Benchmarking , API Integration , Model Registry , Model Rollout
Instruction Ambiguity: The quality of an instruction or prompt that allows for multiple reasonable interpretations, potentially leading to inconsistent or incorrect model responses.
Related: Pragmatics , Semantics , Prompt Engineering , Instruction Following , Prompt Sensitivity
Instruction Following: A model's ability to accurately understand and comply with the directions given to it in a prompt.
Related: Prompt Engineering , System Prompt , Behavioral Specification , Prompt Robustness , Steerability
Inter-Annotator Agreement: A measure of how consistently different human raters label or evaluate the same data, used to assess annotation reliability.
Related: Annotation , Labeling , Data Quality , Human Evaluation , Ground Truth
Issue Reproduction: The process of reliably recreating a reported behavioral failure so it can be analyzed and fixed.
Related: Root Cause Analysis , Failure Mode Analysis , Edge Case Testing , Log Analysis , Regression Testing

J

Jailbreaking: Techniques users employ to get a model to bypass its safety guidelines and produce outputs it's been trained or instructed not to.
Related: Adversarial Prompting , Prompt Injection , Red-Teaming , Harm Avoidance , Refusal Behavior

K

KL Divergence: A measure of how different one probability distribution is from another, used in model training to keep updated behavior from drifting too far from the original.
Related: Reinforcement Learning , RLHF (Reinforcement Learning from Human Feedback) , Policy Gradient , Behavioral Drift , Reward Hacking
Knowledge Sharing: The practice of systematically documenting and distributing behavioral insights, findings, and lessons learned across teams and disciplines.
Related: Cross-Functional Collaboration , Stakeholder Communication , Behavioral Specification , Qualitative Analysis

L

Label Studio: An open-source data labeling platform that supports annotation for text, images, audio, and other modalities used in AI training.
Related: Annotation Platform , Annotation , Labeling , Data Quality , Eval Platform
Labeling: Assigning categories, tags, or classifications to data examples to indicate what they represent or how a model should treat them.
Related: Annotation , Data Quality , Ground Truth , Human Feedback , Evaluation Dataset
LangSmith: A platform from LangChain for tracing, evaluating, and monitoring LLM applications in development and production.
Related: Eval Platform , Evaluation Pipeline , AI Observability , Promptfoo , Braintrust
Latency Benchmarking: Measuring how quickly a model produces responses under various conditions to evaluate its suitability for real-time use.
Related: Cost Benchmarking , Benchmark , Inference Infrastructure , Latency Optimization , Model Quality
Latency Optimization: Techniques and engineering practices that reduce the time it takes for a model to return a response.
Related: Inference Infrastructure , Latency Benchmarking , Cost Benchmarking , Context Window , API Integration
Linguistic Analysis: The systematic study of language features in model outputs — such as vocabulary, syntax, tone, and discourse structure — to understand behavioral patterns.
Related: Pragmatics , Semantics , Discourse Structure , Qualitative Analysis , Behavioral Specification
Literature Review: A systematic survey of existing research and writing on a topic to understand what is already known before designing new work.
Related: Empirical Research , Alignment Science , Qualitative Research , Research Iteration , Knowledge Sharing
LLM-as-Judge: Using a language model to evaluate the quality of another model's outputs, often as a scalable alternative to human review.
Related: Automated Evaluation , RLAIF (Reinforcement Learning from AI Feedback) , Preference-Based Evaluation , Semi-Automated Evaluation , Eval Framework
Log Analysis: Reviewing records of model interactions to identify patterns, failures, and opportunities for improvement.
Related: Production Data , Conversation Transcripts , AI Observability , Root Cause Analysis , Qualitative Analysis

M

Meta-Prompt: A prompt designed to generate or improve other prompts, rather than directly produce the final task output.
Related: Prompt Engineering , System Prompt , Prompt Chaining , Context Engineering
Model Behavior: The observable outputs, responses, and actions of an AI model as experienced by users and systems interacting with it.
Related: System Prompt , Behavioral Specification , Model Behavior Specification , The Role
Model Card: A standardized document that describes a model's intended use cases, known limitations, evaluation results, and potential risks.
Related: Responsible AI , AI Governance , Behavioral Specification , Model Quality , Bias Detection
Model Comparison: The systematic evaluation of two or more models against the same criteria to understand their relative strengths, weaknesses, and behavioral tradeoffs.
Related: Benchmark , Evaluation Pipeline , A/B Testing , Elo Ranking , Preference-Based Evaluation
Model Judgment: A model's ability to reason through ambiguous or novel situations and arrive at contextually appropriate decisions without explicit instructions for every case.
Related: Ethical Judgment , Behavioral Specification , Steerability , Value Alignment , Model Behavior
Model Launch Support: The behavioral work required to prepare a model for public release — including pre-launch evaluation, policy review, red-teaming, and documentation.
Related: Model Rollout , Red-Teaming , Behavioral Audit , Model Card , Model Strategy
Model Playground: An interactive interface for experimenting with model prompts, settings, and outputs without building a full application.
Related: Eval Platform , Promptfoo , ChainForge , Prompt Engineering , A/B Testing
Model Quality: The overall degree to which a model meets the behavioral, accuracy, and user experience standards required for its intended use.
Related: Quality Metrics , Output Quality , Behavioral Consistency , Evaluation Pipeline , Benchmark
Model Registry: A centralized repository that stores, versions, and tracks deployed model artifacts and their associated metadata.
Related: Data Versioning , Experiment Tracking , Model Rollout , Inference Infrastructure , Model Card
Model Rollout: The process of gradually deploying a new model version or behavioral update to users, typically in staged phases to monitor for unexpected issues.
Related: Model Launch Support , Model Strategy , Monitoring , Regression Identification , A/B Testing
Model Strategy: The deliberate plan for which AI models to use, how to configure them, and how to sequence improvements to achieve product and organizational goals.
Related: Model Rollout , Cross-Functional Collaboration , Behavioral Specification , Model Registry , Model Quality
Monitoring: Ongoing observation of a deployed model's behavior over time to detect problems, measure quality, and track changes.
Related: AI Observability , Production Data , Behavioral Drift , Regression Identification , Behavioral Audit
Moral Philosophy: The branch of philosophy concerned with questions about what is right and wrong, what we owe each other, and how to live a good life.
Related: Normative Ethics , Deontological Ethics , Consequentialism , Virtue Ethics , Applied Ethics
Moral Psychology: The scientific study of how people actually form moral judgments, make ethical decisions, and reason about right and wrong.
Related: Moral Philosophy , Ethical Judgment , Cognitive Bias , Behavioral Economics , Normative Ethics

N

Normative Ethics: The branch of moral philosophy concerned with establishing principles and frameworks for determining right and wrong action.
Related: Moral Philosophy , Deontological Ethics , Consequentialism , Virtue Ethics , Applied Ethics

O

Output Distribution: The range and relative frequency of different types of responses a model produces across a given set of inputs.
Related: Behavioral Consistency , Model Quality , Evaluation Pipeline , Steerability , Prompt Sensitivity
Output Quality: How well a model's response meets the requirements of accuracy, helpfulness, appropriateness, and format for a given task.
Related: Model Quality , Quality Metrics , Evaluation Pipeline , Helpfulness , Format Adherence

P

Policy Gradient: A family of reinforcement learning algorithms that improve a model's behavior by adjusting the probability of actions that lead to higher rewards.
Related: Reinforcement Learning , RLHF (Reinforcement Learning from Human Feedback) , Reward Modeling , KL Divergence
Policy Writing: The craft of authoring clear, principled documents that define what a model will and won't do, and why — serving as guidance for training, evaluation, and deployment decisions.
Related: Content Policy , Usage Policy , Behavioral Specification , Data Guideline Authorship , AI Governance
Pragmatics: The branch of linguistics concerned with how context shapes meaning — what people communicate beyond the literal content of their words.
Related: Semantics , Linguistic Analysis , Instruction Ambiguity , Discourse Structure , Model Judgment
Preference Learning: A training approach where models learn from comparisons between outputs rather than from single labeled examples.
Related: RLHF (Reinforcement Learning from Human Feedback) , Reward Modeling , Human Feedback , Preference-Based Evaluation , Human Evaluation
Preference-Based Evaluation: An evaluation method where raters compare outputs and indicate which they prefer, rather than scoring each output independently.
Related: Elo Ranking , Human Evaluation , Preference Learning , LLM-as-Judge , Semi-Automated Evaluation
Product Design Collaboration: The working relationship between behavior architects and product designers to ensure that model behavior and user experience are designed in alignment with each other.
Related: Cross-Functional Collaboration , Stakeholder Communication , Behavioral Specification , Behavior Design , Model Strategy
Production Data: Real data generated from actual user interactions with a deployed model, as opposed to synthetic or evaluation data.
Related: Log Analysis , Conversation Transcripts , User Feedback , AI Observability , Monitoring
Prompt Chaining: A technique where the output of one model call is fed as input into a subsequent call, breaking a complex task into sequential steps.
Related: Prompt Engineering , Context Strategy , Task Decomposition , Reasoning Chain , Context Engineering
Prompt Engineering: The practice of crafting and refining the text given to a model — instructions, examples, context — to reliably produce desired outputs.
Related: Context Engineering , System Prompt , Few-Shot Prompting , Chain-of-Thought Prompting , Prompt Robustness
Prompt Injection: An attack where malicious instructions embedded in user input or external content override a model's intended behavior.
Related: Adversarial Prompting , Jailbreaking , System Prompt , Prompt Robustness , Red-Teaming
Prompt Robustness: The degree to which a prompt continues to produce reliable, appropriate outputs even when inputs vary, are ambiguous, or are adversarial.
Related: Prompt Sensitivity , Prompt Engineering , Adversarial Prompting , Behavioral Consistency , Instruction Following
Prompt Sensitivity: The degree to which small changes in wording, format, or phrasing affect model outputs in significant ways.
Related: Prompt Robustness , Prompt Engineering , Behavioral Consistency , Instruction Following , Output Distribution
Promptfoo: An open-source tool for testing and evaluating LLM outputs, focused on comparing prompts and detecting regressions.
Related: Eval Platform , Evaluation Pipeline , Braintrust , LangSmith , Regression Testing
Prototyping: Building quick, low-fidelity versions of a behavioral design to test assumptions and learn before committing to a full implementation.
Related: Research Iteration , Experimental Design , Model Playground , Prompt Engineering , Behavior Design

Q

Qualitative Analysis: Close, interpretive review of model outputs and interactions to understand nuanced behavioral patterns that numbers alone don't capture.
Related: Quantitative Metrics , Log Analysis , Conversation Transcripts , Human Evaluation , Qualitative Research
Qualitative Research: A research approach that seeks to understand phenomena through detailed, interpretive analysis rather than numerical measurement.
Related: Qualitative Analysis , Quantitative Research , Empirical Research , Human Evaluation , Behavioral Audit
Quality Metrics: Specific, measurable indicators used to assess how well a model is performing across dimensions that matter for a given use case.
Related: Model Quality , Evaluation Pipeline , Benchmark , Goodhart's Law , Quantitative Metrics
Quantitative Metrics: Numerical measurements used to track model performance across dimensions like accuracy, refusal rate, response length, or user satisfaction.
Related: Quality Metrics , Qualitative Analysis , Evaluation Pipeline , Monitoring , Benchmark
Quantitative Research: A research approach that measures phenomena numerically and uses statistical analysis to identify patterns and test hypotheses.
Related: Quantitative Metrics , Qualitative Research , Hypothesis Testing , Evaluation Pipeline , Experimental Design

R

Reasoning Chain: The sequence of intermediate steps a model uses to work through a problem before arriving at a final answer.
Related: Chain-of-Thought Prompting , Task Decomposition , Model Judgment , Prompt Chaining , Calibration
Red-Teaming: Deliberately attempting to find failure modes, safety vulnerabilities, and policy violations in a model by acting as an adversarial user.
Related: Red-Team Test Set , Adversarial Prompting , Jailbreaking , Edge Case Testing , Failure Mode Analysis , AI Safety
Refusal Behavior: The patterns and decisions behind when and how a model declines to fulfill a request.
Related: Refusal Policy , Over-refusal , Under-refusal , Harm Avoidance , Alignment Tax , Content Policy , Helpfulness , Boundary Setting
Regression Identification: The process of detecting when a change to a model or system has caused previously acceptable behavior to degrade.
Related: Regression Testing , Behavioral Regression , Behavioral Drift , Evaluation Pipeline , Monitoring
Regression Testing: Running a consistent set of test cases after a change to verify that previously working behavior hasn't broken.
Related: Evaluation Pipeline , Eval Suite , Behavioral Regression , Regression Identification , A/B Testing
Reinforcement Learning: A machine learning approach where a model learns by receiving rewards or penalties based on the quality of its actions.
Related: RLHF (Reinforcement Learning from Human Feedback) , RLAIF (Reinforcement Learning from AI Feedback) , Reward Modeling , Policy Gradient , Training Signal
Research Iteration: The cyclical process of formulating questions, running experiments, analyzing results, and using findings to inform the next round of investigation.
Related: Empirical Research , Experimental Design , Hypothesis Testing , Prototyping , Ablation Study
Responsible AI: A framework and practice for developing and deploying AI systems in ways that are safe, fair, transparent, and accountable.
Related: AI Ethics , AI Safety , AI Governance , Fairness , Model Card
Reward Hacking: When a model finds ways to score well on a reward signal without actually achieving the underlying goal the reward was meant to measure.
Related: Reward Modeling , RLHF (Reinforcement Learning from Human Feedback) , Goodhart's Law , Alignment Tax , Training Signal
Reward Modeling: Training a separate model to predict human preferences so it can be used to score outputs during reinforcement learning.
Related: RLHF (Reinforcement Learning from Human Feedback) , Preference Learning , Reward Hacking , Training Signal , Reinforcement Learning
RLAIF (Reinforcement Learning from AI Feedback): A variation of RLHF where another AI model provides the preference judgments instead of human raters.
Related: RLHF (Reinforcement Learning from Human Feedback) , Reward Modeling , Preference Learning , Constitutional AI , Synthetic Data
RLHF (Reinforcement Learning from Human Feedback): A way of training AI models by having humans rate or compare outputs, then using those ratings to reinforce better behavior over time.
Related: RLAIF (Reinforcement Learning from AI Feedback) , Reward Modeling , Preference Learning , Human Feedback , Reinforcement Learning
Root Cause Analysis: A structured investigation to identify the underlying reason a failure occurred, rather than treating only its surface-level symptoms.
Related: Failure Mode Analysis , Failure Taxonomy , Issue Reproduction , Log Analysis , Behavioral Audit

S

Semantics: The study of meaning in language — what words, phrases, and sentences refer to and signify.
Related: Pragmatics , Linguistic Analysis , Instruction Ambiguity , Discourse Structure
Semi-Automated Evaluation: An evaluation approach that combines automated scoring with human review at key decision points.
Related: Automated Evaluation , Human Evaluation , LLM-as-Judge , Evaluation Pipeline , Eval Framework
Sensitive Topics: Subject areas that require extra care in handling because of their potential to cause harm, offense, or controversy — such as mental health, politics, religion, and violence.
Related: Content Policy , Harm Avoidance , Dual-Use Risk , Refusal Behavior , Ethical Judgment
Signal-to-Noise Analysis: The practice of separating meaningful patterns or indicators of behavioral problems from irrelevant or random variation in data.
Related: Log Analysis , Qualitative Analysis , Root Cause Analysis , User Feedback , Production Data
Stakeholder Communication: Translating behavioral findings, tradeoffs, and recommendations into clear language for audiences outside the behavior team, including leadership, engineering, legal, and product.
Related: Cross-Functional Collaboration , Knowledge Sharing , Model Strategy , Policy Writing , Behavioral Specification
Steerability: How easily and reliably a model's behavior can be adjusted through prompts, instructions, or system-level configuration.
Related: Instruction Following , Behavioral Specification , Behavior Design , System Prompt , Value Alignment
Supervised Finetuning: A type of finetuning where the model learns from a dataset of input-output pairs that represent the desired behavior.
Related: Finetuning , RLHF (Reinforcement Learning from Human Feedback) , Annotation , Ground Truth , Training Signal
Sycophancy: A tendency in AI models to agree with users, validate their views, or shift their answers to match what they think the user wants to hear, rather than providing accurate or honest responses.
Related: RLHF (Reinforcement Learning from Human Feedback) , Honesty , Calibration , Helpfulness , Reward Hacking
Synthetic Data: Training or evaluation data generated by a model rather than collected from real human interactions.
Related: RLAIF (Reinforcement Learning from AI Feedback) , Data Generation Pipelines , Data Quality , Supervised Finetuning , Evaluation Dataset
Synthetic Testing Environment: A controlled, artificially constructed environment for evaluating model behavior, separate from real production usage.
Related: Evaluation Pipeline , Synthetic Data , Eval Suite , Edge Case Testing , Batch Testing
System Prompt: Instructions provided to a model at the start of a session, before any user input, that establish its role, behavior, and constraints.
Related: System Prompt Architecture , Prompt Engineering , Context Engineering , Behavioral Specification , Instruction Following , Meta-Prompt

T

Task Decomposition: Breaking a complex task into smaller, more manageable subtasks that can be addressed sequentially or in parallel.
Related: Prompt Chaining , Reasoning Chain , Chain-of-Thought Prompting , Context Strategy , Experimental Design
Test-Driven Development (for AI): An approach to AI product development where evaluation criteria and test cases are defined before prompts or models are changed, so success can be measured objectively.
Related: Evaluation Pipeline , Eval Suite , Regression Testing , Experimental Design , Behavioral Specification
Tone: The emotional register and relational quality of a model's responses — whether it comes across as warm, formal, playful, cautious, authoritative, and so on.
Related: Character Consistency , Verbosity , Output Quality , Behavioral Specification , Behavior Design
Tool Prompt: Instructions that describe available tools or functions to a model, telling it when and how to use them.
Related: System Prompt , Prompt Engineering , Context Engineering , Instruction Following
Training Signal: Any information fed back to a model during training to indicate whether its behavior is on the right track.
Related: RLHF (Reinforcement Learning from Human Feedback) , Reward Modeling , Human Feedback , Reinforcement Learning , Supervised Finetuning
Trust and Safety: The organizational function responsible for protecting users and the platform from harm — including abuse, policy violations, and misuse of AI capabilities.
Related: Content Policy , Harm Avoidance , Red-Teaming , Trust and Safety Team , Responsible AI
Trust and Safety Team: The organizational team responsible for detecting, preventing, and responding to harmful or policy-violating uses of an AI product.
Related: Trust and Safety , Content Policy , Red-Teaming , Harm Avoidance , AI Governance

U

Usage Policy: A broader set of rules governing how a model or AI product may and may not be used, often focused on prohibited applications rather than individual outputs.
Related: Content Policy , Behavioral Specification , AI Governance , Responsible AI , Harm Avoidance
User Feedback: Explicit or implicit signals from users that indicate whether they found a model response helpful, harmful, or otherwise notable.
Related: Production Data , Human Feedback , Qualitative Analysis , Monitoring , Behavioral Audit

V

Value Alignment: The degree to which a model's behavior reflects human values, intentions, and goals rather than optimizing for narrow objectives that miss the point.
Related: HHH Framework , Constitutional AI , RLHF (Reinforcement Learning from Human Feedback) , Behavioral Specification , AI Safety
Verbosity: The tendency of a model to produce responses that are longer than necessary for the task at hand.
Related: Output Quality , Format Adherence , Sycophancy , Tone , Behavioral Specification
Virtue Ethics: A moral framework that focuses on character rather than rules or outcomes — asking what a person of good character would do.
Related: Moral Philosophy , Normative Ethics , Deontological Ethics , Consequentialism , Value Alignment

Z

Zero-Shot Prompting: Asking a model to complete a task using only instructions, with no examples provided.
Related: Few-Shot Prompting , In-Context Learning , Prompt Engineering , Instruction Following