A

A/B Testing
Comparing two versions of a model or prompt by splitting traffic between them and measuring outcomes on real users.
Ablation Study
An experiment that systematically removes or varies individual components of a system to understand each component's contribution to overall performance.
Adversarial Examples
Inputs carefully designed to cause a model to fail or produce unintended outputs, often by exploiting specific vulnerabilities in its training or architecture.
Adversarial Prompting
Crafting inputs specifically designed to cause a model to behave in unintended, harmful, or policy-violating ways.
AI Ethics
The study and application of ethical principles — fairness, accountability, transparency, harm avoidance — to the design and deployment of AI systems.
AI Governance
The frameworks, policies, processes, and oversight mechanisms that guide how AI is developed, deployed, and monitored within an organization or across society.
AI Observability
The ability to monitor, inspect, and understand what an AI system is doing in production — including inputs, outputs, errors, and behavioral patterns over time.
AI Safety
The field concerned with ensuring that AI systems behave in ways that are safe, controllable, and aligned with human values — especially as they become more capable.
Alignment Science
The research field focused on developing methods, theories, and techniques for making AI systems reliably pursue intended goals and values.
Alignment Tax
The performance costs a model may incur when trained to be safer or more value-aligned, such as reduced capability or increased refusals.
Annotation
The process of adding labels, ratings, or structured information to data so a model can learn from it.
Annotation Platform
Software designed to help teams collect, manage, and quality-control labeled data from human raters.
API Integration
Connecting an AI model to a product or system through a programming interface, allowing it to send requests and receive model responses programmatically.
Applied Ethics
The application of ethical theory and principles to specific real-world domains and practical decisions.
Applied Finetuning
The practice of finetuning models in a product or business context to improve behavior for a specific use case, distinct from academic or research finetuning.
Automated Evaluation
Using software, scripts, or models to score outputs without requiring human review for each individual case.

B

Batch Testing
Running a large set of prompts through a model simultaneously rather than one at a time, to evaluate behavior at scale.
Behavior Design
The practice of intentionally defining and shaping how an AI model acts across a range of situations, rather than leaving behavior to emerge by default.
Behavioral Audit
A systematic review of a model's behavior across a defined set of scenarios to assess whether it meets expected standards.
Behavioral Consistency
The degree to which a model produces similar outputs for similar inputs across different sessions, users, or contexts.
Behavioral Drift
A gradual, unintended change in how a model behaves over time — often across model updates, prompt changes, or accumulating context — that wasn't explicitly planned.
Behavioral Economics
A field that combines psychology and economics to study how real people make decisions, often in ways that deviate from purely rational models.
Behavioral Regression
When a model update or change causes behavior that was previously working correctly to degrade or break.
Behavioral Specification
A written document or set of guidelines that defines how a model is expected to behave across different situations.
Behavioral Taxonomy
A hierarchical classification system that organizes model behaviors into categories and subcategories for analysis, evaluation, and communication.
Benchmark
A standardized test or dataset used to compare model performance across versions or across different models.
Bias Detection
The process of identifying systematic patterns in model behavior that produce unfair or unequal outcomes across groups of people.
Boundary Exploration
The systematic practice of probing the edges of a model's behavioral constraints to understand where and how its policies apply.
Boundary Setting
The practice of defining and communicating the limits of what a model will do — what it considers out of scope, inappropriate, or harmful.
Braintrust
An evaluation and observability platform for AI products that focuses on running experiments, logging traces, and comparing prompt or model variants.

C

Calibration
The alignment between a model's expressed confidence and its actual accuracy — a well-calibrated model is appropriately uncertain when it might be wrong.
Capability Evaluation
Assessment of what a model is able to do — the range and level of tasks it can perform successfully under defined conditions.
Categorical Thinking
Reasoning by placing things into defined categories with clear rules, rather than reasoning about each case on its individual merits.
Chain-of-Thought Prompting
A prompting technique that encourages a model to work through a problem step by step before producing a final answer.
ChainForge
An open-source visual tool for testing and comparing LLM prompts and responses across models and configurations.
Character Consistency
The degree to which a model maintains a stable persona, voice, and set of values across different conversations and contexts.
Chatbot Arena
A public platform developed by LMSYS where users compare responses from anonymous AI models side by side and vote for the better one, generating a crowd-sourced Elo leaderboard.
Cognitive Bias
Systematic patterns in human thinking that lead to errors in judgment, often unconsciously — including biases that can affect annotation, evaluation, and behavioral design.
Confabulation
A more precise term for hallucination that emphasizes the model is generating plausible-sounding but false information to fill gaps, rather than intentionally lying.
Confidence
The degree of certainty a model expresses — or implies — when producing an output.
Consequentialism
A moral framework that judges actions by their outcomes — the right action is the one that produces the best overall results.
Constitutional AI
An approach developed by Anthropic where a model is trained to critique and revise its own outputs based on a written set of principles.
Content Policy
A documented set of rules governing what types of content a model will and will not produce.
Context Engineering
The broader practice of designing what information a model has access to at inference time — including instructions, memory, tools, and retrieved content.
Context Strategy
A deliberate plan for what information to include in a model's context window, how to structure it, and what to exclude given space and quality constraints.
Context Window
The maximum amount of text — measured in tokens — that a model can read and reason over in a single interaction.
Conversation Transcripts
Records of full multi-turn interactions between users and a model, used to analyze behavior in context.
Cost Benchmarking
Measuring and comparing the financial cost of running a model across different providers, configurations, or usage patterns.
Coverage Matrix
A structured map that shows which behavioral scenarios, use cases, or risk categories are represented in an evaluation suite, and which are missing.
Cross-Functional Collaboration
Working across organizational functions — engineering, design, policy, research, safety — to align on behavioral goals and coordinate the work needed to achieve them.

D

Data Generation Pipelines
Automated systems for producing, filtering, and formatting training or evaluation data at scale.
Data Guideline Authorship
Writing clear, detailed instructions for annotators that define what good and bad model responses look like and how to evaluate them consistently.
Data Pipeline
A series of automated steps that collect, process, transform, and deliver data from one system to another.
Data Quality
The degree to which training or evaluation data is accurate, consistent, relevant, and representative of the desired behavior.
Data Versioning
The practice of tracking changes to datasets over time so that training and evaluation runs are reproducible and changes can be traced.
Decision-Making Frameworks
Structured approaches for reasoning through complex choices, especially when values conflict or outcomes are uncertain.
Deontological Ethics
A moral framework that judges actions as right or wrong based on rules and duties, regardless of their consequences.
Discourse Structure
The organization and coherence of language across multiple sentences or turns — how ideas connect, flow, and build on each other in extended text or conversation.
Domain Evaluation
Evaluating a model's performance specifically within a defined subject area or application context, rather than in general.
Dual-Use Risk
The risk that a capability or piece of information provided by a model could be used both for legitimate purposes and to cause harm.

E

Edge Case Behavior
How a model responds to unusual, ambiguous, or boundary-pushing inputs that fall outside the common range of expected use.
Edge Case Construction
The deliberate process of designing inputs that test a model's behavior at the boundaries of expected usage.
Edge Case Testing
Evaluating model behavior on unusual, extreme, or boundary-pushing inputs that are unlikely but consequential when they occur.
Elo Ranking
A system for ranking models or outputs by their win rate in head-to-head comparisons, borrowed from competitive chess.
Empirical Research
Research based on observation and evidence rather than theory alone — drawing conclusions from data collected through experiments or systematic observation.
Ethical Judgment
The capacity to reason through situations where values conflict, weigh competing interests, and arrive at a principled decision about what is right.
Eval Framework
A structured approach or set of tools for designing, running, and interpreting model evaluations.
Eval Platform
Software infrastructure for running, tracking, and comparing model evaluations systematically.
Eval Suite
A comprehensive, organized collection of evaluation datasets and test cases that together cover the full range of behavioral requirements for a model.
Evaluation Dataset
A curated collection of inputs and expected outputs used to measure model performance in a consistent and repeatable way.
Evaluation Pipeline
An end-to-end system for consistently measuring model behavior across a defined set of inputs and criteria.
Experiment Tracking
Recording the configuration, inputs, and outcomes of model experiments so results can be compared and reproduced.
Experimental Design
The planning of a study or test — defining variables, controls, sample sizes, and measurement methods — so that results are valid and interpretable.

F

Failure Mode
A specific, recurring pattern in which a model produces incorrect, harmful, or otherwise unacceptable outputs.
Failure Mode Analysis
A systematic process for identifying, categorizing, and understanding the ways a model can behave incorrectly or harmfully.
Failure Taxonomy
An organized classification system that categorizes the different ways a model can fail, enabling systematic tracking and prioritization.
Fairness
The property of a model treating different individuals and groups equitably and without unjustified discrimination.
Few-Shot Prompting
Providing a model with a small number of examples of the desired input-output pattern before asking it to complete a new task.
Finetuning
Further training a model on a specific dataset to adjust its behavior, style, or knowledge for a particular purpose.
Format Adherence
A model's ability to consistently follow specified output formats, such as JSON, markdown, bullet lists, or length constraints.
Freeplay
An AI product development platform designed for testing prompts, managing model configurations, and iterating on AI features collaboratively.

G

Goodhart's Law
The principle that when a measure becomes a target, it ceases to be a good measure.
Ground Truth
The accepted correct answer or label for a data example, used as the standard against which model outputs are measured.

H

Hallucination
When a model generates information that sounds plausible but is factually incorrect or entirely fabricated.
Harm Avoidance
The practice of designing model behavior to minimize the risk of producing outputs that cause physical, psychological, social, or financial harm.
Harmlessness
A model's disposition to avoid producing outputs that could cause physical, psychological, social, or other harm to users or third parties.
Hedging
The use of qualifying language — like "it depends," "I'm not sure," or "you may want to consult an expert" — to soften or add uncertainty to a model's response.
Helpfulness
A model's ability to genuinely assist users in accomplishing their goals in a way that's accurate, clear, and appropriately complete.
HHH Framework
A framework developed by Anthropic that identifies Helpful, Harmless, and Honest as the three core properties a well-aligned AI assistant should have.
Honesty
A model's disposition to tell the truth, accurately represent its uncertainty, and avoid creating false impressions in users' minds.
Human Evaluation
Assessment of model outputs by people, used to measure quality dimensions that automated systems can't reliably capture.
Human Feedback
Ratings, comparisons, or corrections provided by people that are used to guide model training and improve behavior.
Hypothesis Testing
A research method for evaluating whether observed data supports a specific claim, by defining a hypothesis and testing it against evidence.

I

In-Context Learning
A model's ability to adapt its behavior or improve at a task based on examples and information provided in the prompt, without any change to its underlying weights.
Inference Infrastructure
The systems and compute resources that host a deployed model and serve its responses to users in production.
Instruction Ambiguity
The quality of an instruction or prompt that allows for multiple reasonable interpretations, potentially leading to inconsistent or incorrect model responses.
Instruction Following
A model's ability to accurately understand and comply with the directions given to it in a prompt.
Inter-Annotator Agreement
A measure of how consistently different human raters label or evaluate the same data, used to assess annotation reliability.
Issue Reproduction
The process of reliably recreating a reported behavioral failure so it can be analyzed and fixed.

J

Jailbreaking
Techniques users employ to get a model to bypass its safety guidelines and produce outputs it's been trained or instructed not to.

K

KL Divergence
A measure of how different one probability distribution is from another, used in model training to keep updated behavior from drifting too far from the original.
Knowledge Sharing
The practice of systematically documenting and distributing behavioral insights, findings, and lessons learned across teams and disciplines.

L

Label Studio
An open-source data labeling platform that supports annotation for text, images, audio, and other modalities used in AI training.
Labeling
Assigning categories, tags, or classifications to data examples to indicate what they represent or how a model should treat them.
LangSmith
A platform from LangChain for tracing, evaluating, and monitoring LLM applications in development and production.
Latency Benchmarking
Measuring how quickly a model produces responses under various conditions to evaluate its suitability for real-time use.
Latency Optimization
Techniques and engineering practices that reduce the time it takes for a model to return a response.
Linguistic Analysis
The systematic study of language features in model outputs — such as vocabulary, syntax, tone, and discourse structure — to understand behavioral patterns.
Literature Review
A systematic survey of existing research and writing on a topic to understand what is already known before designing new work.
LLM-as-Judge
Using a language model to evaluate the quality of another model's outputs, often as a scalable alternative to human review.
Log Analysis
Reviewing records of model interactions to identify patterns, failures, and opportunities for improvement.

M

Meta-Prompt
A prompt designed to generate or improve other prompts, rather than directly produce the final task output.
Model Behavior
The observable outputs, responses, and actions of an AI model as experienced by users and systems interacting with it.
Model Card
A standardized document that describes a model's intended use cases, known limitations, evaluation results, and potential risks.
Model Comparison
The systematic evaluation of two or more models against the same criteria to understand their relative strengths, weaknesses, and behavioral tradeoffs.
Model Judgment
A model's ability to reason through ambiguous or novel situations and arrive at contextually appropriate decisions without explicit instructions for every case.
Model Launch Support
The behavioral work required to prepare a model for public release — including pre-launch evaluation, policy review, red-teaming, and documentation.
Model Playground
An interactive interface for experimenting with model prompts, settings, and outputs without building a full application.
Model Quality
The overall degree to which a model meets the behavioral, accuracy, and user experience standards required for its intended use.
Model Registry
A centralized repository that stores, versions, and tracks deployed model artifacts and their associated metadata.
Model Rollout
The process of gradually deploying a new model version or behavioral update to users, typically in staged phases to monitor for unexpected issues.
Model Strategy
The deliberate plan for which AI models to use, how to configure them, and how to sequence improvements to achieve product and organizational goals.
Monitoring
Ongoing observation of a deployed model's behavior over time to detect problems, measure quality, and track changes.
Moral Philosophy
The branch of philosophy concerned with questions about what is right and wrong, what we owe each other, and how to live a good life.
Moral Psychology
The scientific study of how people actually form moral judgments, make ethical decisions, and reason about right and wrong.

N

Normative Ethics
The branch of moral philosophy concerned with establishing principles and frameworks for determining right and wrong action.

O

Output Distribution
The range and relative frequency of different types of responses a model produces across a given set of inputs.
Output Quality
How well a model's response meets the requirements of accuracy, helpfulness, appropriateness, and format for a given task.

P

Policy Gradient
A family of reinforcement learning algorithms that improve a model's behavior by adjusting the probability of actions that lead to higher rewards.
Policy Writing
The craft of authoring clear, principled documents that define what a model will and won't do, and why — serving as guidance for training, evaluation, and deployment decisions.
Pragmatics
The branch of linguistics concerned with how context shapes meaning — what people communicate beyond the literal content of their words.
Preference Learning
A training approach where models learn from comparisons between outputs rather than from single labeled examples.
Preference-Based Evaluation
An evaluation method where raters compare outputs and indicate which they prefer, rather than scoring each output independently.
Product Design Collaboration
The working relationship between behavior architects and product designers to ensure that model behavior and user experience are designed in alignment with each other.
Production Data
Real data generated from actual user interactions with a deployed model, as opposed to synthetic or evaluation data.
Prompt Chaining
A technique where the output of one model call is fed as input into a subsequent call, breaking a complex task into sequential steps.
Prompt Engineering
The practice of crafting and refining the text given to a model — instructions, examples, context — to reliably produce desired outputs.
Prompt Injection
An attack where malicious instructions embedded in user input or external content override a model's intended behavior.
Prompt Robustness
The degree to which a prompt continues to produce reliable, appropriate outputs even when inputs vary, are ambiguous, or are adversarial.
Prompt Sensitivity
The degree to which small changes in wording, format, or phrasing affect model outputs in significant ways.
Promptfoo
An open-source tool for testing and evaluating LLM outputs, focused on comparing prompts and detecting regressions.
Prototyping
Building quick, low-fidelity versions of a behavioral design to test assumptions and learn before committing to a full implementation.

Q

Qualitative Analysis
Close, interpretive review of model outputs and interactions to understand nuanced behavioral patterns that numbers alone don't capture.
Qualitative Research
A research approach that seeks to understand phenomena through detailed, interpretive analysis rather than numerical measurement.
Quality Metrics
Specific, measurable indicators used to assess how well a model is performing across dimensions that matter for a given use case.
Quantitative Metrics
Numerical measurements used to track model performance across dimensions like accuracy, refusal rate, response length, or user satisfaction.
Quantitative Research
A research approach that measures phenomena numerically and uses statistical analysis to identify patterns and test hypotheses.

R

Reasoning Chain
The sequence of intermediate steps a model uses to work through a problem before arriving at a final answer.
Red-Teaming
Deliberately attempting to find failure modes, safety vulnerabilities, and policy violations in a model by acting as an adversarial user.
Refusal Behavior
The patterns and decisions behind when and how a model declines to fulfill a request.
Regression Identification
The process of detecting when a change to a model or system has caused previously acceptable behavior to degrade.
Regression Testing
Running a consistent set of test cases after a change to verify that previously working behavior hasn't broken.
Reinforcement Learning
A machine learning approach where a model learns by receiving rewards or penalties based on the quality of its actions.
Research Iteration
The cyclical process of formulating questions, running experiments, analyzing results, and using findings to inform the next round of investigation.
Responsible AI
A framework and practice for developing and deploying AI systems in ways that are safe, fair, transparent, and accountable.
Reward Hacking
When a model finds ways to score well on a reward signal without actually achieving the underlying goal the reward was meant to measure.
Reward Modeling
Training a separate model to predict human preferences so it can be used to score outputs during reinforcement learning.
RLAIF (Reinforcement Learning from AI Feedback)
A variation of RLHF where another AI model provides the preference judgments instead of human raters.
RLHF (Reinforcement Learning from Human Feedback)
A way of training AI models by having humans rate or compare outputs, then using those ratings to reinforce better behavior over time.
Root Cause Analysis
A structured investigation to identify the underlying reason a failure occurred, rather than treating only its surface-level symptoms.

S

Semantics
The study of meaning in language — what words, phrases, and sentences refer to and signify.
Semi-Automated Evaluation
An evaluation approach that combines automated scoring with human review at key decision points.
Sensitive Topics
Subject areas that require extra care in handling because of their potential to cause harm, offense, or controversy — such as mental health, politics, religion, and violence.
Signal-to-Noise Analysis
The practice of separating meaningful patterns or indicators of behavioral problems from irrelevant or random variation in data.
Stakeholder Communication
Translating behavioral findings, tradeoffs, and recommendations into clear language for audiences outside the behavior team, including leadership, engineering, legal, and product.
Steerability
How easily and reliably a model's behavior can be adjusted through prompts, instructions, or system-level configuration.
Supervised Finetuning
A type of finetuning where the model learns from a dataset of input-output pairs that represent the desired behavior.
Sycophancy
A tendency in AI models to agree with users, validate their views, or shift their answers to match what they think the user wants to hear, rather than providing accurate or honest responses.
Synthetic Data
Training or evaluation data generated by a model rather than collected from real human interactions.
Synthetic Testing Environment
A controlled, artificially constructed environment for evaluating model behavior, separate from real production usage.
System Prompt
Instructions provided to a model at the start of a session, before any user input, that establish its role, behavior, and constraints.

T

Task Decomposition
Breaking a complex task into smaller, more manageable subtasks that can be addressed sequentially or in parallel.
Test-Driven Development (for AI)
An approach to AI product development where evaluation criteria and test cases are defined before prompts or models are changed, so success can be measured objectively.
Tone
The emotional register and relational quality of a model's responses — whether it comes across as warm, formal, playful, cautious, authoritative, and so on.
Tool Prompt
Instructions that describe available tools or functions to a model, telling it when and how to use them.
Training Signal
Any information fed back to a model during training to indicate whether its behavior is on the right track.
Trust and Safety
The organizational function responsible for protecting users and the platform from harm — including abuse, policy violations, and misuse of AI capabilities.
Trust and Safety Team
The organizational team responsible for detecting, preventing, and responding to harmful or policy-violating uses of an AI product.

U

Usage Policy
A broader set of rules governing how a model or AI product may and may not be used, often focused on prohibited applications rather than individual outputs.
User Feedback
Explicit or implicit signals from users that indicate whether they found a model response helpful, harmful, or otherwise notable.

V

Value Alignment
The degree to which a model's behavior reflects human values, intentions, and goals rather than optimizing for narrow objectives that miss the point.
Verbosity
The tendency of a model to produce responses that are longer than necessary for the task at hand.
Virtue Ethics
A moral framework that focuses on character rather than rules or outcomes — asking what a person of good character would do.

Z

Zero-Shot Prompting
Asking a model to complete a task using only instructions, with no examples provided.