Glossary
Data Guideline Authorship
Writing clear, detailed instructions for annotators that define what good and bad model responses look like and how to evaluate them consistently.
Data guidelines are the operational layer between a behavioral specification and actual labeled data — they translate abstract goals (“be helpful but avoid harm”) into concrete, answerable annotation questions (“given this user message, rate this response from 1 to 5 on helpfulness, where 1 means…”). Writing effective guidelines requires anticipating how annotators will interpret instructions, providing worked examples for both clear cases and common ambiguities, and testing the guidelines in a calibration round before full annotation begins. For behavior architects, guideline authorship is often where the most nuanced behavioral thinking happens: the gaps and ambiguities that surface during guideline writing reveal genuine philosophical questions about what the model should value and how it should reason.