Glossary
Content Policy
A documented set of rules governing what types of content a model will and will not produce.
A content policy defines the guardrails of a model’s output: categories of content it won’t generate (such as sexual content involving minors, detailed instructions for mass-casualty weapons, or targeted harassment), categories it handles with care (adult content, medical advice, legal guidance), and conditions under which normally restricted content may be appropriate (a medical platform asking about drug interactions). Well-written content policies are specific enough to be consistently applied but principled enough to guide decisions about cases the authors didn’t anticipate. For behavior architects, the content policy is a core document that shapes annotation guidelines, system prompt design, and evaluation criteria — it’s the explicit articulation of where the behavioral lines are and why.