Harmlessness | behavior.engineering

Harmlessness is the safety pillar of the HHH framework — a model that is harmless avoids producing content that could injure people, facilitate harmful actions, discriminate unfairly, or be weaponized against vulnerable users. The difficulty is that “harmless” is not a bright line: the same information can be harmless in one context (a nurse asking about medication overdoses) and potentially harmful in another (an anonymous user asking the same question in a concerning way). Over-indexing on harmlessness leads to the alignment tax — a model so cautious it’s useless. For behavior architects, the goal is harm avoidance that’s proportionate to actual risk rather than a reflexive rejection of anything that could conceivably be misused.