Output quality is an umbrella term for whether what the model produced is actually good — not just technically correct but genuinely useful, appropriately scoped, well-formatted, and fitting in tone. Quality is always relative to context: a great response for a casual creative writing prompt might be entirely wrong for a medical question requiring precision and caveats. For behavior architects, output quality is the ultimate measure of whether all the upstream work — the spec, the training data, the prompts, the evaluations — has actually produced a model that serves users well. Evaluating output quality rigorously, rather than relying on gut feel, is what separates systematic improvement from random iteration.