The context window is the model’s working memory for a given conversation: everything that happened before the current message, plus the system prompt, retrieved documents, and tool outputs, all has to fit within this limit. Modern models have dramatically expanded context windows — from a few thousand tokens a few years ago to hundreds of thousands today — but longer contexts aren’t free. Models can lose track of information buried in the middle of very long contexts, and longer inputs cost more to process. For behavior architects, context window design is a key architectural decision: what information to include, how to order it, and what to leave out when space is limited all significantly affect the quality and cost of model responses.