Promptfoo | behavior.engineering

Promptfoo is an open-source evaluation tool built for engineers working with language models. It allows teams to define test cases and prompts in configuration files, run them against multiple models or model versions simultaneously, and produce comparison reports showing where outputs differ. It’s especially good at prompt comparison and regression testing — helping teams catch when a prompt change broke something that was previously working. Because it’s open-source and code-driven, it integrates well into CI/CD pipelines where automated tests run on every change. For behavior architects working closely with engineering teams, Promptfoo offers a lightweight entry point to systematic evaluation without requiring a full platform investment.