Test with confidence, code with clarity

Your home for mastering Test-Driven Development through real-world katas and expert guidance.

TDD Buddy Logo
Latest Evals Are Tests Wearing a Lab Coat

Kata Catalog

Practice TDD with a curated set of real-world katas, each with clear requirements and test cases.

Browse Katas

Guides & References

Learn best practices, naming conventions, and advanced TDD techniques from industry experts.

Explore Guides

Interactive Learning

Discover the TDD Gears model and learn how to apply TDD principles in real-world scenarios.

Learn More

Why TDD Matters More in the AI Era

The bar for "good tests" just moved. Agents made it non-optional.

  • An eval is a test: a named scenario, an assertion, a verdict. The lab coat is new. The structure is not. Treating evals as a separate discipline hides the fact that the team already knows how to do this.
  • 'No single correct output' is a test-design problem, already solved. Disciplined TDD stopped asserting exact strings years ago. Assert on properties, invariants, and domain shape. The non-determinism objection misreads what a good assertion is.
  • A separate eval suite rebuilds the BDD coordination tax. A second specification language, owned by a different role, drifting from the code. The industry deleted feature files and reinvented them as golden datasets.
  • The eval suite should be the test suite, in domain language. Same builders, same vocabulary, same repo, same commit. Model-backed behavior is just another behavior the suite specifies.

Read the argument: Evals Are Tests Wearing a Lab Coat

The Bar Moved

Old bar

Tests exist and pass. Good enough for humans with context.

New bar

Scenario names · builders · domain types · ubiquitous language. The test suite is the interface agents operate against.

TDD Gears

Shift gears based on context, not habit

TDD Gears model showing Low, Medium, High, and Reverse gears for test-driven development

Low Gear

New territory. Build context. Small steps. Learn the shape of the problem before solving it.

Medium Gear

Patterns emerge. Apply design principles. Let the tests guide you toward better abstractions.

High Gear

Known patterns. Follow existing architecture. Move fast because the structure is already proven.

Reverse Gear

Wrong direction. Back up. Delete the test. Try a different approach. This isn't failure — it's steering.