This is the first post in a three-part arc on TDD in the agentic era. It makes the craft case: what good tests actually look like when you’re doing TDD properly. The follow-ups are “BDD Was a Coordination Tax. AI Just Repriced It” (the org case) and “The Bar for TDD Just Moved” (what agentic coding expects from your tests).
BDD wasn’t a discovery. It was a rebrand.
The people doing TDD properly in 2004 were already writing behavior-driven tests. They had scenarios. They had ubiquitous language. They framed tests around behavior, not implementation. They talked to the business. They built tests that read like specifications because they were specifications.
Then the framework camp arrived. They took what good TDDers were already doing, gave it a new name, wrapped it in feature files and step definitions, and sold it back to the industry as a methodology.
The dialect stuck. The discipline didn’t.
What Good TDD Already Had
Read any TDD-era literature from before the Cucumber wave (Beck, Freeman and Pryce, Meszaros, Fowler) and the vocabulary is already there.
- Tests named as scenarios:
should_charge_no_shipping_for_loyalty_members_over_fifty_dollars - Arrange-act-assert as a structural cousin of given-when-then
- Ubiquitous language pulled directly from domain-driven design
- Behavior framing: test what the system does, not what it is made of
- Tests as executable specifications, not afterthought regression nets
None of this was new when BDD got its name. The ideas were mature. The best practitioners were already living them. What the BDD movement added was a label, a tooling ecosystem, and a story about translating between developers and stakeholders.
The label was fine. The tooling introduced a tax nobody needed.
Test Data Builders Are Scenarios
Here’s what a scenario looks like when disciplined developers write it:
var order = anOrder()
.forCustomer(aLoyaltyMember())
.containing(twoBooks(), aVinylRecord())
.shippedTo(california())
.paidWith(aStoredCard());
var receipt = checkout.process(order);
receipt.shipping.Should().Be(free);
receipt.tax.Should().Be(californiaSalesTax(order.subtotal));
Read that. Out loud.
That is given-when-then. That is a scenario. The customer is a loyalty member, the cart has two books and a record, it ships to California, payment is on file. When we check out, shipping is free and tax follows California rules.
No feature file. No regex. No step definition. No glue code. Just a fluent API that reads like English and compiles like code.
The builder is the scenario grammar. Method names are the ubiquitous language. The test body is the specification.
And unlike Gherkin, every identifier is real. Rename LoyaltyMember and the refactor sweeps the scenario. Add a required field to Order and the compiler tells you which scenarios need updating. The business language evolves in the same commit as the code that implements it.
Object Mothers, Builders, Factories
The sophisticated TDD crowd wasn’t stopping at single builders. They were composing worlds.
Growing Object-Oriented Software, Guided by Tests (Freeman and Pryce, 2009) laid out the layering years before BDD frameworks went mainstream:
- Object Mothers produce canonical instances:
Customers.standard(),Customers.loyaltyMember(),Customers.suspended() - Builders take a mother and let the test tweak only what matters:
aLoyaltyMember().withExpiredCard() - Scenario factories compose full worlds.
aCartReadyForCheckout()returns a customer, a cart, a payment method, and the delivery address, all preconfigured, coherent, named for intent - Domain types sit under all of it:
Money,Address,SKU,Quantity. No primitives pretending to be domain concepts.
Stack these and you get scenario composition that Gherkin can’t match. A test says:
var world = aCartReadyForCheckout()
.withCustomer(aLoyaltyMember().inTheirFirstYear())
.withPromotion(blackFridayDoubleDiscount());
Three lines. Eight preconditions. Fully typed. Completely refactorable. Every noun is a domain concept, every verb is a behavior, and the whole thing is code the compiler protects.
Try that in a feature file. You’ll end up with twenty Given steps and a step-definition file that nobody wants to own.
Mocks Were Collaboration Scenarios
The London-school TDDers had another piece of this nailed before BDD existed as a term: mocks as behavioral specifications.
var warehouse = Substitute.For<IWarehouse>();
var orders = new OrderService(warehouse, ...);
orders.Confirm(anOrder().containing(twoBooks()));
warehouse.Received().Reserve(Arg.Is<Reservation>(r =>
r.Items.Count == 2 && r.Priority == Standard));
That test says: when an order is confirmed, the warehouse should be told to reserve the items at standard priority.
That’s a scenario, a behavioral specification, a collaboration contract between two objects, written in code and verified on every run.
BDD frameworks don’t do this well. They push toward end-to-end scenarios driven through a UI or an HTTP boundary because that’s what feature files are ergonomic for. The collaboration-level behavior, which is the design work, happens inside step definitions where nobody looks.
Mock-driven TDD put collaboration scenarios at the center of the design process. The BDD framework camp quietly moved them to the margins and made “outside-in” mean “Selenium-heavy.”
What Gherkin Actually Adds
Strip Cucumber down to its contribution and here’s what’s left:
- A text file format for scenarios
- A regex engine that maps scenario steps to functions
- A reporting layer that restates the text file after the run
That’s it. None of those are behaviors. They’re translation layers.
Every step of that translation has a cost. Feature files live outside the type system, so the compiler can’t help. Rename a concept in code and the feature file drifts until someone runs it and it breaks. Step definitions become a parallel codebase of helpers, setup routines, and “just enough” logic. Nobody owns them. Nobody refactors them. They rot. Regex step matching turns naming conflicts into runtime mysteries: two scenarios that read nearly identically can bind to the same step or different ones depending on word order. World setup gets duplicated across steps because there’s no natural composition primitive, so you get Given I have a customer with... in fifteen variants, each with its own setup path.
The promise was that stakeholders would read and write the scenarios. In practice, they read some of them, once, during a kickoff meeting, and then developers maintain the feature files forever as a second specification that has to be kept in sync with the code.
You now have two codebases. One of them doesn’t compile.
Side-by-Side
Same scenario. Gherkin first:
Feature: Loyalty shipping
Scenario: Loyalty members get free shipping over fifty dollars
Given a loyalty member
And a cart containing two books and a vinyl record
And a shipping address in California
And a stored payment method
When the customer checks out
Then shipping should be free
And tax should equal California sales tax on the subtotal
Plus step definitions:
[Given(@"a loyalty member")]
public void GivenALoyaltyMember() { ... }
[Given(@"a cart containing (.*)")]
public void GivenACartContaining(string items) { ... }
// ... six more, each with its own world-setup plumbing
Builder-driven TDD:
[Fact]
public void Loyalty_members_get_free_shipping_over_fifty_dollars()
{
var order = anOrder()
.forCustomer(aLoyaltyMember())
.containing(twoBooks(), aVinylRecord())
.shippedTo(california())
.paidWith(aStoredCard());
var receipt = checkout.process(order);
receipt.shipping.Should().Be(free);
receipt.tax.Should().Be(californiaSalesTax(order.subtotal));
}
Same scenario. Half the code. Zero glue. Fully typed. Fully refactorable. And it reads cleaner because it doesn’t need a translation layer.
The Language Evolves With the Code
This is the piece the framework crowd never solved.
Builders, factories, mothers, and domain types live in the same repo as the code they describe. They’re refactored by the same tools. They’re versioned in the same commits. When the business renames “loyalty member” to “rewards member,” the domain type changes, the factory changes, and every scenario that touches it changes. They all land in the same pull request.
There is no feature-file drift. There is no step-definition graveyard. There is no meeting to bring the scenarios back in sync with the code.
The ubiquitous language is the code. The scenarios are tests. The tests are the specification. The specification evolves with the system because it’s made of the same stuff the system is made of.
This is what BDD promised. Disciplined TDD delivered it, without the frameworks.
But This Requires Product-Thinking Developers
Here’s the uncomfortable part.
BDD frameworks exist because the industry assumed developers couldn’t or wouldn’t think about behavior, domain, and intent. The framework was scaffolding built around a deficit: if developers won’t name things for business meaning, we’ll make them write scenarios in English first and map them down into code.
Builder-driven TDD assumes the opposite. It assumes developers care about the language of the business. It assumes they’ll name a factory aLoyaltyMemberInTheirFirstYear instead of customer1. It assumes they’ll treat the test API as a product surface, something readers encounter, learn from, and build mental models against. It assumes they’ll push back on domain terminology that’s vague, overloaded, or inconsistent, and work with product to fix it.
That’s product thinking, not test discipline.
If your team treats tests as a chore, no framework saves you. Cucumber will just give you two places where the chore happens.
If your team treats tests as design, as the place where the domain language gets pressure-tested before code gets written, no framework is needed. The tests will be richer, clearer, and more behavior-driven than any Gherkin file.
The Framework Was Scaffolding for a Deficit
That’s the whole story.
BDD frameworks were built for teams that didn’t trust their developers to think about behavior, didn’t invest in building expressive test APIs, and didn’t have the product literacy to name things well. The framework was a workaround: a parallel specification language for people who weren’t writing the real one properly.
For teams that do invest in those things, the framework is pure overhead. It adds a translation layer where none is needed, drifts from the code it’s supposed to describe, and pulls behavior specification away from the design activity it should be part of.
Proper TDD has always been behavior-driven. The builders are the scenarios. The factories are the worlds. The mocks are the collaborations. The domain types are the ubiquitous language. The test suite is the specification. The specification is the code.
You don’t need a BDD framework to do BDD. You need developers who think like product people, a test API treated as a product surface, and the discipline to let the language of the business live inside the language of the code.
That’s not a rebrand. That’s the job.
Next in this arc: BDD Was a Coordination Tax. AI Just Repriced It. Why the framework side of BDD is being repriced by role compression and AI, even as the craft case stands on its own.