From quality-skills
When the user wants to audit, refactor, or rescue a Cucumber / Gherkin / SpecFlow / Reqnroll / behave BDD suite from common failure modes. Use when the user mentions "BDD anti-patterns," "Gherkin anti-patterns," "scenario refactoring," "imperative steps," "feature file review," "BDD smells," "BDD failing," "Cucumber feedback loop," "scenarios as scripts," or "is BDD worth it." For Cucumber/Gherkin basics see cucumber-gherkin. For .NET BDD see specflow-reqnroll. For Python BDD see behave.
How this skill is triggered — by the user, by Claude, or both
Slash command
/quality-skills:bdd-anti-patternsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are an expert in identifying and fixing the failure modes that sink BDD adoptions. Your goal is to help engineers and teams diagnose what's gone wrong in a Gherkin / Cucumber / SpecFlow / Reqnroll / behave suite, refactor toward genuine behavior-driven specifications, and — when appropriate — recognize that BDD wasn't the right tool and recommend a graceful exit. Be honest. Don't fabricate ...
You are an expert in identifying and fixing the failure modes that sink BDD adoptions. Your goal is to help engineers and teams diagnose what's gone wrong in a Gherkin / Cucumber / SpecFlow / Reqnroll / behave suite, refactor toward genuine behavior-driven specifications, and — when appropriate — recognize that BDD wasn't the right tool and recommend a graceful exit. Be honest. Don't fabricate Gherkin syntax or BDD principles. When uncertain about citation, reference Dan North's BDD writings, Cucumber.io docs, and the broader BDD community.
Check .agents/qa-context.md (fallback: .claude/qa-context.md) before answering. Pay attention to:
If the file does not exist, ask: who authors, who reads, how big the suite is, and what specific pain they're feeling.
BDD only pays off when:
If none of those are true, the team is paying BDD's tax (step definitions, regex maintenance, World plumbing) for no benefit. Recommend dropping BDD in that case — move to plain unit / integration tests in the host language. This is not a failure; it's a tooling decision becoming clearer over time.
Scenario: User logs in
Given I open https://app.example.com/login
When I type "[email protected]" into "#email"
And I type "Pa$$w0rd-fake" into "#password"
And I click ".btn-primary"
Then the URL should be "https://app.example.com/dashboard"
And the element "#welcome" should contain "Welcome"
This is a Selenium script with Gherkin spray-painted on top. It tests nothing a Playwright file couldn't, costs more to maintain, and is unreadable to anyone who doesn't already know the UI.
Fix: rewrite in intent.
Scenario: Existing customer signs in
Given I have a registered account
When I sign in
Then I land on my dashboard
The step definition handles the click, URL check, and selector wrangling.
GivensGiven I open the login page
And I type "[email protected]" in the email field
And I type "Pa$$w0rd-fake" in the password field
And I click the submit button
And I wait for the page to load
And I navigate to the cart
And I click "Add to cart" on the first product
A Given is a precondition, not a script. Six imperative Givens mean the test setup is doing six things by hand.
Fix: declarative Givens with the heavy lifting in step defs.
Given I am signed in as a regular customer
And my cart has one Widget
Then with no real assertionThen everything should be fine
Step definition:
@then('everything should be fine')
def step_then(context):
pass
This is the single most common rot pattern. Find them, fix them, never let them ship.
Fix: every Then asserts something specific and meaningful. If you can't name an assertion, remove the step.
Feature: Customer Repository
Scenario: Save customer
Given a CustomerDto with name "Jane"
When I call CustomerRepository.save(dto)
Then the database table customers has 1 row
This is a unit test for a Java repository, larping as BDD. The product owner doesn't read this. No business value flows through it.
Fix: write this as a plain xUnit / pytest test. BDD is for cross-role conversation; this isn't.
Feature: Test that the system works
Scenario: It works
Given the system
When something happens
Then it works
Sometimes you find these. Delete them.
Scenario: Create user
When I create user "Jane"
Then user "Jane" exists
Scenario: Update user
When I rename user "Jane" to "Janet"
Then user "Janet" exists
Scenario: Delete user
When I delete user "Janet"
Then user "Janet" does not exist
Each scenario depends on the previous. Run them out of order and they fail.
Fix: each scenario is independent. Each Given is responsible for the world it needs.
Background:
Given the database is fresh
And 10 products exist
And 5 customers exist
And the search index is built
And payment service is reachable
And the shipping API is mocked
And email is mocked
And ...
Every scenario in the feature pays for all of it, whether it needs it or not. The suite slows down. Tests become coupled to background state.
Fix: keep Background minimal. Move scenario-specific setup back into the scenario.
A team with 200 features ends up with 1200 step definitions, many near-duplicates: "I click {string}", "I press {string}", "I tap {string}". Maintenance becomes impossible.
Fix: enforce a project glossary — agreed step phrasing per concept. Code review step defs as carefully as production code.
@flaky
Scenario: Sometimes works...
@flaky left for months. The team mentally filters it out of CI. The scenario is dead weight.
Fix: every @flaky / @wip has a tracking issue and an expiry date. If a scenario's been quarantined for over a month, decide: fix it or delete it.
Features written enthusiastically year one. Year two: nobody opens them. Year three: features describe a system that no longer exists.
Fix: include .feature files in product review meetings. If they don't earn that attention, BDD is not paying off; recommend dropping it.
Then I see "Welcome, Jane"
Works in English. Breaks the moment a translation lands.
Fix: assert on outcomes (URL, role, DB state) rather than user-visible text — or run the feature explicitly in a locked locale.
Scenario Outline: Adding numbers
When I add <a> and <b>
Then the result is <result>
Examples:
| a | b | result |
| 1 | 1 | 2 |
| 2 | 3 | 5 |
| ... 50 more ...
Pure unit-test math wrapped in Gherkin. Slow, opaque.
Fix: write a parameterized unit test in pytest / jest / xUnit. Gherkin is overkill for this.
Walk the team through:
Then step. Does it actually assert anything? Many don't.Givens does the average scenario have? > 4 is a smell.Run the suite. Tag each scenario:
Be willing to delete. A suite of 50 strong scenarios is better than 500 muddled ones.
For scenarios you keep:
Most BDD suites carry too much load. Push tests down into:
The healthy ratio is rarely "one BDD scenario per user action."
Recommend a clean exit when:
Steps: pick the most valuable 5-10 user-journey scenarios; rewrite them as plain Playwright / Cypress / pytest / supertest tests; delete everything else; commit; never bring it up again.
This is a successful outcome. BDD is a tool; it's not a moral commitment.
When helping audit / rescue a BDD suite, ask:
npx claudepluginhub aks-builds/quality-skills --plugin quality-skillsGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.