From checkmate-bdd
Use to execute BDD scenarios. Accepts a scenario filename, feature group name, or "all". On failure: classifies the error, attempts heal-and-retry using the upstream backend's own prompts and schemas (no API key needed). Reports results with a natural-language summary.
How this skill is triggered — by the user, by Claude, or both
Slash command
/checkmate-bdd:runThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Execute scenarios via checkmate. Requires `bdd:stack up` first.
Execute scenarios via checkmate. Requires bdd:stack up first.
bdd:run <scenario-name> — single scenario (e.g. user-login-success)bdd:run <feature-group> — all scenarios in a group (e.g. auth)bdd:run all — full suiteLoad tests/e2e/checkmate.config.json. Read:
plugin_root → <PLUGIN_ROOT>bdd.run.max_retries (default: 2)bdd.heal.auto_apply_threshold (default: 0.85)checkmate.project_name, checkmate.urlcurl -sf http://127.0.0.1:8000/health > /dev/null 2>&1 || echo "checkmate not running"
curl -sf http://127.0.0.1:8932/health > /dev/null 2>&1 || echo "playwright-http not running"
If either is not running: "Stack is not running. Run bdd:stack up first. Start it now? [Y/n]"
Get test case list:
import json, urllib.request
cfg = json.load(open("tests/e2e/checkmate.config.json"))
base = cfg["checkmate"]["url"]
name = cfg["checkmate"]["project_name"]
projects = json.loads(urllib.request.urlopen(f"{base}/api/projects").read())
proj = next(p for p in projects if p["name"] == name)
cases = json.loads(urllib.request.urlopen(f"{base}/api/test-cases/project/{proj['id']}").read())
Single scenario name: Find cases entry where name == <scenario-name>.
Feature group: Find all .json files under tests/e2e/scenarios/<group>/; match by stem to cases.
all: All cases in the project.
Register any unregistered scenarios first:
python3 <PLUGIN_ROOT>/scripts/run-suite.py \
--config tests/e2e/checkmate.config.json \
--scenarios tests/e2e/scenarios/ \
--register-only
For each test case ID, run this loop (max bdd.run.max_retries heal attempts):
Use the checkmate MCP server run_test tool:
test_case_id: the case IDbrowser: chromium-headless (default)retry parameter — retry is handled client-side here, not by the backendCollect SSE events. Print each step as it streams:
✓ navigate /login
✓ click Login button
✗ assert_text "Welcome" — Element not found: "Welcome"
Read ~/.checkmate-bdd/checkmate/agent/nodes/failure_classifier.py.
Find the system prompt and the FailureClassification Pydantic schema.
Produce a FailureClassification from the SSE failure events (step action, target, value, error, screenshot if available) following the system prompt. The schema includes a retryable boolean and a category field.
If retryable == false OR retry budget exhausted: record as non-retryable failure, skip to 4f.
Read ~/.checkmate-bdd/checkmate/agent/nodes/healer.py.
Find the system prompt and the HealSuggestion / HealedStep Pydantic schema. Note the critical target/value format rules in the prompt — follow them exactly.
Produce a HealSuggestion from:
test_case_name: scenario namenatural_query: case's natural_query fieldbase_url: from configoriginal_steps: current steps from the test casefailed_steps: the failing SSE step eventspage_elements: if a screenshot is available, describe visible elementsThe schema includes healed_steps, changed_step_numbers, explanation, and confidence (0.0–1.0).
If confidence >= bdd.heal.auto_apply_threshold:
Update the test case with healed steps:
import json, urllib.request
cfg = json.load(open("tests/e2e/checkmate.config.json"))
base = cfg["checkmate"]["url"]
healed_steps_json = json.dumps(<healed_steps as list of dicts>)
body = json.dumps({"steps": healed_steps_json}).encode()
req = urllib.request.Request(f"{base}/api/test-cases/{case_id}",
data=body, headers={"Content-Type": "application/json"}, method="PUT")
urllib.request.urlopen(req)
Also update tests/e2e/scenarios/<group>/<name>.json with the healed steps.
Decrement retry budget. Go back to 4a.
If confidence < bdd.heal.auto_apply_threshold:
Print the diff:
Heal suggestion (confidence: 0.62):
Step 3: assert_text "Welcome" → assert_text "Dashboard"
Reason: page shows "Dashboard" not "Welcome" after login
Apply? [y/n/edit]
If user says y: apply and re-run (go to 4a after updating).
If user says n: record as failure with the explanation.
If user says edit: show full healed UTML, let user edit, then apply and re-run.
After all retries or non-retryable failure: record the test case as failed with the FailureClassification.
Read ~/.checkmate-bdd/checkmate/agent/nodes/reporter.py.
Find the system prompt and output structure. Using the full run results (passed cases, failed cases, classification for each failure), produce a summary following that prompt.
Print the summary, followed by component-boundary hints for each failure:
| Failure category | Investigation hint |
|---|---|
element_not_found | UI layer — check HTML structure, element labels, wait for render |
assertion_failed | App state — check the API response, DB state, or page content |
navigation_failed | Routing — check URL patterns, redirects, auth guards |
network_error | HTTP API layer — check server logs, route handler, DB connection |
timeout | Timing — add wait_for_page steps, check async loading |
script_error | JavaScript error — check browser console, React error boundary |
screenshot_required | Visual assertion — screenshot taken, review manually |
unknown | Check full stack logs |
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Results: 4 passed, 1 failed (18.2s)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
FAILED: user-login-success
Step 3: assert_text "Welcome" — Element not found
Category: element_not_found (non-retryable after 2 heal attempts)
Hint: UI layer — check HTML structure and element labels
Screenshot: tests/e2e/screenshots/user-login-success-step3.png
Exit with error signal if any scenario failed (so CI fails correctly).
npx claudepluginhub joshmullikin/checkmate-bdd-plugin --plugin checkmate-bddProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.