From shesha-auto-testing
Author a new markdown test plan at test-plans/<folder>/<name>.md AND a paired Playwright spec at test-plans/<folder>/<name>.spec.ts with real selectors recorded live via MCP browser. Trigger phrases include "create a test", "create a test plan", "new test for X", "test plan for Y", "add a test that...". The user provides only a title and a numbered list of high-level steps. The skill pulls App URL, environment, and credentials from CLAUDE.md, expands each step into full action steps per test-plans/RULES.md, matches the style of existing plans, AND walks each step against the real app via Playwright MCP to capture the resolved locator before writing the spec. Steps that can't be located after 2 retries fall back to a `// TODO[selector]:` marker for AI-repair on first run. The plan + spec are executed later by the run-test skill — this skill authors only, it does NOT run.
How this skill is triggered — by the user, by Claude, or both
Slash command
/shesha-auto-testing:create-testThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Author a plan AND its paired Playwright spec with **real selectors recorded live**. Do not run the test as part of the run-test loop — this skill only authors.
Author a plan AND its paired Playwright spec with real selectors recorded live. Do not run the test as part of the run-test loop — this skill only authors.
Ask the user once, using exactly this format, then stop asking:
Title: <feature name, e.g. "Create Service Request">
Folder: <optional — leave blank to auto-pick>
Steps:
1. <high-level step>
2. <high-level step>
3. ...
If their first message already supplies a title + numbered steps, skip the prompt and generate immediately.
Never ask for App URL, credentials, environment, today's date, or file path.
test-plans/ (e.g. "Create Service Request" → service-requests/, "Login" → auth/ if it exists).Read CLAUDE.md → App URL, Environment, Credentials, today's date.
Read test-plans/RULES.md → prefixes, snapshot rule, assertion rules, hybrid execution model (§8).
Read the closest neighbour plan in the target folder. Match its style, depth, and section order exactly. If the folder is brand new, use test-plans/cases/create-case.md as the canonical example.
Verify Admin credentials in CLAUDE.md. The recorded login helper embeds the username and password directly, so empty or placeholder values (<TODO>, change-me, empty) produce a broken spec. If the Admin row is missing or placeholder, stop and tell the user:
No usable Admin credentials in
CLAUDE.md. Run/test-setup(which validates the credentials table), or add a row to the Credentials section manually:| Admin | <username> | <password> |Then re-run this skill.
Do not invent placeholder credentials and continue.
Probe Playwright MCP (headless). Call mcp__playwright__browser_navigate with url: "about:blank". If it errors, stop and tell the user:
Playwright MCP server is not reachable. The recording loop needs it to capture selectors live. Check
.mcp.jsonand ensure the Playwright MCP server is running, then re-run this skill.
Verify headless mode. Run claude mcp list 2>&1 and check the playwright line for the --headless flag. The recording loop must not pop a visible browser window — the whole point of this skill is silent, background recording. If the flag is absent, ask once:
The Playwright MCP is registered without
--headless, so recording will open a visible Chromium window. Re-register it now in headless mode? Runs:claude mcp remove playwright claude mcp add playwright -- npx -y @playwright/mcp@latest --headless
On yes, run both commands, then re-probe (mcp__playwright__browser_navigate with url: "about:blank") before continuing. On no, continue but warn in the finishing reply that the recording ran headed.
Expand the user's bullets into a TC-NN structure per test-plans/RULES.md:
TC-NN block with prefixed actions (NAVIGATE, CLICK, TYPE, SELECT, WAIT, SNAPSHOT, ASSERT, API, EXTRACT).CLAUDE.md.CLICK or TYPE.ASSERT is observable from a snapshot; exactly one (BLOCKING) assertion per critical TC.test-plans/<folder>/<kebab-title>.md. If the file exists, ask before overwriting.After the .md is written, drive the real app via Playwright MCP to capture every selector live, then emit the .spec.ts. This replaces the old "guess + TODO marker" scaffold.
Recording runs headless / in the background. Pre-flight step 6 verifies the Playwright MCP is registered with --headless so no Chromium window pops up while the loop walks the app. Do not call mcp__playwright__browser_resize or any tool that depends on a visible viewport — every selector resolution must work off the accessibility snapshot, not screen coordinates. Status updates to the user are limited to short text lines ("Recording TC-02 step 3 / 7 …") — never imply "watch the browser" in any prompt.
For each TC-NN in the plan, in plan order:
mcp__playwright__browser_navigate to APP_URL (or the URL the first NAVIGATE step specifies).mcp__playwright__browser_snapshot).role=textbox with accessible name matching /username|email|user/i. If multiple textboxes and none match by name, take the first textbox above the password field.mcp__playwright__browser_type the admin username from CLAUDE.md.role=textbox with name=/password/i (or a textbox whose type=password attribute is exposed in the snapshot).mcp__playwright__browser_type the admin password.role=button with name=/sign in|log in|submit/i.mcp__playwright__browser_click it.mcp__playwright__browser_wait_for until network is idle or a known logged-in element appears.loginAsAdmin helper.mcp__playwright__browser_close to reset state, then start the next TC.| Plan prefix | MCP recording step | Emitted spec line |
|---|---|---|
NAVIGATE <url> | _navigate to <url> | await page.goto('<url>'); |
SNAPSHOT — <desc> | none (Playwright auto-waits at runtime) | // SNAPSHOT: <desc> (comment only) |
CLICK <hint> | _snapshot, resolve locator per priority list, _click to verify | await page.<resolved-locator>.click(); |
TYPE <field> with \`` | _snapshot, resolve label/placeholder/textbox locator, _type with <value> to verify | await page.<resolved-locator>.fill('<value>'); |
SELECT <dropdown> — choose <option> | _snapshot, resolve trigger, _click, _snapshot menu, resolve option, _click | Two lines: trigger click then option click |
WAIT for <condition> | none (observable from later steps) | await expect(page.<resolved-locator>).toBeVisible(); if condition names a visible element, else await page.waitForLoadState('networkidle'); |
ASSERT <claim> | _snapshot, resolve, observe value | await expect(page.<resolved-locator>).<matcher>; |
ASSERT (BLOCKING) <claim> | same as ASSERT, but emit // ASSERT (BLOCKING): <claim> above the expect line | same as ASSERT |
API <method> <url> | none — emitted as a network call | await page.request.<method>('<url>'); |
EXTRACT <thing> | _snapshot, resolve, capture text | const <var> = await page.<resolved-locator>.textContent(); |
When the snapshot is in hand, resolve the failing line's element by walking this list until one matches:
role + accessible name matching the hint (case-insensitive substring). Emit page.getByRole('<role>', { name: '<exact accessible name>' }). This is the preferred form — stable across redesigns.label text matching the hint. Emit page.getByLabel('<exact label>'). Use for form inputs with <label for> associations.page.getByText('<exact text>'). Use for non-interactive verifications.data-testid on the element (or its nearest interactive ancestor). Emit page.getByTestId('<id>').[data-testid="grid"] >> button.add). Emit the locator AND a // FRAGILE: <reason> comment one line above. Cap the chain at 3 levels.For each candidate, call the corresponding MCP action to verify it works (_click for buttons/links, _type for fields). If the MCP call fails (ref stale, multiple matches, not visible), drop to the next strategy.
If 2 strategy retries fail to locate an element, emit the heuristic guess with a // TODO[selector]: <hint> marker one line above. Move on — don't block the rest of the recording. AI-repair will resolve the TODO on first /RunTest.
Cap the recording effort per TC at ~2 minutes wall-clock. If a TC takes longer (recording stuck on the same step), fall through to the heuristic emission for the remaining steps with TODO markers. Note the count of fallback markers in the finishing reply.
The emitted .spec.ts template — keep this shape exactly:
// AUTO-RECORDED from test-plans/<folder>/<kebab-title>.md
// The .md plan is canonical. AI-repair will patch failing lines in this file.
// Do not hand-edit unless you are also updating the .md plan.
import { test, expect, Page } from '@playwright/test';
const APP_URL = '<from CLAUDE.md>';
const ADMIN = { user: '<from CLAUDE.md>', password: '<from CLAUDE.md>' };
async function loginAsAdmin(page: Page) {
await page.goto(APP_URL);
// STEP login.1: <verbatim>
await page.<recorded-locator-for-username>.fill(ADMIN.user);
// STEP login.2: <verbatim>
await page.<recorded-locator-for-password>.fill(ADMIN.password);
// STEP login.3: <verbatim>
await page.<recorded-locator-for-submit>.click();
await page.waitForLoadState('networkidle');
}
test.describe('<Plan Title>', () => {
test('TC-01: <TC title>', async ({ page }) => {
// STEP 1: <verbatim plan step>
<action line>
// STEP 2: <verbatim plan step>
<action line>
// ASSERT (BLOCKING) <assertion>
await expect(<resolved-locator>).<matcher>;
});
test('TC-02: <TC title>', async ({ page }) => {
await loginAsAdmin(page);
// ... recorded steps ...
});
});
// STEP N: <verbatim plan step> comment immediately above it. AI-repair uses these as anchors when it has to patch a line.getByRole / getByLabel / getByText first; raw CSS only as the last-resort fallback with a // FRAGILE: comment above it.loginAsAdmin helper at the top is recorded once per spec and reused — do not re-record per TC.test-plans/<folder>/<kebab-title>.spec.ts. If the file exists, ask before overwriting (same prompt as the .md).One line, nothing else:
Created
test-plans/<folder>/<name>.md+<name>.spec.tswith recorded selectors, fallback TODO markers. Run with/RunTest <name>. (If K>0, expect AI-repair to resolve those K lines on first run.)
npx claudepluginhub boxfusion/boxfusion-plugins --plugin shesha-auto-testingProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.