From ai-kit
Proves a test genuinely guards its stated behavior via targeted fault probes (reachability, sensitivity, oracle validity, reliability) instead of trusting coverage. Use during test development or review.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ai-kit:test-health-checksonnetThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Produce compact PROOF that a specific test guards its stated contract — by running a few
Produce compact PROOF that a specific test guards its stated contract — by running a few targeted fault probes around the code under test. This is not mutation testing and emits no mutation score. It answers a sharper question for active development: does this test pass on healthy behavior and reliably fail on a plausible, contract-related break?
A test can execute a line, or even cover an entire method, yet still fail to assert anything meaningful (a "pseudo-tested" method survives having its whole body deleted). And a test that kills mutants can still pin the wrong requirement if its oracle was copied from the implementation. So this skill verifies four independent properties:
$ARGUMENTS may contain a target and flags. The target is a test name, a test/source file, a
diff/branch scope, or a described behavior. If no target is given, default to the affected cases
of the current branch (see Worktree mode).
| Flag | Effect |
|---|---|
--worktree | Run the whole cycle in an isolated git worktree scoped to affected cases. Production tree is never mutated. Strongly preferred. |
quick (default) | Current test or changed file: baseline → reachability probe → 2 sensitivity probes → restore. |
focused | Function/class/endpoint/flow: baseline → contract extraction → 3–7 targeted probes → boundary/failure analysis → suggestions. |
deep | Critical behavior: broader probe set, repeated runs, test-order variation, mock/fixture review, metamorphic checks, boundary verification. |
exhaustive | Hand off to a real mutation runner for a true score. NOT the primary mode — only for a full module / CI gate / historical comparison. |
--fix | After diagnosis, propose and verify a minimal test patch (off by default). |
ORACLE_UNCLEAR).references/levels.md.ORACLE_UNCLEAR.FLAKY_OR_ORDER_DEPENDENT
or surface the broken baseline — do not probe on top of a broken baseline.NOT_EXERCISING_TARGET. Restore.references/probes.md.references/diagnoses.md for the full enum, when each applies, and the proof matrix.--worktree)Before mutating a file, snapshot its exact bytes. After the probe, rewrite the snapshot and
re-run the test to confirm green. If the file already had uncommitted changes, restore from the
snapshot (not git checkout, which would discard the user's edits). Prefer --worktree to make
this bulletproof.
--worktree (and the default no-target invocation) scopes probing to the tests touched by, or
covering, the current branch's changes, and runs the whole cycle in a disposable git worktree —
so temporary production mutations never reach the real tree, even on interruption.
Use the bundled helper scripts/worktree.sh (base branch defaults to main):
# 1. Discover affected files (branch diff vs main + staged + unstaged + untracked)
bash scripts/worktree.sh affected
bash scripts/worktree.sh tests # same, filtered to likely test files
# 2. Create the isolated worktree (HEAD + uncommitted + untracked replayed); capture its path
WT=$(bash scripts/worktree.sh setup)
# 3. Run baseline + all probes inside "$WT" (cd into it for test + edits)
# 4. ALWAYS tear it down — even on failure
bash scripts/worktree.sh cleanup "$WT"
Rules for this mode:
scripts/worktree.sh tests surfaces obvious test files;
do the semantic source→test mapping yourself.cleanup the worktree (wrap the run so cleanup happens even if a probe throws). Never
leave orphaned worktrees; git worktree list should be clean afterward.--worktree composes with quick / focused / deep.--fix (opt-in)Default is diagnose-only. With --fix, propose a MINIMAL test patch that closes the proven gap,
then verify all three: (a) the new/changed test passes on original code, (b) it fails on the
relevant broken variant, (c) the existing suite stays green. This prevents writing a test
tailored to one syntactic mutant. Never auto-apply without these three checks passing.
Report the contract, the baseline, each probe with its one-line change and result
(detected | survived), the diagnosis token with findings, and (if requested) the suggested
test change. Use a compact structured block:
test: cancels_unpaid_order
scope: integration
contract:
description: "Cancelling an unpaid order persists CANCELLED and issues no payment"
confidence: high
sources: [test name, public service contract, domain state machine]
baseline: { status: passed, command: "<cmd>", runs: 2 }
probes:
- { id: reachability, change: "throw at OrderService.cancel", result: detected }
- { id: remove-persistence, change: "skip repository.save", result: survived }
- { id: invert-payment-condition, change: "initiate payment for unpaid order", result: detected }
diagnosis:
status: WEAK_ORACLE
findings: ["verifies returned status but not persisted state"]
suggested_test_change: ["reload the order from persistence and assert CANCELLED"]
confidence: high
State the diagnosis honestly: prefer INCONCLUSIVE or ORACLE_UNCLEAR over a false HEALTHY.
Read these as needed (one level deep):
references/probes.md — the probe catalog (effect removal, decision inversion, boundary, result
error, failure-path, metamorphic) with language-agnostic examples. Read when choosing probes.references/levels.md — unit / integration / e2e specifics, dual observation for e2e, and which
diagnoses apply per level. Read when determining what to probe at a given level.references/diagnoses.md — the full diagnosis enum, when each applies, and the proof matrix
(healthy-code vs broken-code expectations). Read when classifying the result.npx claudepluginhub ivklgn/ai-kitAudits test suite health by identifying flaky tests, slow tests, coverage gaps, and anti-patterns; computes health score, prioritizes issues, suggests fixes.
Audit test suite health — find flaky tests, slow tests, coverage gaps, and testing anti-patterns. Use when asked to "audit tests", "fix flaky tests", "why are tests slow", "test health", or "improve test suite".