Test-driven skill development toolkit for Claude Code
npx claudepluginhub patanet7/skillproofTest-driven skill development toolkit for Claude Code. Create, test, evaluate, and optimize skills through a 6-phase TDD process with pressure testing, output comparison, and description optimization.
A Claude Code plugin for building skills that actually work. Test-driven, with real evidence.
You tell Claude what skill you want. It runs agents without the skill, watches them fail, writes the skill to fix those failures, then proves the skill works by running agents again. No guessing.
# Clone and point Claude at it
git clone https://github.com/patanet7/skillproof.git
claude --plugin-dir /path/to/skillproof
All workspace output goes to /tmp/skillproof/, never inside your project repo. Skills create workspace directories on demand — no hooks, no environment variables. /tmp is cleaned automatically by the OS on reboot.
To allow Claude to read plugin reference files (e.g., references/best-practices.md, agents/grader.md) without prompting, add this to your ~/.claude/settings.json:
{
"permissions": {
"allow": [
"Read(~/.claude/plugins/cache/**)"
]
}
}
This is a one-time setup. Without it, Claude will prompt for permission each time a skill reads a reference file from the plugin cache.
Once loaded, you get six slash commands:
/skillproof:writing-skills # The main one. Start here.
/skillproof:skill-tdd # TDD methodology (RED-GREEN-REFACTOR)
/skillproof:skill-testing-discipline # Pressure testing for rule-enforcement skills
/skillproof:skill-testing-output # A/B output quality comparison
/skillproof:skill-compare # Head-to-head skill comparison
/skillproof:improve-skill # Evidence-based skill improvement
Tell Claude you want to create a skill:
I want to create a skill that teaches agents to always verify their work
before claiming it's done.
Claude loads writing-skills and walks through 6 phases automatically:
Phase 1 — Capture Intent. Claude asks you clarifying questions. What should the skill do? When should it trigger? What type is it? (Discipline skills enforce rules. Workflow skills teach techniques. Reference skills document APIs.)
Phase 2 — Baseline (RED). Claude spawns a subagent, gives it a task without the skill, and watches it fail. This is the critical part. You see exactly what agents do wrong before the skill exists. No hypothesizing — actual observed failures.
Phase 3 — Draft (GREEN). Claude writes a minimal skill that addresses the specific failures it documented. Not a wish list — just enough to fix what it saw break.
Phase 4 — Evaluate. Claude runs the same scenarios with the skill loaded. It grades the results with assertions. Opens a browser-based viewer so you can compare outputs side by side and leave feedback.
Phase 5 — Refine (REFACTOR). Based on your feedback, Claude tightens the skill. For discipline skills, this means capturing new rationalizations agents use to wiggle out of rules and adding explicit counters. For workflow skills, it means improving output quality. Repeat phases 4-5 until you're happy.
Phase 6 — Optimize & Deploy. Claude generates 20 test queries to evaluate whether the skill's description triggers correctly (fires when it should, stays quiet when it shouldn't). Runs an optimization loop, picks the best description by test score, validates the structure, and it's ready to ship.
When Claude loads writing-skills, it pulls in the companion skills as needed:
skill-testing-discipline for pressure testingskill-testing-output for A/B comparisonskill-compare for head-to-head matchupskill-tdd for the RED-GREEN-REFACTOR cycleYou don't need to invoke these manually. They compose.
writing-skills — OrchestratorThe main entry point. Guides the full 6-phase flow and routes to the right testing strategy based on skill type.
Comes with:
skill-tdd — TDD methodologyThe testing discipline itself. Maps TDD concepts to skill development:
| TDD | Skill creation |
|---|---|
| Write failing test | Run scenario WITHOUT skill, watch agent fail |
| Write minimal code | Write skill addressing those specific failures |
| Watch it pass | Run scenario WITH skill, verify improvement |
| Refactor | Close loopholes, tighten wording, re-test |
Iron law: no skill without a failing test first. Write skill before testing? Delete it. Start over.
Claude Code marketplace entries for the plugin-safe Antigravity Awesome Skills library and its compatible editorial bundles.
Production-ready workflow orchestration with 84 marketplace plugins, 192 local specialized agents, and 156 local skills - optimized for granular installation and minimal token usage
Directory of popular Claude Code extensions including development tools, productivity plugins, and MCP integrations