By patanet7
Test-driven skill development toolkit for Claude Code. Create, test, evaluate, and optimize skills through a 6-phase TDD process with pressure testing, output comparison, and description optimization.
Use when reviewing, auditing, or improving an existing skill — whether it's not triggering correctly, producing poor outputs, too verbose, or you want to systematically evaluate and enhance its quality
Use when comparing two skills head-to-head on the same tasks to determine which produces better results — for choosing between skill versions, competing approaches, or validating that a rewrite improves on the original
Use when creating, editing, or validating a SKILL.md before deployment using a test-first cycle — covers new skill authorship, updates to existing skills, and pre-deployment validation of any skill artifact
Use when testing skills that enforce rules, require compliance, or have discipline requirements that agents might rationalize away under pressure
Use when evaluating whether a skill measurably improves outputs by running controlled comparisons — with-skill vs without-skill, old version vs new version, or A/B tests across skill variants — where quality is graded on produced artifacts like code, docs, migrations, or commit messages
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A Claude Code plugin for building skills that actually work. Test-driven, with real evidence.
You tell Claude what skill you want. It runs agents without the skill, watches them fail, writes the skill to fix those failures, then proves the skill works by running agents again. No guessing.
# Clone and point Claude at it
git clone https://github.com/patanet7/skillproof.git
claude --plugin-dir /path/to/skillproof
All workspace output goes to /tmp/skillproof/, never inside your project repo. Skills create workspace directories on demand — no hooks, no environment variables. /tmp is cleaned automatically by the OS on reboot.
To allow Claude to read plugin reference files (e.g., references/best-practices.md, agents/grader.md) without prompting, add this to your ~/.claude/settings.json:
{
"permissions": {
"allow": [
"Read(~/.claude/plugins/cache/**)"
]
}
}
This is a one-time setup. Without it, Claude will prompt for permission each time a skill reads a reference file from the plugin cache.
Once loaded, you get six slash commands:
/skillproof:writing-skills # The main one. Start here.
/skillproof:skill-tdd # TDD methodology (RED-GREEN-REFACTOR)
/skillproof:skill-testing-discipline # Pressure testing for rule-enforcement skills
/skillproof:skill-testing-output # A/B output quality comparison
/skillproof:skill-compare # Head-to-head skill comparison
/skillproof:improve-skill # Evidence-based skill improvement
Tell Claude you want to create a skill:
I want to create a skill that teaches agents to always verify their work
before claiming it's done.
Claude loads writing-skills and walks through 6 phases automatically:
Phase 1 — Capture Intent. Claude asks you clarifying questions. What should the skill do? When should it trigger? What type is it? (Discipline skills enforce rules. Workflow skills teach techniques. Reference skills document APIs.)
Phase 2 — Baseline (RED). Claude spawns a subagent, gives it a task without the skill, and watches it fail. This is the critical part. You see exactly what agents do wrong before the skill exists. No hypothesizing — actual observed failures.
Phase 3 — Draft (GREEN). Claude writes a minimal skill that addresses the specific failures it documented. Not a wish list — just enough to fix what it saw break.
Phase 4 — Evaluate. Claude runs the same scenarios with the skill loaded. It grades the results with assertions. Opens a browser-based viewer so you can compare outputs side by side and leave feedback.
Phase 5 — Refine (REFACTOR). Based on your feedback, Claude tightens the skill. For discipline skills, this means capturing new rationalizations agents use to wiggle out of rules and adding explicit counters. For workflow skills, it means improving output quality. Repeat phases 4-5 until you're happy.
Phase 6 — Optimize & Deploy. Claude generates 20 test queries to evaluate whether the skill's description triggers correctly (fires when it should, stays quiet when it shouldn't). Runs an optimization loop, picks the best description by test score, validates the structure, and it's ready to ship.
When Claude loads writing-skills, it pulls in the companion skills as needed:
skill-testing-discipline for pressure testingskill-testing-output for A/B comparisonskill-compare for head-to-head matchupskill-tdd for the RED-GREEN-REFACTOR cycleYou don't need to invoke these manually. They compose.
writing-skills — OrchestratorThe main entry point. Guides the full 6-phase flow and routes to the right testing strategy based on skill type.
Comes with:
skill-tdd — TDD methodologyThe testing discipline itself. Maps TDD concepts to skill development:
| TDD | Skill creation |
|---|---|
| Write failing test | Run scenario WITHOUT skill, watch agent fail |
| Write minimal code | Write skill addressing those specific failures |
| Watch it pass | Run scenario WITH skill, verify improvement |
| Refactor | Close loopholes, tighten wording, re-test |
Iron law: no skill without a failing test first. Write skill before testing? Delete it. Start over.
npx claudepluginhub patanet7/skillproof --plugin skillproofProfessional skill creation with TDD workflow. Features dual-mode (fast/full), behavioral validation, and automated quality gates for 9.0/10+ scores.
建立新技能、修改和改進現有技能、衡量技能效能。用於從零開始建立技能、編輯或優化現有技能、執行評估測試、基準測試效能分析、或優化技能描述以提升觸發準確度
Self-evolving skill engine for Claude Code. Creates, scores, repairs, and hardens skills autonomously through recursive improvement cycles.
Ultimate Claude Code skill creator. Design, scaffold, build, review, evolve, and publish production-grade AI agent skills following the Agent Skills open standard and 3-layer architecture.
Create and manage Claude Code skills, plugins, subagents, and hooks. Use when building new skills, validating existing skills, testing skills empirically, creating plugins, converting projects to plugins, creating hooks, or managing plugin automation. Includes /skills-toolkit:skill-composer, /skills-toolkit:skill-refiner, /skills-toolkit:skill-tester, /skills-toolkit:plugin-creator, /skills-toolkit:subagent-creator, /skills-toolkit:hook-creator, and /skills-toolkit:ask-user-question skills.
Agent Skills for improving SKILL.md files: mine repeated workflows from history, personalize and audit existing skills, or generalize personal skills for publication.