Marketplace

skillproof

Test-driven skill development toolkit for Claude Code

npx claudepluginhub patanet7/skillproof

README

View full README on GitHub

1 Plugin

skillproof

0·

Test-driven skill development toolkit for Claude Code. Create, test, evaluate, and optimize skills through a 6-phase TDD process with pressure testing, output comparison, and description optimization.

3mo

v1.1.1

patanet7

Stats

Plugins1

UpdatedMar 22, 2026

Links

View on GitHub View Marketplace JSON

skillproof

A Claude Code plugin for building skills that actually work. Test-driven, with real evidence.

You tell Claude what skill you want. It runs agents without the skill, watches them fail, writes the skill to fix those failures, then proves the skill works by running agents again. No guessing.

Install

# Clone and point Claude at it
git clone https://github.com/patanet7/skillproof.git
claude --plugin-dir /path/to/skillproof

Setup

All workspace output goes to /tmp/skillproof/, never inside your project repo. Skills create workspace directories on demand — no hooks, no environment variables. /tmp is cleaned automatically by the OS on reboot.

To allow Claude to read plugin reference files (e.g., references/best-practices.md, agents/grader.md) without prompting, add this to your ~/.claude/settings.json:

{
  "permissions": {
    "allow": [
      "Read(~/.claude/plugins/cache/**)"
    ]
  }
}

This is a one-time setup. Without it, Claude will prompt for permission each time a skill reads a reference file from the plugin cache.

Once loaded, you get six slash commands:

/skillproof:writing-skills              # The main one. Start here.
/skillproof:skill-tdd                   # TDD methodology (RED-GREEN-REFACTOR)
/skillproof:skill-testing-discipline    # Pressure testing for rule-enforcement skills
/skillproof:skill-testing-output        # A/B output quality comparison
/skillproof:skill-compare               # Head-to-head skill comparison
/skillproof:improve-skill               # Evidence-based skill improvement

How it works

You say what you want. Claude does the rest.

Tell Claude you want to create a skill:

I want to create a skill that teaches agents to always verify their work
before claiming it's done.

Claude loads writing-skills and walks through 6 phases automatically:

Phase 1 — Capture Intent. Claude asks you clarifying questions. What should the skill do? When should it trigger? What type is it? (Discipline skills enforce rules. Workflow skills teach techniques. Reference skills document APIs.)

Phase 2 — Baseline (RED). Claude spawns a subagent, gives it a task without the skill, and watches it fail. This is the critical part. You see exactly what agents do wrong before the skill exists. No hypothesizing — actual observed failures.

Phase 3 — Draft (GREEN). Claude writes a minimal skill that addresses the specific failures it documented. Not a wish list — just enough to fix what it saw break.

Phase 4 — Evaluate. Claude runs the same scenarios with the skill loaded. It grades the results with assertions. Opens a browser-based viewer so you can compare outputs side by side and leave feedback.

Phase 5 — Refine (REFACTOR). Based on your feedback, Claude tightens the skill. For discipline skills, this means capturing new rationalizations agents use to wiggle out of rules and adding explicit counters. For workflow skills, it means improving output quality. Repeat phases 4-5 until you're happy.

Phase 6 — Optimize & Deploy. Claude generates 20 test queries to evaluate whether the skill's description triggers correctly (fires when it should, stays quiet when it shouldn't). Runs an optimization loop, picks the best description by test score, validates the structure, and it's ready to ship.

What triggers automatically

When Claude loads writing-skills, it pulls in the companion skills as needed:

Discipline skill detected → loads skill-testing-discipline for pressure testing
Workflow/technique skill detected → loads skill-testing-output for A/B comparison
Comparing two skills → loads skill-compare for head-to-head matchup
Any skill creation → loads skill-tdd for the RED-GREEN-REFACTOR cycle

You don't need to invoke these manually. They compose.

The six skills

`writing-skills` — Orchestrator

The main entry point. Guides the full 6-phase flow and routes to the right testing strategy based on skill type.

Comes with:

Scripts — description optimization loop, trigger evaluation, benchmark aggregation, structure validation, skill packaging
Agents — grader (assertion evaluation), comparator (blind A/B), analyzer (post-comparison)
Eval viewer — browser-based review UI with side-by-side output comparison and feedback collection
References — authoring best practices, description optimization guide, JSON schemas

`skill-tdd` — TDD methodology

The testing discipline itself. Maps TDD concepts to skill development:

TDD	Skill creation
Write failing test	Run scenario WITHOUT skill, watch agent fail
Write minimal code	Write skill addressing those specific failures
Watch it pass	Run scenario WITH skill, verify improvement
Refactor	Close loopholes, tighten wording, re-test

Iron law: no skill without a failing test first. Write skill before testing? Delete it. Start over.

skillproof

README

1 Plugin

skillproof

skillproof

README

skillproof

Install

Setup

How it works

You say what you want. Claude does the rest.

What triggers automatically

The six skills

`writing-skills` — Orchestrator

`skill-tdd` — TDD methodology

1 Plugin

skillproof

Related Marketplaces

antigravity-awesome-skills

claude-code-workflows

claude-plugins-official

skillproof

Install

Setup

How it works

You say what you want. Claude does the rest.

What triggers automatically

The six skills

`writing-skills` — Orchestrator

`skill-tdd` — TDD methodology

Related Marketplaces

antigravity-awesome-skills

claude-code-workflows

claude-plugins-official

skillproof

README

1 Plugin

skillproof

skillproof

README

skillproof

Install

Setup

How it works

You say what you want. Claude does the rest.

What triggers automatically

The six skills

writing-skills — Orchestrator

skill-tdd — TDD methodology

1 Plugin

skillproof

Related Marketplaces

antigravity-awesome-skills

claude-code-workflows

claude-plugins-official

skillproof

Install

Setup

How it works

You say what you want. Claude does the rest.

What triggers automatically

The six skills

writing-skills — Orchestrator

skill-tdd — TDD methodology

Related Marketplaces

antigravity-awesome-skills

claude-code-workflows

claude-plugins-official

`writing-skills` — Orchestrator

`skill-tdd` — TDD methodology

`writing-skills` — Orchestrator

`skill-tdd` — TDD methodology