Search everything...

Stats

Actions

Available In

skillproof

Name: skillproof
Author: patanet7

By patanet7

Test-driven skill development toolkit for Claude Code. Create, test, evaluate, and optimize skills through a 6-phase TDD process with pressure testing, output comparison, and description optimization.

npx claudepluginhub patanet7/skillproof --plugin skillproof

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Skills6

improve-skill

/improve-skill

Use when reviewing, auditing, or improving an existing skill — whether it's not triggering correctly, producing poor outputs, too verbose, or you want to systematically evaluate and enhance its quality

skill-compare

/skill-compare

Use when comparing two skills head-to-head on the same tasks to determine which produces better results — for choosing between skill versions, competing approaches, or validating that a rewrite improves on the original

skill-tdd

/skill-tdd

Use when creating, editing, or validating a SKILL.md before deployment using a test-first cycle — covers new skill authorship, updates to existing skills, and pre-deployment validation of any skill artifact

skill-testing-discipline

/skill-testing-discipline

Use when testing skills that enforce rules, require compliance, or have discipline requirements that agents might rationalize away under pressure

skill-testing-output

/skill-testing-output

Use when evaluating whether a skill measurably improves outputs by running controlled comparisons — with-skill vs without-skill, old version vs new version, or A/B tests across skill variants — where quality is graded on produced artifacts like code, docs, migrations, or commit messages

Stats

Version1.1.1

LanguagePython

Stars0

MaintenanceExcellent

LicenseMIT

Last CommitMar 3, 2026

AddedMar 22, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

skillproof

README

skillproof

A Claude Code plugin for building skills that actually work. Test-driven, with real evidence.

You tell Claude what skill you want. It runs agents without the skill, watches them fail, writes the skill to fix those failures, then proves the skill works by running agents again. No guessing.

Install

# Clone and point Claude at it
git clone https://github.com/patanet7/skillproof.git
claude --plugin-dir /path/to/skillproof

Setup

All workspace output goes to /tmp/skillproof/, never inside your project repo. Skills create workspace directories on demand — no hooks, no environment variables. /tmp is cleaned automatically by the OS on reboot.

To allow Claude to read plugin reference files (e.g., references/best-practices.md, agents/grader.md) without prompting, add this to your ~/.claude/settings.json:

{
  "permissions": {
    "allow": [
      "Read(~/.claude/plugins/cache/**)"
    ]
  }
}

This is a one-time setup. Without it, Claude will prompt for permission each time a skill reads a reference file from the plugin cache.

Once loaded, you get six slash commands:

/skillproof:writing-skills              # The main one. Start here.
/skillproof:skill-tdd                   # TDD methodology (RED-GREEN-REFACTOR)
/skillproof:skill-testing-discipline    # Pressure testing for rule-enforcement skills
/skillproof:skill-testing-output        # A/B output quality comparison
/skillproof:skill-compare               # Head-to-head skill comparison
/skillproof:improve-skill               # Evidence-based skill improvement

How it works

You say what you want. Claude does the rest.

Tell Claude you want to create a skill:

I want to create a skill that teaches agents to always verify their work
before claiming it's done.

Claude loads writing-skills and walks through 6 phases automatically:

Phase 1 — Capture Intent. Claude asks you clarifying questions. What should the skill do? When should it trigger? What type is it? (Discipline skills enforce rules. Workflow skills teach techniques. Reference skills document APIs.)

Phase 2 — Baseline (RED). Claude spawns a subagent, gives it a task without the skill, and watches it fail. This is the critical part. You see exactly what agents do wrong before the skill exists. No hypothesizing — actual observed failures.

Phase 3 — Draft (GREEN). Claude writes a minimal skill that addresses the specific failures it documented. Not a wish list — just enough to fix what it saw break.

Phase 4 — Evaluate. Claude runs the same scenarios with the skill loaded. It grades the results with assertions. Opens a browser-based viewer so you can compare outputs side by side and leave feedback.

Phase 5 — Refine (REFACTOR). Based on your feedback, Claude tightens the skill. For discipline skills, this means capturing new rationalizations agents use to wiggle out of rules and adding explicit counters. For workflow skills, it means improving output quality. Repeat phases 4-5 until you're happy.

Phase 6 — Optimize & Deploy. Claude generates 20 test queries to evaluate whether the skill's description triggers correctly (fires when it should, stays quiet when it shouldn't). Runs an optimization loop, picks the best description by test score, validates the structure, and it's ready to ship.

What triggers automatically

When Claude loads writing-skills, it pulls in the companion skills as needed:

Discipline skill detected → loads skill-testing-discipline for pressure testing
Workflow/technique skill detected → loads skill-testing-output for A/B comparison
Comparing two skills → loads skill-compare for head-to-head matchup
Any skill creation → loads skill-tdd for the RED-GREEN-REFACTOR cycle

You don't need to invoke these manually. They compose.

The six skills

`writing-skills` — Orchestrator

The main entry point. Guides the full 6-phase flow and routes to the right testing strategy based on skill type.

Comes with:

Scripts — description optimization loop, trigger evaluation, benchmark aggregation, structure validation, skill packaging
Agents — grader (assertion evaluation), comparator (blind A/B), analyzer (post-comparison)
Eval viewer — browser-based review UI with side-by-side output comparison and feedback collection
References — authoring best practices, description optimization guide, JSON schemas

`skill-tdd` — TDD methodology

The testing discipline itself. Maps TDD concepts to skill development:

TDD	Skill creation
Write failing test	Run scenario WITHOUT skill, watch agent fail
Write minimal code	Write skill addressing those specific failures
Watch it pass	Run scenario WITH skill, verify improvement
Refactor	Close loopholes, tighten wording, re-test

Iron law: no skill without a failing test first. Write skill before testing? Delete it. Start over.

View full README on GitHub

skillproof

Popularity

What's Inside

Confidence

README

skillproof

Install

Setup

How it works

You say what you want. Claude does the rest.

What triggers automatically

The six skills

`writing-skills` — Orchestrator

`skill-tdd` — TDD methodology

Similar Plugins

skillkit

skill-creator

singularity-claude

skill-forge

skills-toolkit

skill-optimizer

skillproof

Install

Setup

How it works

You say what you want. Claude does the rest.

What triggers automatically

The six skills

`writing-skills` — Orchestrator

`skill-tdd` — TDD methodology

Popularity

Health & Quality

Similar Plugins

skillkit

skill-creator

singularity-claude

skill-forge

skills-toolkit

skill-optimizer

skillproof

Popularity

What's Inside

Confidence

README

skillproof

Install

Setup

How it works

You say what you want. Claude does the rest.

What triggers automatically

The six skills

writing-skills — Orchestrator

skill-tdd — TDD methodology

Similar Plugins

skillkit

skill-creator

singularity-claude

skill-forge

skills-toolkit

skill-optimizer

skillproof

Install

Setup

How it works

You say what you want. Claude does the rest.

What triggers automatically

The six skills

writing-skills — Orchestrator

skill-tdd — TDD methodology

Popularity

Health & Quality

Similar Plugins

skillkit

skill-creator

singularity-claude

skill-forge

skills-toolkit

skill-optimizer

`writing-skills` — Orchestrator

`skill-tdd` — TDD methodology

`writing-skills` — Orchestrator

`skill-tdd` — TDD methodology