Search everything...

Stats

Actions

Available In

skill-bench

Name: skill-bench
Author: yiminnn

By Yiminnn

Interactive skill authoring bench — create, test, and refine Claude Code skills through conversation

npx claudepluginhub yiminnn/skill-bench-plugin --plugin skill-bench

Popularity

Stars

Above avg

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Agents4

consistency-tester

/consistency-tester

Use when validating a skill draft through multirun testing. Takes a test case library, runs the skill-tester agent N times per case, summarizes consistency across runs, collects user pass/fail judgments with annotations, analyzes failure patterns across cases, proposes targeted skill edits, and manages the re-run loop until the user declares validation complete.

skill-explorer

/skill-explorer

Use when you need to browse, list, or summarize existing skill drafts and their test history. Scans the drafts directory, reads frontmatter, and reports on draft status.

skill-refiner

/skill-refiner

Use when a skill draft has been tested against multiple cases and needs refinement based on test results and user feedback. Takes skill content, N test results with thinking traces, and user annotations identifying which results failed and why. Analyzes failure patterns across runs and proposes targeted skill edits.

skill-tester

/skill-tester

Use when testing a skill draft via simulated execution. Takes a skill's SKILL.md content + sample input, reasons through what the skill would produce, and returns a structured evaluation with pass/fail status, issues, thinking trace, and suggested next test cases.

Skills1

skill-bench

/skill-bench

Use when authoring new Claude Code skills or refining existing ones. New skills: 5-phase workflow (Design → Plan → Build → Validate → Finalize). Existing skills: enter at Refine Mode for validation and targeted refinement.

Stats

Version0.4.0

Stars2

MaintenanceExcellent

LicenseMIT

Last CommitMar 26, 2026

AddedMar 27, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

skill-bench-marketplace

Safety Signals

Caution

Uses power tools

Uses Bash, Write, or Edit tools

README

Skill Bench

Interactive skill authoring for Claude Code — create, test, and refine skills through structured conversation.

Install

claude plugin marketplace add https://github.com/Yiminnn/skill-bench-plugin
claude plugin install skill-bench

Quick Start

/skill-bench

Phase	What Happens	Powered By
1. Design	Brainstorm approaches, produce design spec	`superpowers:brainstorming`
2. Plan	Generate implementation tasks with TDD steps	`superpowers:writing-plans`
3. Build & Test	Build skill, eval with baseline comparison, iterate	`skill-creator`
4. Validate	Multirun consistency testing, user judgment, refinement	`consistency-tester` + `skill-refiner`
5. Finalize	Lint, validate references, promote	built-in

Refine an existing skill

Already have a skill? Skip straight to validation:

/skill-bench
> refine path/to/my-skill/SKILL.md

Quick Reference

You want to...	Say...
Create a new skill	`/skill-bench`
Refine an existing skill	`/skill-bench` then `refine path/to/skill`
Approve a step	`y` or `looks good`
Edit the draft yourself	Edit in your editor, then say `I edited it`
Run quick test	`yes` (when offered sample run)
Run thorough testing	`full validation`
Mark a run as failed	`run 3 failed — [what went wrong]`
Approve proposed fixes	`approve all` or `approve fix 1 and 3`
Finish testing	`validation complete`
Check existing drafts	`show me my skill drafts`

Components

Component	Type	Model	Purpose
`skill-bench`	Skill	—	5-phase workflow orchestrator
`skill-tester`	Agent	Opus	Simulates skill execution, returns structured eval with thinking trace
`consistency-tester`	Agent	Opus	Multirun validation: run N times, compare, collect judgment, refine
`skill-refiner`	Agent	Opus	Dual-lens failure analysis (cross-run + per-run), proposes targeted edits
`skill-explorer`	Agent	Haiku	Read-only scanner for drafts and test history

Configuration

On first use, creates .skillbench/config.json in your project:

{
  "drafts_dir": "skills/drafts",
  "evals_dir": ".skillbench/evals",
  "test_model": "claude-opus-4-6",
  "context_files": []
}

Artifacts

Path	Purpose	Tracked?
`.skillbench/config.json`	Project settings	Yes
`.skillbench/specs/`	Design specs	Yes
`.skillbench/plans/`	Implementation plans	Yes
`.skillbench/evals/`	Eval definitions	Yes
`.skillbench/test-cases/`	Test case libraries	Yes
`.skillbench/workspace/`	Skill-creator iterations	No
`.skillbench/test-history/`	Test results and refinements	No

Prerequisites

Requires two dependencies (both auto-installed on first use):

superpowers — Phases 1-2 (brainstorming + planning)
skill-creator — Phase 3 (build + eval)

Manual install if needed:

claude plugin install claude-plugins-official/superpowers

License

MIT

skill-bench

Popularity

What's Inside

Confidence

README

Skill Bench

Install

Quick Start

Refine an existing skill

Quick Reference

Components

Configuration

Artifacts

Prerequisites

License

Similar Plugins

skill-forge

skillkit

skill-creator

skill-creator-pro

skills-toolkit

ecc

Skill Bench

Install

Quick Start

Refine an existing skill

Quick Reference

Components

Configuration

Artifacts

Prerequisites

License

Popularity

Health & Quality

Similar Plugins

skill-forge

skillkit

skill-creator

skill-creator-pro

skills-toolkit

ecc