Interactive skill authoring for Claude Code
npx claudepluginhub yiminnn/skill-bench-pluginInteractive skill authoring bench — create, test, and refine Claude Code skills through conversation
Interactive skill authoring for Claude Code — create, test, and refine skills through structured conversation.
claude plugin marketplace add https://github.com/Yiminnn/skill-bench-plugin
claude plugin install skill-bench
/skill-bench
| Phase | What Happens | Powered By |
|---|---|---|
| 1. Design | Brainstorm approaches, produce design spec | superpowers:brainstorming |
| 2. Plan | Generate implementation tasks with TDD steps | superpowers:writing-plans |
| 3. Build & Test | Build skill, eval with baseline comparison, iterate | skill-creator |
| 4. Validate | Multirun consistency testing, user judgment, refinement | consistency-tester + skill-refiner |
| 5. Finalize | Lint, validate references, promote | built-in |
Already have a skill? Skip straight to validation:
/skill-bench
> refine path/to/my-skill/SKILL.md
| You want to... | Say... |
|---|---|
| Create a new skill | /skill-bench |
| Refine an existing skill | /skill-bench then refine path/to/skill |
| Approve a step | y or looks good |
| Edit the draft yourself | Edit in your editor, then say I edited it |
| Run quick test | yes (when offered sample run) |
| Run thorough testing | full validation |
| Mark a run as failed | run 3 failed — [what went wrong] |
| Approve proposed fixes | approve all or approve fix 1 and 3 |
| Finish testing | validation complete |
| Check existing drafts | show me my skill drafts |
| Component | Type | Model | Purpose |
|---|---|---|---|
skill-bench | Skill | — | 5-phase workflow orchestrator |
skill-tester | Agent | Opus | Simulates skill execution, returns structured eval with thinking trace |
consistency-tester | Agent | Opus | Multirun validation: run N times, compare, collect judgment, refine |
skill-refiner | Agent | Opus | Dual-lens failure analysis (cross-run + per-run), proposes targeted edits |
skill-explorer | Agent | Haiku | Read-only scanner for drafts and test history |
On first use, creates .skillbench/config.json in your project:
{
"drafts_dir": "skills/drafts",
"evals_dir": ".skillbench/evals",
"test_model": "claude-opus-4-6",
"context_files": []
}
| Path | Purpose | Tracked? |
|---|---|---|
.skillbench/config.json | Project settings | Yes |
.skillbench/specs/ | Design specs | Yes |
.skillbench/plans/ | Implementation plans | Yes |
.skillbench/evals/ | Eval definitions | Yes |
.skillbench/test-cases/ | Test case libraries | Yes |
.skillbench/workspace/ | Skill-creator iterations | No |
.skillbench/test-history/ | Test results and refinements | No |
Requires two dependencies (both auto-installed on first use):
Manual install if needed:
claude plugin install claude-plugins-official/superpowers
MIT
Claude Code marketplace entries for the plugin-safe Antigravity Awesome Skills library and its compatible editorial bundles.
Production-ready workflow orchestration with 84 marketplace plugins, 192 local specialized agents, and 156 local skills - optimized for granular installation and minimal token usage
Directory of popular Claude Code extensions including development tools, productivity plugins, and MCP integrations