🧰 skill-kit

The skill-testing workshop, packaged as a Claude Code plugin

Author, evaluate, security-review, and autonomously improve Claude Agent Skills — in any project, with evidence, not vibes.

The core idea: a skill is only as good as the evidence behind it. skill-kit brings the full evaluation stack from the skill-testing dev-bench to any project as an installable plugin — a 21-check static harness, a live behavioral check, empirical trigger measurement, a blind value-add baseline, and an autoresearch loop that iteratively tightens a SKILL.md and keeps only the changes that score better.

✨ What's inside

Layer	Tool	What it proves
🧹 Static	`check-skill`	The skill is well-formed — 21 checks: frontmatter, naming, trigger language, secret & security-smell scans
⚡ Behavioral	`behavioral-check`	The documented commands actually run against the real system today
🎯 Empirical trigger	`trigger-accuracy`	The skill really fires on its positive prompts and declines its negatives
⚖️ Value-add	`value-add-test`	The skill beats the cold model in a blind head-to-head
🔄 Autoresearch	`improving-skills`	Iterative modify → score → keep-or-revert loop that tightens an existing SKILL.md
⚓ Goal anchoring	`goal-new-skill` · `goal-improve-skill`	Skill work is driven by a measurable end state, not vibes

The harness is self-tested (9 pathological fixtures + 34 pytest tests) and CI enforces zero FAILs and zero WARNs on the shipped skills, on every push and PR.

🚀 Install

Inside Claude Code, on any machine:

/plugin marketplace add mjenkinsx9/mjenkins-toolbox
/plugin install skill-kit@mjenkins-toolbox

Update later with /plugin update skill-kit. Because a plugin's bin/ joins the Bash PATH, every helper is callable by bare name in any repo. skill-kit also installs on Copilot CLI, Codex, Cursor, and Gemini — see Multi-harness support.

# Lint a skill directory (or a SKILL.md directly) — exit 0 iff zero FAILs:
check-skill .claude/skills/my-skill

/skill-kit:improving-skills .claude/skills/my-skill/SKILL.md

📚 Documentation

Full guides live in docs/. Start with the overview, then dive into the section you need.

Doc	Description
Overview	What skill-kit is, the core idea, and the evaluation stack
Installation	Marketplace install, the `bin/`-on-`PATH` convention, requirements
Usage	The `bin/` commands, slash commands, and the improving-skills scoring loop
Multi-harness support	Per-harness status, manifests, and install notes
Repository layout	Map of manifests, helpers, commands, skills, and tests
Testing the tests	Self-tests + pytest, and the relationship to skill-testing

Full documentation map: docs/README.md

Skill Kit

Popularity

What's Inside

README

🧰 skill-kit

The skill-testing workshop, packaged as a Claude Code plugin

✨ What's inside

🚀 Install

📚 Documentation

📄 License

Confidence

Similar Plugins

ecc

octo

planning-with-files

fullstack-dev-skills

superpowers-plus

nextjs-expert

More by mjenkinsx9

autogit

anchor