By mjenkinsx9
Author, evaluate, security-review, and autonomously improve Claude Agent Skills. Bundles the skill-building workflow as the improving-skills autoresearch loop, goal-setting commands, reference protocols, and the static eval harness.
Interview the user and write a goal.md sidecar that anchors an improving-skills run against an existing SKILL.md with measurable targets
Interview the author and write a goal.md sidecar that anchors a new skill from manual practice through harness-clean SKILL.md
Author, evaluate, security-review, and autonomously improve Claude Agent Skills — in any project, with evidence, not vibes.
The core idea: a skill is only as good as the evidence behind it. skill-kit
brings the full evaluation stack from the
skill-testing dev-bench to any
project as an installable plugin — a 21-check static harness, a live behavioral
check, empirical trigger measurement, a blind value-add baseline, and an
autoresearch loop that iteratively tightens a SKILL.md and keeps only the
changes that score better.
| Layer | Tool | What it proves |
|---|---|---|
| 🧹 Static | check-skill | The skill is well-formed — 21 checks: frontmatter, naming, trigger language, secret & security-smell scans |
| ⚡ Behavioral | behavioral-check | The documented commands actually run against the real system today |
| 🎯 Empirical trigger | trigger-accuracy | The skill really fires on its positive prompts and declines its negatives |
| ⚖️ Value-add | value-add-test | The skill beats the cold model in a blind head-to-head |
| 🔄 Autoresearch | improving-skills | Iterative modify → score → keep-or-revert loop that tightens an existing SKILL.md |
| ⚓ Goal anchoring | goal-new-skill · goal-improve-skill | Skill work is driven by a measurable end state, not vibes |
The harness is self-tested (9 pathological fixtures + 34 pytest tests) and CI enforces zero FAILs and zero WARNs on the shipped skills, on every push and PR.
Inside Claude Code, on any machine:
/plugin marketplace add mjenkinsx9/mjenkins-toolbox
/plugin install skill-kit@mjenkins-toolbox
Update later with /plugin update skill-kit. Because a plugin's bin/ joins the
Bash PATH, every helper is callable by bare name in any repo.
skill-kit also installs on Copilot CLI, Codex, Cursor, and Gemini —
see Multi-harness support.
# Lint a skill directory (or a SKILL.md directly) — exit 0 iff zero FAILs:
check-skill .claude/skills/my-skill
/skill-kit:improving-skills .claude/skills/my-skill/SKILL.md
Full guides live in docs/. Start with the overview, then dive into the
section you need.
| Doc | Description |
|---|---|
| Overview | What skill-kit is, the core idea, and the evaluation stack |
| Installation | Marketplace install, the bin/-on-PATH convention, requirements |
| Usage | The bin/ commands, slash commands, and the improving-skills scoring loop |
| Multi-harness support | Per-harness status, manifests, and install notes |
| Repository layout | Map of manifests, helpers, commands, skills, and tests |
| Testing the tests | Self-tests + pytest, and the relationship to skill-testing |
Full documentation map: docs/README.md
MIT © 2026 Mike Jenkins · Security policy
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub mjenkinsx9/mjenkins-toolbox --plugin skill-kitAuto stage → secrets-scan → commit → push after every agent turn, strictly per-repo opt-in (/autogit on). One-command undo, quiet batching, PR mode.
Personal code review: deterministic CLI gathers diff/context/learnings; the active Claude session reasons over them.
Harness-native ECC operator layer - 67 agents, 271 skills, 92 legacy command shims, reusable hooks, rules, selective install profiles, and production-ready workflows for Claude Code, Codex, OpenCode, Cursor, and related agent harnesses
v9.44.1 — Patch release for Gemini environment/version detection and qwen auth gating. Run /octo:setup.
Persistent file-based planning for AI coding agents. Crash-proof markdown plans (task_plan.md, findings.md, progress.md) that survive context loss and /clear, with an opt-in completion gate and multi-agent shared state. Manus-style. Works with Claude Code, Codex CLI, Cursor, Kiro, OpenCode and 60+ agents via the SKILL.md standard. Includes Arabic, German, Spanish, and Chinese (Simplified and Traditional).
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Superpowers Plus core skills library for Claude Code: planning, execution routing, TDD, debugging, and collaboration workflows
Next.js development expertise with skills for App Router, Server Components, Route Handlers, Server Actions, and authentication patterns