This skill should be used when the user asks to "test my skill", "run skill tests", "evaluate a skill", "run the test suite", "check skill quality", "/skill-unit", or mentions skill testing, skill evaluation, or running spec files. It provides a structured unit testing framework for AI agent skills with anti-bias evaluation.
ALWAYS use this skill when the user mentions writing, designing, creating, or adding test cases for any skill, even if they also describe specific behavior to test. Triggers on "write a test case", "write me a test case", "write test cases", "design tests", "create a spec file", "help me write tests", "add tests", "no tests yet", "/test-design", or any request that involves creating test cases, spec files, or test coverage for a skill. If the user says "write a test case for X that covers Y", this skill handles it, not the skill being tested.
A plugin that brings structured, reproducible unit testing to AI agent skills.
Skill Unit lets you write test specs for AI agent skills using a familiar unit-testing mental model — define prompts, declare expected outcomes, and get pass/fail results. It uses process-level isolation to ensure unbiased evaluation: each test prompt runs in a separate CLI session that has no access to expectations or any indication it is being tested.
*.spec.md) — test cases written as prompts with expectations, grouped into suites with YAML frontmatterskill-tests/ directory with *.spec.md files (see skills/skill-unit/templates/example.spec.md)/skill-unit or ask your agent to "run skill tests"Phase 1 (MVP) — in development.
External network access
Connects to servers outside your machine
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub dflor003/skill-unit --plugin skill-unitBenchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs
Agent and skill evaluation harness with MLflow integration
Professional skill and subagent creation with dual-mode workflow: 12-step fast mode and 15-step full mode with behavioral pressure testing and TDD integration.
Skill evaluation and benchmarking - test skill effectiveness with behavioral eval cases, grade results, and track quality improvements
SDK Usability Benchmark — generate, execute, judge, and analyze AI agent benchmark suites
Open-source testing and regression detection framework for AI agents. Golden baseline diffing, CI/CD integration, works with LangGraph, CrewAI, OpenAI, Anthropic Claude, HuggingFace, Ollama, and MCP.