From skill-creator
Create production-grade agent skills aligned with the 2026 AgentSkills.io spec and Anthropic best practices. Also validates existing skills against the Intent Solutions 100-point rubric. Use when building, testing, validating, or optimizing Claude Code skills. Trigger with "/skill-creator", "create a skill", "validate my skill", or "check skill quality". Make sure to use this skill whenever creating a new skill, slash command, or agent capability.
How this skill is triggered — by the user, by Claude, or both
Slash command
/skill-creator:skill-creatorinheritThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Creates complete, spec-compliant skill packages following AgentSkills.io and Anthropic standards.
agents/analyzer.mdagents/comparator.mdagents/grader.mdassets/eval_review.htmleval-viewer/generate_review.pyeval-viewer/viewer.htmlreferences/advanced-eval-workflow.mdreferences/frontmatter-spec.mdreferences/output-patterns.mdreferences/schemas.mdreferences/validation-rules.mdreferences/workflows.mdscripts/__init__.pyscripts/aggregate_benchmark.pyscripts/generate_report.pyscripts/improve_description.pyscripts/package_skill.pyscripts/quick_validate.pyscripts/run_eval.pyscripts/run_loop.pyCreates complete, spec-compliant skill packages following AgentSkills.io and Anthropic standards. Supports both creation and validation workflows with 100-point marketplace grading.
Skill Creator solves the gap between writing ad-hoc agent skills and producing marketplace-ready
packages that score well on the Intent Solutions 100-point rubric. It enforces the 2026 spec
(top-level identity fields, ${CLAUDE_SKILL_DIR} paths, scored sections) and catches
contradictions that would cost marketplace points. Supports two modes: create new skills from
scratch with full validation, or grade/audit existing skills with actionable fix suggestions.
Determine user intent from their prompt:
Ask the user with AskUserQuestion:
Skill Identity:
processing-pdfs, analyzing-data)Execution Model:
/name? Or background knowledge only?$ARGUMENTS substitution)context: fork for subagent execution)disable-model-invocation: true — prevents auto-activation, requires /name)Required Tools:
Bash(git:*), Bash(npm:*), etc.ServerName:tool_nameComplexity:
scripts/)references/)templates/)Location:
~/.claude/skills/<skill-name>/.claude/skills/<skill-name>/Before writing, determine:
Degrees of Freedom:
| Level | When to Use |
|---|---|
| High | Creative/open-ended tasks (analysis, writing) |
| Medium | Defined workflow, flexible content (most skills) |
| Low | Strict output format (compliance, API calls, configs) |
Workflow Pattern (see ${CLAUDE_SKILL_DIR}/references/workflows.md):
Output Pattern (see ${CLAUDE_SKILL_DIR}/references/output-patterns.md):
Create the skill directory and files:
mkdir -p {location}/{skill-name}
mkdir -p {location}/{skill-name}/scripts # if needed
mkdir -p {location}/{skill-name}/references # if needed
mkdir -p {location}/{skill-name}/templates # if needed
mkdir -p {location}/{skill-name}/assets # if needed
mkdir -p {location}/{skill-name}/evals # for eval-driven development
Generate the SKILL.md using the template from ${CLAUDE_SKILL_DIR}/templates/skill-template.md.
Frontmatter rules (see ${CLAUDE_SKILL_DIR}/references/frontmatter-spec.md):
Required fields:
name: {skill-name} # Must match directory name
description: | # Third person, what + when + keywords
{What it does}. Use when {scenario}.
Trigger with "/{skill-name}" or "{natural phrase}".
Identity fields (top-level — marketplace validator scores these here):
version: 1.0.0
author: {name} <{email}>
license: MIT
IMPORTANT: version, author, license, tags, and compatible-with are TOP-LEVEL fields.
Do NOT nest them under metadata:. The marketplace 100-point validator checks them at top-level.
Recommended fields:
allowed-tools: "{scoped tools}"
model: inherit
Optional Claude Code extensions:
argument-hint: "[arg]" # If accepts $ARGUMENTS
context: fork # If needs isolated execution
agent: general-purpose # Subagent type (with context: fork)
disable-model-invocation: true # If explicit /name only (no auto-activation)
user-invocable: false # If background knowledge only
compatibility: "Python 3.10+" # If environment-specific
compatible-with: claude-code, codex # Platforms this works on
tags: [devops, ci] # Discovery tags
Description writing — maximize discoverability scoring:
Descriptions determine activation AND marketplace grade. "Use when"/"Trigger with" scoring is enterprise-tier only (marketplace grading). Standard tier does not penalize for missing these patterns. However, they remain best practices for discoverability regardless of tier.
# Good - scores +6 pts on enterprise marketplace grading
description: |
Analyze Python code for security vulnerabilities. Use when reviewing code
before deployment. Trigger with "/security-scan" or "scan for vulnerabilities".
# Acceptable at standard tier, but loses 6 pts at enterprise tier
description: |
Analyzes code for security issues.
Pattern (enterprise): "Use when [scenario]" (+3 pts) + "Trigger with [phrases]" (+3 pts) + "Make sure to use whenever..." for aggressive claiming.
Token budget awareness: All installed skill descriptions load at startup (~100 tokens each). The total skill list is capped at ~15,000 characters (SLASH_COMMAND_TOOL_CHAR_BUDGET). Keep descriptions impactful but efficient.
Body content guidelines — section recommendations:
Anthropic's spec places no format restrictions on body content. The sections below are enterprise-tier quality recommendations scored by the Intent Solutions marketplace rubric. At standard tier, these are not required but are still good practice:
## Overview (>50 chars content: +4 pts enterprise)
## Prerequisites (+2 pts enterprise)
## Instructions (numbered steps: +3 pts enterprise)
## Output (+2 pts enterprise)
## Error Handling (+2 pts enterprise)
## Examples (+2 pts enterprise)
## Resources (+1 pt enterprise)
5+ sections total: +2 pts bonus (enterprise)
Additional guidelines:
references/ if longer)${CLAUDE_SKILL_DIR}/ for internal file references in the skills you createString substitutions available:
$ARGUMENTS / $0, $1 - user-provided arguments${CLAUDE_SESSION_ID} - current session ID!`command` - dynamic context injectionScripts (scripts/):
chmod +x scripts/*.pyReferences (references/):
references/sub/dir/)Templates (templates/):
{{VARIABLE_NAME}})Assets (assets/):
Run validation (see ${CLAUDE_SKILL_DIR}/references/validation-rules.md):
python3 ${CLAUDE_SKILL_DIR}/scripts/validate-skill.py {skill-dir}/SKILL.md
python3 ${CLAUDE_SKILL_DIR}/scripts/validate-skill.py --grade {skill-dir}/SKILL.md
Standard tier is the default (no required fields, broad compatibility). Use --enterprise for full 100-point marketplace grading.
Validation checks:
${CLAUDE_SKILL_DIR}/ references existIf validation fails: fix issues and re-run. Common fixes:
Bash(git:*) not Bash${CLAUDE_SKILL_DIR}/Create evals/evals.json with minimum 3 scenarios: happy path, edge case, negative test.
[
{"name": "basic_usage", "prompt": "Trigger prompt", "assertions": ["Expected behavior"]},
{"name": "edge_case", "prompt": "Edge case prompt", "assertions": ["Expected handling"]},
{"name": "negative_test", "prompt": "Should NOT trigger", "assertions": ["Skill inactive"]}
]
Run parallel evaluation: Claude A with skill installed vs Claude B without. Compare outputs against assertions — the skill should produce meaningfully better results for its target use cases.
validate-skill.py --gradeCommon fixes: undertriggering -> pushier description, wrong format -> explicit output examples, over-constraining -> increase degrees of freedom.
Create 20 trigger evaluation queries (10 should-trigger, 10 should-not-trigger). Split into train (14) and test (6) sets. Iterate description until >90% accuracy on both sets.
Tips: front-load distinctive keywords, include specific file types/tools/domains, add "Use when...", "Trigger with...", "Make sure to use whenever..." patterns. Avoid generic terms that overlap with other skills.
Show the user:
SKILL CREATED
====================================
Location: {full path}
Files:
SKILL.md ({lines} lines)
scripts/{files}
references/{files}
templates/{files}
evals/evals.json
Validation: Enterprise tier
Errors: {count}
Warnings: {count}
Disclosure Score: {score}/6
Grade: {letter} ({points}/100)
Eval Results:
Scenarios: {count}
Passed: {count}/{count}
Description Accuracy: {percentage}%
Usage:
/{skill-name} {argument-hint}
or: "{natural language trigger}"
====================================
When the user wants to validate, grade, or audit an existing skill:
Ask for the SKILL.md path or detect from context. Common locations:
~/.claude/skills/<skill-name>/SKILL.md (global).claude/skills/<skill-name>/SKILL.md (project)python3 ${CLAUDE_SKILL_DIR}/scripts/validate-skill.py --grade {path}/SKILL.md
100-point rubric across 5 pillars:
| Pillar | Max | What It Measures |
|---|---|---|
| Progressive Disclosure | 30 | Token economy, layered structure, navigation |
| Ease of Use | 25 | Metadata, discoverability, workflow clarity |
| Utility | 20 | Problem solving, examples, feedback loops |
| Spec Compliance | 15 | Frontmatter, naming, description quality |
| Writing Style | 10 | Voice, objectivity, conciseness |
| Modifiers | +/-5 | Bonuses/penalties for patterns |
Grade scale: A (90+), B (80-89), C (70-79), D (60-69), F (<60)
See ${CLAUDE_SKILL_DIR}/references/validation-rules.md for detailed sub-criteria.
Present the grade report with specific fix recommendations. Prioritize fixes by point value (highest first).
If the user says "fix it" or "auto-fix", apply the suggested improvements:
${CLAUDE_SKILL_DIR}/User: Create a skill called "code-review" that reviews code quality
Creates:
~/.claude/skills/code-review/
├── SKILL.md
└── evals/
└── evals.json
Frontmatter:
---
name: code-review
description: |
Make sure to use this skill whenever reviewing code for quality, security
vulnerabilities, and best practices. Use when doing code reviews, PR analysis,
or checking code quality. Trigger with "/code-review" or "review this code".
allowed-tools: "Read,Glob,Grep"
version: 1.0.0
author: Jeremy Longshore <[email protected]>
license: MIT
model: inherit
---
User: Create a skill that generates release notes from git history
Creates:
~/.claude/skills/generating-release-notes/
├── SKILL.md (argument-hint: "[version-tag]")
├── scripts/
│ └── parse-commits.py
├── references/
│ └── commit-conventions.md
├── templates/
│ └── release-template.md
└── evals/
└── evals.json
Uses $ARGUMENTS[0] for version tag.
Uses context: fork for isolated execution.
User: Grade my skill at ~/.claude/skills/code-review/SKILL.md
Runs: python3 ${CLAUDE_SKILL_DIR}/scripts/validate-skill.py --grade ~/.claude/skills/code-review/SKILL.md
Output:
Grade: B (84/100)
Improvements:
- Add "Trigger with" to description (+3 pts)
- Add ## Output section (+2 pts)
- Add ## Prerequisites section (+2 pts)
$ARGUMENTS, handle the empty caseBash, always scope itinherit, only override with good reason| Error | Cause | Solution |
|---|---|---|
| Name exists | Directory already present | Choose different name or confirm overwrite |
| Invalid name | Not kebab-case or >64 chars | Fix to lowercase-with-hyphens |
| Validation fails | Missing fields or anti-patterns | Run validator, fix reported issues |
| Resource missing | ${CLAUDE_SKILL_DIR}/ ref points to nonexistent file | Create the file or fix the reference |
| Undertriggering | Description too passive | Add "Make sure to use whenever..." phrasing |
| Eval failures | Skill not producing expected output | Iterate on instructions and re-test |
| Low grade | Missing scored sections or fields | Add Overview, Prerequisites, Output sections |
References: ${CLAUDE_SKILL_DIR}/references/
source-of-truth.md — Canonical spec | frontmatter-spec.md — Field reference | validation-rules.md — 100-point rubricworkflows.md — Workflow patterns | output-patterns.md — Output formats | schemas.md — JSON schemas (evals, grading, benchmarks)anthropic-comparison.md — Gap analysis | advanced-eval-workflow.md — Eval, iteration, optimization, platform notesAgents (read when spawning subagents): ${CLAUDE_SKILL_DIR}/agents/
grader.md — Assertion evaluation | comparator.md — Blind A/B comparison | analyzer.md — Benchmark analysisScripts: ${CLAUDE_SKILL_DIR}/scripts/
validate-skill.py — 100-point rubric grading | quick_validate.py — Lightweight validationaggregate_benchmark.py — Benchmark stats | run_eval.py — Trigger accuracy testingrun_loop.py — Description optimization loop | improve_description.py — LLM-powered rewritinggenerate_report.py — HTML reports | package_skill.py — .skill packaging | utils.py — Shared utilitiesEval Viewer: ${CLAUDE_SKILL_DIR}/eval-viewer/ — generate_review.py + viewer.html (interactive output comparison)
Assets: ${CLAUDE_SKILL_DIR}/assets/eval_review.html (trigger eval set editor)
Templates: ${CLAUDE_SKILL_DIR}/templates/skill-template.md (SKILL.md skeleton)
For detailed empirical eval workflow (Steps E1-E5), read ${CLAUDE_SKILL_DIR}/references/advanced-eval-workflow.md.
Quick summary: Spawn with-skill and baseline subagents in parallel -> draft assertions while running -> capture timing data from task notifications -> grade with ${CLAUDE_SKILL_DIR}/agents/grader.md -> aggregate with scripts/aggregate_benchmark.py -> launch eval-viewer/generate_review.py for interactive human review -> read feedback.json.
For iteration loop details, read ${CLAUDE_SKILL_DIR}/references/advanced-eval-workflow.md (section "Improving the Skill").
Key principles: Generalize from feedback (don't overfit), keep prompts lean, explain the why behind rules (not just prescriptions), and bundle repeated helper scripts.
For the full pipeline (Steps D1-D4), read ${CLAUDE_SKILL_DIR}/references/advanced-eval-workflow.md (section "Description Optimization"). Quick summary: generate 20 realistic trigger eval queries -> review with user via ${CLAUDE_SKILL_DIR}/assets/eval_review.html -> run python -m scripts.run_loop (60/40 train/test, 3 runs/query, up to 5 iterations) -> apply best_description.
For A/B testing between skill versions, read ${CLAUDE_SKILL_DIR}/agents/comparator.md and ${CLAUDE_SKILL_DIR}/agents/analyzer.md. Optional; most users won't need it.
python -m scripts.package_skill <path/to/skill-folder> [output-directory] — Creates distributable .skill zip after validation.
See ${CLAUDE_SKILL_DIR}/references/advanced-eval-workflow.md (section "Platform-Specific Notes").
--static for eval viewer. Generate viewer BEFORE self-evaluation.npx claudepluginhub nickloveinvesting/nick-love-plugins --plugin skill-creatorCreates and validates Claude Code skills per AgentSkills.io 2026 spec and 100-point rubric. Use for building new skills or auditing existing ones.
Creates, refines, and benchmarks Claude Code agent skills. Drafts content, generates test prompts, runs evals with Python scripts, analyzes results, and iterates on feedback.
Creates new Claude Code skills, modifies existing ones, runs evaluations and benchmarks with variance analysis, and optimizes descriptions for better triggering.