Skill

eval-skill

Deep-evaluate a single skill with static analysis and qualitative issue detection, both individually and in context of the full setup. Use when the user wants to check if a specific skill is worth keeping, well-built, or redundant.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/setup-eval:eval-skill

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

BashRead

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Deep-evaluate a single skill using lint (deterministic rules) and qualitative review, both individually and in context of the full setup.

Supporting Files

report-format.mdrubric/contextual-analysis.mdrubric/skills-rubric.mdscripts/run_skill_eval.py

SKILL.md

100 lines · ~930 tokens

Stats

LanguagePython

Stars4

MaintenanceExcellent

Last CommitJun 18, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Evaluate Skill

Deep-evaluate a single skill using lint (deterministic rules) and qualitative review, both individually and in context of the full setup.

Hard Rules

Never give a verdict without running the checks. Read the actual file content and check all rubric categories before assigning a verdict.
Every category must be checked. Both the individual rubric AND the contextual analysis must be fully evaluated.
Read before you judge. Read the actual SKILL.md content (and reference files if they exist).
Don't manufacture problems. If the skill is good, say so. Only report real issues.
Always end with a short summary.
Record the exact start time and compute the exact duration at the end.

Step 1: Ask Output Preference

Before doing anything else, ask the user:

Where should i present the results?

Terminal - print the report here in the conversation

File - write a markdown report to a file (you'll choose the path)

Wait for their answer before proceeding.

Step 2: Select the Skill

Determine the skill path. If the user says a skill name, find it under skills/<name>/SKILL.md.

Step 3: Run Lint (Static Analysis)

Determine the setup context path (usually the current working directory).

uv run python skills/eval-skill/scripts/run_skill_eval.py <skill-path> <context-path> recommended

If no context path, pass - as the second argument.

Read the JSON output. It contains diagnostics, token count, and contextual findings.

Step 4: Read Actual Files

Read the skill's actual content:

The SKILL.md file
All files in the skill's subdirectories (reference files). Check the COMBINED content.
The skill's guidelines.md (if it exists)

Also read for context (don't check these, they're context for evaluating the target skill): 4. All OTHER skill SKILL.md files in the workspace 5. CLAUDE.md 6. Hooks in .claude/settings.json

Step 5: Individual Rubric (Qualitative Review)

Read rubric/skills-rubric.md for the issue categories and what to flag.

Check the skill against all categories. For each issue found, cite specific evidence from the content.

Verdict: KEEP (no issues or minor only), REVIEW (multiple issues), REMOVE (fundamentally broken/redundant)

Step 6: Contextual Analysis

Read rubric/contextual-analysis.md and evaluate all 5 contextual dimensions.

Check redundancy against three sources:

Claude's default behavior (generic advice = redundant)
Other skills in the workspace (overlap = partially redundant)
CLAUDE.md content (duplication = wasted tokens)

Step 7: Produce the Report

Read report-format.md for the full report structure.

The report must include:

Lint results: each failed rule with WHY it failed
Qualitative review issues (by category)
Contextual analysis
+/!/x sections (good, improve, broken)
Final verdict with suggestions

At the very end:

Evaluated with: setup-eval v{version} (claude-code-plugin)
Duration: [X minutes Y seconds]

Get {version} by running: uv run python -c "import importlib.metadata; print(importlib.metadata.version('setup-eval'))"

If the user chose terminal: print the report in the conversation.

If the user chose file: write the report as markdown to the path they specified (or suggest eval-skill-report.md in the current directory). Tell them the file path when done.

eval-skill

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

eval-skill

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Evaluate Skill

Hard Rules

Step 1: Ask Output Preference

Step 2: Select the Skill

Step 3: Run Lint (Static Analysis)

Step 4: Read Actual Files

Step 5: Individual Rubric (Qualitative Review)

Step 6: Contextual Analysis

Step 7: Produce the Report

Similar Skills

Evaluate Skill

Hard Rules

Step 1: Ask Output Preference

Step 2: Select the Skill

Step 3: Run Lint (Static Analysis)

Step 4: Read Actual Files

Step 5: Individual Rubric (Qualitative Review)

Step 6: Contextual Analysis

Step 7: Produce the Report

Similar Skills