From Setup Eval
Deep-evaluate a single skill with static analysis and qualitative issue detection, both individually and in context of the full setup. Use when the user wants to check if a specific skill is worth keeping, well-built, or redundant.
How this skill is triggered — by the user, by Claude, or both
Slash command
/setup-eval:eval-skillThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Deep-evaluate a single skill using lint (deterministic rules) and qualitative review, both individually and in context of the full setup.
Deep-evaluate a single skill using lint (deterministic rules) and qualitative review, both individually and in context of the full setup.
Before doing anything else, ask the user:
Where should i present the results?
- Terminal - print the report here in the conversation
- File - write a markdown report to a file (you'll choose the path)
Wait for their answer before proceeding.
Determine the skill path. If the user says a skill name, find it under skills/<name>/SKILL.md.
Determine the setup context path (usually the current working directory).
uv run python skills/eval-skill/scripts/run_skill_eval.py <skill-path> <context-path> recommended
If no context path, pass - as the second argument.
Read the JSON output. It contains diagnostics, token count, and contextual findings.
Read the skill's actual content:
Also read for context (don't check these, they're context for evaluating the target skill): 4. All OTHER skill SKILL.md files in the workspace 5. CLAUDE.md 6. Hooks in .claude/settings.json
Read rubric/skills-rubric.md for the issue categories and what to flag.
Check the skill against all categories. For each issue found, cite specific evidence from the content.
Verdict: KEEP (no issues or minor only), REVIEW (multiple issues), REMOVE (fundamentally broken/redundant)
Read rubric/contextual-analysis.md and evaluate all 5 contextual dimensions.
Check redundancy against three sources:
Read report-format.md for the full report structure.
The report must include:
At the very end:
Evaluated with: setup-eval v{version} (claude-code-plugin)
Duration: [X minutes Y seconds]
Get {version} by running: uv run python -c "import importlib.metadata; print(importlib.metadata.version('setup-eval'))"
If the user chose terminal: print the report in the conversation.
If the user chose file: write the report as markdown to the path they specified (or suggest eval-skill-report.md in the current directory). Tell them the file path when done.
npx claudepluginhub redhat-community-ai-tools/harness-eval-lab --plugin setup-evalProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.