From better-skills
Review an agent skill by combining automated linting with structured semantic analysis. Runs hard-rule validation, evaluates contextual findings, then performs deep review against best practices (description quality, workflow design, runtime robustness, script conventions, UX patterns). Produces actionable improvement suggestions with before/after examples. This skill should be used when reviewing a skill, validating skill structure, improving skill quality, checking skill conventions, or when the user says 'review skill', 'validate skill', 'check skill', 'improve skill', 'iterate on skill', '走查技能', '验证技能', '检查 skill', '改进技能', '优化 skill'.
How this skill is triggered — by the user, by Claude, or both
Slash command
/better-skills:better-skill-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Match user's language**: Respond in the same language the user uses.
Match user's language: Respond in the same language the user uses.
Review an agent skill through three layers: automated linting (hard rules), contextual finding evaluation (agent judges with context), and structured semantic review (deep analysis against best practices). Optionally fix issues and verify fixes through independent subagent validation.
Progress:
Accept the skill location as a directory path containing SKILL.md. Auto-detect if the current working directory contains one.
Run both tools, then evaluate.
2a. Automated linting:
python3 {SKILL_DIR}/scripts/validate.py run --path <skill-path>
Output: checks (hard-rule verdicts → linter grade) + findings (soft detections needing judgment).
2b. Profile extraction:
bash {SKILL_DIR}/scripts/analyze.sh analyze <skill-path>
Note skill level (l0/l0plus/l1) and feature flags.
2c. Contextual findings review:
Judge each finding using its context_hint. See references/semantic_dimensions.md § Finding Judgment Table for decision criteria. Promote real issues to warnings; dismiss the rest with a brief note.
2d. Semantic review:
Read the skill's SKILL.md fully, then score each dimension (0-3). Read references/semantic_dimensions.md for the full checklist per dimension:
| Dimension | What it evaluates |
|---|---|
| 5.1 Description Quality | Trigger phrases, length, voice, specificity |
| 5.2 Workflow Design | Steps, decision points, SKILL.md length |
| 5.3 Runtime Robustness | Preflight, degradation, troubleshooting (L0+/L1 only) |
| 5.4 Script Quality | JSON output, error handling, exit codes (scripts only) |
| 5.5 UX Practices | Language, checklist, completion report (applicability matrix) |
| 5.6 Setup Flow Integrity | Bootstrap safety, live validation, credential security |
Format the report:
[Skill Review] <skill-name>
═══ Linter ═══
Grade: <letter> (<pass>/<total> passed, <warn> warnings, <fail> failures)
Failures:
✗ <check_id>: <message> → Fix: <fix>
Warnings:
⚠ <check_id>: <message>
Findings (agent-reviewed):
✓ <finding_id>: dismissed — <reason>
⚠ <finding_id>: promoted to warning — <reason>
═══ Semantic Review ═══
5.1 Description Quality: <score>/3 <one-line assessment>
5.2 Workflow Design: <score>/3 <one-line assessment>
5.3 Runtime Robustness: <score>/3 <one-line assessment>
5.4 Script Quality: <score>/3 <one-line assessment>
5.5 UX Practices: <score>/3 <one-line assessment>
5.6 Setup Flow Integrity: <score>/3 <one-line assessment>
─────────
Semantic Score: <total>/18
═══ Improvement Suggestions ═══
For each dimension scoring < 3, provide:
1. What to change and why
2. Which file to edit
3. A concrete before/after example or specific instruction
4. Priority: High (functionality/UX) / Medium (convention) / Low (polish)
If linter grade is A and semantic score ≥ 15: congratulate and suggest publishing with better-skill-publish.
Ask the user:
After fixing, do NOT self-evaluate. Proceed directly to Step 5.
CRITICAL: Verification must be independent. The agent that fixed cannot judge its own work.
5a. Dispatch verification subagent:
Agent(subagent_type: "general-purpose", description: "Verify skill review", prompt: <template>)
Use the prompt template from references/semantic_dimensions.md § Verification Subagent Prompt Template. The verification subagent must:
validate.py and analyze.sh fresh (never trust cached results)references/semantic_dimensions.md for scoring criteria5b. Process verification result:
Anti-patterns (never do these):
| Anti-pattern | Why it's wrong | Correct approach |
|---|---|---|
| Self-evaluate after fixing | Blind to own mistakes | Always dispatch subagent |
| Reuse same subagent for re-verify | Already biased by prior assessment | New subagent each round |
| Fix without running tools first | Subjective judgment misses issues | Tools produce objective data |
| Ignore verification report | Discards independent evidence | Fix only reported FAIL items |
Grading (hard-rule checks only):
Check categories:
| Category | Checks |
|---|---|
| structure | SKILL.md exists, frontmatter, required fields, directory layout |
| naming | Kebab-case, length, no consecutive hyphens, matches directory |
| content | Description length, body length, heading structure |
| paths | Referenced files exist, scripts executable |
| security | No secrets, no template placeholders |
references/semantic_dimensions.md — Full checklist for each review dimension + finding judgment table + verification prompt templatereferences/validation_rules.md — Rationale for each linter checkreferences/improvement_patterns.md — Knowledge base of improvement patterns with examplesreferences/best_practices.md — Skill design conventions and quick referencenpx claudepluginhub psylch/better-skills --plugin better-skillsReviews and improves AI agent skills (SKILL.md files) against best practices, scoring 10 quality dimensions and rewriting problem areas.
Audits Claude Code skills for structure compliance, triggering accuracy, instruction quality, and best practices. Scores 0-100 with prioritized improvement recommendations.
Audits Claude skills against Anthropic prompting best practices including positive framing, motivation, and XML structure. Use after creation/modification, before release, or for inconsistent results.