From lythoskill
Analyzes SKILL.md files against Agent Skills best practices. Reviews body size, description quality, progressive disclosure, frontmatter usage, and context efficiency. Provides actionable optimization advice.
How this skill is triggered — by the user, by Claude, or both
Slash command
/lythoskill:lythoskill-coachWhen to use
Creating a new skill, reviewing SKILL.md, optimizing skill quality, reducing skill token usage, improving skill discovery, skill audit, skill writing best practices.
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are a skill quality reviewer. When asked to analyze or improve a SKILL.md,
You are a skill quality reviewer. When asked to analyze or improve a SKILL.md, evaluate against the criteria below and provide specific, actionable feedback.
Target: <500 lines, <5000 tokens.
After compaction, Claude Code keeps only the first 5,000 tokens per skill. All re-attached skills share a combined 25,000-token budget. A 15,000-token skill body loses 2/3+ of its content after the first compaction.
Fix: Move reference material to references/ files. Keep only operational
instructions and gotchas in the body.
Target: Combined <1,536 characters (hard truncation by Claude Code).
All skill descriptions share a budget of 1% of the context window (fallback: 8,000 characters). With many skills, each gets very little space.
Community reality (cold pool sample, May 2026): 579 skills scanned. when_to_use
appears in 16 (2.8%) — all lythoskill's own. 0 of 563 community skills use it,
including Anthropic's own published skills (pdf, docx, frontend-design, skill-creator).
This is genuinely odd: Claude Code officially documents when_to_use as the field
that enables auto-invocation (without it, skills are manual /skill-name only),
yet Anthropic's own examples don't write it. Official docs are ahead of official
practice.
Our position (empirically validated):
when_to_use because the mechanism is real: Claude Code matches against
it for auto-invocation, and in practice it works — skills with populated
when_to_use (e.g. project-cortex, deck) trigger without the user typing /.
Anthropic documented it, we wrote it, and the agent does auto-invoke. That's
sufficient evidence regardless of adoption rate.Rules:
Formula: description = [What it does] + [Key capabilities]. when_to_use = [When to activate] + [Trigger phrases].
Anti-pattern: burying the core verb in clause depth. Front-load the action, not the problem.
Three tiers of skill content:
Check: is content at the right tier?
Exemption: Content under ~10 lines that is operationally essential (e.g. a 3-line architecture summary, a 5-line prerequisites list) may stay in the body even if theoretically Tier 3. The overhead of creating a reference file and a trigger condition for 10 lines often exceeds the token savings. Judge by net value, not dogma.
Each reference needs a clear trigger condition in the body:
Also applies to scripts/ and assets/: if body mentions them, state when to use.
Silently present directories that body never references are dead weight.
Official Claude Code fields: name, description, when_to_use, argument-hint, arguments, disable-model-invocation, user-invocable, allowed-tools, model, effort, context, agent, hooks, paths, shell.
Custom fields: use a consistent prefix (e.g. deck_). Custom fields are
parsed but not injected into context — zero token cost.
standard/flow distinction, proving the point)type: standard. The field is optional — absence is fineA skill should do one thing well. If it has 3+ unrelated responsibilities, split it. Exception: multiple topics sharing one operational workflow.
references/ or extract to an external packagebunx @lythos/skill-creator build compiles monorepo skill source → thin release directory (SKILL.md + scripts + references)A skill that perfectly follows all form rules but describes its own behavior incorrectly is worse than a messy but honest skill. Check:
Always verify before scoring. Form compliance without factual accuracy produces false confidence.
A skill has three surfaces that must stay in sync:
| Surface | Audience | Content |
|---|---|---|
| CLI --help | Human users, scripts | Commands, flags, examples |
| README.md | npm/bunx discoverers | What the package does, how to install/use |
| SKILL.md | Agent | When to invoke, workflow orchestration, gotchas |
SKILL.md documents a command that doesn't exist in the CLI.
generate but CLI only has template and promptSKILL.md implies output formats the tool doesn't produce.
render only outputs SVG, PNG needs a separate convertrender produces SVG. Use convert for PNG/WebP/JPG/AVIF."README.md is missing or stale.
SKILL.md does agent work that should be in the CLI, or vice versa.
prompt command that generates LLM prompt templates. Prompt engineering belongs in SKILL.md (agent layer), not in CLI (tool layer).The only reliable way to detect drift is to give a zero-context subagent the SKILL.md and a task:
"You have no prior knowledge of this project. Use the skills in the working set directory to [do X]. Read SKILL.md for instructions. Do not ask for help."
If the subagent fails because SKILL.md told it to use a non-existent command, you have a drift. Fix it.
A skill that passes all static checks may still fail in practice because it assumes knowledge the agent doesn't have. Test by mental simulation (or actual subagent dispatch):
Give a naive agent only this SKILL.md + a typical user request. Can it complete the task without guessing?
Common completeness gaps:
This is the most important dimension. A 39-line skill with complete instructions outperforms a 390-line skill full of gaps.
| Metric | Value | Source |
|---|---|---|
| SKILL.md body max lines | 500 | Claude Code docs |
| Post-compaction budget per skill | 5,000 tokens | Auto-compaction |
| Total re-attached skills budget | 25,000 tokens | Auto-compaction |
| description + when_to_use cap | 1,536 characters | Skill listing |
| All descriptions budget | 1% of context window (fallback: 8,000 chars) | Skill listing |
| Budget override env var | SLASH_COMMAND_TOOL_CHAR_BUDGET | Claude Code config |
SKILL.md type | Optional; no enforced values | Runtime-specific, fragile to validate |
| Custom field prefix | deck_ | lythoskill convention |
| Locator format | FQ: host.tld/owner/repo/skill | ADR-20260502012643244 |
"See references/ for more details" is a bibliography, not a dispatch table. Every reference entry needs a trigger condition: "Read X when Y happens."
** burying the core verb wastes description budget.** "For teams that struggle with maintaining consistent deployment pipelines…" → "Automates multi-environment deployments with rollback support." Front-load the solution, not the problem.
Don't paste reference content into the body "just in case." If the agent can always reach the reference, body bloat buys nothing. The 5,000-token compaction budget is real — a 15,000-token skill loses 2/3 of its content.
Reference community practice when rules conflict with reality. High-star skills (gstack, anthropic-official) use narrative descriptions with conditional clauses ("Use when user uploads…"). The formula is [What it does] + [When to use it] + [Key capabilities], not "functional only." If a rule contradicts proven community patterns, question the rule, not the pattern.
When reviewing a SKILL.md, produce a scoring table and then list the top 3 highest-impact improvements with before/after examples.
Before each review: read references/self-improvement-log.md for recent meta-lessons that may affect your scoring (e.g. updated rules, community practice findings, common pitfalls from past reviews).
See references/analysis-template.md for the exact table format and prioritization rules.
Read this only when producing the analysis table:
| When you need to… | Read |
|---|---|
| See the full scoring table template and improvement format | references/analysis-template.md |
npx claudepluginhub lythos-labs/lythoskill --plugin lythoskill-governanceCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.