From skill-creator-x
Manual-invocation-only meta-skill for creating Skills. Adds a pre-flight responsibility judgment (Skill vs MCP vs Subagent vs .claude/rules/), an explicit MCP/Subagent confirmation step, and an Exa-MCP research phase on top of the official skill-creator loop. Invoke explicitly via /skill-creator-x or by naming the skill — it does not auto-trigger.
How this skill is triggered — by the user, by Claude, or both
Slash command
/skill-creator-x:skill-creator-xsonnetThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
An evolved meta-skill for creating Skills. It keeps the proven `draft → test → review → improve → repeat` loop of the official `skill-creator`, and adds three stages in front:
agents/analyzer.mdagents/comparator.mdagents/grader.mdagents/researcher.mdassets/eval_review.htmleval-viewer/generate_review.pyeval-viewer/viewer.htmlreferences/frontmatter-cheatsheet.mdreferences/halt-template.mdreferences/judgment-framework.mdreferences/mcp-subagent-confirmation.mdreferences/research-phase.mdreferences/schemas.mdscripts/__init__.pyscripts/aggregate_benchmark.pyscripts/generate_report.pyscripts/improve_description.pyscripts/package_skill.pyscripts/quick_validate.pyscripts/run_eval.pyAn evolved meta-skill for creating Skills. It keeps the proven draft → test → review → improve → repeat loop of the official skill-creator, and adds three stages in front:
The rest of the loop (draft → test → grade → review → iterate → optimize → package) is the same as the official skill-creator and uses the bundled scripts under scripts/, the subagent prompts under agents/, and the viewer under eval-viewer/.
Heads-up about your reader: skill-creator users span deep technical and non-technical backgrounds. "evaluation" / "benchmark" are borderline OK; explain "JSON" or "assertion" briefly when in doubt. Match the user's vocabulary.
Stage 0 Responsibility judgment ──┐ halt → alternative plan
│
Stage 1 Capture intent │
Stage 1.5 Confirm MCP/Subagent │
Stage 2 Research (Exa, if needed) │
Stage 3 Draft SKILL.md │
Stage 4 Test cases & evals.json │ ← same as official
Stage 5 Run with-skill + baseline │ skill-creator from here
Stage 6 Grade, aggregate, review │
Stage 7 Iterate │
Stage 8 Description optimization │
Stage 9 Package ┘
Add these to your TodoList so they do not get skipped, especially Stage 0 and Stage 1.5. In Cowork specifically, also add "Create evals JSON and run eval-viewer/generate_review.py so human can review test cases".
Goal: before drafting anything, decide whether the requested capability belongs in a Skill, or in MCP / Subagent / .claude/rules/, or in a combination.
Read references/judgment-framework.md (the 4-step flow) and walk through it. Then reach one of three verdicts:
.claude/rules/) is appropriate → proceed to Stage 1, but record which additional components will be created or assumed, and reflect them in Stage 1.5.references/halt-template.md and stop. Do not write any skill files.Communicate the verdict explicitly to the user before continuing — they may have context that flips the answer. If the verdict is (3), present the alternative plan and wait for the user's explicit direction before doing anything else.
Worked judgment examples are in references/judgment-framework.md. The decisive question for the most common confusion is: "Does this need tool-permission scoping or independent context?" If yes, it's at least partly a Subagent — not a Skill.
Same as the official skill-creator. Get clear answers (or confirm what is already in the conversation) on:
If the current conversation already contains a workflow the user wants to capture ("turn this into a skill"), extract as much as possible from history first — tools used, sequence, corrections, formats — then have the user fill the gaps.
Proactively ask about edge cases, input/output formats, example files, success criteria, and dependencies. Do not jump to test prompts until this is solid.
Before drafting, explicitly agree with the user on the tooling shape of the new skill. Use references/mcp-subagent-confirmation.md for the question script; the short version:
brave-search, context7, exa, playwright configured here"). Ask which, if any, the new skill should rely on. Document this in the skill's description / compatibility so users without those MCPs are warned.tools / permissionMode each one should have..claude/rules/ companion files (only if the verdict in Stage 0 said so). Note their globs: so the user knows when they auto-load.allowed-tools frontmatter for the skill itself: which tools should be auto-approved when this skill is active? Default to the minimum set.Write the answers into a short summary block ("Tooling shape") and read it back to the user for confirmation before moving on.
When the skill will touch a library, API, or domain whose current state you are not confident about, do not guess. Use the exa MCP to research, then summarize evidence in the skill. See references/research-phase.md for the full protocol; the essentials:
agents/researcher.md) — it uses mcp__exa__web_search_exa (and mcp__exa__web_fetch_exa to pull specific URLs) and returns a short evidence-grounded summary with URLs.Skip Stage 2 when the skill is trivially within your training (e.g., "make a skill that always adds a smiley to the response").
Same structure as the official skill-creator. Decide the directory layout:
<skill-name>/
├── SKILL.md (required)
│ ├── YAML frontmatter (name, description required)
│ └── Markdown instructions
├── scripts/ (executable, decisive, deterministic)
├── references/ (loaded into context on demand)
└── assets/ (templates, icons, fonts used in output)
Frontmatter fields beyond name/description are often overlooked. See references/frontmatter-cheatsheet.md — pay attention to:
when_to_use: adds to description (combined 1,536-char cap)allowed-tools: tools auto-approved while this skill is activedisable-model-invocation: true: manual /<name> only — also blocks subagent preloadpaths: globs that trigger automatic load — overlaps with .claude/rules/ behaviormodel / effort / context: fork / agent: advanced overridesWriting style (from agent-skills + skill-creator):
description "pushy": Claude undertriggers skills, so include both what the skill does AND contexts where it must trigger, even when the user does not name the skill explicitly.references/ and scripts/. The 3-level model is metadata → SKILL.md body → bundled resources.After the draft, propose 2-3 realistic test prompts — the kind of thing a real user would actually type, with concrete file paths, column names, casual phrasing where appropriate. Save them to evals/evals.json. Do not write assertions yet — just the prompts. See references/schemas.md for the schema.
{
"skill_name": "example-skill",
"evals": [
{ "id": 1, "prompt": "...", "expected_output": "...", "files": [] }
]
}
Share the prompts with the user: "Here are a few test cases — do they look right, or do you want to add more?"
Use the workspace pattern: <skill-name>-workspace/iteration-N/eval-<ID>/{with_skill,without_skill,old_skill,new_skill}/outputs/.
Spawn all subagents in one turn. Do not run with-skill first and baselines later — launch both for every eval together.
without_skill).cp -r <skill> <workspace>/skill-snapshot/), baseline = snapshot (old_skill).Write eval_metadata.json per eval (assertions can be empty initially; give a descriptive eval_name):
{ "eval_id": 0, "eval_name": "renames-columns-on-xlsx", "prompt": "...", "assertions": [] }
While runs execute, draft the assertions. Good assertions are objectively verifiable with names that read cleanly in the viewer. Subjective skills (writing style, design) skip assertions — lean on human review.
As each task completes, immediately capture total_tokens and duration_ms from the completion notification into timing.json — this is the only chance to persist that data.
{ "total_tokens": 84852, "duration_ms": 23332, "total_duration_seconds": 23.3 }
agents/grader.md. Output grading.json per run. Field names must be exactly text / passed / evidence — the viewer depends on them. For programmatic checks, write a script rather than eyeballing.python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>
Produces benchmark.json + benchmark.md (pass_rate / time / tokens, mean ± stddev, delta).agents/analyzer.md — surface non-discriminating assertions, flaky evals, time/token tradeoffs. Add notes to the benchmark.nohup python <skill-creator-x>/eval-viewer/generate_review.py \
<workspace>/iteration-N \
--skill-name "<name>" \
--benchmark <workspace>/iteration-N/benchmark.json \
> /dev/null 2>&1 &
VIEWER_PID=$!
For iteration 2+, pass --previous-workspace <workspace>/iteration-<N-1>. For Cowork / headless, use --static <output_path> and proffer the link.When the user is done, read feedback.json. Empty feedback means "fine" — focus improvements on entries with specific complaints. Kill the viewer with kill $VIEWER_PID 2>/dev/null.
Iterate by:
MUST clauses.create_docx.py, that script belongs in scripts/ of the skill.After improvements: rerun all evals into a new iteration-N+1/, re-launch the viewer with --previous-workspace, wait, re-read feedback. Stop when the user is happy, all feedback is empty, or progress has plateaued.
After the skill itself is solid, offer to optimize the description for triggering accuracy. See references/schemas.md for the eval set format. Workflow:
assets/eval_review.html__EVAL_DATA_PLACEHOLDER__ / __SKILL_NAME_PLACEHOLDER__ / __SKILL_DESCRIPTION_PLACEHOLDER__/tmp/eval_review_<skill>.html, open it~/Downloads/eval_set.jsonpython -m scripts.run_loop \
--eval-set <path> --skill-path <path> \
--model <current-session-model-id> \
--max-iterations 5 --verbose
The script splits 60/40 train/test (stratified by should_trigger), runs each query 3× per iteration, asks Claude for a new description, re-evaluates, picks the iteration with the best test score.best_description back into SKILL.md frontmatter. Show before/after and report scores.python -m scripts.package_skill <path/to/skill-folder>
Validates the frontmatter, excludes evals/ at the skill root (development artefact), produces <skill-name>.skill (a ZIP). If present_files is available, present the file path to the user.
Claude.ai (no subagents):
claude -p → skip.Cowork (subagents available, no display):
--static to generate_review.py and offer the user a link.feedback.json (request access if needed), copy into the workspace.Updating an existing skill:
/tmp/<skill-name>/ first — install location may be read-only — and edit there.references/judgment-framework.md — Stage 0 4-step decision tree with worked examples.references/halt-template.md — output template when Stage 0 halts.references/mcp-subagent-confirmation.md — Stage 1.5 question script.references/research-phase.md — Stage 2 Exa-MCP research protocol.references/frontmatter-cheatsheet.md — full frontmatter field list, condensed.references/schemas.md — JSON shapes for evals / grading / benchmark / comparison / analysis (inherited from the official skill-creator).agents/researcher.md — Exa-driven research subagent (NEW).agents/grader.md — assertion-based grading subagent.agents/comparator.md — blind A/B comparator subagent.agents/analyzer.md — post-hoc + benchmark observation analyzer.Core loop one more time:
Stage 0 judgment → Stage 1 intent → Stage 1.5 confirm tooling → Stage 2 research → Stage 3 draft → Stage 4 evals → Stage 5 run → Stage 6 grade/review → Stage 7 iterate → Stage 8 optimize → Stage 9 package.
The first three stages are what skill-creator-x adds on top of the official loop. They are not optional — they exist because skipping them produces skills that should have been MCPs, agents that should have been rules, and descriptions that overfit to stale information.
Good luck.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub tomoya-k31/personal-agent-plugins --plugin skill-creator-x