Skill

skill-creator

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit or optimize an existing skill, test a skill against realistic prompts, or iterate on skill quality. Also use when someone says "turn this into a skill" or asks to capture a workflow as a reusable skill.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/nsls2-skills:skill-creator

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

A skill for creating new skills and iteratively improving them.

SKILL.md

289 lines · ~3.2k tokens

Stats

Stars0

MaintenanceGood

Last CommitJun 9, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Skill Creator

A skill for creating new skills and iteratively improving them.

The process of creating a skill:

Decide what the skill should do and roughly how
Write a draft of the skill
Create test prompts and run them (with and without the skill)
Evaluate the results with the user — both qualitative review and quantitative assertions
Rewrite the skill based on feedback
Repeat until satisfied
Expand the test set and try again at larger scale

Your job is to figure out where the user is in this process and help them progress. Maybe they want to make a skill from scratch — help them narrow scope, write a draft, test it, and iterate. Maybe they already have a draft — go straight to eval/iterate. Be flexible. If the user says "I don't need evaluations, just vibe with me," do that instead.

Communicating with the user

Skill creation attracts users across a wide range of technical familiarity. Pay attention to context cues to understand how to phrase your communication.

"evaluation" and "benchmark" are borderline but OK for most users
For "JSON" and "assertion," look for cues that the user knows what those are before using them without explanation
It's OK to briefly explain terms if you're in doubt

Creating a skill

Capture Intent

Start by understanding the user's intent. The current conversation might already contain a workflow the user wants to capture (e.g., they say "turn this into a skill"). If so, extract answers from the conversation history — the tools used, the sequence of steps, corrections the user made, input/output formats observed. The user may need to fill gaps and should confirm before proceeding.

What should this skill enable the agent to do?
When should this skill trigger? (what user phrases/contexts)
What's the expected output format?
Should we set up test cases to verify the skill works? Skills with objectively verifiable outputs (file transforms, data extraction, code generation, fixed workflow steps) benefit from test cases. Skills with subjective outputs (writing style, art) often don't. Suggest the appropriate default based on the skill type, but let the user decide.

Interview and Research

Proactively ask questions about edge cases, input/output formats, example files, success criteria, and dependencies. Wait to write test prompts until you've ironed this out.

If useful tools are available for research (searching docs, finding similar skills, looking up best practices), use them. Come prepared with context to reduce burden on the user.

Write the SKILL.md

Based on the user interview, fill in these components:

name: Skill identifier (lowercase, hyphenated)
description: When to trigger, what it does. This is the primary triggering mechanism — include both what the skill does AND specific contexts for when to use it. All "when to use" info goes here, not in the body. Descriptions should be somewhat "pushy" to combat undertriggering. For example, instead of "How to deploy to Azure," write "How to deploy to Azure. Use this skill whenever the user mentions Azure deployments, Bicep templates, container registries, or CI/CD pipelines targeting Azure, even if they don't explicitly ask for a 'deployment.'"
The rest of the skill body: The actual instructions, patterns, reference material

Skill Writing Guide

Anatomy of a Skill

skill-name/
├── SKILL.md (required)
│   ├── YAML frontmatter (name, description required)
│   └── Markdown instructions
└── Bundled Resources (optional)
    ├── scripts/    - Executable code for deterministic/repetitive tasks
    ├── references/ - Docs loaded into context as needed
    └── assets/     - Files used in output (templates, icons, fonts)

Progressive Disclosure

Skills use a three-level loading system:

Metadata (name + description) — Always in context (~100 words)
SKILL.md body — In context whenever skill triggers (<500 lines ideal)
Bundled resources — As needed (unlimited, scripts can execute without loading)

Key patterns:

Keep SKILL.md under 500 lines. If approaching this limit, add hierarchy with clear pointers about where to look next.
Reference files clearly from SKILL.md with guidance on when to read them
For large reference files (>300 lines), include a table of contents

Domain organization — when a skill supports multiple domains/frameworks, organize by variant:

cloud-deploy/
├── SKILL.md (workflow + selection)
└── references/
    ├── aws.md
    ├── gcp.md
    └── azure.md

The agent reads only the relevant reference file.

Writing Patterns

Prefer using the imperative form in instructions.

Defining output formats:

## Report structure
ALWAYS use this exact template:
# [Title]
## Executive summary
## Key findings
## Recommendations

Examples pattern:

## Commit message format
**Example 1:**
Input: Added user authentication with JWT tokens
Output: feat(auth): implement JWT-based authentication

Writing Style

Explain to the model why things are important rather than using heavy-handed MUSTs. Use theory of mind and make the skill general rather than overly narrow to specific examples.

If you find yourself writing ALWAYS or NEVER in all caps, or using super rigid structures, that's a yellow flag — reframe and explain the reasoning so the model understands why the thing you're asking for is important. That's a more effective approach than brute-force instruction.

Start by writing a draft, then look at it with fresh eyes and improve it.

Self-Improvement Section

Every skill should end with a self-improvement section that tells the agent to update the skill when it encounters gaps, failures, or new patterns. This ensures the skill gets better over time. Example:

## Self-Improvement

After using this skill:
1. **If something failed or was wrong**: update the relevant section or add a new gotcha.
2. **If a new pattern emerged**: add it or create a new section.
3. **If a workaround was needed**: document it inline where the original guidance was.

Test Cases

After writing the skill draft, come up with 2-3 realistic test prompts — the kind of thing a real user would actually say. Share them with the user: "Here are a few test cases I'd like to try. Do these look right, or do you want to add more?"

Save test cases to evals/evals.json in the skill's workspace directory:

{
  "skill_name": "example-skill",
  "evals": [
    {
      "id": 1,
      "prompt": "User's task prompt",
      "expected_output": "Description of expected result",
      "assertions": [],
      "files": []
    }
  ]
}

Running and Evaluating Test Cases

Put results in <skill-name>-workspace/ as a sibling to the skill directory. Within the workspace, organize results by iteration (iteration-1/, iteration-2/, etc.) and within that, each test case gets a directory (eval-0/, eval-1/, etc.). Create directories as you go.

Step 1: Run all test cases (with-skill AND baseline)

For each test case, run two versions — one with the skill, one without. Launch them in parallel using your subagent/task tool if available. If subagents aren't available, run them sequentially.

With-skill run: Execute the task prompt while following the skill's instructions. Save outputs to <workspace>/iteration-N/eval-ID/with_skill/outputs/.

Baseline run (same prompt, no skill loaded):

Creating a new skill: Run without the skill at all. Save to without_skill/outputs/.
Improving an existing skill: Run with the old version. Before editing, snapshot the skill, then use the snapshot for the baseline. Save to old_skill/outputs/.

Write an eval_metadata.json for each test case. Give each eval a descriptive name based on what it's testing.

{
  "eval_id": 0,
  "eval_name": "descriptive-name-here",
  "prompt": "The user's task prompt",
  "assertions": []
}

Step 2: Draft assertions while runs are in progress

Use the time productively — draft quantitative assertions for each test case and explain them to the user. Good assertions are objectively verifiable and have descriptive names. Subjective skills (writing style, design quality) are better evaluated qualitatively — don't force assertions onto things that need human judgment.

Update the eval_metadata.json files and evals/evals.json with the assertions.

Step 3: Capture timing and results

When each run completes, save timing data to timing.json in the run directory:

{
  "total_tokens": 84852,
  "duration_ms": 23332,
  "total_duration_seconds": 23.3
}

Step 4: Grade and present results

Once all runs are done:

Grade each run — evaluate each assertion against the outputs. For assertions that can be checked programmatically, write and run a script. Save results to grading.json:

{
  "eval_id": 0,
  "expectations": [
    {"text": "Output contains valid JSON", "passed": true, "evidence": "Parsed successfully"},
    {"text": "Includes all required fields", "passed": false, "evidence": "Missing 'description' field"}
  ]
}

Aggregate results — compute pass rates for with-skill vs baseline. Note time and token differences.
Present to the user — show a summary of each test case: the prompt, key outputs, assertion results, and any observations. Highlight patterns the aggregate stats might hide (assertions that always pass regardless of skill, high-variance results, time/token tradeoffs).

Step 5: Get feedback

Ask the user to review the results. Empty feedback means they thought it was fine. Focus improvements on test cases where the user had specific complaints.

Improving the skill

How to think about improvements

Generalize from the feedback. You're iterating on a few examples to move fast, but the skill will be used across many different prompts. Rather than putting in fiddly overfitty changes or oppressively constrictive MUSTs, try branching out with different metaphors or recommending different patterns.
Keep the prompt lean. Remove things that aren't pulling their weight. Read the transcripts, not just the final outputs — if the skill makes the model waste time on unproductive steps, trim those parts.
Explain the why. Try hard to explain the reasoning behind everything you're asking the model to do. Even if the feedback is terse, understand the task and transmit that understanding into the instructions.
Look for repeated work across test cases. If all test runs independently wrote similar helper scripts or took the same multi-step approach, the skill should bundle that script. Write it once, put it in scripts/, and reference it from the skill.

Take your time. Write a draft revision, then look at it fresh and improve it.

The iteration loop

After improving the skill:

Apply improvements to the skill
Rerun all test cases into a new iteration-N+1/ directory, including baseline runs
Present results to the user with comparison to previous iteration
Wait for feedback, improve again, repeat

Keep going until:

The user says they're happy
The feedback is all empty (everything looks good)
You're not making meaningful progress

NSLS2 Skills Repo Conventions

When adding a skill to the NSLS2 skills repo (NSLS2/skills):

Create skills/<skill-name>/SKILL.md

Add the skill path to .claude-plugin/marketplace.json:

{
  "skills": [
    "./skills/existing-skill",
    "./skills/new-skill"
  ]
}

Update README.md — add a row to the Available Skills table
Include a Self-Improvement section at the end of the skill

Self-Improvement

After using this skill to create or improve another skill:

If the process was confusing or inefficient: update the relevant section with clearer guidance.
If a new pattern for skill writing emerged: add it to the Writing Guide.
If the test/eval workflow had gaps: update the Running and Evaluating section.
If repo conventions changed: update the NSLS2 Skills Repo Conventions section.

skill-creator

Invocation

Context Preview

SKILL.md

skill-creator

Invocation

Context Preview

SKILL.md

Skill Creator

Communicating with the user

Creating a skill

Capture Intent

Interview and Research

Write the SKILL.md

Skill Writing Guide

Anatomy of a Skill

Progressive Disclosure

Writing Patterns

Writing Style

Self-Improvement Section

Test Cases

Running and Evaluating Test Cases

Step 1: Run all test cases (with-skill AND baseline)

Step 2: Draft assertions while runs are in progress

Step 3: Capture timing and results

Step 4: Grade and present results

Step 5: Get feedback

Improving the skill

How to think about improvements

The iteration loop

NSLS2 Skills Repo Conventions

Self-Improvement

Similar Skills

Skill Creator

Communicating with the user

Creating a skill

Capture Intent

Interview and Research

Write the SKILL.md

Skill Writing Guide

Anatomy of a Skill

Progressive Disclosure

Writing Patterns

Writing Style

Self-Improvement Section

Test Cases

Running and Evaluating Test Cases

Step 1: Run all test cases (with-skill AND baseline)

Step 2: Draft assertions while runs are in progress

Step 3: Capture timing and results

Step 4: Grade and present results

Step 5: Get feedback

Improving the skill

How to think about improvements

The iteration loop

NSLS2 Skills Repo Conventions

Self-Improvement

Similar Skills