From prompt-engineer
Scaffold prompt playgrounds; test variations across inputs, score outputs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/prompt-engineer:playgroundThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A directory for evaluating and improving prompts against inputs.
A directory for evaluating and improving prompts against inputs.
<playground>/
├── config.toml # Generation and composition settings
├── task.md # Goal, evaluation criteria, constraints (free-form markdown)
├── prompts/
│ └── <slot>/ # Named slot ("main", "system", "critic", …)
│ ├── config.toml # { default = "base" }
│ └── <variation>.md # Freeform name ("base", "concise", "v2")
├── inputs/
│ └── <case>.md # One file per test case; stem = case name in outputs
└── outputs/
└── <run-label>/ # Human-chosen ("baseline", "concise-v2", …)
├── run.toml # Standalone invoke-llm TOML (see invoke-llm skill)
└── <case>.md # LLM output; YAML frontmatter for eval
Ask user for: directory path, task description, prompt slots (default: single main), initial inputs. Create config.toml, task.md, one variation per slot, and inputs. Do NOT create outputs/.
[generation]
model = "claude-sonnet-4-6"
temperature = 1.0
max_tokens = 4096
[composition]
separator = "\n\n" # join between parts; inherited by each message
substitute = false # true → replace {{input}} in part text; "inputs" in parts becomes [vars]
# Single-message mode (cannot coexist with [[composition.messages]]):
parts = ["prompts/main", "inputs"]
# role = "user" # default
# Multi-message mode:
# [[composition.messages]]
# role = "system"
# parts = ["preamble.md", "prompts/main"]
#
# [[composition.messages]]
# role = "user"
# parts = ["prompts/main"]
# substitute = true
Paths resolve relative to playground root. Directory paths (e.g., "prompts/main") resolve to the selected variation at run time. "inputs" resolves to the current test case.
All .md files support optional TOML frontmatter (+++). Frontmatter is stripped before LLM calls.
prompts/*/*.md, inputs/*.md: comments = "..." (free-form note).outputs/*/*.md use TOML frontmatter written by playground-run.py:
--json: model, input_tokens, output_tokens, latency_ms, stop_reason.score (1–5, scale/criteria from task.md), comments.scripts/playground-run.py <playground-dir> -l <run-label> [-v SLOT=VAR,...] [-i PATTERN] [--dry-run] [--json]
| Flag | Description |
|---|---|
<playground-dir> | Positional. Path to playground root. |
-l / --label | Run label. Required unless --dry-run. |
-v / --variation | Slot variations. Repeatable. -v main=base,concise sweeps both. -v main=* sweeps all .md in slot. Omitted slots use default. |
-i / --inputs | Filter inputs by name pattern. Default: all. |
--dry-run | Print generated TOMLs to stdout, don't execute. |
--json | Include metadata (tokens, latency) in output frontmatter. |
Output: Single combo → outputs/<label>/<case>.md + run.toml. Multiple combos → outputs/<label>/<combo-label>/ subdirs.
Score outputs by editing their YAML frontmatter (score:, comments:).
npx claudepluginhub 123jimin-llm/marketplace --plugin prompt-engineerDesigns test cases, adversarial inputs, and iterates on prompts based on eval results. Useful for prompt-engineering tasks like drafting, testing, and refining prompts and skills.
Designs, optimizes, and evaluates LLM prompts — generating templates, structured output schemas, evaluation rubrics, and test suites. Use for prompt refactoring, chain-of-thought, or system prompt design.
Designs, tests, compares, versions, and validates prompts or LLM behavior using measurable criteria and datasets. Useful when evaluating prompt quality, edge cases, and deployment readiness.