From grimoire
Creates structured, reusable prompt templates for reliable LLM outputs in production systems using chain-of-thought and few-shot examples.
How this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:design-prompt-templateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Create structured, reusable prompt templates that produce reliable, consistent LLM outputs across diverse inputs in production systems.
Create structured, reusable prompt templates that produce reliable, consistent LLM outputs across diverse inputs in production systems.
Adopted by: Anthropic, OpenAI, Google (all publish prompt engineering guides); every production LLM product uses templated prompts, not ad-hoc strings Impact: Structured prompts reduce output variance by 40-70% (OpenAI cookbook benchmarks); chain-of-thought prompting improves reasoning accuracy by 18-57% on benchmark tasks (Wei et al. 2022) Why best: LLMs are sensitive to exact wording; unstructured prompts produce inconsistent outputs; templates enforce structure, enable testing, and make prompt evolution manageable
Sources: Wei et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" NeurIPS (2022); OpenAI "Prompt Engineering" docs; Anthropic "Claude's Constitution" guidance
Define the task precisely — Write a one-sentence task definition: "Given X, produce Y in format Z." Vague tasks produce vague outputs. If you cannot state the task in one sentence, decompose it. The task definition becomes the core of your system prompt.
Design the system prompt — Assign a clear role: "You are a [role] that [does task]." Specify: output format (JSON schema, markdown, numbered list), tone and style, constraints ("respond only in English," "do not speculate"), and what to do when input is ambiguous or out of scope.
Structure the user prompt template — Use clear delimiters to separate variable content from instructions. XML tags work well for Claude: <document>{{content}}</document>. Markdown code fences work for code. Place variable content after instructions to prevent prompt injection via user data.
Apply chain-of-thought for reasoning tasks — For tasks requiring multi-step reasoning (analysis, math, classification), add: "Think step by step before giving your final answer" or use <thinking> tags to elicit reasoning before the answer. Chain-of-thought consistently improves accuracy on complex tasks.
Use few-shot examples for format enforcement — Include 2-5 representative input-output examples when output format is complex or unusual. Examples are more reliable than format instructions alone. Choose examples that cover edge cases and represent the expected distribution of inputs.
Specify output format explicitly — If returning structured data: provide a JSON schema example or a template with placeholder values. Request XML or JSON over free prose when downstream code parses the output. Validate output format in your application and retry on parse failure.
Handle edge cases in the prompt — Explicitly instruct the model on what to do with: ambiguous inputs, out-of-scope requests, insufficient information, and adversarial inputs. "If the input is not a valid X, respond with {error: 'invalid_input', reason: '...'}." Models without explicit edge case handling produce unpredictable outputs on edge inputs.
Parameterize and version the template — Store prompt templates as versioned files or in a prompt registry (not hardcoded strings). Use a templating system (Jinja2, Handlebars) for variable substitution. Treat prompt changes as code changes requiring review and testing.
Evaluate with a test set — Build a labeled test set of 50-200 input-output pairs covering normal cases, edge cases, and adversarial inputs. Run evaluations before deploying prompt changes. Define pass/fail criteria (ROUGE score, JSON validity, human rating threshold). Never ship prompt changes without evaluation.
Monitor output quality in production — Log a sample of production inputs and outputs. Implement output classifiers or LLM-as-judge to detect failures (format violations, refusals, off-topic responses). Alert on quality degradation. Model updates can break prompts without notice.
npx claudepluginhub jeffreytse/grimoire --plugin grimoireGuides structured prompt design for LLMs: role, context, task, format, constraints. Useful when outputs are inconsistent, too long, off-format, or incorrect.
Designs reusable, parameterized prompt templates for consistent LLM outputs. Covers anatomy, variables, patterns, composition, quality criteria, and artefacts.
Provides workflows to write, debug, and optimize LLM prompts using few-shot examples, chain-of-thought structuring, system prompts, and templates. Activates for prompt improvement requests.