Skill

gauntlet-author

Author Gauntlet Trial YAMLs from a product spec. Use this skill when the user wants to translate a verbose product specification, design doc, or natural-language description of an HTTP service into testable invariants packaged as Gauntlet Trials. Triggers include "author trials from this spec", "generate gauntlet trials", "propose trials for this API", "make trials from this design doc", "what should we test about this service".

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/gauntlet:gauntlet-author

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill takes a product spec — markdown, plain text, or a path to one — and produces a `.gauntlet/trials/` directory of Trial YAMLs ready to feed the Gauntlet hardening loop. The skill is the translation layer between "what the system is supposed to do" (a human-authored spec) and "what externally observable invariants we will check" (Gauntlet Trials).

SKILL.md

114 lines · ~1.8k tokens

Stats

LanguagePython

Stars1

MaintenanceExcellent

Last CommitJun 6, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Gauntlet Author

This skill takes a product spec — markdown, plain text, or a path to one — and produces a .gauntlet/trials/ directory of Trial YAMLs ready to feed the Gauntlet hardening loop. The skill is the translation layer between "what the system is supposed to do" (a human-authored spec) and "what externally observable invariants we will check" (Gauntlet Trials).

You are reasoning about invariants, not endpoints. The Trial describes the property that must hold; the host's Attacker subagent will figure out how to exercise the API surface.

Critical: Trials own the train/test split

Each Trial has two halves:

description — the attack surface, given to the Attacker. Phrased as "the API enforces X." The Attacker uses this to compose probes.
blockers — the falsifiable acceptance criteria, withheld from the Attacker and given only to the HoldoutEvaluator. Phrased as "A request that does X is rejected with status 403."

If both halves describe the same thing, the holdout adds no information beyond what the Attacker already has. They must be different angles on the same invariant. The description is a generality; each blocker is a concrete, observable check.

If you write a description that gives away the blockers, you have collapsed the train/test split before the run even starts. Re-write.

What to do when invoked

Step 1 — Get the spec

The host gives you one of:

a file path (read it via Read),
inline spec text in the dispatch prompt,
a directory containing related docs (read each via Read).

If the spec is missing or empty, surface that and stop. Do not invent invariants from no source.

Step 2 — Identify testable invariants

Read the spec end-to-end before writing anything. Look for properties that:

can be falsified by an external HTTP observation (status code, response body shape, side-effect on subsequent reads),
have a clear "should" or "must" in the spec, or are implicit in domain language ("only the owner can ...", "the cart must total ..."),
generalize across endpoints (one invariant tested at many endpoints is more valuable than one endpoint tested for many invariants).

Common invariant categories — use this as a checklist, not a script:

Category	Example invariants
Authorization	"Users cannot read or modify other users' resources."
Ownership	"A resource is mutable only by its creator."
Input validation	"Required fields are required. Type mismatches are rejected."
State transitions	"A completed order cannot be cancelled. A draft cannot be billed."
Read isolation	"Listing resources returns only those the caller can read."
Idempotency	"Re-submitting the same operation does not duplicate side effects."
Error boundaries	"Internal failures surface as 5xx with a stable error shape, not 200."

If the spec mentions a specific failure mode ("must reject malformed JSON with 400"), that is a blocker, not an invariant on its own — fold it into the relevant category.

If you find five candidate invariants, prefer five over fifty. Each trial adds 4+ iteration cycles of cost. A small, sharp arsenal that surfaces the first wave of regressions is more valuable than a sprawling one that exhausts the budget on shallow checks.

Step 3 — Write each Trial as YAML

Authoring location: .gauntlet/trials/<slug>.yaml. One file per trial. Slug is snake_case derived from the invariant.

Required fields, per the Trial Pydantic model in gauntlet/models.py:

id: resource_ownership_write_isolation     # snake_case, stable across runs
title: Users cannot modify each other's resources
description: >
  The API must enforce resource ownership across every write endpoint
  (PATCH, PUT, DELETE). A user who did not create a resource must not be
  able to mutate it through any documented route, including bulk operations.
blockers:
  - A PATCH request by a non-owner is rejected with 403 or 404
  - A DELETE request by a non-owner is rejected with 403 or 404
  - The resource body is unchanged after an unauthorized write attempt
  - A GET by the owner after a failed unauthorized write returns the original data

Authoring rules:

id is snake_case, required, and stable. The risk-report assembler keys findings by id; if you rename the id between runs, history breaks.
title is human-readable, sentence-case, no period at the end.
description is what the Attacker sees. It describes the surface and the invariant — not the acceptance criteria.
Each blocker is a single falsifiable statement with an expected status code where applicable. "Returns 403" is testable; "is secure" is not.
Do not include sample requests or expected responses in the YAML. That is the Attacker's job to compose at run time.

Self-checks before saving each trial:

A blocker shorter than ~20 chars is too vague; add the expected behaviour.
No HTTP status codes anywhere in the blockers usually means you're describing intent ("auth check happens") instead of behaviour ("returns 403").

Step 4 — Summarize

Return to the user (or orchestrator):

Number of trials authored.
One-line summary per trial (id + title).
Suggested next steps: run the gauntlet skill.

What this skill does not do

It does not run the adversarial loop. Authoring and running are separate concerns; the host composes them.
It does not score the authored trials for coverage. (That's a future feature; for now, human review.)
It does not invoke any LLM other than itself — no API calls, no separate Anthropic/OpenAI clients.
It does not modify trials that already exist in .gauntlet/trials/. If a trial with the same id is already there, surface the conflict and let the user decide; don't overwrite.

Common mistakes to avoid

Description leaks blockers. "The API rejects non-owner PATCH with 403" in the description means the Attacker reads the blocker for free. Re-write the description to "The API enforces resource ownership on writes."
Blocker is implementation, not behaviour. "Goes through the auth middleware" is not externally observable; "returns 403" is.
One trial per endpoint instead of per invariant. If "POST /tasks" and "POST /projects" both need the same auth check, that's one trial describing the invariant ("Writes require authentication"), not two.
Authoring fifty trials from a five-page spec. Start with the five sharpest invariants. The user can ask for more if those land.
Editing existing trials mid-run. Each trial's blockers anchor the holdout. If they churn, the holdout is no longer a holdout.

gauntlet-author

Popularity

Invocation

Context Preview

SKILL.md

gauntlet-author

Popularity

Invocation

Context Preview

SKILL.md

Gauntlet Author

Critical: Trials own the train/test split

What to do when invoked

Step 1 — Get the spec

Step 2 — Identify testable invariants

Step 3 — Write each Trial as YAML

Step 4 — Summarize

What this skill does not do

Common mistakes to avoid

Similar Skills

Gauntlet Author

Critical: Trials own the train/test split

What to do when invoked

Step 1 — Get the spec

Step 2 — Identify testable invariants

Step 3 — Write each Trial as YAML

Step 4 — Summarize

What this skill does not do

Common mistakes to avoid

Similar Skills