Skill

gauntlet

Adversarial API inspection via the Gauntlet MCP server. Use this skill when the user wants to run their service through the gauntlet — stress-test a running HTTP API under attack, validate authorization/ownership/input invariants before promoting code, or drive Gauntlet's role-disciplined adversarial loop against a SUT. Triggers include "run the gauntlet", "run my service through the gauntlet", "run it through the gauntlet", "adversarial test", "check before merging", "attack this API", "run the hardening loop".

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/gauntlet:gauntlet

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Gauntlet is an adversarial API inspection loop. **You** are the Orchestrator: you preflight trials, drive the loop, and assemble the final report. The actual Attacker / Inspector / HoldoutEvaluator work runs inside dedicated **subagents** that the Gauntlet plugin ships alongside this skill, each with an MCP-tool allowlist that physically enforces the train/test split.

SKILL.md

109 lines · ~1.9k tokens

Stats

LanguagePython

Stars1

MaintenanceExcellent

Last CommitJun 6, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Gauntlet

Gauntlet is an adversarial API inspection loop. You are the Orchestrator: you preflight trials, drive the loop, and assemble the final report. The actual Attacker / Inspector / HoldoutEvaluator work runs inside dedicated subagents that the Gauntlet plugin ships alongside this skill, each with an MCP-tool allowlist that physically enforces the train/test split.

Gauntlet's novelty is the train/test split: each Trial has a description (the attack surface, shown to the Attacker) and blockers (the expected invariants, withheld from the Attacker and checked only by the HoldoutEvaluator). Violating the split invalidates the run. The subagent allowlists make a violation impossible at the permission layer rather than a matter of prompt discipline.

The four roles

Role	Where it runs	MCP tools it can call
Orchestrator	This skill (you)	every Gauntlet tool
Attacker	`gauntlet-attacker` subagent	`list_trials`, `execute_plan`, `read_iteration_records`, `record_iteration`
Inspector	`gauntlet-inspector` subagent	`read_iteration_records`, `record_iteration`
HoldoutEvaluator	`gauntlet-holdout-evaluator` subagent	`get_trial`, `execute_plan`, `record_holdout_result`

The Attacker and Inspector subagents cannot call get_trial even if their prompts told them to — Claude Code's permission layer rejects the call before it reaches the MCP server. Likewise the HoldoutEvaluator cannot read the iteration buffer that holds Attacker plans and Inspector findings; it works from a fresh context informed only by the trial's blockers.

Prerequisites

The Gauntlet plugin is installed; the MCP server is registered. Confirm with /mcp (should list gauntlet and its tools) and /agents (should list gauntlet-attacker, gauntlet-inspector, gauntlet-holdout-evaluator).
The project has a .gauntlet/ directory with at least one trial YAML. If missing, tell the user and stop — or, if the user has a product spec, dispatch the gauntlet-author skill (also shipped in this plugin) to generate trials first.
A running SUT whose URL the host can reach. If the user hasn't named a URL, ask.
Existing tests pass. Gauntlet is the final check, not a first-pass linter.

The loop

Step 1 — Orchestrator: pick trials and start the run

list_trials(trials_path) → pick one (or several) by id. If the user named a trial, use it. If not, present the list and ask.
start_run(trial_ids=[...]) → {run_id}. Carry run_id through every subsequent dispatch.

The iteration ladder is fixed at 4 stages: baseline → boundary → adversarial_misuse → targeted_escalation. Build the IterationSpec list inline; there is no MCP tool to fetch it.

Step 2 — For each trial, iterate the train side (typically 4 iterations)

For each IterationSpec, dispatch the Attacker subagent, then the Inspector subagent.

Dispatch the Attacker — pass run_id, trial_id, the iteration spec, the SUT url, and user_headers. The subagent will:

read its trial brief (no blockers),
read prior iteration records to see what's already been tried,
compose 2–4 plans, execute them, and append an IterationRecord (with empty findings) via record_iteration,
return a one-paragraph summary.

Dispatch the Inspector — pass run_id, trial_id, and the iteration spec. The subagent will:

read the buffer to find the Attacker's latest record,
analyse the ExecutionResults into Findings (with violated_blocker=null, always — the buffer rejects anything else),
append a follow-up IterationRecord (with empty plans/results, populated findings) via record_iteration,
return a one-paragraph summary.

If the Inspector reports a high-severity finding, you may stop iterating early. Note this in the final summary.

Step 3 — For each trial, dispatch the HoldoutEvaluator (fresh context)

Critical: do not paste any Attacker plan, Inspector finding, or summary into the HoldoutEvaluator's dispatch prompt. It runs from a fresh context with only run_id, trial_id, the SUT url, and user_headers. The subagent will:

call get_trial(trial_id) to read the blockers,
derive one acceptance plan per blocker, execute each, and append a HoldoutResult via record_holdout_result,
return a one-paragraph summary.

Step 4 — Orchestrator: assemble the per-trial report

For each trial: assemble_run_report(run_id=run_id, trial_id=trial_id, clearance_threshold=0.9) → {risk_report, clearance}.

Show the user, per trial:

risk_level (low | medium | high)
confirmed_failures (the list of violated invariants — safe to show, these are outcomes, not blocker text)
clearance.recommendation (pass | conditional | block) if present
confidence_score and coverage summary

If you ran multiple trials, follow up with assemble_final_clearance(run_id) to get the overall pass/fail decision across all of them.

Acting on results

`risk_level`	Action
`low`	Safe to promote or merge.
`medium`	Attempt fixes and re-run the loop.
`high`	Stop. Surface to a human. Do not attempt automated fixes - the code has drifted from intended behavior and automated fixes typically make things worse. Auto mode exception: when the host session is in auto mode, attempt fixes and re-run the loop the same way you would for `medium`. The user has accepted the higher blast-radius tradeoff by enabling auto mode; do not stop and ask. Still surface the confirmed failures and the fix attempt in the final summary so a human can review after the fact.

Treat a conditional clearance as a signal for human review, not a green light. (Auto mode does not override conditional - clearance is a separate axis from risk_level, and the threshold check exists precisely to catch cases the risk classifier missed.)

Why subagents (and not just role-context switching)

Older versions of this skill drove the loop by switching contexts inside a single host session: "now I'm the Attacker, now I'm the Inspector, now I'm the HoldoutEvaluator." That worked, but the train/test split was held only by prompt discipline — a sloppy summary or a prompt-injection attack could collapse it silently.

The subagent model fixes that structurally:

The Attacker subagent's tool allowlist does not include mcp__gauntlet__get_trial. It cannot call it. Even if its prompt asks it to.
The Inspector subagent's allowlist does not include mcp__gauntlet__get_trial or mcp__gauntlet__read_holdout_results.
The HoldoutEvaluator subagent's allowlist does not include mcp__gauntlet__read_iteration_records. It runs in fresh context with no carryover.
The MCP server's record_iteration rejects findings with non-null violated_blocker, so the Inspector cannot smuggle blocker-shaped data through the iteration buffer either.

That is kernel-level enforcement of the split, not a stylistic recommendation.

Single-call invocations

Some prompts want a summary, not a full run:

"What trials are available?" → list_trials() and format the briefs. No iteration, no run buffer.

Don't launch the full loop unless the user clearly wants a run.

gauntlet

Popularity

Invocation

Context Preview

SKILL.md

gauntlet

Popularity

Invocation

Context Preview

SKILL.md

Gauntlet

The four roles

Prerequisites

The loop

Step 1 — Orchestrator: pick trials and start the run

Step 2 — For each trial, iterate the train side (typically 4 iterations)

Step 3 — For each trial, dispatch the HoldoutEvaluator (fresh context)

Step 4 — Orchestrator: assemble the per-trial report

Acting on results

Why subagents (and not just role-context switching)

Single-call invocations

Similar Skills

Gauntlet

The four roles

Prerequisites

The loop

Step 1 — Orchestrator: pick trials and start the run

Step 2 — For each trial, iterate the train side (typically 4 iterations)

Step 3 — For each trial, dispatch the HoldoutEvaluator (fresh context)

Step 4 — Orchestrator: assemble the per-trial report

Acting on results

Why subagents (and not just role-context switching)

Single-call invocations

Similar Skills