Adversarial API inspection loop. MCP server + host skill that runs inside a Claude Code session: the host plays Attacker and Inspector, Gauntlet executes plans against the SUT and assembles a risk report. Ships a train/test split so the host never sees the blockers it is testing.
Adversarial Attacker role for one Gauntlet trial iteration. Reads attacker-safe trial briefs, composes plans, executes them against the SUT, and appends the iteration to the run buffer. Never reads blocker text.
Holdout evaluator for one Gauntlet trial. Reads the trial's blockers, derives one acceptance plan per blocker, executes them against the SUT, and appends each HoldoutResult to the run buffer. Runs in fresh context — no Attacker or Inspector traces carry in.
Adversarial Inspector role for one Gauntlet trial iteration. Reads execution results from the run buffer, produces Findings, and appends them back. Never reads blocker text or holdout results.
Author Gauntlet Trial YAMLs from a product spec. Use this skill when the user wants to translate a verbose product specification, design doc, or natural-language description of an HTTP service into testable invariants packaged as Gauntlet Trials. Triggers include "author trials from this spec", "generate gauntlet trials", "propose trials for this API", "make trials from this design doc", "what should we test about this service".
Adversarial API inspection via the Gauntlet MCP server. Use this skill when the user wants to run their service through the gauntlet — stress-test a running HTTP API under attack, validate authorization/ownership/input invariants before promoting code, or drive Gauntlet's role-disciplined adversarial loop against a SUT. Triggers include "run the gauntlet", "run my service through the gauntlet", "run it through the gauntlet", "adversarial test", "check before merging", "attack this API", "run the hardening loop".
Admin access level
Server config contains admin-level keywords
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Two-role adversarial MCP server that infers software correctness by observing how code behaves under sustained, targeted attack. Quality control for dark-factory environments where code is written by bots and verified by attack.
Run your service through the gauntlet. Point a host Claude Code agent at a running service, hand it the trial set, and the gauntlet is what the service survives. The host plays Attacker and Inspector; Gauntlet provides the deterministic tools (config loading, plan execution, risk-report assembly).
AI-written code can look correct while hiding behavioral failures. Traditional tests miss this because the same agent wrote code and tests. Gauntlet's Attacker context assumes the code is broken, and each Trial's blockers never load into that context, preserving a train/test split.
An Attacker uses a Trial aimed at a Target to generate Plans. Gauntlet's Drone executes those Plans as a User. An Inspector watches and surfaces Findings. Hidden Vitals are checked independently to produce a Clearance.
See docs/architecture.md for the model, docs/usage.md for the runbook, docs/development.md for dev setup.
Gauntlet ships as a Claude Code plugin bundling the MCP server and the host skill:
claude plugin marketplace add coilysiren/gauntlet
claude plugin install gauntlet@coilysiren-gauntlet
Restart Claude Code so the skill, MCP server, and subagents register. Confirm with /mcp and "run gauntlet". No Anthropic creds needed; the host has auth. Local dev: git clone ... && claude --plugin-dir path/to/gauntlet.
The plugin delivers the MCP server, the gauntlet skill (orchestrator loop), gauntlet-author skill (spec to trial YAMLs), and gauntlet-attacker / -inspector / -holdout-evaluator subagents whose MCP allowlists enforce the train/test split. The Attacker subagent literally cannot call get_trial. The full MCP tool surface is listed in docs/FEATURES.md.
your-project/
└── .gauntlet/
└── trials/
├── task_ownership.yaml
└── ...
Trials define reusable attack strategies. blockers are externally observable truths about expected behavior, never loaded into the Attacker context:
title: Users cannot modify each other's tasks
description: >
The task API must enforce resource ownership.
blockers:
- A PATCH request by a non-owner is rejected with 403
- The task body is unchanged after an unauthorized PATCH attempt
- A GET by the owner after an unauthorized PATCH returns the original data
If the SUT requires auth, the orchestrator passes user_headers to execute_plan (a dict[str, dict[str, str]] mapping user names to headers). Users without an entry fall back to X-User: <name>.
docs/architecture.md.Cross-reference convention from coilysiren/agentic-os#59.
npx claudepluginhub coilyco-flight-deck/gauntlet --plugin gauntletStateless query and analysis frontend over Claude session data. MCP server is the canonical surface; slash commands wrap the same tool. Reads repo-recall directly (and session-lattice once it ships). No local store, no docker, no hooks.
Complete collection of battle-tested Claude Code configs from an Anthropic hackathon winner - agents, skills, hooks, and rules evolved over 10+ months of intensive daily use
Harness-native ECC operator layer - 67 agents, 271 skills, 92 legacy command shims, reusable hooks, rules, selective install profiles, and production-ready workflows for Claude Code, Codex, OpenCode, Cursor, and related agent harnesses
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Comprehensive .NET development skills for modern C#, ASP.NET, MAUI, Blazor, Aspire, EF Core, Native AOT, testing, security, performance optimization, CI/CD, and cloud-native applications
Binary reverse engineering, malware analysis, firmware security, and software protection research for authorized security research, CTF competitions, and defensive security
v9.44.1 — Patch release for Gemini environment/version detection and qwen auth gating. Run /octo:setup.