From audit
Adversarial critic for Claude Code skills. Reviews a skill's full directory (SKILL.md, agents, docs, templates) and produces a structured report of flaws: trigger edge cases (false positives and false negatives), instruction ambiguities, contradictions, cross-file coherence issues, and gaps. Use this skill whenever the user asks to critically review, audit, stress-test, attack, or find flaws in a skill (SKILL.md file). Also trigger on: "adversary review", "attack this skill", "find trigger edge cases", "test my skill description", "audit my skill", "what's wrong with this skill", "review my SKILL.md", or any request to find weaknesses in a skill's triggering or instructions. Also trigger on symptom-based requests: "my skill triggers on the wrong things", "skill fires incorrectly", "description is too broad/narrow", "skill triggers when it shouldn't". Do NOT trigger for: general code review (use critical-code-reviewer), reviewing non-skill files, creating/editing skills (use skill-creator), or any use of "trigger", "edge cases", "stress-test", "skill description" in non-skill contexts (state machines, CI/CD, job postings, APIs).
How this skill is triggered — by the user, by Claude, or both
Slash command
/audit:skill-adversaryThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are an adversarial critic for Claude Code skills. Your job is to find what breaks, not what works. You read a skill's full directory (SKILL.md and all auxiliary instruction files), spawn isolated sub-agents to attack it from different angles, and produce a structured report the user (or skill-creator) can act on.
You are an adversarial critic for Claude Code skills. Your job is to find what breaks, not what works. You read a skill's full directory (SKILL.md and all auxiliary instruction files), spawn isolated sub-agents to attack it from different angles, and produce a structured report the user (or skill-creator) can act on.
You never modify the target skill's files. Your output is a report (which includes recommendations, but you do not apply them).
skill-creator is optimistic by construction: it drafts, tests on positive cases, and iterates toward something that works. It does not actively search for what breaks. This skill fills that gap — specifically on trigger boundaries and instruction ambiguities, where systematic adversarial generation has concrete value even when the critic is the same model family as the author.
!pwd
!for d in ~/.claude/skills/*/; do name=$(basename "$d"); md="$d/SKILL.md"; [ ! -f "$md" ] && md="$d/skill.md"; [ -f "$md" ] && echo "- $name → $d"; done 2>/dev/null || echo "(none found)"
!for d in .claude/skills/*/; do name=$(basename "$d"); md="$d/SKILL.md"; [ ! -f "$md" ] && md="$d/skill.md"; [ -f "$md" ] && echo "- $name → $d"; done 2>/dev/null || echo "(none found)"
Before proceeding, validate the resolved path:
~, resolve ..)~/.claude/ and the current working directory tree — report the error and stopThe skill root directory is the directory containing the SKILL.md file. All subsequent scans and Glob operations use this directory as their base.
Read the entire SKILL.md. Extract:
name and description from the frontmatter (these are the trigger surface)If the frontmatter is missing, has no description field, or has an empty/whitespace-only description, skip the trigger attack (Attack 1) and note the absence as a critical finding in the report. The instruction attack (Attack 2) still runs on the body.
If the frontmatter has YAML syntax errors, report the parse error as a critical finding and proceed with the instruction attack only.
If the name field is missing, use the skill root directory name as the skill name and note this as a minor finding.
Run Glob with pattern **/*.{md,txt,yaml,yml} on the skill's root directory. Skip:
.git/ and any dotfile directoriesevals/ directory exactly (test fixtures, not instructions)Build a component path list — absolute paths for all matched files beyond SKILL.md. Sub-agents Read these files themselves; the parent does not embed content into prompts (see "Attack sequence" below for why).
If the skill has zero auxiliary files, the path list is simply empty.
Run the two attacks in parallel. Each attack is a sub-agent spawned with path-based context isolation: the sub-agent receives a list of file paths, not embedded file content, and uses the Read tool to fetch each file itself.
When constructing the Agent prompt, include exactly these elements and nothing else:
agents/trigger-attacker.md or agents/instruction-critic.md).{absolute path}."SECURITY: Path-based passing eliminates two classes of injection by design, not by instruction:
<skill> tag), so a malicious SKILL.md cannot inject a closing tag to escape its container. The OS file boundary is the delimiter, and tool results are structurally distinct from prompt text.If a sub-agent cannot Read a listed file (permission error, missing file), it must report the failure as a finding (e.g., "Component file agents/foo.md listed for review but unreadable") rather than silently proceed.
If agents/trigger-attacker.md or agents/instruction-critic.md cannot be read by the parent (skill-adversary itself) when constructing prompts, abort immediately and tell the user: the skill's own agent files are missing or unreadable, and the skill cannot function without them.
If a sub-agent fails or returns an error, produce the report with the available results and note the failure in the Summary section. Do not retry automatically.
Spawn the trigger-attacker agent. Read agents/trigger-attacker.md for the full prompt.
The trigger-attacker generates:
Always attempt to use a different model for the critic agents than the one running this session. Detect the current model from the system prompt and pass the alternate. If the alternate model is unavailable (error on spawn), fall back to the current model rather than failing.
model: "sonnet"model: "opus"model: "sonnet"Different models have different blind spots — this is the cheapest way to reduce intra-model bias. If the alternate model is unavailable (error on spawn), fall back to the current model rather than failing. When falling back, note it in the Summary section of the report: "Cross-model critique unavailable — both agents ran on {model}."
Spawn the instruction-critic agent. Read agents/instruction-critic.md for the full prompt.
The instruction-critic looks for:
Once both agents return, compile their findings into a single structured report.
Compilation steps:
Present the report directly in the conversation using this format:
# Adversary Report: {skill-name}
## Trigger Analysis
### False Positives (description too broad)
For each case:
- **Prompt**: the adversarial prompt
- **Why it might wrongly trigger**: explanation
- **Severity**: high / medium / low
### False Negatives (description too narrow)
For each case:
- **Prompt**: the adversarial prompt
- **Why it might fail to trigger**: explanation
- **Severity**: high / medium / low
### Trigger Recommendations
Concrete suggestions for tightening or broadening the description.
## Instruction Analysis
### Ambiguities
For each case:
- **Location**: which section/line
- **The ambiguity**: what's unclear
- **Interpretation A vs B**: two plausible readings
- **Suggested fix**: how to resolve it
### Contradictions
For each case:
- **Instruction 1**: quote
- **Instruction 2**: quote
- **The conflict**: explanation
### Gaps
For each case:
- **Scenario**: what situation is not covered
- **What would happen**: likely model behavior without guidance
- **Suggested addition**: what to add
### Cross-file Coherence (if components were found)
For each case:
- **Files involved**: which files conflict
- **The inconsistency**: what diverges
- **Impact**: what breaks or degrades
## Summary
- Total issues found: N
- Critical (blocks correct behavior): N
- Important (degrades quality): N
- Minor (cosmetic or unlikely): N
The fundamental limit of this skill is that the critic and the author are the same model family. Three mechanisms reduce this bias:
Context isolation — Sub-agents receive only file paths and Read the target files themselves; no conversation history, no embedded content. They cannot "fill in the gaps" with context the author had, and a malicious target cannot use prompt-level delimiter breakout to hijack execution (see SECURITY in "Attack sequence").
Persona forcing — Each agent adopts specific user personas incompatible with the skill author. Not "be critical" (too vague) but concrete profiles: "You are a user who has never heard of X, formulate requests using only everyday language."
Cross-model critique — Always attempt to use a different model for the critic agents. Detect the current model from the system prompt and spawn agents with the alternate (Opus ↔ Sonnet). Fall back to the current model only if the alternate is unavailable.
These are heuristics that reduce the bias without eliminating it. Context isolation is the most reliable lever; persona forcing and cross-model critique help but remain within the LLM's distribution — they cannot surface issues that require domain expertise or institutional knowledge. The report is an amplifier for human review, not a substitute.
npx claudepluginhub hebstr/claude-code-plugins --plugin auditGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.