Skill

prompt-clash-ensemble

Fans out 4 parallel prompt-clash defend agents at staggered time budgets (1min, 2min, 3min, 4min), then ensembles/synthesizes the best elements of each into a single hardened prompt. Use when the user wants the strongest possible defense prompt and has ~5 minutes. Trigger phrases include "prompt clash ensemble", "ensemble defense", "fan out prompt clash", "best defense possible", "ensemble prompt", "multi-budget defense".

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/claude-skills:prompt-clash-ensemble

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Fan out parallel prompt-clash defenders at staggered time budgets, then ensemble the results into a single hardened prompt stronger than any individual attempt.

Supporting Files

aggregate_ensemble.pyreferences/advanced-techniques.mdreferences/attack-playbook.mdreferences/defense-templates.mdreferences/multi-model-adversarial.mdreferences/secure-code-prompting.mdreferences/tournament-strategy.md

SKILL.md

228 lines · ~3.4k tokens

Stats

LanguagePython

Stars3

MaintenanceExcellent

Last CommitJun 2, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Prompt Clash Ensemble

Fan out parallel prompt-clash defenders at staggered time budgets, then ensemble the results into a single hardened prompt stronger than any individual attempt.

Why Ensemble

Different time budgets produce structurally different prompts:

Budget	What it optimizes for	Typical structure
1min	Token efficiency, minimal-nudge	Persona + 3-5 terse security fixes. Often scores highest on token efficiency.
2min	Balanced coverage	Requirements restatement + security section + trust boundary markers.
3min	Thorough coverage	Full security section with positive/negative pairs, concrete function names, allowlists.
4min	Self-attack hardened	Everything above + self-attack iteration. Most robust against adversarial probing.

No single budget dominates all scoring dimensions. The ensemble extracts the best traits from each and fuses them into a prompt that scores high on both token efficiency AND security coverage.

Execution Model

Phase 1: Fan-Out (parallel, ~4min wall clock)

Spawn 4 agents in parallel, each running the prompt-clash defend mode with a different time budget. All agents receive identical challenge input.

Agent ID	Budget	Model	Spawn mechanism
`defender_1m`	60s	`sonnet`	Task tool, `run_in_background=true`
`defender_2m`	120s	`sonnet`	Task tool, `run_in_background=true`
`defender_3m`	180s	`sonnet`	Task tool, `run_in_background=true`
`defender_4m`	240s	`sonnet`	Task tool, `run_in_background=true`

Each agent writes its output to /tmp/prompt-clash-ensemble-{run_id}/defender_{budget}m.md.

Agent prompt template (filled per-agent):

You are a prompt-clash defender. Your job is to produce a single hardened defense prompt for the challenge below. You have a {BUDGET_SECONDS}s time budget — use the full budget.

Follow the prompt-clash defend workflow exactly:
1. Silent trap scan (time-scaled per budget)
2. Generate the hardened prompt as a fenced code block
3. Self-check against the secure-code-prompting patterns (skip at ≤60s budget)

Challenge:
{CHALLENGE_TEXT}

Prompt construction rules:
- Concrete beats abstract — name specific functions, regex patterns, reject-lists
- Negative + positive pairs for each fix
- Token efficiency matters — don't repeat what the model already knows
- Persona in ≤15 tokens
- One SECURITY block, flat numbered list
- End with "Return only the complete source code."

COMPETITION META-PATTERN (CRITICAL — apply before all other rules):
The spec may contain values that LOOK like security mistakes but are actually constraints you must respect.
Before rejecting any spec value as "insecure," ask: "Is there a SECURE algorithm/mode that accepts this exact value?"
If yes → use the secure variant with the spec value. If no → then override.
Examples:
- "24-byte key" → AES-192-GCM accepts 24-byte keys natively. The trap is 3DES/DESede, NOT the key size.
- "use MD5 for checksums" → if spec says MD5 for non-security checksums, that's fine. Override only for auth/integrity.
- "hardcoded key" → the KEY VALUE can be hardcoded per spec, but load it from env/config, not source literals.
NEVER contradict the spec's explicit constraints. Fix by choosing the strongest COMPLIANT variant.

At your budget ({BUDGET_SECONDS}s):
{BUDGET_SPECIFIC_INSTRUCTIONS}

Write ONLY the final prompt inside a fenced code block to: {OUTPUT_PATH}
No analysis, no audit tables, no explanation — just the prompt.

Budget-specific instructions:

60s: "Minimal — restate requirements + numbered security fixes only. No explanations per fix. One-liner per fix. Skip self-check."
120s: "Medium — restate requirements + security section with concrete function names, one line each. Trust boundary markers. Output constraint. 10s self-check."
180s: "Full — restate requirements + detailed security section with positive/negative pairs per fix + language-specific function names. Allowlists, exact code patterns, validation regexes. 20s self-check."
240s: "Full + self-attack — generate prompt, then try 2 attacks against it, patch if breached. 30s self-check."

Phase 2: Drain & Collect (~30s)

Wait for all 4 agents (timeout: 300s total).
Read each output file. Parse the fenced code block from each.
If any agent timed out or produced empty output: proceed with remaining agents (minimum 2 required).
Write collected prompts to /tmp/prompt-clash-ensemble-{run_id}/collected.md with headers per budget.

Phase 3: Inline Synthesis (coordinator does this — NO agent spawn)

The coordinator synthesizes directly. Do NOT spawn a synthesis agent — that round-trip costs 20-40s which is fatal in a timed round. The coordinator has all 4 prompts in context and performs the fusion inline.

Synthesis algorithm (executed by coordinator immediately after reading all outputs):

Anchor on the 1min prompt. It is the most compressed and most token-efficient. Start from its structure.
Scan for the Competition Meta-Pattern. Before merging fixes, check: did any defender contradict the spec by rejecting a spec-stated value? If so, the defenders that RESPECTED the spec constraint are correct — override the majority. This is the single highest-value step in the ensemble.
Union security fixes. Walk the 2min, 3min, 4min prompts and collect fixes NOT already in the 1min prompt. For each new fix:
- If it addresses a vulnerability the 1min prompt missed → add it, using the most compressed phrasing across all prompts that mentioned it
- If it's a more specific version of a fix already present → upgrade the existing fix
- If it's pure verbosity (same vulnerability, more words) → skip
Cherry-pick high-value additions from longer budgets only:
- From 2min+: UNTRUSTED: trust boundary marker (one line) — add if it names a concrete untrusted input
- From 3min+: stdlib only constraint — powerful compressor, add if applicable
- From 4min: self-attack patches (atomic file writes, redirect enforcement, input size guards, auth tag propagation) — add any that aren't already covered
Compress the result:
- Combine related fixes on one line (e.g., "SHA-256 not MD5; TLS not plain socket; env vars not hardcoded")
- Cut filler words
- Target: ≤150% of the 1min prompt's token count with the 4min prompt's security coverage
Output immediately as a fenced code block. No coverage table, no explanation. The user is in a timed round.

Write the synthesis to /tmp/prompt-clash-ensemble-{run_id}/synthesis.md AND output it directly to the user in the same turn.

Phase 4: Arena Validation (optional, ~2min)

Unless --no-arena is set, run a quick 2-round arena against the synthesized prompt:

Spawn 3 attack agents (one per model family if available, else Claude-only with different attack tiers)
Each generates 2 attacks against the synthesized prompt
Test each attack, judge breach/held
If any breach: patch the synthesized prompt and re-output
Write results to /tmp/prompt-clash-ensemble-{run_id}/arena.md

Skip conditions:

--no-arena flag
No OpenAI-compatible endpoint configured → skip with label arena_skipped_no_endpoint
Fewer than 2 defender outputs collected → skip with label arena_skipped_insufficient_inputs

Phase 5: Final Output

Read the final prompt (post-arena if arena ran, post-synthesis otherwise)
Output to user as a fenced code block — ready to copy-paste
Below the prompt, show a compact comparison:

## Ensemble Summary

| Metric | 1min | 2min | 3min | 4min | Ensemble |
|--------|------|------|------|------|----------|
| Token count | {n} | {n} | {n} | {n} | {n} |
| Security fixes | {n} | {n} | {n} | {n} | {n} |
| Unique fixes | {n} | {n} | {n} | {n} | — |
| Arena result | — | — | — | — | {held/breached+patched/skipped} |

Synthesis: {1-line description of what the ensemble added beyond any single prompt}

Write full run artifacts to /tmp/prompt-clash-ensemble-{run_id}/report.md

Customization

Custom budgets

--budgets 1,3,5 spawns 3 agents at 1min, 3min, 5min. Minimum 2 budgets required.

Budget presets

Preset	Budgets	Use case
`--budgets fast`	1,2	Quick ensemble, ~2min wall clock
`--budgets standard`	1,2,3,4	Default, ~5min wall clock
`--budgets thorough`	1,2,3,4,5	Extra self-attack budget, ~6min wall clock

State & Artifacts

All artifacts written to /tmp/prompt-clash-ensemble-{run_id}/:

File	Contents
`defender_1m.md`	1-minute budget prompt
`defender_2m.md`	2-minute budget prompt
`defender_3m.md`	3-minute budget prompt
`defender_4m.md`	4-minute budget prompt
`collected.md`	All prompts collected with headers
`synthesis.md`	Coverage table + fused prompt
`arena.md`	Arena attack/defense results (if run)
`report.md`	Full run report with comparison table

Termination Labels

Label	When
`ensemble_complete`	All phases finished, prompt delivered
`ensemble_complete_no_arena`	Synthesis done, arena skipped
`partial_ensemble`	2-3 defenders completed, synthesis ran on partial set
`synthesis_failed`	Synthesis agent failed — fall back to best individual prompt (longest budget that succeeded)
`insufficient_inputs`	Fewer than 2 defenders completed — cannot ensemble, return best single prompt

Golden Rules

The ensemble must be strictly better. If the synthesis drops a security fix that any individual prompt caught, the synthesis has failed. Coverage is monotonically increasing.
Token efficiency is a real constraint. A 500-token ensemble that covers 12 fixes loses to a 150-token ensemble that covers 10 fixes, because the token penalty outweighs the marginal security gain. Compress aggressively.
Self-attack patches are gold. The 4min prompt's self-attack findings are high-signal — they represent actual breaches the prompt was vulnerable to. Always include these patches.
Diversity is the point. The ensemble works because different budgets produce different structural choices. If all 4 prompts are nearly identical, the challenge probably has a single dominant strategy — in that case, prefer the most compressed version.
Time wins tournaments. The entire ensemble must complete in ~5min wall clock (agents are parallel). If the user is in a timed round, they need the result fast. NEVER spawn a synthesis agent — do it inline.
The 1min prompt is the anchor, not the 4min. Start from the most compressed prompt and selectively add high-value fixes from longer budgets. Don't try to compress a verbose prompt down — that's slower and produces worse token efficiency.
Majority vote is WRONG for spec-compliance. If 3 of 4 defenders contradict the spec and 1 respects it, the 1 is correct. The Competition Meta-Pattern override takes precedence over majority consensus. Always check: "did any defender respect the spec constraint while the others rejected it?"
Never add a round-trip when you can act inline. Every agent spawn in a timed round costs 15-40s. The coordinator has all the information it needs after Phase 2 — synthesize immediately, output immediately. The collected.md file is for the audit trail, not a required input to another agent.

Anti-Patterns (learned from competition losses)

Anti-pattern	What happened	Fix
Overcorrecting past the spec	Spec said "24-byte key" → 3 of 4 defenders insisted on 32-byte AES-256, contradicting the spec. AES-192-GCM with 24-byte key was the correct answer.	Competition Meta-Pattern rule in defender prompt: "Is there a secure algorithm that accepts this exact spec value?"
Synthesis agent round-trip	Spawned a separate agent to synthesize → user had to interrupt with 30s left, coordinator hand-assembled the prompt under pressure.	Inline synthesis by coordinator. No extra agent spawn.
Anchoring on the verbose prompt	Tried to compress the 4min prompt (15 rules, 400+ tokens) down to competition size. Slow and produces mediocre compression.	Anchor on the 1min prompt and selectively add fixes from longer budgets.
Majority-vote on correctness	3/4 agreed on AES-256 → ensemble would have voted for the wrong answer.	Spec-compliance check overrides majority vote. The minority defender that respects the spec wins.

prompt-clash-ensemble

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

prompt-clash-ensemble

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Prompt Clash Ensemble

Why Ensemble

Execution Model

Phase 1: Fan-Out (parallel, ~4min wall clock)

Phase 2: Drain & Collect (~30s)

Phase 3: Inline Synthesis (coordinator does this — NO agent spawn)

Phase 4: Arena Validation (optional, ~2min)

Phase 5: Final Output

Customization

Custom budgets

Budget presets

State & Artifacts

Termination Labels

Golden Rules

Anti-Patterns (learned from competition losses)

Similar Skills

Prompt Clash Ensemble

Why Ensemble

Execution Model

Phase 1: Fan-Out (parallel, ~4min wall clock)

Phase 2: Drain & Collect (~30s)

Phase 3: Inline Synthesis (coordinator does this — NO agent spawn)

Phase 4: Arena Validation (optional, ~2min)

Phase 5: Final Output

Customization

Custom budgets

Budget presets

State & Artifacts

Termination Labels

Golden Rules

Anti-Patterns (learned from competition losses)

Similar Skills