From silver-bullet
Runs repeatable benchmark and adversarial evaluation workflows across agents, models, providers, prompts, or implementation approaches. Defines task fixtures, scoring rubrics, and produces decision reports.
How this skill is triggered — by the user, by Claude, or both
Slash command
/silver-bullet:silver-benchmark <benchmark task> [--providers <list>] [--rounds N]<benchmark task> [--providers <list>] [--rounds N]The summary Claude sees in its skill listing — used to decide when to auto-load this skill
SB-owned benchmark workflow for repeatable evaluation. External providers may
SB-owned benchmark workflow for repeatable evaluation. External providers may enrich the run only when installed and requested; SB owns the fixture, scoring, evidence, and final decision.
Write or update .planning/BENCHMARK.md.
The report must include:
SILVER BULLET > BENCHMARK.silver:domain-audit --pack benchmark-eval.silver:review or silver:research when the
benchmark drives implementation.A benchmark result is valid only when the fixture, rubric, raw evidence, and decision rationale are sufficient for another session to reproduce the comparison.
npx claudepluginhub alo-exp/silver-bullet --plugin silver-bulletGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.