By raddue
Enforce a systematic software development pipeline with 52 agent skills for TDD, code review, debugging, adversarial testing, quality gates, design exploration, and retrospective analysis—all coordinated from a single Claude session.
Stagnation judge for Crucible quality-gate — mechanical cross-round comparison of finding sets. Pinned to Sonnet (cheap mechanical check). Dispatched via disk-mediated dispatch.
Mechanical structural checker for Crucible quality-gate — fix verification and the persistence checker. Pinned to Sonnet (cheap structural check). Dispatched via disk-mediated dispatch.
Adversarial reviewer (Devil's Advocate) for Crucible quality-gate and red-team. Pinned to Opus because recall-critical adversarial review degrades on weaker models. Dispatched via disk-mediated dispatch.
Fix agent / Plan Writer for Crucible quality-gate — applies fixes for red-team findings (main fix loop, re-reviewed each round) and the post-pass minor quick-fix. Inherits the session model. Dispatched via disk-mediated dispatch.
Use after completing implementation to find unknown failure modes. Reads implementation diff and writes up to 5 tests designed to make it break. Triggers on 'break it', 'adversarial test', 'stress test implementation', 'find weaknesses', or any task seeking to expose unknown failure modes.
Recon-informed approach evaluator. Weighs competing options against codebase constraints and returns structured recommendations with confidence scoring, kill criteria, and evidence grounding. Consumes recon briefs or caller context. Used by design, spec, migrate. Triggers on /assay, 'evaluate approaches', 'which option', 'compare alternatives'.
Reconcile the Crucible calibration ledger — walk merged fix/hotfix branches to falsify the originating gating-verdicts, compute per-skill Brier calibration scores, and append a falsification log. Triggers on "/calibration-reconcile", "reconcile ledger", "reconcile calibration", "falsify verdicts", "brier score", "calibration reconcile", "compute brier".
Use when exploring unfamiliar code and want to persist what you learn, when starting a task and want to consult known codebase structure, or when collaborators need module-specific context for implementation or review
Shadow git checkpoint system for pipeline rollback. Creates working directory snapshots without modifying the project's git history. Invoked by build, quality-gate, and debugging orchestrators at pipeline boundaries.
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A collection of agent skills for systematic software development. Works with Claude Code, Cursor, OpenAI Codex, Amp, Cline, and any platform that supports the SKILL.md format.
52 skills across the full development lifecycle — design, planning, TDD implementation, code review, debugging, adversarial testing, and quality gates. 12 core skills are eval-tested with measured A/B deltas: +23% on execution evals (52 evals, 475 assertions, graded blind on Claude Opus 4.8) and +31% on sequence/ordering evals (18 evals, Opus 4.6 — not yet re-run on 4.8).
Skill value scales inversely with model capability: the same execution suite moved +29% on Opus 4.6 → +23% on Opus 4.8. The methodology is scaffolding that keeps a model on track, so the stronger the base model, the less lift it adds — one datapoint consistent with that thesis (not a controlled comparison — the two runs differ in more than the base model, since the methodology was also corrected between them, so the move can't be cleanly attributed to capability alone; see the eval caveats). On weaker models (Sonnet, Haiku, or non-Anthropic models in Cursor/Codex), we expect the deltas to widen — a prediction, not yet measured.
Originally forked from obra/superpowers, now independently maintained and significantly diverged. Pipeline checkpoint system, auto skill extraction, structured context compression, and trajectory capture inspired by NousResearch/hermes-agent.
| Platform | Status |
|---|---|
| Claude Code | Pending review |
| skills.sh | npx skills add raddue/crucible |
| Cursor / Codex / Amp / Cline | Compatible — same SKILL.md format |
quality-gate loops until clean (red-team → fix → fresh re-review), with weighted stagnation detection. Measured +55% delta (93% with vs 38% without — the largest delta in the suite even on Opus 4.8; see eval results).build chains design → plan → execute → complete with shadow-git checkpoints, crash recovery, and structured compaction-state blocks. One command, idea to merged PR.adversarial-tester writes tests designed to break the implementation; inquisitor runs 5 parallel attack dimensions against the full feature diff; siege runs 6 attacker-perspective security agents until zero Critical/High./tmp, only ~100-token pointers enter orchestrator context. A full build saves 73-131K tokens of fossilized context./calibration-reconcile walks merged fix/* branches to falsify past verdicts and score each skill's Brier calibration, /ledger renders the honest "caught N silent bugs" headline with a rolling-median inflation detector, and the Book of Grudges surfaces past regressions on the files you're about to touch. Receipts between orchestrator and subagents are checked by a runtime linter (scripts/rcpt_verify.py), so a fabricated verdict can't pass unwitnessed.replay resumes interrupted pipelines from the last phase boundary (10 minutes, not 90); recall queries a searchable session activity index that survives compaction.temper runs fresh-eyes review loops on PRs from any platform (GitHub, GitLab, Bitbucket, self-hosted) or raw <base>..<head> ranges. Multi-round convergence with stagnation detection; optional external-model second opinion via MCP. Renamed from /code-review to avoid collision with Claude Code's built-in /review.See docs/architecture.md for how the orchestrator skills compose, and docs/skills.md for the full catalog.
git clone [email protected]:raddue/crucible.git ~/repos/crucible
ln -sf ~/repos/crucible/skills/* ~/.claude/skills/
mkdir -p ~/.claude/agents
ln -sf ~/repos/crucible/agents/* ~/.claude/agents/
Or, when available on the marketplace: claude plugin install raddue/crucible. Note: the per-role model pins are currently delivered by the symlink install above — under a plugin install the agent types are namespaced (crucible:…) and bare-name dispatch resolution is unconfirmed (see harness-adapter §8).
npx claudepluginhub raddue/cruciblePersona-driven AI development team: orchestrator, team agents, review agents, skills, slash commands, and advisory hooks for Claude Code
Verification-first engineering toolkit for Claude Code. 15 skills across a 5-phase spine (Investigate → Design → Implement → Verify → Ship), 8 specialist agents, an interactive setup wizard. Every skill has rationalizations + evidence requirements. Built for senior ICs and tech leads.
Helder's personal SDLC toolbelt for AI coding agents — from PRD to ship. Bundles the tracer-bullet workflow alongside TDD, code review, audits, and shipping skills.
Code transformation: Dev SDLC orchestrator (code-shipping pipeline), plan, assert, audit, review, test, refactor, debug, for-sure. Hosts engineering agents.
PLAN/ACT/EVAL workflow with auto-detection, specialist agents, and reusable skills for systematic TDD development
Core skills library for Claude Code: TDD, debugging, collaboration patterns, and proven techniques