Name: crucible
Author: raddue

Stats

Actions

Available In

Tags

Crucible

A collection of agent skills for systematic software development. Works with Claude Code, Cursor, OpenAI Codex, Amp, Cline, and any platform that supports the SKILL.md format.

52 skills across the full development lifecycle — design, planning, TDD implementation, code review, debugging, adversarial testing, and quality gates. 12 core skills are eval-tested with measured A/B deltas: +23% on execution evals (52 evals, 475 assertions, graded blind on Claude Opus 4.8) and +31% on sequence/ordering evals (18 evals, Opus 4.6 — not yet re-run on 4.8).

Skill value scales inversely with model capability: the same execution suite moved +29% on Opus 4.6 → +23% on Opus 4.8. The methodology is scaffolding that keeps a model on track, so the stronger the base model, the less lift it adds — one datapoint consistent with that thesis (not a controlled comparison — the two runs differ in more than the base model, since the methodology was also corrected between them, so the move can't be cleanly attributed to capability alone; see the eval caveats). On weaker models (Sonnet, Haiku, or non-Anthropic models in Cursor/Codex), we expect the deltas to widen — a prediction, not yet measured.

Originally forked from obra/superpowers, now independently maintained and significantly diverged. Pipeline checkpoint system, auto skill extraction, structured context compression, and trajectory capture inspired by NousResearch/hermes-agent.

Marketplace

Platform

Status

Claude Code

Pending review

skills.sh

npx skills add raddue/crucible

Cursor / Codex / Amp / Cline

Compatible — same SKILL.md format

What you get

Iterative quality gates, not single-pass review — quality-gate loops until clean (red-team → fix → fresh re-review), with weighted stagnation detection. Measured +55% delta (93% with vs 38% without — the largest delta in the suite even on Opus 4.8; see eval results).

Full pipeline orchestration — build chains design → plan → execute → complete with shadow-git checkpoints, crash recovery, and structured compaction-state blocks. One command, idea to merged PR.

Adversarial testing at every level — adversarial-tester writes tests designed to break the implementation; inquisitor runs 5 parallel attack dimensions against the full feature diff; siege runs 6 attacker-perspective security agents until zero Critical/High.

Token-efficient by design — orchestrators use disk-mediated dispatch: full subagent prompts go to /tmp, only ~100-token pointers enter orchestrator context. A full build saves 73-131K tokens of fossilized context.

Self-calibrating, not self-certifying — gate verdicts append to a machine-local calibration ledger; /calibration-reconcile walks merged fix/* branches to falsify past verdicts and score each skill's Brier calibration, /ledger renders the honest "caught N silent bugs" headline with a rolling-median inflation detector, and the Book of Grudges surfaces past regressions on the files you're about to touch. Receipts between orchestrator and subagents are checked by a runtime linter (scripts/rcpt_verify.py), so a fabricated verdict can't pass unwitnessed.

Crash-resilient and session-continuous — replay resumes interrupted pipelines from the last phase boundary (10 minutes, not 90); recall queries a searchable session activity index that survives compaction.

Forge-agnostic iterative code review — temper runs fresh-eyes review loops on PRs from any platform (GitHub, GitLab, Bitbucket, self-hosted) or raw <base>..<head> ranges. Multi-round convergence with stagnation detection; optional external-model second opinion via MCP. Renamed from /code-review to avoid collision with Claude Code's built-in /review.

See docs/architecture.md for how the orchestrator skills compose, and docs/skills.md for the full catalog.

Install

Claude Code

git clone [email protected]:raddue/crucible.git ~/repos/crucible ln -sf ~/repos/crucible/skills/* ~/.claude/skills/ mkdir -p ~/.claude/agents ln -sf ~/repos/crucible/agents/* ~/.claude/agents/

Or, when available on the marketplace: claude plugin install raddue/crucible. Note: the per-role model pins are currently delivered by the symlink install above — under a plugin install the agent types are namespaced (crucible:…) and bare-name dispatch resolution is unconfirmed (see harness-adapter §8).

Crucible

A collection of agent skills for systematic software development. Works with Claude Code, Cursor, OpenAI Codex, Amp, Cline, and any platform that supports the SKILL.md format.

Marketplace

Platform	Status
Claude Code	Pending review
skills.sh	`npx skills add raddue/crucible`
Cursor / Codex / Amp / Cline	Compatible — same SKILL.md format

What you get

Iterative quality gates, not single-pass review — quality-gate loops until clean (red-team → fix → fresh re-review), with weighted stagnation detection. Measured +55% delta (93% with vs 38% without — the largest delta in the suite even on Opus 4.8; see eval results).
Full pipeline orchestration — build chains design → plan → execute → complete with shadow-git checkpoints, crash recovery, and structured compaction-state blocks. One command, idea to merged PR.
Adversarial testing at every level — adversarial-tester writes tests designed to break the implementation; inquisitor runs 5 parallel attack dimensions against the full feature diff; siege runs 6 attacker-perspective security agents until zero Critical/High.
Token-efficient by design — orchestrators use disk-mediated dispatch: full subagent prompts go to /tmp, only ~100-token pointers enter orchestrator context. A full build saves 73-131K tokens of fossilized context.
Self-calibrating, not self-certifying — gate verdicts append to a machine-local calibration ledger; /calibration-reconcile walks merged fix/* branches to falsify past verdicts and score each skill's Brier calibration, /ledger renders the honest "caught N silent bugs" headline with a rolling-median inflation detector, and the Book of Grudges surfaces past regressions on the files you're about to touch. Receipts between orchestrator and subagents are checked by a runtime linter (scripts/rcpt_verify.py), so a fabricated verdict can't pass unwitnessed.
Crash-resilient and session-continuous — replay resumes interrupted pipelines from the last phase boundary (10 minutes, not 90); recall queries a searchable session activity index that survives compaction.
Forge-agnostic iterative code review — temper runs fresh-eyes review loops on PRs from any platform (GitHub, GitLab, Bitbucket, self-hosted) or raw <base>..<head> ranges. Multi-round convergence with stagnation detection; optional external-model second opinion via MCP. Renamed from /code-review to avoid collision with Claude Code's built-in /review.

See docs/architecture.md for how the orchestrator skills compose, and docs/skills.md for the full catalog.

Install

Claude Code

git clone [email protected]:raddue/crucible.git ~/repos/crucible
ln -sf ~/repos/crucible/skills/* ~/.claude/skills/
mkdir -p ~/.claude/agents
ln -sf ~/repos/crucible/agents/* ~/.claude/agents/

crucible

Popularity

What's Inside

Confidence

README

Crucible

Marketplace

What you get

Install

Claude Code

Similar Plugins

agentic-dev-team

claudekit

hb

aidd-dev

codingbuddy

superpowers

Crucible

Marketplace

What you get

Install

Claude Code

Popularity

Health & Quality

Similar Plugins

agentic-dev-team

claudekit

hb

aidd-dev

codingbuddy

superpowers