By mwarger
Evidence-to-artifact pipeline with autonomous improvement loops: spec intake, adaptive clarification, adversarial review, self-replicating autoresearch cycles (doer/judge/arbiter/strategist), and canonical readiness gates
Stamp a self-replicating four-bead autoresearch loop (doer/judge/arbiter/strategist) for autonomous iterative improvement of any artifact. Use after interactive intake when you have a program and want to run an autonomous improvement loop with blind scoring. Triggers on: autoresearch, research loop, autonomous loop, overnight loop, iterative improvement.
Bootstrap Forge in the current project. Detects project type, configures validation commands, installs pre-commit hook, and vendors ralph-loop. Use this when setting up a new project for Forge spec runs and bead execution.
Create a subject-named specification from any evidence source using a reducer-based Forge workflow. Use this when the user wants a planning-ready spec, a clean-room reverse spec, or an evidence-first feature spec with sub-agent fanout, provenance tracking, adaptive clarification, speculative variants, and a canonical readiness contract.
Stress-test a subject spec for ambiguity, gaps, contradictions, and untestable claims using dynamic agent teams. Use this when the spec has passed completeness and synthesis-review gates and needs adversarial validation before readiness promotion.
Decompose an implementation plan into br beads with dependency wiring, epic grouping, and provenance labels. Use this after spec-plan-handoff when the user accepts the beads generation prompt.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Turn evidence into implementation-ready artifacts through autonomous iterative refinement.
Forge is the third generation of a single idea. Each project explored part of the problem. Forge reconnects them.
The first system. A three-phase SDLC framework: REVERSE (input → spec), DECOMPOSE (spec → beads), FORWARD (beads → code). Super-Ralph proved that you can encode an entire development methodology — intake interrogation, spec generation, task decomposition, implementation — as structured bead packs with fat descriptions. The AI agent doesn't need to understand the methodology; it just executes the bead it's given. Strategy lives in the bead descriptions, not in the runner.
Super-Ralph separated strategy from execution. It defined what to do (the three-phase loop, the skill-specific question banks, the completion signals) and delegated how to run it to ralph-tui. Any compatible bead runner could execute Super-Ralph's process graphs without knowing the underlying methodology.
Key insight: the full SDLC is a loop — reverse/decompose/forward — and each phase can run autonomously if you encode the methodology in the task descriptions.
The second system. Trace took Super-Ralph's REVERSE phase and went deep. Instead of a single spec-generation pass, Trace built a 12-phase pipeline with formal structure: evidence classification, provenance tracking, a single-reducer merge protocol, 12-dimension scoring ontology, adversarial review by dynamic agent teams, and a readiness state machine with blocker rules. The spec doesn't become planning-ready until adversarial agents find nothing wrong with it.
Where Super-Ralph's reverse phase produced specs through iterative interrogation (interactive or autonomous), Trace added rigor: every canonical claim must trace back to evidence, readiness gates enforce coverage thresholds, and the pipeline won't hand off a spec with unresolved blockers — even if the scores look good.
Key insight: readiness is a state machine with blocker rules, not a score threshold. Adversarial stress-testing before handoff catches what scoring alone misses.
Both systems converge on the same primitive: hypothesis → act → evaluate → keep/discard → repeat. Super-Ralph applies it as reverse/decompose/forward phases. Trace applies it to spec quality through iterative evidence processing. Forge makes the loop first-class and self-replicating.
Forge reconnects Trace's refined spec pipeline with an autonomous improvement loop inspired by Karpathy's autoresearch pattern: a four-bead cycle (doer/judge/arbiter/strategist) that self-replicates, running overnight if needed, with blind scoring that can't be gamed. After intake, you choose: work interactively (Trace's adaptive clarification), or hand it off to an autoresearch loop. Either path converges at the same readiness gate. The same loop pattern works for spec refinement, code implementation, or any artifact that can be scored.
Key insight: the spec pipeline and the execution engine are the same loop at different scales. Make it self-replicating and you can walk away.
Everything in Forge reduces to this:
hypothesis → act → evaluate → keep/discard → repeat
Super-Ralph's three phases are this loop at the SDLC scale. Trace's pipeline is this loop at the spec-quality scale. Forge's autoresearch cycle is this loop at the iteration scale — concrete, mechanical, self-replicating:
doer-N → judge-N → arbiter-N → strategist-N → (stamps N+1)
Each bead is a fresh agent with no prior context. The bead description IS the context. No context rot. No accumulated confusion. Every iteration starts clean.
After spec intake, Forge presents a choice:
A) Evidence-first loop — interactive spec-loop with adaptive clarification.
Best when you're available for questions.
B) Autoresearch loop — autonomous iterative improvement with blind scoring.
Best for overnight runs or well-defined programs.
Both paths converge at the same READINESS_GATE. The spec doesn't care how it got there.
You stay in the conversation. The pipeline asks clarifying questions mapped to critical decision buckets, processes evidence units, drafts spec sections, and loops until readiness gates pass. Then adversarial review, plan handoff, optional beads. This is the full 12-phase pipeline, unchanged from Trace.
Best for: new features, sparse prompts, anything where the critical decisions haven't been made yet.
npx claudepluginhub mwarger/forge --plugin forgeQRDS-PI cycle: interview-driven SDLC on bd bead state machines
Spec-from-evidence pipeline with autoresearch loops: build subject-named specs with sub-agent fanout, provenance tracking, adaptive clarification, blind scoring, self-replicating bead cycles, and a canonical readiness contract
Ultra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Memory compression system for Claude Code - persist context across sessions
Multi-model consensus engine integrating OpenAI Codex CLI, Gemini CLI, and Claude CLI for collaborative code review and problem-solving.
Curate auto-memory, promote learnings to CLAUDE.md and rules, extract proven patterns into reusable skills.
Editorial "Web Designer" bundle for Claude Code from Antigravity Awesome Skills.
Core skills library for Claude Code: TDD, debugging, collaboration patterns, and proven techniques