Search everything...

Stats

Actions

Available In

forge

Name: forge
Author: mwarger

By mwarger

Evidence-to-artifact pipeline with autonomous improvement loops: spec intake, adaptive clarification, adversarial review, self-replicating autoresearch cycles (doer/judge/arbiter/strategist), and canonical readiness gates

npx claudepluginhub mwarger/forge --plugin forge

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Skills11

autoresearch-loop

/autoresearch-loop

Stamp a self-replicating four-bead autoresearch loop (doer/judge/arbiter/strategist) for autonomous iterative improvement of any artifact. Use after interactive intake when you have a program and want to run an autonomous improvement loop with blind scoring. Triggers on: autoresearch, research loop, autonomous loop, overnight loop, iterative improvement.

forge-init

/forge-init

Bootstrap Forge in the current project. Detects project type, configures validation commands, installs pre-commit hook, and vendors ralph-loop. Use this when setting up a new project for Forge spec runs and bead execution.

forge-orchestrator

/forge-orchestrator

Create a subject-named specification from any evidence source using a reducer-based Forge workflow. Use this when the user wants a planning-ready spec, a clean-room reverse spec, or an evidence-first feature spec with sub-agent fanout, provenance tracking, adaptive clarification, speculative variants, and a canonical readiness contract.

spec-adversarial-review

/spec-adversarial-review

Stress-test a subject spec for ambiguity, gaps, contradictions, and untestable claims using dynamic agent teams. Use this when the spec has passed completeness and synthesis-review gates and needs adversarial validation before readiness promotion.

spec-beads-generate

/spec-beads-generate

Decompose an implementation plan into br beads with dependency wiring, epic grouping, and provenance labels. Use this after spec-plan-handoff when the user accepts the beads generation prompt.

Hooks1

Event Hooks

1 hook across 1 event

Stats

Version0.5.0

LanguageShell

Stars0

MaintenanceExcellent

LicenseMIT

Last CommitApr 2, 2026

AddedApr 1, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

forge

README

Forge

Turn evidence into implementation-ready artifacts through autonomous iterative refinement.

Lineage

Forge is the third generation of a single idea. Each project explored part of the problem. Forge reconnects them.

Super-Ralph (the methodology)

The first system. A three-phase SDLC framework: REVERSE (input → spec), DECOMPOSE (spec → beads), FORWARD (beads → code). Super-Ralph proved that you can encode an entire development methodology — intake interrogation, spec generation, task decomposition, implementation — as structured bead packs with fat descriptions. The AI agent doesn't need to understand the methodology; it just executes the bead it's given. Strategy lives in the bead descriptions, not in the runner.

Super-Ralph separated strategy from execution. It defined what to do (the three-phase loop, the skill-specific question banks, the completion signals) and delegated how to run it to ralph-tui. Any compatible bead runner could execute Super-Ralph's process graphs without knowing the underlying methodology.

Key insight: the full SDLC is a loop — reverse/decompose/forward — and each phase can run autonomously if you encode the methodology in the task descriptions.

Trace (the specification engine)

The second system. Trace took Super-Ralph's REVERSE phase and went deep. Instead of a single spec-generation pass, Trace built a 12-phase pipeline with formal structure: evidence classification, provenance tracking, a single-reducer merge protocol, 12-dimension scoring ontology, adversarial review by dynamic agent teams, and a readiness state machine with blocker rules. The spec doesn't become planning-ready until adversarial agents find nothing wrong with it.

Where Super-Ralph's reverse phase produced specs through iterative interrogation (interactive or autonomous), Trace added rigor: every canonical claim must trace back to evidence, readiness gates enforce coverage thresholds, and the pipeline won't hand off a spec with unresolved blockers — even if the scores look good.

Key insight: readiness is a state machine with blocker rules, not a score threshold. Adversarial stress-testing before handoff catches what scoring alone misses.

Forge (the synthesis)

Both systems converge on the same primitive: hypothesis → act → evaluate → keep/discard → repeat. Super-Ralph applies it as reverse/decompose/forward phases. Trace applies it to spec quality through iterative evidence processing. Forge makes the loop first-class and self-replicating.

Forge reconnects Trace's refined spec pipeline with an autonomous improvement loop inspired by Karpathy's autoresearch pattern: a four-bead cycle (doer/judge/arbiter/strategist) that self-replicates, running overnight if needed, with blind scoring that can't be gamed. After intake, you choose: work interactively (Trace's adaptive clarification), or hand it off to an autoresearch loop. Either path converges at the same readiness gate. The same loop pattern works for spec refinement, code implementation, or any artifact that can be scored.

Key insight: the spec pipeline and the execution engine are the same loop at different scales. Make it self-replicating and you can walk away.

The core loop

Everything in Forge reduces to this:

hypothesis → act → evaluate → keep/discard → repeat

Super-Ralph's three phases are this loop at the SDLC scale. Trace's pipeline is this loop at the spec-quality scale. Forge's autoresearch cycle is this loop at the iteration scale — concrete, mechanical, self-replicating:

doer-N → judge-N → arbiter-N → strategist-N → (stamps N+1)

Each bead is a fresh agent with no prior context. The bead description IS the context. No context rot. No accumulated confusion. Every iteration starts clean.

Two paths to readiness

After spec intake, Forge presents a choice:

A) Evidence-first loop — interactive spec-loop with adaptive clarification.
   Best when you're available for questions.

B) Autoresearch loop — autonomous iterative improvement with blind scoring.
   Best for overnight runs or well-defined programs.

Both paths converge at the same READINESS_GATE. The spec doesn't care how it got there.

Path A: Interactive (Forge's spec-loop)

You stay in the conversation. The pipeline asks clarifying questions mapped to critical decision buckets, processes evidence units, drafts spec sections, and loops until readiness gates pass. Then adversarial review, plan handoff, optional beads. This is the full 12-phase pipeline, unchanged from Trace.

Best for: new features, sparse prompts, anything where the critical decisions haven't been made yet.

Path B: Autoresearch loop

View full README on GitHub

forge

Popularity

What's Inside

Confidence

README

Forge

Lineage

Super-Ralph (the methodology)

Trace (the specification engine)

Forge (the synthesis)

The core loop

Two paths to readiness

Path A: Interactive (Forge's spec-loop)

Path B: Autoresearch loop

Similar Plugins

caveman

claude-mem

llm-council-plugin

self-improving-agent

antigravity-bundle-web-designer

superpowers

More by mwarger

ralph-crispies

forge

Forge

Lineage

Super-Ralph (the methodology)

Trace (the specification engine)

Forge (the synthesis)

The core loop

Two paths to readiness

Path A: Interactive (Forge's spec-loop)

Path B: Autoresearch loop

Popularity

Health & Quality

More by mwarger

ralph-crispies

forge

Similar Plugins

caveman

claude-mem

llm-council-plugin

self-improving-agent

antigravity-bundle-web-designer

superpowers