From forge-protocol
Use when the user mentions "forge", "forge protocol", "/forge-protocol", wants to set up a FORGE project, create a TASKSPEC, run an audit, generate session prompts, execute session gates, do scar loading, or manage multi-session agent-delegated development. Also trigger when user asks about spec-driven agent workflows, verification gates, regression enforcement, scar-based failure injection, or session dependency DAGs. NEVER trigger for generic project setup or test writing outside of FORGE context.
How this skill is triggered — by the user, by Claude, or both
Slash command
/forge-protocol:forge-protocolThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
*Shape it once. Strike until it holds.*
Shape it once. Strike until it holds.
Multi-session agent delegation with spec-driven execution, autonomous gate verification, and invisible workflow artifacts.
Solves: context decay (spec loaded every session), compounding rot (gates block progression), regression blindness (test suite catches breakage), spec-drift (discovery grounds the spec in reality before execution begins).
FORGE guarantees derivation integrity, not spec correctness. Wrong spec = wrong build, done correctly. Discovery exists to catch the "wrong spec" case before agents waste sessions on it.
SPEC (.forge/TASKSPEC.md) <- ground truth, append-only
|
DISCOVERY (.forge/DISCOVERY.md) <- verify spec against actual codebase
|
AUDIT (.forge/AUDIT.md) <- risk assessment + scar extraction
|
SESSION PROMPTS <- spec + discovery + scars -> scoped work + gates
|
AUTONOMOUS EXECUTION <- agent builds, runs gates, reports
|
HUMAN REVIEW <- reads verdict -> proceed / review / blocked
| Actor | Does | Does NOT |
|---|---|---|
| Orchestrator | Plans DAG, runs discovery, generates prompts, delegates to executor agents, manages state | Write project code |
| Executor(s) | Build, test, run all gates, commit, produce session reports | Decide to proceed to next session |
| Human | Reads session reports, go/no-go decisions, spec corrections when needed | Run gates, type commands, manage branches |
Branches -- derived from session titles in Build Order:
feat/data-layer, feat/signal-engine, fix/auth-timeout.
Parallel sessions on parallel branches. Merge to dev after approval. Never direct to main.
Tags -- semantic versions applied after merge to dev:
v0.1.0, v0.2.0. Session-to-version mapping defined in Build Order.
Commits -- conventional format referencing the feature:
feat: add dual-lane signal engine with redis caching.
No session numbers. No FORGE terms. No AI attribution. No session references in messages.
Artifacts -- everything FORGE-related lives in .forge/ (gitignored):
.forge/
TASKSPEC.md # canonical spec
DISCOVERY.md # spec-vs-reality verification
AUDIT.md # audit or risk report
AUDIT-SCARS.md # archived scars (large projects)
state.json # DAG progress
sessions/
01-data-layer/
prompt.md
output.md
02-signal-engine/
prompt.md
output.md
Only CLAUDE.md stays in project root (standard Claude Code file, not FORGE-specific).
.forge/TASKSPEC.md -- the project constitution. Contains: mission, stack, directory structure, data model, features with acceptance criteria, failure handling, env vars, and Build Order.
Build Order defines each session:
feat/data-layer)feat/data-layer)v0.1.0)Append-only during execution. Corrections via addendum + version increment (v1.0 -> v1.1). Original text stays frozen.
Session count heuristic: N ~ ceil(total_features / (context_budget * 0.6))
Runs after the spec is written, before the audit. The orchestrator reads the actual codebase and checks every assumption the spec makes. Discovery catches spec-drift -- the gap between what the requirements doc says and what the code actually does.
Discovery checks:
| Check | What to verify | How |
|---|---|---|
| File collisions | Does a file the spec says "create" already exist? | ls / find every target path |
| Package manager | Does the spec assume the right one? | Check for package-lock.json (npm), pnpm-lock.yaml (pnpm), yarn.lock (yarn), bun.lockb (bun) |
| Naming conventions | Do new models/files/routes match existing patterns? | Read the last 3-5 similar entities in the codebase |
| CI environment | Are env vars set to what the spec assumes? | Read the CI workflow file |
| Migration format | Does the migration naming match the project convention? | ls the migrations directory |
| Schema conventions | Do new DB models match existing ORM patterns? | Read the last model added to the schema |
| Dependency state | Are assumed packages already installed? | grep the lockfile or package.json |
| Script naming | Do new scripts match existing package.json conventions? | Read the scripts block |
Output: .forge/DISCOVERY.md -- a list of every spec assumption that was verified, with corrections for any that don't match reality. If corrections are found, the orchestrator amends the TASKSPEC (version increment) before proceeding to audit.
When to skip: Pure greenfield projects with no existing codebase. If the .forge/TASKSPEC.md Provenance field says greenfield and no code exists yet, skip discovery.
Why this exists: Requirements from external systems (Linear, Jira, PRDs, conversations) are written by humans who may not know the current state of the codebase. "Use pnpm" in a ticket doesn't make pnpm the package manager. "Create AI_PLATFORM_GUIDE.md" doesn't mean the file doesn't already exist. Discovery is the step that turns claims into verified facts.
Brownfield (existing code): Agent reads codebase, produces per-module verdicts: KEEP / PATCH / REWRITE / DELETE / UNCERTAIN. Extracts structured scars from discovered failures.
Greenfield (new project): Agent reads spec, projects risk per module: SIMPLE / MODERATE / COMPLEX / RISKY. Extracts scar seeds from projected failure modes.
Hybrid (new features on existing infrastructure): Treat as brownfield. Read the modules the spec touches, produce verdicts for those, project risk for new modules. Discovery must have already run.
Concrete failure injection into session prompts. Each scar:
| Field | Format |
|---|---|
| ID | S{session}-{N}, A{N} (audit), D{N} (discovery), or R{N} (risk) |
| Category | DATA-LOSS / SILENT-FAILURE / PERFORMANCE / CORRECTNESS / SECURITY / INTEGRATION / BUILD / SPEC-DRIFT |
| Description | Concrete past or projected failure -- never abstract advice |
| Severity | CRITICAL / HIGH / MEDIUM / LOW |
SPEC-DRIFT -- the spec assumed something about the codebase that isn't true. Examples: wrong package manager, file already exists at target path, naming convention mismatch, CI env var set to a dummy value, migration format doesn't match project convention. Discovery produces these; they're injected into session prompts so executors don't repeat the same mistakes.
Loading priority: CRITICAL always included. HIGH if relevant to this session's modules. MEDIUM from last two sessions only. LOW archived.
Global scars: Cross-project scars in references/global-scars.md are loaded into EVERY session prompt alongside project-specific scars. These encode failure patterns that recur across projects (binary file misclassification, env-var silent failures, etc.). Add new global scars when a failure pattern is project-agnostic.
Pruning (>6 sessions): Retain last two sessions + all CRITICAL. Archive rest to .forge/AUDIT-SCARS.md.
Self-contained execution cartridges carrying:
v1.2)Sessions declare depends_on in Build Order. Independent sessions run concurrently.
S1 -> S2a (parallel) -> S3 (merge) -> S4
S2b (parallel) /
Parallel dispatch: Orchestrator spawns one executor Agent per independent session, each in its own worktree (branch isolation). All execute concurrently.
Merge sessions: After parallel tracks complete, a dedicated merge session integrates branches into dev. The merge session runs the full test suite as regression. Conflicts resolved during merge.
Each executor agent, without human intervention:
devVerification gates -- discrete runnable commands testing this session's deliverables. Defined in the spec at planning time, not invented during execution.
Quality gates -- two tiers:
Regression -- run the project's full test suite. Passing suite = all prior sessions verified. First session also establishes test infrastructure and runner.
Every session produces a structured output report: status, deliverables completed, gate results (table), confidence per deliverable (HIGH/MEDIUM/LOW), discoveries, deviations from spec, new scars from failures encountered.
The report ends with a Human Action block:
## Human Action
VERDICT: PROCEED / REVIEW / BLOCKED
[PROCEED]
All gates passed. Confidence HIGH across deliverables.
-> Merge feat/[name] to dev, tag v[X.Y.Z].
-> Next: [session title]. Orchestrator can generate prompt.
[REVIEW]
Gates passed but attention needed:
- [ ] Review: [specific area or concern]
- [ ] Decide: [question requiring human judgment]
Estimated review: ~N minutes.
[BLOCKED]
Cannot proceed until resolved:
- [ ] [issue with root cause analysis]
- [ ] [what must happen before retry]
The human reads the verdict. PROCEED = move on. REVIEW = check flagged items then move on. BLOCKED = fix something.
Gate failure: Agent attempts fix in current session, re-runs all gates. If fix-forward fails twice -> report BLOCKED with root cause.
Test regression: If this session caused it -> agent fixes forward. If pre-existing cause -> report BLOCKED with diagnosis.
Spec is wrong: Human adds addendum to TASKSPEC.md: original assumption, what reality revealed, corrected assumption, affected sessions. Version increments. Orchestrator regenerates affected prompts.
forge init project-name scaffolds .forge/ structure with TASKSPEC template.forge/TASKSPEC.md with Build Order (titles, branches, tags, DAG).forge/DISCOVERY.md. If corrections found, amends TASKSPEC (version increment) before proceeding. Skip for pure greenfield.| Failure | Mitigation |
|---|---|
| Wrong spec | Discovery verifies spec assumptions against codebase before audit. Human reviews Discovery corrections. |
| Spec-drift from requirements | Discovery checks every "create file" target, package manager assumption, naming convention, CI env var. SPEC-DRIFT scars injected into sessions. |
| Audit false KEEP | UNCERTAIN verdict required for partially-read files |
| Scar bloat | Priority loading + archival pruning |
| Prompt drifts from spec | Prompts reference exact spec version |
| Spec-rot | Addendum protocol with version increment |
| Regression | Test suite catches it; recovery protocol handles it |
| Context too large | Budget estimate in prompt; split oversized sessions |
| Parallel merge conflict | Dedicated merge session with full test regression |
| Action | How |
|---|---|
| Start project | forge init my-project |
| Write spec | Edit .forge/TASKSPEC.md |
| Verify spec against codebase | forge discover |
| Audit existing code | forge audit |
| Assess greenfield risk | forge risk |
| Generate session prompt | forge prompt 1 |
| Run gates locally | forge gate 1 |
| Merge and tag session | forge merge 1 |
| Check project status | forge status |
| Add a scar | forge scar add "description" |
| Validate spec structure | forge validate |
For all templates, see references/templates.md.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub phj6688/claude-marketplace --plugin forge-protocol