Skill

factory-spec

Display Dark Factory pipeline node graphs including spec-review pipelines, factory gates, node types, edge conditions, handler mappings, and pipeline selection decision table.

developer-tools

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/claude-commands:factory-spec

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Dark Factory runs via the **`dark-factory` binary**, not `python -m runner` from

SKILL.md

283 lines · ~3.4k tokens

Stats

LanguagePython

Stars27

Forks4

MaintenanceExcellent

Last CommitJun 2, 2026

Actions

View Source View Plugin View on GitHub View README

/factory_spec — Dark Factory Spec Node Graph Reference

Install vs source

Dark Factory runs via the dark-factory binary, not python -m runner from a raw checkout:

~/projects/dark-factory/install.sh
export DARK_FACTORY_HOME=~/projects/dark-factory
export PATH="$HOME/.local/bin:$PATH"

/fs / /factory-spec — graph reference + Step 0 classification + pipeline pick.
/f / /factory — run pipelines via dark-factory (see dark-factory skill).

Pipelines and prompts resolve from $DARK_FACTORY_HOME; implementation work happens in the caller's cwd (--workdir defaults to cwd).

Pipeline selection (mandatory before `/f` or `/factory`)

Full decision table: ~/projects/dark-factory/docs/pipeline-selection.md

Do not default every run to one .dot. If the user did not pass --pipeline, classify the goal (Step 0 below) and pick from this quick guide:

Task	Pipeline
Smoke / wiring	`pipelines/factory/hello.dot`
New feature (full loop)	`pipelines/slim/minimal_feature.dot`
PR iteration (no holdout)	`pipelines/slim/minimal_pr.dot`
Validate diff + holdout	`pipelines/factory/gates.dot`
PR gates only	`pipelines/factory/pr_gates.dot`
Spec review slim	`benchmarks/attractor-spec-review/pipelines/review_slim.dot`
Spec review full	`benchmarks/attractor-spec-review/pipelines/review_full.dot`
Brownfield replace/delete	custom goal + delete-first rules; often `minimal_feature.dot` or custom `.dot`

Short names for --pipeline: gates, hello, pr_gates, minimal_pr, minimal_feature, review_slim, review_full.

Execution command (from target repo cwd):

export PATH="$HOME/.local/bin:$PATH"
dark-factory --pipeline pipelines/slim/minimal_pr.dot --goal "..." --backend claude

Purpose

Show the factory pipeline graph structure at a glance — node types, edges, conditions, and handler mappings — without running a pipeline. Use this when you need to remember what nodes exist, what the wiring looks like, or which pipeline to pick for a given goal.

Step 0 (MANDATORY): classify the task — Brownfield vs Greenfield

Before picking a pipeline or writing the spec/goal, decide which kind of change this is. Getting this wrong is the #1 cause of a factory run that reaches exit/success while certifying the wrong thing. (Real failure 2026-05-30: a replace-the-backend-override task was run with a greenfield additive pipeline → the override was never deleted, a parallel mechanism was bolted on top, net LOC was +2507/−54, and an unwired Pydantic model passed test_e2e as dead code.)

Greenfield — new behavior from scratch (net-additive)

Use the standard pipelines (hello.dot, minimal_feature.dot, review_*.dot): plan → implement → test → review → holdout → gates. Net-positive LOC is expected.

Brownfield — refactor / replace / delete existing behavior

The task removes or replaces a code path that already runs in production. A greenfield pipeline is wrong here. Encode these rules in the spec, the --goal, and the DAG:

DELETE-FIRST ordering (not delete-last). The implement node must remove or replace the old path as part of the build, and the behavior/test_llm node then proves the new path works with the old one already gone. Never "add new alongside old → prove → delete later": the deletion gets orphaned (no node executes it) and the proof is confounded (the old path's fallback/passthrough can mask a weak new path, so green proves nothing).
Deletion needs an executor node — never a conditional. Do NOT write "(N) only AFTER the proof, delete X" in a goal: nothing in the DAG runs a post-proof deletion, and resume pipelines (test-onward) have no implement node at all. Make the deletion part of implement/fix, or add an explicit node.
Net-LOC ≤ 0 guard. For replace/delete milestones, add a gate/acceptance check that FAILS if net production LOC > 0 (the new code must displace at least as much as it adds). See deletion-milestone skill.
Dead-code gate. Add a check that FAILS if a newly-added module/symbol is defined but unreferenced in runtime (non-test). Otherwise a passing unit test makes test_e2e green while the code is dead (e.g. a model class only instantiated in its own test).
Replace at the same call site. The new mechanism must be wired into the same call site the old one used, and the old one removed in the same change — not added at a new site while the old site still runs.
Prove against the post-deletion tree. The real-LLM / behavior test must execute with the old path already deleted, so a green verdict cannot be produced by the old fallback.

Quick test: "If this milestone succeeds, should git diff show deletions of production code?" If yes → brownfield → apply rules 1–6. If the planned diff is all additions for a replace/delete goal, the pipeline is mis-shaped — STOP and re-architect delete-first.

Spec-Review Pipelines (Primary)

These are the Attractor-style spec-validation pipelines under benchmarks/attractor-spec-review/pipelines/. The key innovation: an independent cold reviewer (codex exec --yolo) that numbers every spec line and returns strict JSON line-by-line findings.

1. `review_slim.dot` — Line-Aware Spec Review (Slim)

start ──▶ plan ──▶ implement ──▶ acceptance ──(success)──▶ review ──(success)──▶ exit
                                    │                       │
                                    └──(fail)──▶ fix ◀──────┘ (max 3 visits)
                                                      │
                                                      └──▶ implement (loop)

Node	Type	Command / Prompt	What it does
`start`	built-in	—	Entry point
`plan`	`codergen`	`@benchmarks/attractor-spec-review/prompts/plan.md`	Plan from spec
`implement`	`codergen`	`@benchmarks/attractor-spec-review/prompts/implement.md`	Write code
`acceptance`	`tool`	`python scripts/validate_spec.py --spec spec/feature.md --report spec_review/validation_report.json`	Line-numbered spec validation
`review`	`tool`	`benchmarks/attractor-spec-review/scripts/review_with_codex.sh . spec/feature.md spec_review/independent_reviewer.json`	Independent cold reviewer via `codex exec --yolo`; returns JSON verdict
`fix`	`codergen`	`@benchmarks/attractor-spec-review/prompts/fix.md`, `max_visits=3`	Fix failures, loops back to implement
`exit`	built-in	—	Terminal node

Use when: validating a spec implementation against line-numbered acceptance

independent reviewer, without full-stack smoke.

2. `review_full.dot` — Line-Aware Spec Review (Full)

start ──▶ plan ──▶ implement ──▶ acceptance ──(success)──▶ stack_smoke ──(success)──▶ review ──(success)──▶ exit
                                    │                       │                       │
                                    └──(fail)──▶ fix ◀──────┘                       └──(fail)──▶ fix
                                                      │
                                                      └──▶ implement (loop)

Same as slim, plus a stack_smoke node between acceptance and review:

Node	Type	Command	What it does
`stack_smoke`	`tool`	`bash scripts/fullstack_smoke.sh`	Full-stack smoke test before reviewer

Use when: you need end-to-end runtime verification before the independent reviewer checks spec conformance.

Key node: Independent Reviewer

The review node runs review_with_codex.sh which:

Numbers every spec line (0001: ..., 0002: ...)
Shells out to codex exec --yolo — separate process, cold reviewer
Returns strict JSON: line-by-line findings + verdict (pass/fail)
goal_gate=true + retry_target="fix" — failure routes to fix loop

This is the Attractor guarantee: the reviewer has never seen the implementation prompt, only the spec and the code diff.

Factory Pipelines (Validation Gates)

3. `gates.dot` — 4-Gate Validation

start ──▶ holdout_eval ──(success)──▶ gate_es ──(success)──▶ gate_er ──(success)──▶ gate_cs ──▶ exit
                 │                      │                      │
                 └──(fail)──▶ exit      └──(fail)──▶ exit     └──(fail)──▶ exit

Node	Type	Handler	What it does
`holdout`	`holdout_eval`	Sealed evaluator at `$DARK_FACTORY_HOLDOUTS/evaluator/run.py`	Runs hidden test scenarios against the diff
`gate_es`	`gate_es`	`claude --print /es`	Evidence standards check
`gate_er`	`gate_er`	`claude --print /er`	Evidence review check
`gate_cs`	`gate_code_standards`	`claude --print /code_standards`	ZFC + leveling + root-cause-first

Use when: already-implemented diff needs Attractor-style 4-gate validation (requires holdout).

3.5 `pr_gates.dot` — 3-Gate PR Validation (No Holdout)

start ──▶ gate_es ──(success)──▶ gate_er ──(success)──▶ gate_cs ──▶ exit
             │                      │
             └──(fail)──▶ exit      └──(fail)──▶ exit

Node	Type	Handler	What it does
`gate_es`	`gate_es`	`claude --print /es`	Evidence standards check
`gate_er`	`gate_er`	`claude --print /er`	Evidence review check
`gate_cs`	`gate_code_standards`	`claude --print /code_standards`	ZFC + leveling + root-cause-first

Use when: validating an in-flight PR diff (like gates.dot but bypasses holdout features).

4. `hello.dot` — Plan/Implement/Fix Loop

start ──▶ plan ──▶ implement ──▶ holdout_eval ──(success)──▶ exit
                                      │
                                      └──(fail)──▶ fix ──▶ holdout_eval (loop, max 3 visits)

Use when: adding a new feature from scratch with a holdout scenario.

5. `minimal_feature.dot` — Full Feature Factory (Slim)

start ──▶ plan ──▶ implement ──▶ test ──(success)──▶ review ──(success)──▶ holdout ──(success)──▶ gate_es ──(success)──▶ gate_er ──(success)──▶ exit
                                    │                  │                  │                  │
                                    └──(fail)──▶ fix ◀─┘                  │                  │
                                                          └──(fail)──▶ fix ┘                  │
                                                                               └──(fail)──▶ fix ┘

fix ──▶ test (loop)

Use when: full production pipeline from scratch: test → review → holdout → evidence gates.

6. `minimal_pr.dot` — Slim PR Iteration Factory (No Holdout)

start ──▶ plan ──▶ implement ──▶ test ──(success)──▶ review ──(success)──▶ gate_es ──(success)──▶ gate_er ──(success)──▶ exit
                                    │                  │                  │                  │
                                    └──(fail)──▶ fix ◀─┘                  │                  │
                                                          └──(fail)──▶ fix ┘                  │
                                                                               └──(fail)──▶ fix ┘

fix ──▶ test (loop)

Use when: in-flight PR iteration loop with parameterized test commands (--state slim.test_command="...") and evidence checks, bypassing behavioral holdout scenarios.

Handler type registry

Node `type` attr	Handler function	Behavior
`codergen`	`_codergen`	Render prompt template, dispatch to backend (claude/codex/ao/agy)
`tool`	`_tool`	Shell out to `command="..."` attribute
`holdout_eval`	`_holdout_eval`	Run sealed evaluator from `$DARK_FACTORY_HOLDOUTS`
`gate_es`	`_gate_es`	Shell out to `claude --print /es`
`gate_er`	`_gate_er`	Shell out to `claude --print /er`
`gate_code_standards`	`_gate_code_standards`	Shell out to `claude --print /code_standards`
`human_gate`	`_human_gate`	Block on stdin or use `ctx.state["<node>.outcome"]`
`conditional`	`_conditional`	Hexagon decision node; outcome from `ctx.state[decision_key]`

Shape-based handler fallback

DOT shape	Handler
`Mdiamond`	`start`
`Msquare`	`exit`
`hexagon`	`conditional`

Edge condition syntax

condition="key=value" — matches ctx.state[key] == value
condition="key!=value" — matches ctx.state[key] != value
No condition — unconditional (fallback)
Conditional edges tried before unconditional ones

Backend routing

`--backend` flag	Per-node `backend` attr	CLI invoked
`echo`	—	Deterministic mock, no LLM
`claude`	`claude`	`claude --print --dangerously-skip-permissions`
`codex`	`codex`	`codex exec --yolo`
`ao`	`ao`	Agent Orchestrator `ao spawn`
`agy`	`agy`	`agy --print --dangerously-skip-permissions`

Source files

Spec-review pipelines: benchmarks/attractor-spec-review/pipelines/
Spec-review scripts: benchmarks/attractor-spec-review/scripts/
Factory pipelines: pipelines/factory/ and pipelines/slim/
Handlers: runner/handlers.py
Engine: runner/engine.py
Parser: runner/parser.py

factory-spec

Popularity

Invocation

Context Preview

SKILL.md

factory-spec

Popularity

Invocation

Context Preview

SKILL.md

/factory_spec — Dark Factory Spec Node Graph Reference

Install vs source

Pipeline selection (mandatory before /f or /factory)

Purpose

Step 0 (MANDATORY): classify the task — Brownfield vs Greenfield

Greenfield — new behavior from scratch (net-additive)

Brownfield — refactor / replace / delete existing behavior

Spec-Review Pipelines (Primary)

1. review_slim.dot — Line-Aware Spec Review (Slim)

2. review_full.dot — Line-Aware Spec Review (Full)

Key node: Independent Reviewer

Factory Pipelines (Validation Gates)

3. gates.dot — 4-Gate Validation

3.5 pr_gates.dot — 3-Gate PR Validation (No Holdout)

4. hello.dot — Plan/Implement/Fix Loop

5. minimal_feature.dot — Full Feature Factory (Slim)

6. minimal_pr.dot — Slim PR Iteration Factory (No Holdout)

Handler type registry

Shape-based handler fallback

Edge condition syntax

Backend routing

Source files

Similar Skills

/factory_spec — Dark Factory Spec Node Graph Reference

Install vs source

Pipeline selection (mandatory before /f or /factory)

Purpose

Step 0 (MANDATORY): classify the task — Brownfield vs Greenfield

Greenfield — new behavior from scratch (net-additive)

Brownfield — refactor / replace / delete existing behavior

Spec-Review Pipelines (Primary)

1. review_slim.dot — Line-Aware Spec Review (Slim)

2. review_full.dot — Line-Aware Spec Review (Full)

Key node: Independent Reviewer

Factory Pipelines (Validation Gates)

3. gates.dot — 4-Gate Validation

3.5 pr_gates.dot — 3-Gate PR Validation (No Holdout)

4. hello.dot — Plan/Implement/Fix Loop

5. minimal_feature.dot — Full Feature Factory (Slim)

6. minimal_pr.dot — Slim PR Iteration Factory (No Holdout)

Handler type registry

Shape-based handler fallback

Edge condition syntax

Backend routing

Source files

Similar Skills

Pipeline selection (mandatory before `/f` or `/factory`)

1. `review_slim.dot` — Line-Aware Spec Review (Slim)

2. `review_full.dot` — Line-Aware Spec Review (Full)

3. `gates.dot` — 4-Gate Validation

3.5 `pr_gates.dot` — 3-Gate PR Validation (No Holdout)

4. `hello.dot` — Plan/Implement/Fix Loop

5. `minimal_feature.dot` — Full Feature Factory (Slim)

6. `minimal_pr.dot` — Slim PR Iteration Factory (No Holdout)

Pipeline selection (mandatory before `/f` or `/factory`)

1. `review_slim.dot` — Line-Aware Spec Review (Slim)

2. `review_full.dot` — Line-Aware Spec Review (Full)

3. `gates.dot` — 4-Gate Validation

3.5 `pr_gates.dot` — 3-Gate PR Validation (No Holdout)

4. `hello.dot` — Plan/Implement/Fix Loop

5. `minimal_feature.dot` — Full Feature Factory (Slim)

6. `minimal_pr.dot` — Slim PR Iteration Factory (No Holdout)