From claude-commands
Display Dark Factory pipeline node graphs including spec-review pipelines, factory gates, node types, edge conditions, handler mappings, and pipeline selection decision table.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-commands:factory-specThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Dark Factory runs via the **`dark-factory` binary**, not `python -m runner` from
Dark Factory runs via the dark-factory binary, not python -m runner from
a raw checkout:
~/projects/dark-factory/install.sh
export DARK_FACTORY_HOME=~/projects/dark-factory
export PATH="$HOME/.local/bin:$PATH"
/fs / /factory-spec — graph reference + Step 0 classification + pipeline pick./f / /factory — run pipelines via dark-factory (see dark-factory skill).Pipelines and prompts resolve from $DARK_FACTORY_HOME; implementation work
happens in the caller's cwd (--workdir defaults to cwd).
/f or /factory)Full decision table:
~/projects/dark-factory/docs/pipeline-selection.md
Do not default every run to one .dot. If the user did not pass --pipeline,
classify the goal (Step 0 below) and pick from this quick guide:
| Task | Pipeline |
|---|---|
| Smoke / wiring | pipelines/factory/hello.dot |
| New feature (full loop) | pipelines/slim/minimal_feature.dot |
| PR iteration (no holdout) | pipelines/slim/minimal_pr.dot |
| Validate diff + holdout | pipelines/factory/gates.dot |
| PR gates only | pipelines/factory/pr_gates.dot |
| Spec review slim | benchmarks/attractor-spec-review/pipelines/review_slim.dot |
| Spec review full | benchmarks/attractor-spec-review/pipelines/review_full.dot |
| Brownfield replace/delete | custom goal + delete-first rules; often minimal_feature.dot or custom .dot |
Short names for --pipeline: gates, hello, pr_gates, minimal_pr,
minimal_feature, review_slim, review_full.
Execution command (from target repo cwd):
export PATH="$HOME/.local/bin:$PATH"
dark-factory --pipeline pipelines/slim/minimal_pr.dot --goal "..." --backend claude
Show the factory pipeline graph structure at a glance — node types, edges, conditions, and handler mappings — without running a pipeline. Use this when you need to remember what nodes exist, what the wiring looks like, or which pipeline to pick for a given goal.
Before picking a pipeline or writing the spec/goal, decide which kind of change this is.
Getting this wrong is the #1 cause of a factory run that reaches exit/success while
certifying the wrong thing. (Real failure 2026-05-30: a replace-the-backend-override
task was run with a greenfield additive pipeline → the override was never deleted, a parallel
mechanism was bolted on top, net LOC was +2507/−54, and an unwired Pydantic model passed
test_e2e as dead code.)
Use the standard pipelines (hello.dot, minimal_feature.dot, review_*.dot):
plan → implement → test → review → holdout → gates. Net-positive LOC is expected.
The task removes or replaces a code path that already runs in production. A greenfield
pipeline is wrong here. Encode these rules in the spec, the --goal, and the DAG:
implement node must remove or replace
the old path as part of the build, and the behavior/test_llm node then proves the new
path works with the old one already gone. Never "add new alongside old → prove → delete
later": the deletion gets orphaned (no node executes it) and the proof is confounded
(the old path's fallback/passthrough can mask a weak new path, so green proves nothing).implement node at all. Make the deletion part of
implement/fix, or add an explicit node.deletion-milestone skill.test_e2e green
while the code is dead (e.g. a model class only instantiated in its own test).Quick test: "If this milestone succeeds, should git diff show deletions of production
code?" If yes → brownfield → apply rules 1–6. If the planned diff is all additions for a
replace/delete goal, the pipeline is mis-shaped — STOP and re-architect delete-first.
These are the Attractor-style spec-validation pipelines under
benchmarks/attractor-spec-review/pipelines/. The key innovation: an
independent cold reviewer (codex exec --yolo) that numbers every spec
line and returns strict JSON line-by-line findings.
review_slim.dot — Line-Aware Spec Review (Slim)start ──▶ plan ──▶ implement ──▶ acceptance ──(success)──▶ review ──(success)──▶ exit
│ │
└──(fail)──▶ fix ◀──────┘ (max 3 visits)
│
└──▶ implement (loop)
| Node | Type | Command / Prompt | What it does |
|---|---|---|---|
start | built-in | — | Entry point |
plan | codergen | @benchmarks/attractor-spec-review/prompts/plan.md | Plan from spec |
implement | codergen | @benchmarks/attractor-spec-review/prompts/implement.md | Write code |
acceptance | tool | python scripts/validate_spec.py --spec spec/feature.md --report spec_review/validation_report.json | Line-numbered spec validation |
review | tool | benchmarks/attractor-spec-review/scripts/review_with_codex.sh . spec/feature.md spec_review/independent_reviewer.json | Independent cold reviewer via codex exec --yolo; returns JSON verdict |
fix | codergen | @benchmarks/attractor-spec-review/prompts/fix.md, max_visits=3 | Fix failures, loops back to implement |
exit | built-in | — | Terminal node |
Use when: validating a spec implementation against line-numbered acceptance
review_full.dot — Line-Aware Spec Review (Full)start ──▶ plan ──▶ implement ──▶ acceptance ──(success)──▶ stack_smoke ──(success)──▶ review ──(success)──▶ exit
│ │ │
└──(fail)──▶ fix ◀──────┘ └──(fail)──▶ fix
│
└──▶ implement (loop)
Same as slim, plus a stack_smoke node between acceptance and review:
| Node | Type | Command | What it does |
|---|---|---|---|
stack_smoke | tool | bash scripts/fullstack_smoke.sh | Full-stack smoke test before reviewer |
Use when: you need end-to-end runtime verification before the independent reviewer checks spec conformance.
The review node runs review_with_codex.sh which:
0001: ..., 0002: ...)codex exec --yolo — separate process, cold reviewerpass/fail)goal_gate=true + retry_target="fix" — failure routes to fix loopThis is the Attractor guarantee: the reviewer has never seen the implementation prompt, only the spec and the code diff.
gates.dot — 4-Gate Validationstart ──▶ holdout_eval ──(success)──▶ gate_es ──(success)──▶ gate_er ──(success)──▶ gate_cs ──▶ exit
│ │ │
└──(fail)──▶ exit └──(fail)──▶ exit └──(fail)──▶ exit
| Node | Type | Handler | What it does |
|---|---|---|---|
holdout | holdout_eval | Sealed evaluator at $DARK_FACTORY_HOLDOUTS/evaluator/run.py | Runs hidden test scenarios against the diff |
gate_es | gate_es | claude --print /es | Evidence standards check |
gate_er | gate_er | claude --print /er | Evidence review check |
gate_cs | gate_code_standards | claude --print /code_standards | ZFC + leveling + root-cause-first |
Use when: already-implemented diff needs Attractor-style 4-gate validation (requires holdout).
pr_gates.dot — 3-Gate PR Validation (No Holdout)start ──▶ gate_es ──(success)──▶ gate_er ──(success)──▶ gate_cs ──▶ exit
│ │
└──(fail)──▶ exit └──(fail)──▶ exit
| Node | Type | Handler | What it does |
|---|---|---|---|
gate_es | gate_es | claude --print /es | Evidence standards check |
gate_er | gate_er | claude --print /er | Evidence review check |
gate_cs | gate_code_standards | claude --print /code_standards | ZFC + leveling + root-cause-first |
Use when: validating an in-flight PR diff (like gates.dot but bypasses holdout features).
hello.dot — Plan/Implement/Fix Loopstart ──▶ plan ──▶ implement ──▶ holdout_eval ──(success)──▶ exit
│
└──(fail)──▶ fix ──▶ holdout_eval (loop, max 3 visits)
Use when: adding a new feature from scratch with a holdout scenario.
minimal_feature.dot — Full Feature Factory (Slim)start ──▶ plan ──▶ implement ──▶ test ──(success)──▶ review ──(success)──▶ holdout ──(success)──▶ gate_es ──(success)──▶ gate_er ──(success)──▶ exit
│ │ │ │
└──(fail)──▶ fix ◀─┘ │ │
└──(fail)──▶ fix ┘ │
└──(fail)──▶ fix ┘
fix ──▶ test (loop)
Use when: full production pipeline from scratch: test → review → holdout → evidence gates.
minimal_pr.dot — Slim PR Iteration Factory (No Holdout)start ──▶ plan ──▶ implement ──▶ test ──(success)──▶ review ──(success)──▶ gate_es ──(success)──▶ gate_er ──(success)──▶ exit
│ │ │ │
└──(fail)──▶ fix ◀─┘ │ │
└──(fail)──▶ fix ┘ │
└──(fail)──▶ fix ┘
fix ──▶ test (loop)
Use when: in-flight PR iteration loop with parameterized test commands (--state slim.test_command="...") and evidence checks, bypassing behavioral holdout scenarios.
Node type attr | Handler function | Behavior |
|---|---|---|
codergen | _codergen | Render prompt template, dispatch to backend (claude/codex/ao/agy) |
tool | _tool | Shell out to command="..." attribute |
holdout_eval | _holdout_eval | Run sealed evaluator from $DARK_FACTORY_HOLDOUTS |
gate_es | _gate_es | Shell out to claude --print /es |
gate_er | _gate_er | Shell out to claude --print /er |
gate_code_standards | _gate_code_standards | Shell out to claude --print /code_standards |
human_gate | _human_gate | Block on stdin or use ctx.state["<node>.outcome"] |
conditional | _conditional | Hexagon decision node; outcome from ctx.state[decision_key] |
| DOT shape | Handler |
|---|---|
Mdiamond | start |
Msquare | exit |
hexagon | conditional |
condition="key=value" — matches ctx.state[key] == valuecondition="key!=value" — matches ctx.state[key] != value--backend flag | Per-node backend attr | CLI invoked |
|---|---|---|
echo | — | Deterministic mock, no LLM |
claude | claude | claude --print --dangerously-skip-permissions |
codex | codex | codex exec --yolo |
ao | ao | Agent Orchestrator ao spawn |
agy | agy | agy --print --dangerously-skip-permissions |
benchmarks/attractor-spec-review/pipelines/benchmarks/attractor-spec-review/scripts/pipelines/factory/ and pipelines/slim/runner/handlers.pyrunner/engine.pyrunner/parser.pynpx claudepluginhub jleechanorg/claude-commands --plugin claude-commandsRuns a goal through the Dark Factory DOT pipeline runner, an external Python implementation of the Attractor pattern. Use for reproducible, replayable pipelines with step recording to CXDB and sealed holdouts, instead of in-Claude subagent dispatch.
Runs a full build-and-ship pipeline from a natural language spec, including parsing, scenario generation, cross-provider evaluation, and satisfaction scoring. Useful for hands-off project generation and automated spec-to-delivery workflows.
Build agentic workflow pipelines in Python using Graflow, combining deterministic flow control with AI reasoning via Studio Agent and MCP tools.