From factory
(internal) Recover a factory pipeline run that `factory run resume` cannot untangle — a crashed/suspended session left tasks STUCK mid-stage, or a terminal partial run has recoverable drops to retry. Resets the resettable tasks via `factory rescue apply`, reopens a terminal run, then hands off to resume. v1 reconciles RUN STATE only.
How this skill is triggered — by the user, by Claude, or both
Slash command
/factory:rescue-protocolThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are the rescue orchestrator. `factory run resume` only re-checks the quota gate — it
You are the rescue orchestrator. factory run resume only re-checks the quota gate — it
never touches task state. When a crashed or suspended session left tasks stuck mid-stage
(so a re-drive would deadlock), or a terminal partial run has blocked-environmental drops
worth retrying, resume alone cannot recover it. Rescue is that missing seam: it resets the
resettable tasks, reopens a terminal run, then hands back to resume + the run loop.
You are Model A — the in-session orchestrator. The factory CLI is the deterministic
brain: factory rescue scan is a read-only REPORTER, factory rescue apply is the only
WRITER. The CLI never spawns agents; you spawn the read-only rescue-diagnostic agent for
ambiguous dead-ends. Never edit state.json by hand.
factory rescue apply. Never hand-edit state.json.scan is read-only; apply is the only mutation. Scan first, reason, then apply.apply resets only stuck ∪ recoverable
(blocked-environmental) tasks. A spec-defect/capability-budget drop is reset ONLY on
an explicit assertion the root cause is fixed (--include-dead-ends, or --task <id>).done task. It would un-ship merged work; apply makes it a LOUD error.rescue-diagnostic is read-only and advisory. Its decision drives whether you issue a
--task reset; it mutates nothing itself.reference/disposition-taxonomy.md./factory:rescue passes run=<id-or-empty> tasks=<csv-or-empty> include-dead-ends=<bool> dry-run=<bool>:
run → thread as --run <id> into every scan/apply (empty = default to runs/current).dry-run=true → run steps 1–3 only: scan + report, then stop. No apply, no resume.tasks (csv, non-empty) → skip the default/diagnostic path; reset exactly those ids with
--task <id> (repeated). A named dead-end is reset without --include-dead-ends.include-dead-ends=true → in step 5, reset all dead-ends (--include-dead-ends) instead of
diagnosing per task (the human has asserted the upstream root cause is fixed).Resolve the target run. Use --run <id> if given; otherwise the active run
(runs/current). If neither resolves, stop with no run to rescue. (Both scan and
apply default to runs/current themselves — pass --run only to override.)
Scan.
factory rescue scan [--run <id>]
Emits the read-only RescueScan (see reference/disposition-taxonomy.md):
{
"run_id", "run_status",
"counts": { "total", "shipped", "runnable", "stuck", "recoverable", "dead_end" },
"resettable": ["<task-id>", ...], // stuck ∪ recoverable — default apply resets these
"dead_ends": ["<task-id>", ...], // reset only on explicit assertion
"needs_rescue", "would_deadlock", "summary",
"tasks": [ { "task_id", "status", "disposition", "failure_class?", "failure_reason?", "branch?", "pr_number?" }, ... ]
}
Short-circuit if clean. If needs_rescue is false AND would_deadlock is false,
there is nothing for rescue to reset. Skip to step 7 (resume) if the run is non-terminal;
otherwise report summary (it may note dead-ends that need a fix + --include-dead-ends)
and stop. If dry-run=true, report the scan and stop here regardless — never apply.
If tasks was given (non-empty): skip the default + diagnostic paths entirely — apply
exactly those ids and go to step 7:
factory rescue apply [--run <id>] --task <id> [--task <id> ...]
Apply the default (safe) set. Resets stuck (crashed in-flight) + recoverable
(blocked-environmental) tasks, and reopens a terminal run that had work to reset:
factory rescue apply [--run <id>]
Emits { run_id, run_status, reset: [...], reopened, skipped: [...] }. run_status is
running when a terminal run was reopened. This is safe to auto-apply — it never resets a
dead-end and never touches done work.
Decide on dead-ends (only if any). For each id in scan.dead_ends, decide whether the
root cause has cleared. Two paths:
Diagnostic-gated (autonomous). Spawn the read-only rescue-diagnostic agent — one
Agent() per dead-end, in a single message — passing each task's scan line + the ground
truth you can gather (worktree_path, review_files, ci_logs_path, the durable
spec_path). See reference/diagnostic-agent-contract.md. Harvest each agent's final
message (its decision JSON). For every decision: "reset", reset that one:
factory rescue apply [--run <id>] --task <task-id> [--task <task-id> ...]
(Naming a task IS the assertion the cause is fixed, so --task resets a dead-end without
--include-dead-ends.) leave-dropped / no-action → leave it; the run finalizes
partial, which is the correct loud outcome.
Human-asserted. If a human has confirmed the upstream root cause is fixed for the whole set (e.g. the spec was amended, a stronger model is now available), reset them all:
factory rescue apply [--run <id>] --include-dead-ends
When run interactively and dead-ends exist, confirm with one AskUserQuestion
(reset-all-dead-ends / diagnose-each / leave-dropped) before resetting — resetting a
determined failure burns a full pipeline cycle.
Re-scan to confirm (optional). A second scan should show the reset tasks now
runnable and would_deadlock: false. apply is idempotent, so a re-run is a safe no-op.
Hand off to resume. Invoke the orchestrator skill directly (the autonomous path — no human round-trip):
Skill(pipeline-orchestrator) # then run its resume entry: factory run resume [--run <id>]
{ kind: "resumed", run } → continue the Phase 3 run loop; the driver now picks up the
reset (and reopened) tasks.{ kind: "still-blocked", run_id, status, reason, resets_at_epoch? } → the quota window
has not recovered. Report reason (+ resets_at_epoch if present) and stop; the reset
state is durable and a later resume continues from it.Do not tell the user to type /factory:run resume themselves — calling the skill directly
is the autonomous path. The slash command is only the manual-narration fallback.
factory run resume, no rescue needed.reference/disposition-taxonomy.md — the five dispositions, apply semantics, and the
GitHub-side drift that v1 explicitly defers.reference/diagnostic-agent-contract.md — the rescue-diagnostic input/output contract.scan/apply exit non-zero → surface the stderr verbatim; no state changes on a scan
failure (it is read-only), and apply mutates under a lock so a failed apply leaves the run
consistent.apply --task <id> on a done task → LOUD error (would un-ship); fix the id and retry.rescue-diagnostic agent that errors or returns unparseable JSON → treat as no-action
(leave the task dropped); never reset on a guess.apply already done stays
(it is idempotent and safe).npx claudepluginhub jfa94/factory --plugin factoryProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.