From epistemic-skills
Forces a decision (KILL, PIVOT, RECOMMIT, REFINE, SHIP) on a hypothesis using repository evidence. Use at falsification completion, adversary verdict, or cost cap breach.
How this skill is triggered — by the user, by Claude, or both
Slash command
/epistemic-skills:kill-or-shipThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Related skills:** `/skill:research-question`, `/skill:experiment-execution`, `/skill:falsification-review`, `/skill:surprise-triage`, `/skill:verification-before-publication`
Related skills:
/skill:research-question,/skill:experiment-execution,/skill:falsification-review,/skill:surprise-triage,/skill:verification-before-publication
This is the decision phase.
Not the coping phase.
Not the one more run phase.
Not the place where sunk cost gets a vote.
You must choose exactly one branch:
KILL — the current claim dies.PIVOT — the current claim dies, but the failure teaches a different claim worth registering as a new hypothesis.RECOMMIT — same claim, same method, bounded extra budget or time under a written override.REFINE — same claim, changed method, written override, explicit refinement count, then rerun from execution.SHIP — the claim survived the gates and is ready for publication verification.Two distinctions are non-negotiable:
PIVOT is still a kill of the old hypothesis.REFINE is not RECOMMIT. RECOMMIT keeps the method. REFINE changes it.COST_OVERRUN is not a sixth branch.
It is the LessonEntry.outcome you write when budget pressure forced the decision.
Current repo reality matters:
src/state/repo.ts is the canonical state surface.src/adversary/dispatch.ts is the adversary entrypoint.src/index.ts still shows registerKillCriteria(...) as planned, not active.So no live gate is going to save you from a sentimental decision. This skill is the gate.
| Branch | Same claim? | Same method? | Required writes | Lesson outcome |
|---|---|---|---|---|
KILL | No future work on this claim | n/a | HYPOTHESES.md -> KILLED, killReason, experiments/{id}/KILLED.md | "KILLED" or "COST_OVERRUN" |
PIVOT | No | No | old entry KILLED, kill reason points to new id, experiments/{id}/KILLED.md, new hypothesis entry | "PIVOT" |
RECOMMIT | Yes | Yes | OVERRIDES.md, possible cap or window change, status stays RUNNING | "COST_OVERRUN" only if budget forced it |
REFINE | Yes | No | OVERRIDES.md, increment Refinement count, rerun | none |
SHIP | Yes | Yes | confirmed result on disk, status CONFIRMED | none |
5:1 kill-to-ship is normal.
Pivots count as kills of the old claim.
Cost already spent still does not vote.
Most ideas should die. Some should pivot. A few should survive long enough to ship. Anything softer becomes zombie research.
Use this skill when:
falsified-or-unreproduciblecostCapSHIP might be availableKILLED.md, or write an overrideUse it especially when the decision feels awkward. That usually means emotion is trying to outvote evidence.
Do not use this skill:
experiments/{id}/prereg.md exists/skill:falsification-review/skill:surprise-triagesmokes/KILLED recordjudge.lock, stale baselines, or missing falsifier filesIf the idea itself is changing before the evidence exists, use /skill:research-question or /skill:preregistration instead.
Did the claim actually fail?
├─ yes
│ ├─ Ask first: "What does this failure teach us that we didn't know before?"
│ ├─ Concrete new claim, new contract, new id? -> PIVOT
│ └─ No concrete new claim? -> KILL
└─ no
├─ Same claim, same method, bounded extra budget or time? -> RECOMMIT
├─ Same claim, changed method? -> REFINE
└─ All gates clean, confirmed result on disk, no unresolved overrun? -> SHIP
REFINE is not a loophole after a real falsifier kill.
If the claim died, kill it or pivot it.
REFINE is for the same claim when the method changed and the claim itself is still live.
Read the actual repo state before deciding anything.
| Surface | Why it matters |
|---|---|
HYPOTHESES.md | canonical branch record, killReason, new hypothesis entry, refinement counter |
.epistemic/cost-ledger.jsonl | total spend and spend composition |
.epistemic/lessons.jsonl | cross-run memory via appendLesson() |
OVERRIDES.md | mandatory authorization for RECOMMIT and REFINE |
experiments/{id}/prereg.md | SHIP eligibility and method contract |
experiments/{id}/judge.lock | proof the judge did not drift |
experiments/{id}/smokes/ | provisional evidence only |
experiments/{id}/RESULTS.md | confirmed result required for SHIP |
experiments/{id}/KILLED.md | terminal artifact for KILL and the old side of a PIVOT |
experiments/{id}/falsifiers/ | why the claim survived or died |
BASELINES.md and experiments/repro_{name}/prereg.md | freshness and reproduction for comparison claims |
src/state/repo.ts | canonical helpers and types |
src/adversary/dispatch.ts | adversary verdict source |
src/index.ts | proves the kill gate is still planned, not enforced |
State helpers you will actually use here:
loadRepoState(cwd)loadHypotheses(cwd), getActiveHypothesis(entries), parseHypotheses(content)hypothesisToMarkdown(entry), saveHypotheses(cwd, entries), updateHypothesisStatus(cwd, id, status)fileExists(path)getHypothesisSpend(cwd, id), getHypothesisSpendByCategory(cwd, id), getAllHypothesisSpends(cwd)loadBaselines(cwd), getBaselineAgeDays(entry)getJudgeLock(cwd, id), computeJudgeHash(judgeRef, id)appendLesson(cwd, lesson)runFalsificationAdversary({ claim, context, cwd }) if the decision depends on missing or stale adversary outputCurrent repo reality:
HypothesisEntry supports killReason.LessonEntry.outcome supports "KILLED", "PIVOT", "COST_OVERRUN", and "UNREPRODUCIBLE_BASELINE".HypothesisEntry does not currently carry a refinement counter.So REFINE needs a visible - **Refinement count:** N line in the hypothesis block.
Preserve it deliberately.
Do not assume saveHypotheses(...) will keep unknown fields.
loadRepoState(cwd) for the top-level scaffold.loadHypotheses(cwd) and identify the active hypothesis.id explicitly.HypothesisEntry closely enough to answer:
experiments/{id}/.prereg.md, judge.lock, RESULTS.md, and KILLED.md with fileExists(...).smokes/, falsifiers/, and baselines/.getHypothesisSpend(cwd, id).getHypothesisSpendByCategory(cwd, id).getAllHypothesisSpends(cwd).getHypothesisSpendByCategory(cwd, id).llm and compute.$10 on LLM and $200 on Modal is not failing the same way as one that spent $180 on judge calls and $5 on compute.llm >> compute often means the hypothesis, judge, prompt, or search loop consumed the budget.compute >> llm often means the substrate, orchestration path, or execution economics consumed the budget.costCap.1.5 × costCap, treat it as a forced decision point.SHIP is closed until the overrun is explicitly resolved.outcome: "COST_OVERRUN".SHIP is closed if experiments/{id}/prereg.md is missing.SHIP is closed if judge.lock is missing or does not match computeJudgeHash(h.judgeRef, id).SHIP is closed if the claim still depends on smokes/.SHIP is closed if comparison language depends on a stale or unreproduced baseline.SHIP is closed if the falsifier files show unresolved falsified-or-unreproducible or cannot-audit verdicts.SHIP is closed if cost overrun was never explicitly resolved.RECOMMIT is closed if the claim changed.RECOMMIT is closed if the method changed.REFINE is closed if the claim changed.REFINE is closed if you cannot describe the old method, the new method, and why the claim itself still deserves to live.PIVOT is closed if you do not have a concrete new hypothesis.falsified, ask the pivot question firstTreat any falsified-or-unreproducible verdict as a real falsifier hit for this phase.
Ask this exact question before you even think about KILL:
What does this failure teach us that we didn't know before?
Then decide honestly:
PIVOT.KILL.same claim, but we need a different method, that is only REFINE when the claim itself survived and only the method is changing.REFINE is not available.PIVOT comes before KILL in this branch because learning is the only honest rescue.RECOMMIT from REFINEThis is where people lie to themselves.
Choose RECOMMIT only when all of these are true:
Choose REFINE only when all of these are true:
If the claim changed, it is not RECOMMIT.
If the claim changed, it is not REFINE.
It is either PIVOT or KILL.
KILLKILL means the current claim is dead and there is no concrete better claim to register right now.updateHypothesisStatus(cwd, id, "KILLED").loadHypotheses(cwd).killReason.saveHypotheses(cwd, entries).experiments/{id}/KILLED.md.llm, compute)KILLsmokes/, falsifier files, and ledger history.PIVOTPIVOT means the old claim died.KILLED.killReason so it points to the new hypothesis id and the lesson learned.experiments/{oldId}/KILLED.md.loadHypotheses(cwd), append the new HypothesisEntry, then persist with saveHypotheses(cwd, entries).id, claim, falsifier, bestCaseConclusion, n, judgeRef, baselineRef, costCap, computeTarget, status: OPEN, and timestamp./skill:research-question instead of faking specificity.RECOMMITRECOMMIT is same claim, same method, tighter remaining work.OVERRIDES.md.costCapRUNNINGCOST_OVERRUN lesson.REFINEREFINE keeps the claim and changes the method.OVERRIDES.md.- **Refinement count:** N line in that hypothesis block.REFINE.PIVOT./skill:experiment-execution.REFINE straight to publication.SHIPSHIP is the rare branch.judge.lock matches computeJudgeHash(...)experiments/{id}/RESULTS.mdsmokes/SHIP is not available.updateHypothesisStatus(cwd, id, "CONFIRMED").SHIP does not skip publication verification.On KILL, PIVOT, or budget-driven overrun decisions, append a LessonEntry through appendLesson() from src/state/repo.ts.
Do not hand-edit .epistemic/lessons.jsonl.
Use the real fields:
hypothesisIdoutcomesummarycostSpentrootCauseCanonical shape:
await appendLesson(cwd, {
timestamp: new Date().toISOString(),
hypothesisId: id,
outcome,
summary,
costSpent: totalSpend,
rootCause,
});
Decision-to-lesson mapping:
KILL -> outcome: "KILLED" unless budget pressure was the forcing reasonPIVOT -> outcome: "PIVOT"KILL or RECOMMIT -> outcome: "COST_OVERRUN"Write the lesson like an adult:
summary says what was learned or why the line stoppedrootCause names the mechanism, not the moodcostSpent is the real total from getHypothesisSpend(...)Good rootCause:
Modal compute burn dominated the run and no stable gain survived the locked judge.Falsification showed the gain existed only on long-context tasks, so the general claim died.Bad rootCause:
Not feeling itMaybe laterToo messyThe decision is not done until the repository tells one story without you present.
After KILL or PIVOT:
HYPOTHESES.md says KILLEDkillReason is presentexperiments/{id}/KILLED.md exists.epistemic/lessons.jsonl has the lesson rowAfter RECOMMIT:
OVERRIDES.md existsCOST_OVERRUN lessonRUNNING for a real reason, not habitAfter REFINE:
OVERRIDES.md existsRefinement count incremented/skill:experiment-executionAfter SHIP:
experiments/{id}/RESULTS.md is the authoritative artifactCONFIRMEDsmokes/If the files disagree, the decision is not finished.
| Excuse | Reality |
|---|---|
We already spent too much to stop now. | Prior spend is not evidence. It is exactly why you need a decision. |
Pivot is basically the same as keeping it alive. | No. PIVOT kills the old claim and creates a new id. |
Falsified means kill immediately. | First ask what the failure taught. PIVOT comes before KILL when the evidence supports a new claim. |
Refine and recommit are basically the same. | No. RECOMMIT keeps the method. REFINE changes it. |
The total cost is enough. | No. Read the split. $10 LLM + $200 Modal is a different failure mode from $180 of judge calls. |
We can write the lesson later. | Unwritten lessons are forgotten failures. Use appendLesson() now. |
We can reopen the killed hypothesis if the new idea works. | Silent revival is method fraud. New id required. |
The smokes look great, so ship is fine. | smokes/ is provisional. It does not authorize SHIP. |
The override can be one sentence. | Short excuses are why the 50-character minimum exists. |
Refinement count is bookkeeping. | It is churn accounting. If the same claim needed three method rewrites, that matters. |
Stop and restart the decision if:
smokes/COST_OVERRUN like a branch instead of a lesson label.epistemic/lessons.jsonl because the failure feels embarrassingHYPOTHESES.md, KILLED.md, RESULTS.md, and OVERRIDES.md tell different storiesAll of those mean the same thing: stop, reread the artifacts, and let the repository win.
# KILLED
- Hypothesis ID: h-017
- Claim: Router A improves answer quality over Router B across the full eval set.
- Decision: PIVOT
- Why old claim died: Falsification showed the gain vanished on short-context tasks under the locked judge.
- What we learned: The effect appears limited to long-context routing.
- Successor hypothesis: h-044
- Spend: $210.14 total ($12.08 llm, $198.06 compute)
Good because the old claim is dead, the lesson is explicit, and the new claim is narrower.
- Status: RUNNING
- Note: same idea, just with a slightly smarter framing
Bad because nothing died, nothing was learned, and the new contract is hidden.
## 2026-05-31 — Refine h-024
- Reason: The claim is unchanged, but the extraction parser was dropping valid answers and contaminating the score. We are keeping the same claim, comparator, metric, and judge, updating only the parser, and rerunning the full preregistered sample.
- Method change: parser v1 -> parser v2
- Hypothesis entry: Refinement count 2
Good because the claim stayed put, the method change is explicit, and the churn is counted.
## Override h-024
- Reason: Want a few more runs and some evaluation cleanup
Bad because evaluation cleanup is method change disguised as budget extension.
await appendLesson(cwd, {
timestamp: "2026-05-31T18:04:11.233Z",
hypothesisId: "h-031",
outcome: "COST_OVERRUN",
summary: "Killed after compute burn exceeded the budget without stable improvement.",
costSpent: 210.14,
rootCause: "Compute spend on modal dominated the run while the locked-judge win rate stayed flat.",
});
Good because the lesson says why the budget mattered, not just that the number was large.
# Maybe dead
Spent a lot.
Might revisit later.
Bad because it preserves deniability instead of recording a decision.
After SHIP, the next required skill is /skill:verification-before-publication.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub atomicstrata/epistemic --plugin epistemic-skills