Skill

lab-journal-discipline

Use when designing a new experiment (read `experiments/NOTES.md` first to avoid repeating known-failed approaches), when a cell has finished and its metrics are committed (append an entry), or when correcting a wrong number in a prior entry (in-place fix is allowed and often required — leaving a wrong number on the page poisons future context; preserve the trail with strikethrough or a correction note when feasible, but accuracy of current evidence comes first) — enforces read-before-design, append-only-after-completion, every number traceable to an actual run (no estimates, no assumptions, no projected numbers), and root-cause hypotheses for every failed variant.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/lab-notebook-skills:lab-journal-discipline

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

`experiments/NOTES.md` is the project's running journal of completed runs — the lab notebook of what has actually been tried. Three operations: read it before designing a new experiment, append to it after a run finishes and its metrics are committed, and correct prior numbers in place when they turn out to be wrong against the actual run artifacts. Future-you treats it as primary evidence abou...

Supporting Files

examples/NOTES.md

SKILL.md

185 lines · ~2.9k tokens

Stats

Stars0

MaintenanceExcellent

Last CommitApr 29, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Lab Journal Discipline

Overview

experiments/NOTES.md is the project's running journal of completed runs — the lab notebook of what has actually been tried. Three operations: read it before designing a new experiment, append to it after a run finishes and its metrics are committed, and correct prior numbers in place when they turn out to be wrong against the actual run artifacts. Future-you treats it as primary evidence about what the project has learned, so it must reflect what the runs actually produced — not plans, not guesses, not retroactive reinterpretations of hypotheses or conclusions.

Violating the letter of the rules is violating the spirit of the rules.

When to Use

Designing a new experiment — read experiments/NOTES.md before drafting the design
A cell has finished and its metrics are committed — append a new entry
Correcting a wrong number in a prior entry — fix it in place against the run artifact (see "Correcting a Wrong Number" below); preserve the trail with strikethrough or a correction note when feasible

When to Read It

Before drafting any new experiment design — this skill's "designing a new experiment" trigger — open experiments/NOTES.md and skim:

the running summary table of pipeline / metric evolution (if present)
prior experiments on adjacent topics
explicitly-recorded failed variants and their root causes

The point is to avoid re-running an experiment whose result is already on the page (especially failures). If you're about to propose variant X and NOTES.md already has a section explaining why X regressed and what the root cause was, surface that to the user and propose the next step from there instead.

When to Append

Append a new entry only after a cell or experiment has actually run and its metrics are committed. Every number in the entry must trace to an actual run — no estimates, no assumptions, no projected numbers, no "approximately" fill-ins. Do not pre-register expected results, do not write the entry while the run is in flight, do not write up an experiment that exists only as a plan.

If a cell crashed before producing metrics, that is also a recordable result — write the entry with what actually happened (e.g. "OOM at step 3000, did not complete") rather than projected numbers.

Correcting a Wrong Number

Corrections are allowed and often required. A number that turns out to be wrong (typo, re-run produced different value, judge / metric definition changed, data bug discovered later) must be corrected — leaving a known-wrong number on the page poisons every future read of the journal, because the next designer (human or agent) will treat it as evidence and build the next experiment on top of it.

In-place editing is permitted. Accuracy of current evidence beats strict immutability.

Preferred form, when the trail is useful to preserve:

- accuracy: ~~0.823~~ → 0.847 *(corrected 2026-04-29: original was the
  pre-fix run; see `expNN_*/eNNx/results_v2.json`)*

The strikethrough keeps the original visible, the arrow gives the corrected value, and the parenthetical names the date, reason, and the artifact the new number comes from.

Acceptable when context-pollution is the dominant concern (e.g. a headline metric in the running summary table that downstream readers will skim first): replace the number outright and add a brief footnote or appended "Corrections" entry noting what changed and why.

What is not acceptable in either form:

Replacing a wrong number with another estimate, "approximately", or placeholder
Silent edits with no indication that a correction occurred
Deleting an entry to make a wrong number disappear

The constraint is evidence, not immutability: every number must trace to an actual run artifact, and every correction must name the artifact the new number comes from. The audit trail is a nice-to-have — accuracy of the current state is the must-have.

Entry Schema

Each entry covers one experiment (one expNN_* dir) and includes:

Section	Content
Goal	What question this experiment was trying to answer.
Hypothesis	What you expected to see and why (1–3 sentences).
Method	What was actually run — which cells, which knobs varied, which were fixed. Reference the cell dirs by name.
Results table	One row per cell, columns for the metrics you care about, with a marked baseline / control row for comparison.
Key findings	The 2–5 things that would change someone's design choices going forward.
Failed variants	Any cell that regressed vs control. Include a root-cause hypothesis for each failure ("why this failed"), not just the numbers. This is the highest-value content in the journal — it is what prevents re-running the same losing experiment.
Conclusion / shipped?	Whether anything from this experiment was integrated into the root pipeline, and which cell won.
Files	Pointers to the cell dirs and the result JSONs.

A worked example of a notebook with multiple entries lives at examples/NOTES.md.

Format

Markdown. New entries appended at the bottom (or grouped by experiment number — pick one convention per project and keep it). Prior entries are edited only to correct numbers against the actual run artifacts, never to reinterpret, reword hypotheses, or polish conclusions. Tables are encouraged for results — they're easy to skim and easy to diff. A single running summary table at the top of the file ("Pipeline Evolution" or similar) that captures the headline metric of each experiment is very useful for spotting trends across experiments.

What NOT to do

Don't leave a known-wrong number sitting in a prior entry. Fix it (in place is fine — see "Correcting a Wrong Number" above), and name the run artifact the corrected number comes from.
Don't edit prior entries for any reason other than correcting a number to match actual evidence. No retroactive reinterpretation, no rewriting hypotheses to look smarter, no polishing language that would change the meaning.
Don't delete entries for failed experiments. Failures are the most valuable content for the next designer.
Don't record speculation as result. A line like "v6 will probably improve correctness by ~0.05" before v6 has run is forbidden. Wait, run it, then write what happened.
Don't leave estimated, rounded-to-look-nice, or "approximately" numbers in an entry. Every figure must come from a specific run artifact you can point at.
Don't treat NOTES.md as a replacement for the per-cell artifacts (logs, result JSONs, configs). NOTES.md summarizes; the cell dirs are the primary evidence.

Common Mistakes & Rationalizations

Excuse / mistake	Reality
"I'll write the NOTES.md entry now while I set up the run, and fill in the numbers later"	Forbidden. NOTES.md is append-only-after-completion. A draft entry that gets numbers patched in later quietly turns into "what I expected" rather than "what I observed." Wait until the run finishes.
"v6 will probably hit ~0.05 better, let me note that"	Speculation never goes in NOTES.md. Hypotheses go in the experiment's own README before running; observed results go in NOTES.md after.
"The prior number is wrong but I'll leave it and just append a correction below"	Half-right. Append-only is the fallback, not the goal. A wrong number in a results table will be picked up by the next reader who skims; fix it in place (strikethrough → corrected, with date and artifact reference), and only fall back to an appended note when in-place editing would be misleading or destructive.
"I don't have the exact number handy, I'll put ~0.82 and update later"	Forbidden. Every number traces to a specific run artifact. If you don't have the artifact in front of you, don't write the entry yet.
"The prior hypothesis sounds dumb in hindsight, let me reword it"	Forbidden. Corrections are for numbers that turn out to be wrong against actual evidence, not for retroactively making yourself look smarter. The hypothesis stays.
"I'll skip reading NOTES.md, I already know what I want to try"	Read it anyway. The whole point of the journal is that prior failures aren't always intuitive. Five minutes of reading saves a wasted run.

Red Flags — STOP and Reset

If you find yourself about to do any of these, STOP, surface the situation to the user, and propose a non-destructive alternative:

Writing to experiments/NOTES.md before the experiment has actually run and produced metrics
Writing a number you cannot point at a specific run artifact for (no estimates, no "approximately", no placeholders)
Editing a prior entry for any reason other than correcting a number against fresh evidence (no retroactive hypothesis polish, no reinterpretation, no deletion of failed entries)
Leaving a known-wrong number in a prior entry "to preserve history" — fix it (in place is fine, see "Correcting a Wrong Number") and name the artifact the new number comes from
Designing a new experiment without first reading experiments/NOTES.md

Why These Rules Exist

The rules look pedantic. They exist because every weakening creates a path to non-reproducible results.

Append-only-after-completion rule: A draft entry written before the run finishes silently turns into "what I expected to see" rather than "what I observed" — and once the entry is on the page, the next designer treats it as evidence. Pre-registering numbers and leaving estimates as placeholders both corrupt the journal's role as the project's source of truth about what has been tried.
Evidence-over-immutability rule: Numbers can and do turn out to be wrong. When that happens, the wrong number must be corrected — leaving it on the page poisons the next read of the journal, because the next designer will pick the headline number off a table without scrolling down to find an appended correction. In-place edits are therefore permitted, with strikethrough or a footnote where the trail is worth preserving. The rule is every number traces to a real run artifact, not no number ever changes.
NOTES.md read-before-design rule: The most expensive failure mode in experimental work is re-running a variant that was already tried and documented as losing. The journal exists specifically to make that mistake catchable in the planning step instead of the result step.

lab-journal-discipline

Invocation

Context Preview

Supporting Files

SKILL.md

lab-journal-discipline

Invocation

Context Preview

Supporting Files

SKILL.md

Lab Journal Discipline

Overview

When to Use

When to Read It

When to Append

Correcting a Wrong Number

Entry Schema

Format

What NOT to do

Common Mistakes & Rationalizations

Red Flags — STOP and Reset

Why These Rules Exist

See also

Similar Skills

Lab Journal Discipline

Overview

When to Use

When to Read It

When to Append

Correcting a Wrong Number

Entry Schema

Format

What NOT to do

Common Mistakes & Rationalizations

Red Flags — STOP and Reset

Why These Rules Exist

See also

Similar Skills