Skill

pipeline-scaffold — emit an experiment codebase from a slug or spec

Scaffold a self-contained experiment codebase (config, data loader, model, training/inference, eval, multi-seed runner, README) for a CS/ML/NLP study. Three task types: classification (HF Trainer + accuracy), generation (Seq2SeqTrainer + BLEU/ROUGE), prompt-eval (HF causal LM + exact-match, no training). Output: research/code/<slug>/. Use after experiment-design has produced a spec; pass it via --from-spec. For literature use lit-scan; for paper digests use lit-digest; for analyzing results use the future result-analyze.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/research-helper:pipeline-scaffold

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Given a task type and either a slug or an experiment-design spec, drop a

Supporting Files

SKILL.md

128 lines · ~1.4k tokens

Stats

LanguagePython

Parent stars0

MaintenanceGood

Last CommitJun 1, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

pipeline-scaffold — emit an experiment codebase from a slug or spec

Purpose

Given a task type and either a slug or an experiment-design spec, drop a runnable starter codebase into research/code/<slug>/. The user installs deps, edits the config (data path, model name), and runs.

When to invoke

Triggers:

"Scaffold experiment code"
"Set up training code for X"
"Make me a starter codebase for this study"
"I have a spec, now make the code"
User runs /research-helper:pipeline-scaffold <task>

Do NOT invoke for:

"Fix this bug" / "debug my training" — not scaffolding
"What does this code do?" — read it directly, don't overwrite
"Run my experiment" — scaffold emits code; doesn't execute it
"Make me an inference server" — out of scope

Workflow

1. Pick the right `--task`

If the user says	Use `--task`
"fine-tune BERT", "train a classifier", "GLUE / SST-2"	`classification`
"summarization", "translation", "T5 / BART fine-tune", "causal LM training"	`generation`
"prompt comparison", "in-context learning", "few-shot", "RAG eval", "no training"	`prompt-eval`
Unclear	Ask the user

2. Look for an existing experiment-design spec

Check research/experiments/. If a matching spec exists, pass it via --from-spec — this:

Reuses the spec's slug for the scaffold's directory
Parses **N seeds:** and pre-fills config.yaml
Parses **Primary** (...): and pre-fills the primary metric field
Adds a relative link to the spec in the scaffold's README

If multiple specs match, list them and ask the user which.

3. Confirm the slug + output dir

Default output is research/code/<slug>/. The user can override via --output. Confirm before running so we don't surprise them.

4. Call the script

With a spec:

python skills/pipeline-scaffold/scripts/scaffold.py \
  --task <task> \
  --from-spec research/experiments/<slug>-<date>.md

Without:

python skills/pipeline-scaffold/scripts/scaffold.py --task <task> --slug <slug>

If research/code/<slug>/ exists, the script refuses to overwrite. Pass --force to wipe-and-replace (confirm with the user first).

5. Walk the user through the generated README

Open research/code/<slug>/README.md. Surface in this order:

Setup: python -m venv venv, activate, pip install -r requirements.txt. For prompt-eval with a gated model: HF_TOKEN env var.
Configure: the config.yaml has placeholders that MUST be set:
- model_name (default is a small reasonable choice)
- dataset (default is a small reasonable choice)
- For classification: text_column / label_column must match the dataset
- For generation: input_column / target_column
- For prompt-eval: prompt_template_id and possibly extra templates in prompt.py
Run single seed: python main.py --seed 42 --config config.yaml
Run all seeds: bash run_all_seeds.sh

Don't dump the file tree into chat — point at the README.

6. Don't auto-run training

The user runs the code. Surface the commands; stop there. Only run if the user explicitly says "yes, run it now."

7. Pre-run sanity checks to mention

Disk space for model + checkpoint + cache (10-50 GB for medium models)
GPU available (script auto-detects via torch.cuda.is_available())
For gated models (Llama, Mistral): huggingface-cli login or HF_TOKEN
If --from-spec was used: spot-check that the seeds in config.yaml match what the spec said. If parse_seeds warned to stderr, the seeds may have defaulted.

8. Where it lives

research/code/<slug>/. Tell the user the path; don't paste the file contents unless asked.

Failure modes

Output dir exists — script refuses; pass --force (confirm with user). If iterating, prefer a new --slug over force-overwriting.

Spec not found — exit 1 with "spec not found: <path>". Ask the user to verify the path.

Spec filename doesn't match <slug>-YYYY-MM-DD.md — parsing still works (seeds, primary metric); only the slug can't be derived from the filename. Pass --slug explicitly.

Spec parse warnings — script prints pipeline-scaffold: <warning> to stderr but doesn't fail. Note in handoff to the user that defaults were used for whatever didn't parse.

--task invalid — argparse rejects.

Neither --slug nor --from-spec — argparse error with helpful message.

Setup notes

Stdlib only for the scaffold itself. The EMITTED code has per-task requirements.txt files the user installs after scaffolding.

Defaults summary

Knob	Default	When to change
`--task`	(required)	—
`--output`	`research/code/<slug>/`	Override if the user asks
`--from-spec`	(none)	Use when a matching spec exists in `research/experiments/`
`--force`	off	Turn on only when intentionally overwriting

pipeline-scaffold — emit an experiment codebase from a slug or spec

Invocation

Context Preview

Supporting Files

SKILL.md

pipeline-scaffold — emit an experiment codebase from a slug or spec

Invocation

Context Preview

Supporting Files

SKILL.md

pipeline-scaffold — emit an experiment codebase from a slug or spec

Purpose

When to invoke

Workflow

1. Pick the right --task

2. Look for an existing experiment-design spec

3. Confirm the slug + output dir

4. Call the script

5. Walk the user through the generated README

6. Don't auto-run training

7. Pre-run sanity checks to mention

8. Where it lives

Failure modes

Setup notes

Defaults summary

Similar Skills

pipeline-scaffold — emit an experiment codebase from a slug or spec

Purpose

When to invoke

Workflow

1. Pick the right --task

2. Look for an existing experiment-design spec

3. Confirm the slug + output dir

4. Call the script

5. Walk the user through the generated README

6. Don't auto-run training

7. Pre-run sanity checks to mention

8. Where it lives

Failure modes

Setup notes

Defaults summary

Similar Skills

1. Pick the right `--task`

1. Pick the right `--task`