From research-helper
Scaffold a self-contained experiment codebase (config, data loader, model, training/inference, eval, multi-seed runner, README) for a CS/ML/NLP study. Three task types: classification (HF Trainer + accuracy), generation (Seq2SeqTrainer + BLEU/ROUGE), prompt-eval (HF causal LM + exact-match, no training). Output: research/code/<slug>/. Use after experiment-design has produced a spec; pass it via --from-spec. For literature use lit-scan; for paper digests use lit-digest; for analyzing results use the future result-analyze.
How this skill is triggered — by the user, by Claude, or both
Slash command
/research-helper:pipeline-scaffoldThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Given a task type and either a slug or an experiment-design spec, drop a
scripts/requirements.txtscripts/scaffold.pyscripts/test_scaffold.pytemplates/classification/README.md.tmpltemplates/classification/config.yaml.tmpltemplates/classification/data.pytemplates/classification/eval.pytemplates/classification/main.pytemplates/classification/model.pytemplates/classification/requirements.txttemplates/classification/run_all_seeds.shtemplates/classification/train.pytemplates/generation/README.md.tmpltemplates/generation/config.yaml.tmpltemplates/generation/data.pytemplates/generation/eval.pytemplates/generation/main.pytemplates/generation/model.pytemplates/generation/requirements.txttemplates/generation/run_all_seeds.shGiven a task type and either a slug or an experiment-design spec, drop a
runnable starter codebase into research/code/<slug>/. The user installs
deps, edits the config (data path, model name), and runs.
Triggers:
/research-helper:pipeline-scaffold <task>Do NOT invoke for:
--task| If the user says | Use --task |
|---|---|
| "fine-tune BERT", "train a classifier", "GLUE / SST-2" | classification |
| "summarization", "translation", "T5 / BART fine-tune", "causal LM training" | generation |
| "prompt comparison", "in-context learning", "few-shot", "RAG eval", "no training" | prompt-eval |
| Unclear | Ask the user |
Check research/experiments/. If a matching spec exists, pass it via --from-spec — this:
**N seeds:** and pre-fills config.yaml**Primary** (...): and pre-fills the primary metric fieldIf multiple specs match, list them and ask the user which.
Default output is research/code/<slug>/. The user can override via --output. Confirm before running so we don't surprise them.
With a spec:
python skills/pipeline-scaffold/scripts/scaffold.py \
--task <task> \
--from-spec research/experiments/<slug>-<date>.md
Without:
python skills/pipeline-scaffold/scripts/scaffold.py --task <task> --slug <slug>
If research/code/<slug>/ exists, the script refuses to overwrite. Pass --force to wipe-and-replace (confirm with the user first).
Open research/code/<slug>/README.md. Surface in this order:
python -m venv venv, activate, pip install -r requirements.txt. For prompt-eval with a gated model: HF_TOKEN env var.config.yaml has placeholders that MUST be set:
model_name (default is a small reasonable choice)dataset (default is a small reasonable choice)text_column / label_column must match the datasetinput_column / target_columnprompt_template_id and possibly extra templates in prompt.pypython main.py --seed 42 --config config.yamlbash run_all_seeds.shDon't dump the file tree into chat — point at the README.
The user runs the code. Surface the commands; stop there. Only run if the user explicitly says "yes, run it now."
torch.cuda.is_available())huggingface-cli login or HF_TOKEN--from-spec was used: spot-check that the seeds in config.yaml match what the spec said. If parse_seeds warned to stderr, the seeds may have defaulted.research/code/<slug>/. Tell the user the path; don't paste the file contents unless asked.
Output dir exists — script refuses; pass --force (confirm with user). If iterating, prefer a new --slug over force-overwriting.
Spec not found — exit 1 with "spec not found: <path>". Ask the user to verify the path.
Spec filename doesn't match <slug>-YYYY-MM-DD.md — parsing still works (seeds, primary metric); only the slug can't be derived from the filename. Pass --slug explicitly.
Spec parse warnings — script prints pipeline-scaffold: <warning> to stderr but doesn't fail. Note in handoff to the user that defaults were used for whatever didn't parse.
--task invalid — argparse rejects.
Neither --slug nor --from-spec — argparse error with helpful message.
Stdlib only for the scaffold itself. The EMITTED code has per-task requirements.txt files the user installs after scaffolding.
| Knob | Default | When to change |
|---|---|---|
--task | (required) | — |
--output | research/code/<slug>/ | Override if the user asks |
--from-spec | (none) | Use when a matching spec exists in research/experiments/ |
--force | off | Turn on only when intentionally overwriting |
npx claudepluginhub mhburg/research-helper --plugin research-helperSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.