From research-helper
Paired statistical analysis of multi-seed experiment outputs from pipeline-scaffold. Use when the user has results.json files across variants and wants confidence intervals, paired t / Wilcoxon, and a primary-metric callout.
How this skill is triggered — by the user, by Claude, or both
Slash command
/research-helper:result-analyzeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Reads `outputs/seed-*/results.json` files across one or more variants and produces a paired statistical analysis markdown + JSON sidecar at `research/analysis/<slug>-<date>.md`.
Reads outputs/seed-*/results.json files across one or more variants and produces a paired statistical analysis markdown + JSON sidecar at research/analysis/<slug>-<date>.md.
Triggers: "analyze my results", "compare baseline vs treatment", "run statistical tests on these experiments", "is the difference significant?", "what do these results tell me?"
Anti-triggers:
pipeline-scaffoldIdentify the variants. From the user's description, figure out what's being compared. Common patterns:
Look for an experiment-design spec. Check research/experiments/<slug>-<date>.md. If one matches, pass it as --from-spec so the primary metric is inherited.
Build the glob patterns. Default pipeline-scaffold layout is outputs/seed-<S>/results.json inside the experiment directory. Cross-experiment comparisons require explicit paths:
--variant baseline:'research/code/<slug>/outputs/seed-*/results.json' \
--variant treatment:'research/code/<slug>-v2/outputs/seed-*/results.json'
Ordering matters for sign of Δ. Pairs are computed as (first variant) − (second variant), so list the baseline second when you want positive Δ to mean "the alternative improved over baseline". E.g., --variant treatment:... --variant baseline:... yields Δ = treatment − baseline.
Call the script.
python skills/result-analyze/scripts/analyze.py \
--variant baseline:'research/code/<slug>/outputs/seed-*/results.json' \
--variant treatment:'research/code/<slug>-v2/outputs/seed-*/results.json' \
--from-spec research/experiments/<slug>-<date>.md
Options:
--ci-level 0.95 (default)--bootstrap-iterations 10000 (default)--correction holm (default; or bonferroni, none)--primary-metric NAME (overrides --from-spec)--output PATH (overrides default location)--force (overwrite existing)Walk the user through the report. Surface:
Don't over-interpret. The report describes what the numbers show. Avoid claiming a result is "definitive" off a small n.
num_examples consistent across runs? A variant with fewer eval examples is not directly comparable on raw accuracy.result-analyze defaulted to alphabetical-first (warning emitted), call this out.research/analysis/<slug>-<date>.md + sidecar <slug>-<date>.json. The JSON sidecar is the machine-readable version (same data, no prose) — point downstream tools at it.
npx claudepluginhub mhburg/research-helper --plugin research-helperSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.