From sdrf-skills
Plans SDRF metadata annotation for proteomics experiments by guiding users through experimental design, template selection, and reference dataset search.
How this skill is triggered — by the user, by Claude, or both
Slash command
/sdrf-skills:sdrf-brainstorm [experiment description][experiment description]The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are helping the user plan their SDRF annotation BEFORE creating the file.
You are helping the user plan their SDRF annotation BEFORE creating the file. This is a collaborative thinking session, not a file generation task.
Ask the user about (if not already provided):
Use the sdrf:templates decision tree to select the right combination. Reference the 5 template layers:
ms-proteomics or affinity-proteomicshuman, vertebrates, invertebrates, or plantsdia-acquisition, cell-lines, single-cell, immunopeptidomics, crosslinkingclinical-metadata, oncology-metadatametaproteomics + child (human-gut, soil, water)Read spec/sdrf-proteomics/sdrf-templates/templates.yaml to confirm template names and current versions.
Present the recommendation:
Your experiment: [description]
Recommended templates:
1. ms-proteomics (required — mass spectrometry experiment)
2. human (organism is Homo sapiens)
3. clinical-metadata (patient samples with treatment data)
4. oncology-metadata (cancer study — adds tumor staging)
This combination requires these columns: [read from TERMS.tsv, filter by template names in usage]
And recommends these additional columns: [read from template YAMLs for recommended columns]
Read spec/sdrf-proteomics/TERMS.tsv and filter by the selected template names to list the columns.
Find reference datasets to learn from:
Search PRIDE for similar experiments:
mcp PRIDE → search_extensive(query="<keywords>")
Search publications for standard experimental designs:
mcp PubMed → search_articles(query="<keywords> AND proteomics")
Search bioRxiv for recent preprints:
mcp bioRxiv → search_preprints(category="biochemistry" or "cell biology", recent_days=180)
Present findings:
Similar datasets found:
- PXD012345: TMT phosphoproteomics of breast cancer (24 samples, 12 fractions)
- PXD023456: Label-free DIA of liver cancer tissue (30 patients)
Common design patterns in this field:
- Typical sample size: 10-30 per group
- Common labels: TMT, label-free DIA, SILAC
- Standard fractionation: 12-24 high-pH RP fractions
- Most include: age, sex, disease staging
Present a complete column plan organized by importance:
Columns required by the selected template combination. For each: explain what it is, what ontology to use, and give examples.
Columns that 70%+ of similar experiments include. For each: explain why it adds value.
Columns that would increase reusability and findability. For each: explain the benefit.
Discuss what the experimental comparison is:
Raise potential issues proactively:
Help the user understand the scale:
Your SDRF will have:
Rows: [samples] × [fractions] × [label channels] × [technical replicates]
Example: 20 patients × 12 fractions × 1 (label-free) × 1 replicate = 240 rows
Example: 20 patients × 12 fractions × 10 (TMT10plex) × 1 replicate = 2,400 rows
Columns: ~15 required + ~8 recommended + factor values = ~25 columns
Create a clear annotation plan the user can follow:
## SDRF Annotation Plan for [experiment]
Templates: ms-proteomics + human + oncology-metadata
Rows: ~240 (20 patients × 12 fractions)
Columns: 26
Required metadata to collect:
- Patient demographics: age, sex (from clinical records)
- Diagnosis: specific cancer subtype (from pathology)
- Tumor staging: TNM stage, grade (from clinical records)
- Tissue type: primary tumor vs adjacent normal
Technical metadata (from instrument):
- Instrument model, fragmentation method
- Mass tolerances, collision energy
- Label type and channel assignments
Factor values: disease (tumor vs normal)
Next step: Run /sdrf:annotate to create the file
npx claudepluginhub bigbio/sdrf-skillsStructures biological experiments with controls, randomization, blinding, and power analysis to produce valid reproducible results. Uses GLP and Fisher principles.
Guides omics data analysis (transcriptomics, proteomics) using three-tiered approach: validated pipelines, standard workflows, custom methods. For bulk RNA-seq counts and pre-quantified proteins.
Plans experiment protocols, result tables, mock data, evaluation gates, method traceability, and table schemas for research papers before real results exist.