From stat-analysis
User has a merged or analysis-ready tabular file and wants to generate a BIDS-style JSON data dictionary that annotates every variable. Use this skill whenever the user asks to document, annotate, or describe the variables in a dataframe or TSV/CSV file, wants to create a data dictionary or codebook, or needs BIDS-compliant variable metadata for a merged dataset.
How this skill is triggered — by the user, by Claude, or both
Slash command
/stat-analysis:gen-data-dictThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Inspect a tabular file (typically the output of `/merge-data`), annotate each
Inspect a tabular file (typically the output of /merge-data), annotate each
variable with a BIDS-style description, and write a JSON data dictionary. The
input file is read-only. Run this skill after you are satisfied with the merged
dataframe — not before, since column names and types may still change.
If $ARGUMENTS contains a path, use it.
Otherwise, look for a merged.tsv in the current directory or ask:
"What file should I build the dictionary for? (Usually merged.tsv from /merge-data)"
Confirm the file exists and is a supported tabular format (.tsv, .csv,
.parquet, .feather, .xlsx). If not, report the issue clearly.
Read the file and collect the following for each column:
For text formats, detect delimiter and encoding before reading. For binary formats, read the schema first, then a sample.
For each column, draft a proposed annotation using these rules:
Description:
fa → fractional anisotropy, md → mean diffusivity,
ad → axial diffusivity, rd → radial diffusivity, gm → grey matter,
wm → white matter, csf → cerebrospinal fluid, roi → region of interestbprs → Brief Psychiatric Rating Scale,
panss → Positive and Negative Syndrome Scale, iq → intelligence quotient,
rt → reaction time, hit → hit rate, fa (in cognition context) → false alarmdob → date of birth, ses → socioeconomic status (verify context)x7b_v2), leave Description as "" and flag
it to ask the user.Units:
"years", "mm³", "ms", "z-score",
"unitless (0–1)", "mm", "Hz"."n/a" for identifiers, categoricals, booleans, and free-text columns.Levels (for categorical / boolean columns only):
M / F → "Male" / "Female"HC / SZ / BD → "Healthy control" / "Schizophrenia spectrum" / "Bipolar disorder"0 / 1 for a passed_qc column → "Failed QC" / "Passed QC""" and flag for the user.LongName (optional):
bprs_total → "Brief Psychiatric Rating Scale, total score").Show the user the proposed dictionary as a readable summary — not raw JSON. Use a compact table format:
| Column | Dtype | Proposed Description | Units | Levels / Notes |
|---|---|---|---|---|
| subject_id | object | Participant identifier | n/a | — |
| age | Int64 | Age at time of assessment | years | — |
| diagnosis | category | Clinical diagnosis group | n/a | HC, SZ, BD |
| passed_qc | bool | Whether scan passed QC | n/a | true/false |
| fa_l_af | float64 | (flagged — see below) | — | — |
List flagged columns separately:
"" (opaque names you couldn't decode)"" (coded values whose meaning is unclear)For each flagged column, ask a targeted question. Examples:
fa_l_af measure? Is it fractional anisotropy of the left arcuate
fasciculus, or something else?"site column, what do the codes A, B, C stand for?"age measured in years at scan, years at consent, or something else?"Incorporate the user's answers before writing the file.
Write a JSON file with one top-level key per column. Use this structure:
{
"subject_id": {
"Description": "Participant identifier",
"Units": "n/a"
},
"age": {
"Description": "Age at time of assessment",
"Units": "years"
},
"diagnosis": {
"LongName": "Clinical diagnosis group",
"Description": "Diagnostic category assigned at study intake",
"Levels": {
"HC": "Healthy control",
"SZ": "Schizophrenia spectrum disorder",
"BD": "Bipolar disorder"
}
},
"passed_qc": {
"Description": "Whether the scan passed quality control review",
"Units": "n/a",
"Levels": {
"true": "Passed QC",
"false": "Failed QC or manually excluded"
}
},
"fa_l_af": {
"LongName": "Fractional anisotropy, left arcuate fasciculus",
"Description": "Mean FA along the left arcuate fasciculus tract, derived from probabilistic tractography",
"Units": "unitless (0–1)"
}
}
Name the file <input_stem>_data_dictionary.json and place it in the same
directory as the input file. For example, merged.tsv → merged_data_dictionary.json.
After writing the file, tell the user:
"" that the user should
fill in manually before submitting or sharing the dataset/gen-data-dict if you modify the merged file and column
names or types change.""".npx claudepluginhub bcmcpher/my-skills --plugin stat-analysisGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.