From medsci-project
Maps research variables to literature-backed operational definitions using data dictionaries, preventing ad-hoc phenotype definitions that invite reviewer rejection.
How this skill is triggered — by the user, by Claude, or both
Slash command
/medsci-project:define-variablesinheritThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Every observational study operationalizes abstract constructs (MASLD, CKD, emphysema, obesity, incidentaloma) into concrete rules against the available data dictionary. When that operationalization is invented ad-hoc from the dictionary alone, reviewers reject on construct validity regardless of downstream statistics.
Every observational study operationalizes abstract constructs (MASLD, CKD, emphysema, obesity, incidentaloma) into concrete rules against the available data dictionary. When that operationalization is invented ad-hoc from the dictionary alone, reviewers reject on construct validity regardless of downstream statistics.
This skill forces a literature-first pass: each variable is mapped to a canonical guideline/consensus definition, cross-checked against prior operationalizations in comparable cohorts, then mapped to available DB variables. Ad-hoc deviations are flagged explicitly and justified, not hidden.
Use it when:
Call after /design-study, before /write-protocol.
variable_operationalization.md in the project root (or path the user specifies).Missing inputs → ask once, then proceed.
Trigger: project has a project.yaml::db.dictionary_path field pointing to a machine-readable codebook (xlsx/csv/markdown), OR user supplied a dictionary path in inputs. If neither, skip to Tier 1.
For every candidate DB variable — before touching literature — open the dictionary and record, verbatim, the sheet name, row number, and code→meaning mapping. This prevents the single most common observational-study error: assuming a column code (status == 0, grade == 4) means what it intuitively reads like, when the codebook says otherwise.
Concrete procedure per variable:
Dict. sheet & row + Dict. verbatim columns of the operationalization table.Empirical checks (value distributions, cross-tabs with related columns) are useful for sanity testing after the verbatim codebook meaning is recorded — never as a substitute for it.
Project-level binding (recommended): commit a DICTIONARY_FIRST_POLICY.md at the project root (or shared-config path) capturing the canonical dictionary path + escalation contact. Cross-project rule template: ~/.claude/rules/dictionary-first.md.
Exit gate: check_dictionary_citations.py (or equivalent) PASS on the operationalization table before running Tier 1.
Check references/common_definitions.md (shipped with skill) for the variable. Covers high-frequency constructs:
If the variable hits Tier 1, record: guideline, year, canonical cutoff, BibTeX key. Done — no /search-lit call.
/search-lit (focused queries only)For variables NOT in Tier 1, OR when subgroup justification is needed (Asian-specific cutoff, pediatric, young-adult, pregnancy, etc.), call /search-lit with one query per variable — not a general sweep. Query pattern:
"{construct} definition {cohort type} {subgroup qualifier}"
e.g., "obstructive sleep apnea prevalence Korean health screening cohort"
Cap: 5 queries per session. Stop early if first 1-2 papers converge on the same definition.
Before finalizing, run /verify-refs on the accumulated BibTeX to confirm every citation exists in PubMed/CrossRef. Ad-hoc choices (no canonical source found) must be flagged Ad-hoc: yes and justified with 1-2 sentences — never hidden.
Write to {project_root}/variable_operationalization.md using templates/variable_operationalization.md. Required structure:
Header: research question, cohort type, date, author
Operationalization table — one row per variable:
| Variable | Role | Dict. sheet & row | Dict. verbatim | Canonical source | Definition | Cutoff | DB vars | Implementation | Ad-hoc? |
Role: exposure / outcome / covariate / eligibilityDict. sheet & row: e.g. 5-1.복부초음파 r12 — mandatory if a DB dictionary existsDict. verbatim: full code→meaning string copied from the dictionary — mandatory same conditionCanonical source: BibTeX key (e.g., @rinella2023_aasld_masld)Definition: one line, verbatim from guideline where possibleCutoff: numeric + unitsDB vars: exact dictionary column names usedImplementation: SQL/pandas-style pseudocode (e.g., bmi>=25 & (b_tg>=150 | b_hdl<40))Ad-hoc?: yes/no. If yes, justification below tableAd-hoc justifications — for each yes row
Mapping gaps — variables in the protocol with no DB equivalent; list proxy / omit / request decisions
References — BibTeX block
/analyze-stats/write-paper/clean-data/calc-sample-sizeintake-project → design-study → search-lit → define-variables → write-protocol → analyze-stats → write-paper
^^^^^^^^^^^^^^^
/orchestrate should insert this skill between /search-lit and /write-protocol for any observational cohort or registry study.
Every variable definition, cutoff, and era anchor must be grounded in a verified source — a clinical guideline, a peer-reviewed paper with DOI, or an established registry data dictionary. Never invent a phenotype threshold from the model's prior; if the source is unknown, mark the row Ad-hoc: yes and require user confirmation before it propagates into /write-protocol or /analyze-stats. When citing papers to justify a cutoff, verify the citation via /search-lit or /verify-refs — do not carry references from memory alone. The output table must carry explicit source, year, and guideline_version columns so downstream skills can re-verify.
status == 0, grade == 4) by its surface reading without consulting the codebook. Tier 0 exists specifically to prevent this. Distinguish from Failure #1: Tier 0 says "once you've picked the DB column, quote the codebook verbatim before using its values." Failure #1 says "don't pick DB columns before picking definitions from literature." Both rules co-exist.Ad-hoc: yes flag.Role = covariate and Implementation = "IF status == 'never' THEN dose = 0 ELSE measured_value" — and adjust on the categorical status variable, reserving the continuous dose for an exposed-only secondary analysis. /clean-data (categorical-implied-zero flag) and /analyze-stats ("Covariate Pitfalls") enforce this downstream.npx claudepluginhub aperivue/medsci-skills --plugin medsci-projectStatistical analysis for medical research papers. Generates reproducible Python/R code with publication-ready tables and figures for diagnostic accuracy, survival analysis, regression, propensity score, and repeated measures.
Guides epidemiological study analysis from PECO question design through statistical modeling and publication-ready reporting. Runs Python code for NHANES/UK-Biobank-style cohort, case-control, and cross-sectional analyses.
Guides clinical and health science research through PICOT question formulation, evidence hierarchy assessment, bias evaluation (Cochrane RoB 2, ROBINS-I), outcome prioritization, and GRADE certainty rating.