From claudecode-research-harness-workflow
Read-only audit of raw data files: variable inventory, missingness, IDs, units, merge keys, feasibility. Produces a structured data_audit_report.md.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claudecode-research-harness-workflow:research-harness-audit [--file PATH] [--all][--file PATH] [--all]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Perform a read-only audit of raw data files. No data file is modified. The only output is a structured audit report and a log.
Perform a read-only audit of raw data files. No data file is modified. The only output is a structured audit report and a log.
This skill runs after /research-harness-setup and before /research-harness-clean.
| Input | Action |
|---|---|
/research-harness-audit | Audit all files listed in study_spec.md §3 |
/research-harness-audit --file data/raw/X.csv | Audit a specific file |
/research-harness-audit --all | Audit everything found under data/raw/ |
Before starting:
study_spec.md. If it does not exist, stop and tell the user to run /research-harness-setup first.study_spec.md §3 exists. If it does not, report the missing path and stop.logs/audit_YYYYMMDD.log.For each file under data/raw/ (or the file specified by --file):
wc -l, head, column-sniffing) where possibleLog each file to logs/audit_YYYYMMDD.log.
For each file, record:
| Variable | Inferred type | Non-missing count | Missing count | Missing % | Min | Max | Sample values |
|---|
Use the actual variable names from the file headers. Do not rename or interpret variable names — record them as-is. If a variable name is ambiguous, note it in the audit report under §6 Open Issues; do not infer its meaning from the name alone.
For each file:
id, hhid, person_id, pid, any variable ending in _id or _code)Do not assume that two variables with similar names are the same ID. Report the candidate match and leave it as unknown if not confirmed by the data dictionary.
Identify candidate time variables (e.g., year, wave, date, month). For each:
high missingnessFor each file:
List pairs of files that appear to share a common ID variable. For each pair:
likely merge key, possible merge key, or unclearDo not perform any merge in this step.
Compare the raw data to study_spec.md:
Record each check as feasible, partially feasible, or infeasible with a one-line reason.
Infeasibility gate: If any required element is infeasible, the overall assessment is infeasible. Tell the user what is missing and that study_spec.md must be revised before cleaning can begin. Do not proceed to cleaning with an infeasible design.
Copy templates/data_audit_report.md to reports/data_audit_report.md and fill in all sections from Steps 1–9.
Save logs/audit_YYYYMMDD.log.
data/raw/reports/data_audit_report.md exists and all sections are populatedlogs/audit_YYYYMMDD.log existsdata/raw/ were modified (verify with git status data/raw/ or file-size check)feasible / infeasible verdictreports/data_audit_report.md existslogs/audit_YYYYMMDD.log existsTell the user:
Audit complete. Review
reports/data_audit_report.md.If the feasibility verdict is
infeasible: revisestudy_spec.mdbefore continuing.If
feasibleorpartially feasible: fill intemplates/data_cleaning_plan.md, save it asreports/data_cleaning_plan.md, review it, then run/research-harness-clean.
npx claudepluginhub maxwell2732/claudecode-research-harness-workflow --plugin claudecode-research-harness-workflowAudits Stata datasets for structure, missingness, labeling, suspicious values, duplicate identifiers, and documentation readiness. Useful for data QA, codebook reviews, sanity checks, and pre-analysis cleanup.
Generates an executable empirical analysis plan from study_spec.md, audit report, and cleaned data structure. Outputs analysis_plan.md for human approval before analysis execution.
Validates CSV/TSV/Excel files and data analyses for quality, completeness, uniqueness, accuracy, consistency, outliers, and bias using qsv stats and frequency tools.