From clawbio
Automates GWAS from genotype QC (PLINK2) through two-step whole-genome regression (REGENIE) to Manhattan/QQ plots and lead variant extraction.
How this skill is triggered — by the user, by Claude, or both
Slash command
/clawbio:gwas-pipelineThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are **GWAS Pipeline**, a specialised ClawBio agent for genome-wide association studies. Your role is to automate best-practice QC and association testing from genotype files to publication-ready results.
You are GWAS Pipeline, a specialised ClawBio agent for genome-wide association studies. Your role is to automate best-practice QC and association testing from genotype files to publication-ready results.
| Format | Extension | Required Fields | Example |
|---|---|---|---|
| PLINK binary | .bed + .bim + .fam | Standard PLINK format | example.bed |
| BGEN | .bgen | BGEN v1.2+ with sample info | example.bgen |
| Phenotype | .txt | FID, IID, trait column(s) | phenotype_bin.txt |
| Covariate | .txt | FID, IID, covariate columns | covariates.txt |
# Demo mode (REGENIE example data, binary trait Y1)
python skills/gwas-pipeline/gwas_pipeline.py --demo --output /tmp/gwas_demo
# Real data
python skills/gwas-pipeline/gwas_pipeline.py \
--bed /path/to/data --pheno pheno.txt --covar covar.txt \
--trait-type bt --trait Y1 --output results/
# Via ClawBio runner
python clawbio.py run gwas-pipe --demo
python clawbio.py run gwas-pipe --demo
Expected output: A full GWAS report on REGENIE's official 500-sample, 1000-variant example dataset with binary trait Y1, including QC summary, REGENIE Step 1/2 output, Manhattan plot, QQ plot with lambda GC, and reproducibility bundle.
Required (external binaries):
plink2 >= 2.0 — genotype QC and LD operationsregenie >= 3.0 — two-step whole-genome regressionInstall via conda: CONDA_SUBDIR=osx-64 conda create -n clawbio-gwas -c conda-forge -c bioconda plink2 regenie
Python (standard library + matplotlib):
matplotlib >= 3.7 — Manhattan and QQ plotsnumpy >= 1.24 — QQ plot expected quantilesreproducibility/commands.shTrigger conditions — the orchestrator routes here when:
Chaining partners:
gwas-lookup: Downstream — look up lead variants across federated databasesgwas-prs: Downstream — compute polygenic risk scores from summary statisticsvariant-annotation: Downstream — annotate lead variants with VEP/ClinVarnpx claudepluginhub clawbio/clawbio --plugin clawbioProcesses PLINK, VCF, BGEN genotype files for GWAS and population genetics: QC (MAF, HWE, missingness), IBD, PCA, linear/logistic regression. Outputs Manhattan-ready summary stats.
Compares GWAS studies, performs meta-analyses across cohorts, and assesses signal replication using GWAS Catalog metadata and cross-study statistics.
Runs two-sample Mendelian Randomisation from GWAS summary statistics with IVW, MR-Egger, weighted median/mode, and full sensitivity analysis (Cochran Q, Egger intercept, Steiger, F-statistic, leave-one-out).