From clawbio
Runs ancestry decomposition PCA by merging a study cohort VCF with the Simons Genome Diversity Project reference panel, producing multi-panel figures and a markdown report with population assignments.
How this skill is triggered — by the user, by Claude, or both
Slash command
/clawbio:claw-ancestry-pcaThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Place your study cohort in global genetic context by computing a joint PCA against the Simons Genome Diversity Project (SGDP) — 345 samples from 164 populations spanning every inhabited continent.
Place your study cohort in global genetic context by computing a joint PCA against the Simons Genome Diversity Project (SGDP) — 345 samples from 164 populations spanning every inhabited continent.
If you ask ChatGPT to "run a PCA against a global reference panel," it will:
This skill encodes the correct methodological decisions:
The skill bundles the SGDP v4 dataset (Mallick et al., 2016, Nature):
python ancestry_pca.py \
--vcf your_cohort.vcf.gz \
--pop-map your_populations.tsv \
--output ancestry_report
python ancestry_pca.py --demo --output demo_report
The demo uses pre-computed PCA results from the Peruvian Genome Project (736 samples, 28 populations) and generates the full 4-panel figure instantly.
Ancestry Decomposition PCA
==========================
Cohort: 736 samples, 28 populations
Reference: SGDP (345 samples, 164 populations)
Common variants: 42,831 biallelic SNPs
Variance explained:
PC1: 51.44% PC2: 21.70% PC3: 6.70%
Panel D — Global Context:
Cohort samples cluster between European and East Asian
reference populations, with Amazonian groups showing
distinct positioning from Highland and Coastal groups.
Figures saved to: ancestry_report/
Figure3_PCA_composite.png (300 dpi)
Figure3_PCA_composite.pdf (vector)
Reproducibility:
commands.sh | environment.yml | checksums.sha256
If you use this skill in a publication, please cite:
npx claudepluginhub clawbio/clawbio --plugin clawbioCompute HEIM diversity and equity metrics from VCF or ancestry data. Generates heterozygosity, FST, PCA plots, and a composite HEIM Equity Score with markdown reports.
Processes PLINK, VCF, BGEN genotype files for GWAS and population genetics: QC (MAF, HWE, missingness), IBD, PCA, linear/logistic regression. Outputs Manhattan-ready summary stats.
Searches 1000 Genomes Project (IGSR) populations and samples by superpopulation or free-text query. Use for ancestry-specific allele frequency lookups, population stratification, and cohort-aware variant analysis.