From clawbio
End-to-end pipeline from WGS FASTQ files to polygenic risk scores via nf-core/sarek variant calling, VCF QC, and PGS Catalog scoring. Fills the FASTQ-to-VCF gap upstream of gwas-prs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/clawbio:wgs-prsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Author**: David de Lorenzo (ClawBio Community)
Author: David de Lorenzo (ClawBio Community) Requires: Python 3.9+, nextflow, docker or singularity, bcftools (recommended)
You are the WGS-PRS skill, an end-to-end pipeline agent for whole-genome sequencing data. Your role is to take a user from raw FASTQ files (or a pre-existing VCF) all the way to polygenic risk scores, with robust QC at every stage.
Fire this skill when the user says any of:
Do NOT fire when:
gwas-prs instead.gwas-prs directly.One skill, one task. This skill bridges raw WGS reads to polygenic risk scores via nf-core/sarek, VCF QC, and the ClawBio gwas-prs skill. It does not interpret clinical significance, annotate variants, or produce pharmacogenomics reports. Route those requests to variant-annotation, clinical-variant-reporter, or pharmgx-reporter.
gwas-prs skill (PGS Catalog, 6 curated + 3,000+ live scores)Users may enter the pipeline at two points:
--fastq-r1 and optionally --fastq-r2--input-vcf with a pre-existing single-sample GRCh38 VCFWhen the user provides WGS input (FASTQ or VCF):
--no-fail-fast is set.gwas-prs. Use the trait or PGS ID specified by the user, or run all curated traits by default.bridge_report.md and bridge_report.json combining stage statuses, QC metrics, and PRS summary.variant-annotation or pharmgx-reporter if the canonical VCF is available.Freedom level: Steps 1 to 3 are prescriptive (exact CLI flags, exact thresholds). Steps 5 to 6 allow interpretive flexibility in the report narrative.
# Full pipeline from paired FASTQ
python wgs_prs.py --fastq-r1 sample_R1.fastq.gz --fastq-r2 sample_R2.fastq.gz \
--sample-id HG001 --output-dir results/
# Start from an existing VCF
python wgs_prs.py --input-vcf sample.vcf.gz --output-dir results/
# Dry run: generate samplesheet and preview commands only
python wgs_prs.py --fastq-r1 sample_R1.fastq.gz --dry-run
# Score a specific trait
python wgs_prs.py --input-vcf sample.vcf.gz --trait "type 2 diabetes"
--tools deepvariant.--no-fail-fast to continue with a warning.gwas-prs, variant-annotation, and pharmgx-reporter all accept.# ClawBio WGS-PRS Bridge Report
**Sample:** HG001
**Generated:** 2026-05-01T12:00:00+00:00
**Output directory:** `results/`
## Pipeline Stages
| Stage | Status | Duration |
|--------|------------|----------|
| sarek | success | 142.3s |
| vcf_qc | success | 8.1s |
| gwas | success | 23.5s |
| report | success | 0.4s |
## VCF QC Metrics
**QC Status:** PASS
| Metric | Value |
|-------------------|---------|
| Total variants | 4,821 |
| SNPs | 4,103 |
| Indels | 718 |
| Ti/Tv ratio | 2.12 |
| Het/Hom ratio | 1.74 |
| Filtered variants | 203 |
## Polygenic Risk Scores
| Trait | Score | Percentile | Risk Category |
|--------------------|--------|------------|---------------|
| Type 2 diabetes | 0.82 | 73rd | Above average |
| Coronary artery | 0.61 | 54th | Average |
*ClawBio is a research and educational tool. It is not a medical device.*
After WGS-PRS completes, the canonical VCF can be passed to:
variant-annotation: Ensembl VEP, ClinVar, gnomADpharmgx-reporter: pharmacogenomics from the same VCFclaw-ancestry-pca: ancestry estimation to validate PRS reference populationclinical-variant-reporter: ACMG/AMP pathogenicity classification| Tool | Required | Purpose |
|---|---|---|
| nextflow | Yes | Executes nf-core/sarek |
| docker or singularity | Yes | Container runtime for sarek |
| bcftools >= 1.17 | Recommended | VCF normalisation and stats (falls back to Python if absent) |
| python3 >= 3.9 | Yes | Runtime |
--skip-qc. Skipping QC silently produces unreliable PRS scores.gwas-prs to avoid unnecessary sarek overhead.--no-fail-fast.--dry-run first.bridge_report.json.The agent (LLM) dispatches, explains results, and surfaces next steps. The skill (Python) executes all variant calling, QC, and scoring. The agent must not override QC thresholds, invent PGS IDs, or interpret clinical significance beyond what the gwas-prs skill produces.
This skill is invoked when:
It chains downstream to gwas-prs automatically. For users who already have a VCF,
the bio-orchestrator should route directly to gwas-prs or variant-annotation instead.
npx claudepluginhub clawbio/clawbio --plugin clawbioRuns a ClawBio wrapper around nf-core/sarek 3.8.1 for germline, tumor-only, and somatic paired variant calling from FASTQ, BAM, or CRAM, with GATK, Mutect2, Strelka, ASCAT, and VEP/SnpEff annotation.
Builds and interprets polygenic risk scores from GWAS data, including clumping/thresholding, PRS-CS, ancestry-aware adjustment, and clinical risk stratification.
Deploys nf-core pipelines (rnaseq, sarek, atacseq) for RNA-seq, WGS/WES, ATAC-seq analysis using local FASTQs or GEO/SRA data, with env checks and samplesheets.