casetrack

Lifecycle data management for computational biology pipelines on HPC.

Answers two questions about a multi-patient, multi-specimen, multi-assay cohort: "is this analysis complete?" and "is this sample usable?"

New to casetrack? Start with the 15-minute Quickstart — clone → init → register a 3-patient demo cohort → record an analysis → query for pending work. No prior knowledge assumed.

Storage layers, one CLI:

v0.10 (current, alpha): register-cohort — load patients + specimens
- assays from one schema-native wide sample sheet in a single transaction (proposal 0012). Builds on the additive sibling-table layers added since v0.6: cohort-level artifacts (joint VCFs / PoNs / matrices, proposal 0009), versioned reference artifacts with downstream staleness (proposal 0010), and artifact-to-artifact lineage with transitive derived_stale (proposal 0011). The three-level core is untouched by all of these.
v0.6: identity layer on top of v0.4. Every project gets a project_id slug at init, persisted in TOML + project_meta SQLite table + ~/.casetrack/registry.json, so commands can address a project by name (casetrack --project hgsoc-2026 query "...") instead of a fragile path. Hierarchy IDs (patient_id, specimen_id, assay_id) are now validated against an ASCII regex at insert time — typos in samplesheets fail loudly at register, not silently downstream. Per-level escape hatches via [levels.<level>] id_pattern for legacy LIMS IDs. See proposal 0005.
v0.4: QC / censoring / consent subsystem. Every read path (status, rerun, export, query, dashboard) filters out QC-failed and consent-revoked entities by default. SLURM summary TSVs auto-flag via qc_pass / qc_fail_reason / qc_warn columns. Paired-design readiness via casetrack cohort --pair-by.
v0.3 (project mode): SQLite-backed project directory with normalized patient → specimen → assay tables, enforced foreign keys, typed columns, and DuckDB-powered SQL queries. Survives DB corruption — everything is regenerable from casetrack.toml + provenance.jsonl.
v0.2 (flat mode — deprecated): one TSV manifest per project, one row per sample. Still works, loud deprecation warning, removed in v1.0.

Upgrade paths: v0.2 → v0.3 via casetrack migrate (guide). v0.3 → v0.4 via casetrack migrate-qc (guide). v0.4 → v0.6 is automatic for new projects (init writes project_id); legacy projects continue to work without one until v0.6 final ships casetrack migrate-project-id.

cohort_v3/
├── casetrack.toml       # declared schema — git-tracked source of truth
├── casetrack.db         # SQLite, WAL + busy_timeout=30000 + FK enforcement
├── provenance.jsonl     # append-only audit log (git-trackable)
├── .gitignore           # excludes casetrack.db, casetrack.db-wal/-shm, exports/
└── sandbox/             # preserved source TSVs (migration artifact)

How people actually use this

casetrack is a CLI that wraps a SQLite DB. It's installed once (globally or per-env) and used against many projects. Three layers — keep them separate:

Layer	Where it lives	How many	What it is
1. `casetrack` package	Wherever pip put it	One per env	The CLI itself — install once with `pip install casetrack`
2. Casetrack projects	Your data filesystem (`/data1/.../cohort_X/`)	Many per user, one per cohort	A directory with `casetrack.toml`, `casetrack.db`, `provenance.jsonl`
3. Your pipeline code	Your own git repo (Snakemake / Nextflow / bash / etc.)	Many per user, one per pipeline	Orchestration + summary scripts — ends each job with `casetrack append --project-dir ...`

Users do not clone this repo to use casetrack — they install it once, create project directories wherever their data lives, and call it from their own pipeline code. The examples/giab_chr21/ directory is a demo and reference for the three-phase SLURM pattern; it's not a template you need to copy wholesale.

Three recommended patterns by user shape:

casetrack

Popularity

What's Inside

README

casetrack

How people actually use this

Confidence

Similar Plugins

claude-mem

nanobanana

llm-council-plugin

product-management

More by sahuno

igv-reports

Popularity

Health & Quality

More by sahuno

igv-reports

Similar Plugins

claude-mem

nanobanana

llm-council-plugin

product-management