From encode-toolkit
Evaluates ENCODE experiment quality using QC metrics (FRiP, NSC, RSC, NRF, IDR, TSS enrichment, fragment size) and audit flags. For data quality checks, filtering, comparisons, and usability decisions.
How this skill is triggered — by the user, by Claude, or both
Slash command
/encode-toolkit:quality-assessmentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- User asks about data quality, QC metrics, or whether an experiment is reliable
Help the user evaluate whether ENCODE experiments meet quality standards for their analysis. Quality assessment is not a single-metric exercise — it requires integrating multiple orthogonal measures in the context of the specific assay, biological system, and analytical goals.
| # | Reference | Key Contribution |
|---|---|---|
| 1 | Landt et al. 2012, Genome Res, DOI:10.1101/gr.136184.111 (~3,500 cit) | ENCODE/modENCODE ChIP-seq guidelines; defined NSC, RSC, NRF, FRiP thresholds |
| 2 | ENCODE Project Consortium 2020, Nature, DOI:10.1038/s41586-020-2493-4 (~1,656 cit) | ENCODE Phase 3; expanded quality standards to new assays, defined cCRE registry |
| 3 | Buenrostro et al. 2013, Nat Methods, DOI:10.1038/nmeth.2688 (~7,000 cit) | Introduced ATAC-seq; established fragment size and TSS enrichment as key QC |
| 4 | Ou et al. 2018, BMC Genomics, DOI:10.1186/s12864-018-4559-3 | ATACseqQC R package; systematic quality metrics for ATAC-seq |
| 5 | Conesa et al. 2016, Genome Biol, DOI:10.1186/s13059-016-0881-8 (~2,363 cit) | RNA-seq best practices survey; defined mapping rate, rRNA, gene body coverage |
| 6 | Foox et al. 2021, Genome Biol, DOI:10.1186/s13059-021-02529-2 | SEQC2 EpiQC consortium; multi-platform WGBS benchmarking |
| 7 | Yardimci et al. 2019, Genome Biol, DOI:10.1186/s13059-019-1658-7 | Hi-C quality measures; cis/trans ratio, distance-dependent decay, resolution |
| 8 | Skene & Henikoff 2017, eLife, DOI:10.7554/eLife.21856 (~1,800 cit) | CUT&RUN method; established spike-in normalization and low-background QC |
| 9 | Kaya-Okur et al. 2019, Nat Commun, DOI:10.1038/s41467-019-09982-5 (~1,200 cit) | CUT&Tag method; tagmentation-based profiling with distinct QC profile |
| 10 | Li et al. 2011, Ann Appl Stat, DOI:10.1214/11-AOAS466 (~1,500 cit) | Irreproducible Discovery Rate (IDR); principled replicate concordance |
| 11 | Hitz et al. 2023, Nucleic Acids Res, DOI:10.1093/nar/gkad243 | ENCODE uniform processing pipelines; standardized QC across all assays |
| 12 | Nordin et al. 2023, Genome Biol, DOI:10.1186/s13059-023-03027-3 | CUT&RUN suspect list; identified artifact-prone regions specific to CUT&RUN/CUT&Tag |
| 13 | Amemiya et al. 2019, Sci Rep, DOI:10.1038/s41598-019-45839-z (~1,372 cit) | ENCODE Blacklist v2; artifact regions to exclude from all analyses |
Use encode_get_experiment with the accession to get full metadata including:
encode_get_experiment(accession="ENCSR...")
For batch assessment across multiple experiments:
encode_search_experiments(assay_title="...", organ="...", limit=50)
# Then iterate through results checking audit flags
ENCODE audits are generated by automated validators during the ENCODE uniform processing pipeline (Hitz et al. 2023). They flag experiments by severity:
| Level | Meaning | Action |
|---|---|---|
| ERROR | Critical issues — data may be unreliable | Avoid using unless no alternative exists. Document thoroughly if used. |
| NOT_COMPLIANT | Does not meet current ENCODE standards | Usable with caveats. Check which specific standard is violated. |
| WARNING | Minor issues detected | Generally safe. Document the specific warning. |
| INTERNAL_ACTION | DCC processing notes | Usually not a concern for external users. |
Common audit categories and what they mean:
| Audit Category | What It Checks |
|---|---|
replicate concordance | IDR or correlation between biological replicates |
library complexity | NRF, PBC1, PBC2 — whether library is saturated |
read depth | Whether minimum depth thresholds are met |
control quality | Whether input/IgG control is adequate |
mapping quality | Alignment rate and uniquely mapped fraction |
peak calling | Whether peaks were called successfully, FRiP |
antibody validation | Whether antibody meets ENCODE standards |
Present every audit flag to the user and explain each one. A single ERROR audit does not automatically disqualify an experiment — context matters.
The ENCODE ChIP-seq guidelines (Landt et al. 2012) established the foundational metrics still used today. These were developed from analysis of hundreds of ChIP-seq experiments and reflect empirically-derived thresholds.
| Metric | Threshold | Concern | What It Measures | Why It Matters |
|---|---|---|---|---|
| FRiP | ≥1% (TF), ≥5% (histone) | Below threshold | Fraction of reads in peaks | Signal enrichment. Very low FRiP means most reads are background. TF ChIP typically has lower FRiP than broad histone marks. |
| NSC | >1.05 | ≤1.05 | Normalized strand cross-correlation | Signal-to-noise ratio. Computed from strand shift analysis. Values near 1.0 indicate no enrichment. |
| RSC | >0.8 | ≤0.8 | Relative strand cross-correlation | Signal relative to phantom peak. More robust than NSC for shallow libraries. |
| NRF | ≥0.8 | <0.8 | Non-redundant fraction (unique/total) | Library complexity. Low NRF = excessive PCR duplication = wasted sequencing. |
| PBC1 | ≥0.8 | <0.5 | PCR bottleneck coefficient 1 | N1/Nd: fraction of locations with exactly 1 read. More sensitive than NRF at high depth. |
| PBC2 | ≥3 | <1 | PCR bottleneck coefficient 2 | N1/N2: ratio of 1-read to 2-read locations. <1 indicates severe bottleneck. |
| Target Type | Minimum per Replicate | Recommended | Notes |
|---|---|---|---|
| Transcription factor | 10M uniquely mapped | 20M | Narrow peaks, need depth for detection |
| Broad histone mark (H3K27me3, H3K9me3, H3K36me3) | 20M uniquely mapped | 45M | Broad domains require more reads |
| Narrow histone mark (H3K4me3, H3K27ac) | 20M uniquely mapped | 20M | Sharp peaks, similar to TF |
| Input/IgG control | 10M uniquely mapped | Match IP depth | Should match or exceed IP library depth |
The Irreproducible Discovery Rate provides principled assessment of replicate concordance:
| IDR Comparison | Expected | Concern | Interpretation |
|---|---|---|---|
| Nt (true replicates) | ≥50% of Np | <50% Np | Low concordance between biological replicates |
| Np (pooled pseudoreplicates) | Reference set | — | Represents total discoverable peaks |
| Self-consistency (Ns) | ≥50% of Np | <50% Np | Individual replicate quality |
| Rescue ratio (Np/max(Nt,Ns)) | <2 | >2 | High ratio = one replicate much weaker |
Key insight: IDR thresholded peaks represent peaks passing replicate concordance analysis. Pseudoreplicated peaks = single-replicate fallback (lower confidence). Optimal IDR peaks from pooled data = most complete peak set.
ENCODE requires characterization for every antibody:
antibody_lot_reviews field in experiment metadataATAC-seq has a distinct quality profile driven by the transposase insertion mechanism.
| Metric | Good | Concern | What It Measures |
|---|---|---|---|
| TSS enrichment | ≥5 GRCh38 / ≥6 hg19 / ≥10 mm10 (ENCODE data standards) | <4 | Signal enrichment at transcription start sites. The single most informative ATAC-seq QC metric. |
| FRiP | ≥20% | <10% | Higher expected FRiP than ChIP-seq because accessible chromatin = true signal |
| Fragment size distribution | Clear nucleosomal ladder | Monotonic decay | Should show peaks at <150bp (NFR), ~200bp (mono-nuc), ~400bp (di-nuc), ~600bp (tri-nuc) |
| NFR ratio | >2× mono-nucleosomal | <1× | Ratio of sub-nucleosomal to mono-nucleosomal fragments |
| Mitochondrial reads | <20% (after filtering) | >50% | Mitochondrial DNA is highly accessible; excessive = poor nuclear enrichment |
| Duplicate rate | <30% | >50% | PCR duplication. Omni-ATAC protocol reduces this. |
| NRF | ≥0.7 | <0.5 | Library complexity, same concept as ChIP-seq |
| Sample Type | Minimum | Recommended |
|---|---|---|
| Bulk ATAC-seq | 25M uniquely mapped (post-dedup, post-mito filter) | 50M |
| Single-cell ATAC-seq | 25K unique fragments per cell | 50K per cell |
The fragment size distribution is the signature QC plot for ATAC-seq:
A clean ATAC-seq library shows a clear nucleosomal ladder. Monotonic decay (no peaks) suggests either dead cells, over-transposition, or excessive DNA damage.
| Metric | Good | Concern | What It Measures |
|---|---|---|---|
| Mapping rate | 70-90% uniquely mapped | <70% | Alignment success. Low rate = contamination, adapter issues, or wrong reference |
| rRNA contamination | <10% | >20% | Ribosomal RNA depletion efficiency. High = failed ribo-depletion |
| Gene body coverage | Uniform 5'→3' | Strong 3' bias | Even coverage across gene bodies. 3' bias = degraded RNA or poly-A capture bias |
| Duplication rate | <50% | >70% | PCR amplification artifacts |
| Replicate correlation | Spearman ≥0.9 (same condition) | <0.8 | Concordance between replicates |
| Exonic reads | >60% of mapped | <40% | Reads mapping to annotated exons vs intergenic |
| Intergenic reads | <10% | >20% | Reads mapping between genes — may indicate genomic DNA contamination |
| Application | Minimum | Recommended | Notes |
|---|---|---|---|
| Gene-level quantification | 10M mapped | 30M mapped | Standard bulk RNA-seq |
| Transcript-level quantification | 30M mapped | 60M mapped | Isoform detection requires more depth |
| Differential expression | 10M per sample, ≥3 bio reps | 20M per sample | Statistical power depends more on replicates than depth |
| Rare transcript detection | 50M+ mapped | 100M mapped | Long-tail of expression distribution |
| total RNA-seq | 50M+ mapped | 100M mapped | Includes non-coding RNA, intergenic transcripts |
ENCODE RNA-seq data may be stranded or unstranded:
run_type and library_strand_specificity in metadata.| Metric | Good | Concern | What It Measures |
|---|---|---|---|
| Bisulfite conversion rate | ≥98% | <98% | Efficiency of C→U conversion of unmethylated cytosines. Measured from spike-in controls (lambda phage DNA). |
| CpG coverage | >80% of CpGs at ≥1× | <50% | Fraction of CpG sites covered by at least one read |
| Mean CpG coverage | ≥10× for DMR analysis | <5× | Average sequencing depth at CpG sites. 10× needed for reliable methylation calls. |
| Mapping rate | >60% unique | <40% | Lower than standard WGS due to reduced complexity after bisulfite conversion |
| Duplication rate | <30% | >50% | PCR duplicates |
| CpG methylation distribution | Bimodal (near 0% and near 100%) | Unimodal | Healthy cells show bimodal: most CpGs are either fully methylated or unmethylated |
| Lambda/pUC19 conversion | ≥98% conversion rate | <98% | Spike-in controls for bisulfite conversion efficiency |
The SEQC2 EpiQC benchmark found significant platform effects:
| Application | Minimum CpG Coverage | Recommended |
|---|---|---|
| Methylation landscape | 1× | 5× |
| Differentially methylated regions | 5× per sample | 10× per sample |
| Allele-specific methylation | 15× | 30× |
| Single CpG resolution | 10× | 30× |
| Metric | Good | Concern | What It Measures |
|---|---|---|---|
| Cis/trans ratio | >60% cis | <40% cis | Fraction of contacts within same chromosome. Low cis = random ligation = poor quality |
| Long-range cis (>20kb) | >40% of cis | <15% | True 3D interactions vs random proximity. Short-range contacts are noise-enriched |
| Unique valid pairs | >50% of total | <25% | Pairs surviving all filters (mapping, dedup, chimera removal) |
| Duplicate rate | <40% | >60% | PCR duplicates in Hi-C are especially problematic because they inflate contact frequencies |
| Contact distance decay | Smooth P(s)∝s^-1 curve | Irregular/plateau | Expected power-law decay with genomic distance |
| Resolution Target | Minimum Valid Pairs | Recommended |
|---|---|---|
| Compartment-level (100kb) | 50M | 100M |
| TAD-level (40kb) | 200M | 500M |
| Loop-level (5-10kb) | 500M | 1B+ |
| Sub-TAD (1kb) | 2B+ | 5B+ |
Note: Hi-C resolution is not just about read depth — it also depends on restriction enzyme site density, ligation efficiency, and fragment size distribution. In situ Hi-C (Rao et al. 2014) generally produces cleaner data than dilution Hi-C.
These newer profiling methods have distinct quality profiles from ChIP-seq.
| Feature | ChIP-seq | CUT&RUN / CUT&Tag |
|---|---|---|
| Background | High (requires input control) | Low (targeted cleavage) |
| Required depth | 10-45M | 3-8M sufficient |
| FRiP | >1-5% | >20% typical |
| Input control | Required | IgG control recommended but lower priority |
| Fragment size | Size-selected ~200-600bp | Variable; CUT&RUN releases <120bp fragments |
| Spike-in | Not standard | Recommended (E. coli carry-over or added spike-in) |
| Metric | Good | Concern |
|---|---|---|
| FRiP | >20% | <5% |
| Fragment size | Peak at <120bp (released fragments) | Only large fragments |
| Read depth | 3-8M unique mapped | <1M |
| Spike-in ratio | Consistent across conditions | >5× variation |
| Duplicate rate | <30% | >60% |
CUT&RUN and CUT&Tag generate artifacts at specific genomic regions (distinct from the ENCODE Blacklist). These are regions with apparent enrichment that is not target-specific:
ENCODE includes scRNA-seq and scATAC-seq experiments (primarily 10X Chromium platform). Single-cell data has distinct quality metrics from bulk assays, focused on per-cell quality rather than per-experiment signal-to-noise.
| Metric | Acceptable Range | Red Flag | Notes |
|---|---|---|---|
| Genes per cell (median) | 1,500–4,000 (10X) / 4,000–8,000 (Smart-seq2) | <500 | Tissue-dependent; immune cells typically lower than epithelial |
| UMIs per cell (median) | 3,000–15,000 (10X) | <1,000 | N/A for Smart-seq2 (no UMIs) |
| Mitochondrial % (median) | <10–15% | >25% | High mito% indicates cell stress or lysis; tissue-dependent thresholds |
| Doublet rate (estimated) | 2–8% (10X, cell-count dependent) / <2% (plate-based) | >10% | Increases with cell loading density; use Scrublet or DoubletFinder |
| Mapping rate | >80% | <60% | Low mapping suggests contamination or mismapping |
| Sequencing saturation | >40% | <20% | Low saturation may miss rare transcripts |
| Cell count vs expected | Within 50–150% of expected | <30% or >200% | Very low = failed capture; very high = doublets or debris |
| Metric | Acceptable Range | Red Flag | Notes |
|---|---|---|---|
| Unique fragments per cell | >3,000 | <1,000 | Sparse data below threshold makes peak calling unreliable |
| TSS enrichment per cell | >5 | <2 | Low TSS enrichment indicates failed Tn5 insertion bias |
| Fraction in peaks (FRiP) | >20% | <10% | Measures signal-to-noise at single-cell level |
| Fraction of mitochondrial reads | <5% | >10% | Dead/dying cells captured |
| Duplicate rate | <40% | >60% | High duplication indicates low library complexity |
Single-cell experiments in ENCODE include cell-level quality summaries in their metadata. Check:
encode_get_experiment(accession="ENCSR...")
Look for audit flags — ENCODE applies automated QC checks including minimum cell counts, minimum genes per cell, and maximum doublet rates. Also check the replicates section for library preparation details.
ENCODE requires minimum 2 independent biological replicates for released data.
| Replicate Type | Definition | Use Case |
|---|---|---|
| Biological | Independent biological samples | Gold standard — captures biological variation |
| Technical | Same sample, different library prep | Assesses technical reproducibility |
| Isogenic | Same genotype, different growth/collection | Common for cell lines (e.g., K562, GM12878) |
| Anisogenic | Different genotypes/donors | Common for tissue samples |
Use encode_list_files to check for replicated peak files:
| File Output Type | What It Means | Confidence |
|---|---|---|
| IDR thresholded peaks | Passed replicate concordance analysis (Li et al. 2011) | Highest |
| Optimal IDR peaks | Peaks from pooled data, thresholded by IDR | Complete set |
| Conservative IDR peaks | Stricter IDR threshold | Most conservative |
| Pseudoreplicated peaks | IDR on pseudoreplicates from pooled data | Single-replicate fallback |
| Replicated peaks | Found in multiple replicates (non-IDR method) | Moderate |
For quantitative data (RNA-seq, signal tracks):
Before interpreting any peak-based quality metric, confirm that the ENCODE Blacklist has been applied:
hg38-blacklist.v2.bed.gz (910 regions)mm10-blacklist.v2.bed.gzFailure to remove blacklisted regions will inflate FRiP, create false peaks, and confound enrichment analyses. If analyzing CUT&RUN/CUT&Tag, also apply the CUT&RUN suspect list (Nordin et al. 2023).
When listing files with encode_list_files, use quality-informed selection:
# Get preferred default files (ENCODE's recommendation)
encode_list_files(experiment_accession="ENCSR...", preferred_default=True)
# Get IDR thresholded peaks (gold standard for ChIP-seq)
encode_list_files(experiment_accession="ENCSR...", output_type="IDR thresholded peaks", assembly="GRCh38")
# Get signal tracks for visualization
encode_list_files(experiment_accession="ENCSR...", output_type="fold change over control", assembly="GRCh38")
| Priority | File Type | When to Use |
|---|---|---|
| 1 | preferred_default=True | ENCODE's recommended files — start here |
| 2 | IDR thresholded peaks | Gold standard for ChIP-seq peak calls |
| 3 | Fold change over control | Normalized signal for visualization and quantitative comparison |
| 4 | Signal of unique reads | Clean signal tracks (unnormalized) |
| 5 | Pseudoreplicated peaks | Fallback when IDR fails or only 1 replicate available |
| 6 | Unfiltered alignments | Only for custom re-analysis |
Provide a structured quality assessment:
| Tier | Criteria | Recommendation |
|---|---|---|
| High quality | No ERROR/NOT_COMPLIANT audits, all metrics above thresholds, ≥2 biological replicates, IDR peaks available | Use confidently. Ideal for primary analysis. |
| Usable with caveats | WARNING-level audits or borderline metrics (within 20% of threshold), good replication | Usable. Document specific limitations in methods. |
| Use with caution | NOT_COMPLIANT flags, one metric below threshold, or single-replicate | Use only if no better alternative. Document all issues. Flag in results. |
| Not recommended | ERROR flags, multiple metrics below threshold, poor replication, no IDR peaks | Avoid. Seek alternative experiments or datasets. |
For each experiment assessed, provide:
Single-metric decisions: No single metric captures quality. FRiP alone can be misleading — some TF ChIP-seq with biological signal has low FRiP due to focal binding patterns. Always evaluate collectively.
Comparing across assays: Do NOT compare ChIP-seq metrics to ATAC-seq metrics to CUT&RUN metrics. Each assay has its own quality profile and thresholds.
Ignoring batch effects: Experiments from different labs, dates, or platforms may have systematic quality differences. When combining data, check for batch-correlated quality variation.
Assembly mismatch: Quality metrics computed on different assemblies (hg19 vs GRCh38) may differ slightly. Always verify the assembly of quality metrics matches your analysis assembly.
Antibody lot variation: The same antibody target can show different enrichment across lots. Check antibody_lot_reviews in ENCODE metadata.
Read depth ≠ quality: A deeply sequenced bad library is still a bad library. Check NRF/PBC first — if complexity is exhausted, more sequencing wastes resources.
Control quality matters: An IP library is only as good as its control. Poor input/IgG control undermines all downstream peak-based metrics.
Newer assays, different rules: CUT&RUN and CUT&Tag have inherently different quality profiles from ChIP-seq. Applying ChIP-seq thresholds to CUT&RUN will incorrectly flag high-quality data.
Goal: Evaluate the quality of ENCODE ChIP-seq experiments against ENCODE consortium standards before including them in downstream analysis. Context: Not all ENCODE experiments meet the highest quality standards. Quality assessment prevents garbage-in-garbage-out in aggregation and integration analyses.
encode_get_experiment(accession="ENCSR000AKA")
Expected output:
{
"accession": "ENCSR000AKA",
"assay_title": "Histone ChIP-seq",
"target": "H3K27ac",
"biosample_summary": "GM12878",
"replicates": 2,
"status": "released",
"audit": {"ERROR": 0, "NOT_COMPLIANT": 0, "WARNING": 1}
}
Interpretation: 0 ERRORs and 0 NOT_COMPLIANT = experiment meets ENCODE standards. 1 WARNING is acceptable.
encode_list_files(accession="ENCSR000AKA", file_format="bed", output_type="IDR thresholded peaks", assembly="GRCh38")
Key ChIP-seq quality thresholds (Landt et al. 2012):
| Metric | Threshold | Meaning |
|---|---|---|
| FRiP | >= 1% | Signal enrichment over background |
| NSC | > 1.05 | Strand cross-correlation signal |
| RSC | > 0.8 | Relative strand correlation |
| NRF | >= 0.8 | Library complexity |
| IDR | < 0.05 | Reproducibility between replicates |
encode_track_experiment(accession="ENCSR000AKA", notes="QC PASSED: FRiP=3.2%, NSC=1.12, RSC=0.95, 0 audit errors")
encode_get_experiment(accession="ENCSR000AKA")
Expected output:
{
"accession": "ENCSR000AKA",
"audit": {"ERROR": 0, "NOT_COMPLIANT": 0, "WARNING": 1}
}
encode_list_files(accession="ENCSR000AKA", file_format="bed", assembly="GRCh38")
Expected output:
{
"files": [
{"accession": "ENCFF001ABC", "output_type": "IDR thresholded peaks", "file_size_mb": 1.1}
]
}
encode_get_file_info(accession="ENCFF001ABC")
Expected output:
{
"accession": "ENCFF001ABC",
"file_format": "bed narrowPeak",
"output_type": "IDR thresholded peaks",
"assembly": "GRCh38",
"quality_metrics": {"frip": 0.032, "nsc": 1.12, "rsc": 0.95}
}
| This skill produces... | Feed into... | Purpose |
|---|---|---|
| Quality-verified experiments | histone-aggregation | Only aggregate high-quality data |
| QC pass/fail decisions | search-encode | Filter search results by quality |
| Quality metric reports | data-provenance | Document QC criteria used |
| Audit interpretation | pipeline-guide | Guide reprocessing decisions |
| QC documentation | scientific-writing | Methods section QC reporting |
| Quality thresholds | publication-trust | Verify QC threshold citations |
| Validated experiment lists | batch-analysis | Process only quality-approved experiments |
| QC-filtered peaks | regulatory-elements | High-confidence regulatory element maps |
npx claudepluginhub ammawla/encode-toolkitEvaluates ENCODE experiment quality using QC metrics (FRiP, NSC, RSC, NRF, IDR, TSS enrichment, fragment size) and audit flags. For data quality checks, filtering, comparisons, and usability decisions.
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.