Skill

data-provenance

Tracks exact provenance for every operation on ENCODE data — tool versions, references, parameters, and timestamps — and auto-generates publication-ready methods sections from the log.

Python

Bash

documentation

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/encode-toolkit:data-provenance

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

- User wants to track the full analysis chain from ENCODE download through processing to publication figure

Supporting Files

references/literature.md

SKILL.md

647 lines · ~6.2k tokens(exceeds 5k compaction limit)

Stats

LanguagePython

Stars35

Forks5

MaintenanceExcellent

Last CommitJun 14, 2026

Actions

View Source View Plugin View on GitHub View README

Field	Example	Source
Accession	ENCSR133RZO	ENCODE portal
Assay	Histone ChIP-seq	encode_get_experiment
Target	H3K27ac	encode_get_experiment
Biosample	pancreas tissue	encode_get_experiment
Lab	Bing Ren, UCSD	encode_get_experiment
Replicates	2 biological	encode_get_experiment
Sequencer	Illumina HiSeq 4000	encode_get_experiment
Read length	76bp PE	encode_get_experiment
Read count	42.3M per rep	File metadata
Library	TruSeq ChIP	encode_get_experiment
Batch/date	2019-06-15	encode_get_experiment

Tool	Citation
bedtools	Quinlan & Hall 2010, Bioinformatics
samtools	Li et al. 2009, Bioinformatics
STAR	Dobin et al. 2013, Bioinformatics
featureCounts	Liao et al. 2014, Bioinformatics
edgeR	Robinson et al. 2010, Bioinformatics
MACS2	Zhang et al. 2008, Genome Biology
DESeq2	Love et al. 2014, Genome Biology
Seurat	Stuart et al. 2019, Cell
SCTransform	Hafemeister & Satija 2019, Genome Biology
CellRanger	10x Genomics (cite version used)
Scanpy	Wolf et al. 2018, Genome Biology
ChromHMM	Ernst & Kellis 2012, Nature Methods
liftOver	Kent et al. 2002, Genome Research
HOMER	Heinz et al. 2010, Molecular Cell
deepTools	Ramirez et al. 2016, Nucleic Acids Research
Harmony	Korsunsky et al. 2019, Nature Methods
IDR	Li et al. 2011, Annals of Applied Statistics
WGCNA	Langfelder & Horvath 2008, BMC Bioinformatics
CibersortX	Newman et al. 2019, Nature Biotechnology
GSEA	Subramanian et al. 2005, PNAS
Gviz	Hahne & Ivanek 2016, Methods in Molecular Biology
GraphPad Prism	GraphPad Software (cite version)
DAVID	Huang et al. 2009, Nature Protocols
Enrichr	Kuleshov et al. 2016, Nucleic Acids Research
DEGAS	Li et al. 2022, Genome Biology
RRHO	Plaisier et al. 2010, Nucleic Acids Research

Software	Version	Citation
R	4.3.2	R Core Team 2023
Bioconductor	3.18	Huber et al. 2015
DESeq2	1.42.0	Love et al. 2014

This skill produces...	Feed into...	Using tool/skill
Provenance chain (accession → derived files)	Methods section generation	scientific-writing skill
Logged analysis steps with parameters	Reproducibility audit	publication-trust skill
MD5-verified file records	Data availability statement	cite-encode skill
Sequential script numbering	Pipeline documentation	pipeline-guide skill
Complete tool + version records	Tool citation list	cite-encode → BibTeX export

Field	Example	Source
Accession	ENCSR133RZO	ENCODE portal
Assay	Histone ChIP-seq	encode_get_experiment
Target	H3K27ac	encode_get_experiment
Biosample	pancreas tissue	encode_get_experiment
Lab	Bing Ren, UCSD	encode_get_experiment
Replicates	2 biological	encode_get_experiment
Sequencer	Illumina HiSeq 4000	encode_get_experiment
Read length	76bp PE	encode_get_experiment
Read count	42.3M per rep	File metadata
Library	TruSeq ChIP	encode_get_experiment
Batch/date	2019-06-15	encode_get_experiment

data-provenance

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

data-provenance

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Exact Provenance Tracking and Methods Writing

When to Use

Scientific Rationale

Comprehensive Provenance Standard

Why This Level of Detail Matters

Step 1: Initialize Experiment Log

Log Structure

Experiment Log Format (experiment_log.json)

Track Source ENCODE Experiments

Step 2: Log Every Operation

Operation Log Entry Format

Common Operations to Log

Downloading ENCODE Files

Genome Coordinate Liftover

Peak Filtering

Merging/Union Operations

R/Bioconductor Analysis

Python Analysis

Step 3: Record Software Environment

R Environment

Python Environment

Command-Line Tools

System Information

Step 4: Store Scripts

Naming Convention

Script Header Template

Step 5: Log Derived Files to ENCODE Tracker

Step 6: Version Control and Experiment Branching

When the User Runs Multiple Versions

Example Version Log

Step 7: Auto-Generate Methods Sections

Methods Template Structure

Scientific Documentation Standards

Citation Format for Tools

Step 8: Supplementary Data Tables

Table S1: ENCODE Experiments Used

Table S2: Files Selected

Table S3: Processing Steps

Table S4: Software Environment

Pitfalls and Edge Cases

Tool Version Drift

Reference File Versioning

Incomplete Provenance

Multi-User Environments

Containerization for Exact Reproduction

Walkthrough: Building a Complete Provenance Trail for an ENCODE Analysis

Step 1: Log data acquisition

Step 2: Log file downloads with MD5 verification

Step 3: Log derived analysis outputs

Step 4: View the complete provenance chain

Step 5: Generate provenance summary for publication

Integration with downstream skills

Code Examples

1. Track an experiment for provenance

2. Log a derived analysis file

3. View provenance chain

Integration

Related Skills

Presenting Results

For the request: "$ARGUMENTS"

Similar Skills

Exact Provenance Tracking and Methods Writing

When to Use

Scientific Rationale

Comprehensive Provenance Standard

Why This Level of Detail Matters

Step 1: Initialize Experiment Log

Log Structure