From encode-toolkit
Queries UCSC Genome Browser REST API for regulatory tracks, DNA sequences, cCRE annotations, TF binding clusters, and track schemas in genomic regions. Use for regulatory element lookups, ENCODE cCRE queries, or TF binding data retrieval.
How this skill is triggered — by the user, by Claude, or both
Slash command
/encode-toolkit:ucsc-browserThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- User wants to query the UCSC Genome Browser REST API for tracks, sequences, or cCRE annotations
Retrieve regulatory annotations, DNA sequences, TF binding data, and ENCODE-hosted tracks from the UCSC Genome Browser programmatic interface.
The question: "What regulatory annotations exist at this genomic locus, and what is the underlying sequence?"
The UCSC Genome Browser hosts the most comprehensive collection of genome annotations, including ENCODE cCREs (926,535 human), TF rPeak clusters (21.8M from 912 factors across 1,152 biosamples), DNase clusters, conservation scores, and gene models. The REST API at api.genome.ucsc.edu enables programmatic access without authentication.
The ENCODE Portal (encodeproject.org) provides experiment-level data — individual ChIP-seq peaks, BAM files, quality metrics. UCSC provides aggregated, cross-experiment annotations: which cCREs overlap your region, which TFs bind there across all ENCODE biosamples, and what the underlying DNA sequence is. Together they answer: "What did ENCODE find at this locus?" (UCSC) and "What are the specific experiments behind it?" (ENCODE Portal).
encodeCcreCombined track. DOITFrPeakClusters track on UCSC.Base URL: https://api.genome.ucsc.edu
No authentication required. Rate limit: ~1 request/second recommended. Use semicolons (;) to separate parameters.
Coordinate system: Half-open, 0-based start (matches BED format). start=1000000;end=1000100 returns 100 bases starting at position 1,000,000.
Before querying data, check what tracks exist for your assembly:
# List all tracks for hg38
curl "https://api.genome.ucsc.edu/list/tracks?genome=hg38"
# Search for ENCODE-specific tracks
curl "https://api.genome.ucsc.edu/search?search=encode+regulation&genome=hg38&categories=trackDb"
# Get schema (field definitions) for a track
curl "https://api.genome.ucsc.edu/list/schema?genome=hg38;track=encodeCcreCombined"
| Track ID | Description | Data Type | Source |
|---|---|---|---|
encodeCcreCombined | 926,535 candidate cis-regulatory elements (V3) | bigBed 9+ | ENCODE Phase 3 |
TFrPeakClusters | 21.8M TF rPeak clusters, 912 factors, 1,152 biosamples | bigBed 12+ | ENCODE 4 |
wgEncodeRegDnaseClustered | 2.1M+ DNase clusters across 95 cell types | MySQL table | ENCODE 2/3 |
wgEncodeRegTfbsClustered | TF binding site clusters (legacy) | MySQL table | ENCODE 2/3 |
| Track ID | Description | Use Case |
|---|---|---|
cpgIslandExt | CpG islands | Promoter identification |
rmsk | RepeatMasker | Filter repetitive elements |
snp155 | dbSNP 155 with ClinVar | Variant annotation |
phastCons100way | Conservation scores (100 vertebrates) | Evolutionary constraint |
phyloP100way | Per-base conservation (100 vertebrates) | Variant impact |
The most common use case — what regulatory elements does ENCODE predict at this region?
# Get all cCREs in a 100kb window
curl "https://api.genome.ucsc.edu/getData/track?genome=hg38;track=encodeCcreCombined;chrom=chr1;start=1000000;end=1100000"
# Use jsonOutputArrays for named fields (recommended)
curl "https://api.genome.ucsc.edu/getData/track?genome=hg38;track=encodeCcreCombined;chrom=chr1;start=1000000;end=1100000;jsonOutputArrays=1"
| Field | Description | Example |
|---|---|---|
chrom | Chromosome | chr1 |
chromStart | Start (0-based) | 999856 |
chromEnd | End | 1000009 |
name | ENCODE accession | EH38E1310344 |
score | Signal strength (0-1000) | 312 |
encodeLabel | cCRE class | PLS, pELS, dELS, CTCF-only |
zScore | Max DNase Z-score | 3.1283 |
ccre | Full classification | PLS,CTCF-bound |
| Class | Full Name | Biochemical Signature |
|---|---|---|
| PLS | Promoter-like signature | DNase+ H3K4me3+ near TSS |
| pELS | Proximal enhancer-like | DNase+ H3K27ac+ within 2kb of TSS |
| dELS | Distal enhancer-like | DNase+ H3K27ac+ >2kb from TSS |
| CTCF-only | CTCF-only | DNase+ CTCF+ (no H3K4me3/H3K27ac) |
| DNase-H3K4me3 | DNase-H3K4me3 | DNase+ H3K4me3+ >200bp from TSS |
Which transcription factors bind at your region across all ENCODE biosamples?
# Get TF rPeak clusters in a region
curl "https://api.genome.ucsc.edu/getData/track?genome=hg38;track=TFrPeakClusters;chrom=chr1;start=1000000;end=1100000;jsonOutputArrays=1"
| Field | Description |
|---|---|
factor | Transcription factor name (e.g., CTCF, POLR2A) |
ubiquity | Fraction of experiments showing binding (0-1) |
cCRE | Overlapping cCRE accession |
exp | ENCODE experiment accessions (links to Portal) |
Cross-reference with ENCODE Portal: The exp field contains ENCODE experiment accessions. Use encode_get_experiment to get full metadata:
encode_get_experiment(accession="ENCSR...")
Get the underlying DNA sequence for regulatory elements:
# Get sequence for a region
curl "https://api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chr1;start=1000000;end=1000500"
# Get reverse complement
curl "https://api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chr1;start=1000000;end=1000500;revComp=1"
Response includes a dna field with the nucleotide sequence.
Use cases for sequence retrieval:
# DNase clusters (95 cell types)
curl "https://api.genome.ucsc.edu/getData/track?genome=hg38;track=wgEncodeRegDnaseClustered;chrom=chr1;start=1000000;end=1100000;jsonOutputArrays=1"
The sourceCount field tells you how many of the 95 cell types show accessibility at each site — a measure of how constitutive vs tissue-specific the element is.
For genome-wide queries, use UCSC command-line utilities instead of the REST API (which caps at 1M items):
# Download UCSC tools (macOS example)
# Available at: https://hgdownload.gi.ucsc.edu/admin/exe/
# Extract ENCODE cCREs for a region from hosted bigBed
bigBedToBed https://hgdownload.gi.ucsc.edu/gbdb/hg38/encode3/encodeCcreCombined.bb \
-chrom=chr1 -start=1000000 -end=2000000 stdout
# Extract TF rPeak clusters
bigBedToBed https://hgdownload.gi.ucsc.edu/gbdb/hg38/bbi/ENCODE4/TFrPeakClusters.bb \
-chrom=chr1 -start=1000000 -end=2000000 stdout
# Summarize bigWig signal over regions
bigWigSummary http://path/to/signal.bw chr1 1000000 1100000 10
mysql --user=genome --host=genome-mysql.gi.ucsc.edu -A -P 3306 -D hg38 \
-e "SELECT * FROM wgEncodeRegDnaseClustered WHERE chrom='chr1' AND chromStart >= 1000000 AND chromEnd <= 1100000;"
A typical regulatory analysis workflow combining both:
1. Search ENCODE for tissue-specific experiments:
encode_search_experiments(assay_title="Histone ChIP-seq", target="H3K27ac", organ="pancreas")
2. Get experiment details and download peaks:
encode_list_files(experiment_accession="ENCSR...", output_type="IDR thresholded peaks", assembly="GRCh38")
3. For each peak region, query UCSC for regulatory context:
curl "https://api.genome.ucsc.edu/getData/track?genome=hg38;track=encodeCcreCombined;chrom=CHR;start=START;end=END;jsonOutputArrays=1"
4. Check which TFs bind at each peak:
curl "https://api.genome.ucsc.edu/getData/track?genome=hg38;track=TFrPeakClusters;chrom=CHR;start=START;end=END;jsonOutputArrays=1"
5. Get DNA sequence for motif analysis:
curl "https://api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=CHR;start=START;end=END"
6. Track and log provenance:
encode_track_experiment(accession="ENCSR...", notes="Pancreas H3K27ac - UCSC cCRE overlap analysis")
wgEncode prefix are from ENCODE 2/3 and stored in MySQL tables. ENCODE 4 data uses bigBed files accessed via the REST API. Access methods differ.encodeCcreCombined track contains V3 cCREs (926,535). The expanded V4 registry (2.35M cCREs, Moore et al. 2024) may not yet be reflected on UCSC — check the SCREEN portal for the latest.api.genome.ucsc.edu (US), genome-euro.ucsc.edu (EU), or genome-asia.ucsc.edu (Asia) based on your location.Goal: Use UCSC Genome Browser REST API to retrieve candidate cis-regulatory elements (cCREs) and custom track data that complement ENCODE experiments, enabling genome-wide regulatory visualization. Context: UCSC Genome Browser hosts ENCODE-derived cCRE tracks and provides REST API access to sequence, annotations, and track data.
encode_search_experiments(assay_title="ATAC-seq", organ="heart", organism="Homo sapiens", limit=5)
Expected output:
{
"total": 18,
"results": [
{"accession": "ENCSR100HRT", "assay_title": "ATAC-seq", "biosample_summary": "heart left ventricle", "status": "released"}
]
}
encode_get_file_info(accession="ENCFF200BW")
Expected output:
{
"accession": "ENCFF200BW",
"file_format": "bigWig",
"output_type": "fold change over control",
"href": "https://www.encodeproject.org/files/ENCFF200BW/@@download/ENCFF200BW.bigWig",
"assembly": "GRCh38",
"file_size_mb": 45.2
}
Interpretation: Use the bigWig download URL directly in a UCSC custom track or track hub for visualization.
Using UCSC REST API (via skill guidance):
GET https://api.genome.ucsc.edu/getData/track?genome=hg38&track=encodeCcreCombined&chrom=chr1&start=1000000&end=1100000
Expected response (key fields):
{
"encodeCcreCombined": [
{"chrom": "chr1", "chromStart": 1020500, "chromEnd": 1021200, "name": "EH38E1234567", "ucscLabel": "pELS"},
{"chrom": "chr1", "chromStart": 1050800, "chromEnd": 1051500, "name": "EH38E1234568", "ucscLabel": "dELS"}
]
}
Interpretation: pELS = proximal enhancer-like signature, dELS = distal enhancer-like signature. These cCRE classifications are derived from ENCODE data and provide standardized regulatory element annotations.
GET https://api.genome.ucsc.edu/getData/sequence?genome=hg38&chrom=chr1&start=1020500&end=1021200
Use the retrieved sequence for downstream motif scanning with → jaspar-motifs.
If you have hg19 coordinates that need conversion:
# Use liftover-coordinates skill for assembly conversion
# Then query UCSC API with GRCh38 coordinates
encode_search_experiments(
assay_title="ATAC-seq",
organ="brain"
)
Expected output:
{
"total": 24,
"experiments": [
{
"accession": "ENCSR789XYZ",
"assay_title": "ATAC-seq",
"biosample_summary": "brain tissue female adult (53 years)"
}
]
}
encode_list_files(
accession="ENCSR789XYZ",
file_format="bigWig",
assembly="GRCh38"
)
Expected output:
{
"total": 4,
"files": [
{
"accession": "ENCFF456DEF",
"file_format": "bigWig",
"output_type": "fold change over control",
"assembly": "GRCh38",
"file_size_mb": 125.3,
"href": "/files/ENCFF456DEF/@@download/ENCFF456DEF.bigWig"
}
]
}
| This skill produces... | Feed into... | Using tool/skill |
|---|---|---|
| cCRE annotations | Regulatory element classification | regulatory-elements skill |
| DNA sequences from peak regions | Motif analysis | motif-analysis → HOMER/MEME |
| Conservation scores | Variant prioritization | variant-annotation skill |
| Track hub configuration | Visualization | visualization-workflow skill |
| Repeat masker annotations | Peak filtering | peak-annotation skill |
| Skill | When to Use Instead/Additionally |
|---|---|
regulatory-elements | Comprehensive cCRE classification and chromatin state analysis |
variant-annotation | Annotating GWAS/eQTL variants with ENCODE functional data |
search-encode | Finding specific ENCODE experiments by assay, tissue, target |
integrative-analysis | Multi-mark integration for regulatory element characterization |
epigenome-profiling | Full histone mark profiling workflow |
data-provenance | Logging derived files from UCSC+ENCODE combined analyses |
geo-connector | Cross-referencing ENCODE experiments with GEO accessions |
gnomad-variants | Population frequency and constraint data for variants in UCSC regions |
ensembl-annotation | VEP annotation and Regulatory Build overlap for UCSC-retrieved regions |
publication-trust | Verify literature claims backing analytical decisions |
npx claudepluginhub ammawla/encode-toolkit --plugin encode-toolkitQueries UCSC Genome Browser REST API for regulatory tracks, DNA sequences, cCRE annotations, TF binding clusters, and track schemas in genomic regions. Use for regulatory element lookups, ENCODE cCRE queries, or TF binding data retrieval.
Queries UCSC Genome Browser REST API for DNA sequences by region, annotation tracks, gene models, chromosome sizes, and conservation scores across 100+ genome assemblies.
Analyzes chromatin state, histone modifications, ATAC-seq accessibility, and TF binding from ENCODE, Roadmap Epigenomics, and ChIP-Atlas. Use for regulatory landscape mapping and cCRE annotations.