Skill

tooluniverse-metagenomics-analysis

Discovers microbiome studies, classifies taxa with GTDB, assesses genome quality via CheckM, links species to clinical phenotypes, and interprets pathways using MGnify, ENA, GMrepo, KEGG, and EuropePMC.

Python

data-engineering

ai-ml

Popularity

Parent stars

1,368

Parent forks

209

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/tooluniverse:tooluniverse-metagenomics-analysis

User invocable

Model invocation disabled

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Integrated pipeline for exploring microbiome studies, classifying taxa, assessing genome quality, linking microbial composition to clinical phenotypes, and interpreting findings through pathway analysis and literature context.

SKILL.md

103 lines · ~1.3k tokens

Stats

LanguagePython

Parent stars1,368

Parent forks209

MaintenanceGood

Last CommitMay 21, 2026

Actions

View Source View Plugin View on GitHub View README

Metagenomics & Microbiome Analysis

Guiding principles:

Study context first -- understand biome, sequencing method, and metadata before diving into taxa
Taxonomic consistency -- GTDB taxonomy as reference standard; reconcile NCBI where needed
Genome quality matters -- CheckM completeness/contamination thresholds determine trustworthy MAGs
Interpretation over enumeration -- explain what taxa mean for the biological question
English-first queries -- use English terms in tool calls

LOOK UP, DON'T GUESS

When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory.

COMPUTE, DON'T DESCRIBE

When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

Core Databases

Database	Best For
MGnify	Processed metagenomics studies, taxonomic/functional results
GTDB	Standardized bacterial/archaeal taxonomy, species-level resolution
GMrepo	Gut species-to-human-health phenotype associations
ENA	Raw sequencing datasets and study metadata
KEGG	Pathway mapping for microbial functional annotations
PubMed/EuropePMC	Published microbiome-disease studies
CTD	Chemical-microbiome-disease relationships

Workflow

Phase 0: Parse query → organism, biome, phenotype, or accession
Phase 1: Study Discovery → MGnify_search_studies, ENAPortal_search_studies
Phase 2: Taxonomic Classification → GTDB_search_genomes, GTDB_get_species, GTDB_search_taxon
Phase 3: Genome Quality → MGnify_search_genomes, MGnify_get_genome (CheckM metrics)
Phase 4: Functional Annotation → MGnify GO terms + KEGG pathway mapping
Phase 5: Clinical Associations → GMrepo species-phenotype links
Phase 6: Literature → PubMed/EuropePMC + CTD gene-disease
Phase 7: Interpretation & Report Synthesis

Key Phase Notes

Phase 1: ENA requires structured queries (e.g., study_title="*IBD*"), not free text. If ENA fails, fall back to MGnify.

Phase 2: GTDB uses its own naming (e.g., s__Bacteroides_A fragilis vs NCBI Bacteroides fragilis). Always note discrepancies. Use GTDB_search_taxon(operation="search_taxon", query=name).

Phase 3 - Quality tiers (MIMAG):

High: >= 90% complete, <= 5% contamination, rRNA + >= 18 tRNAs
Medium: >= 50% complete, <= 10% contamination
Low: below medium -- flag but don't exclude

Phase 4 - Functional interpretation: Don't just list GO terms. Connect to biology:

Functional Category	Key KEGG Pathways	Significance
SCFA production	map00650, map00640	Gut barrier, anti-inflammatory
LPS biosynthesis	map00540	Pro-inflammatory, endotoxemia
Bile acid metabolism	map00120	Fat absorption, FXR signaling
Tryptophan metabolism	map00380	Serotonin, AhR, immune
Vitamin biosynthesis	map00730/740/760	Host nutritional contribution

Use kegg_search_pathway(keyword=...) (NOT query). Pathway IDs need organism prefix (hsa, ko, eco), NOT bare map.

Phase 5: GMrepo uses MeSH terms: "Crohn Disease" not "IBD", "Colitis, Ulcerative" not "UC", "Colorectal Neoplasms" not "colorectal cancer". Try NCBI taxon IDs if species name fails.

Phase 6 - Evidence grading:

Strong: Meta-analysis or >5 studies, consistent direction
Moderate: 2-5 studies consistent, or 1 large cohort
Preliminary: Single study or conflicting
Mechanistic only: In vitro/animal, no human epidemiology

Phase 7 - Report: Executive summary, study landscape, GTDB taxonomy, functional interpretation (not GO term lists), clinical relevance with evidence grades, mechanistic model, genome catalog with quality tiers, data gaps.

Edge Cases & Fallbacks

Taxon not in GTDB: Try partial search or fall back to MGnify (NCBI taxonomy)
No GMrepo data: Normal for non-gut organisms; use literature
GMrepo 0 results: Use formal MeSH terms or NCBI taxon IDs
No KEGG match: Check MetaCyc or literature

Limitations

GMrepo: Gut-only
GTDB: Bacteria/Archaea only
ENA: Raw data only, strict query syntax
No sequence analysis: Queries databases, not raw FASTQ/FASTA

tooluniverse-metagenomics-analysis

Popularity

Invocation

Context Preview

SKILL.md

tooluniverse-metagenomics-analysis

Popularity

Invocation

Context Preview

SKILL.md

Metagenomics & Microbiome Analysis

LOOK UP, DON'T GUESS

COMPUTE, DON'T DESCRIBE

Core Databases

Workflow

Key Phase Notes

Edge Cases & Fallbacks

Limitations

Similar Skills

Metagenomics & Microbiome Analysis

LOOK UP, DON'T GUESS

COMPUTE, DON'T DESCRIBE

Core Databases

Workflow

Key Phase Notes

Edge Cases & Fallbacks

Limitations

Similar Skills