From tooluniverse
Discovers microbiome studies, classifies taxa with GTDB, assesses genome quality via CheckM, links species to clinical phenotypes, and interprets pathways using MGnify, ENA, GMrepo, KEGG, and EuropePMC.
How this skill is triggered — by the user, by Claude, or both
Slash command
/tooluniverse:tooluniverse-metagenomics-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Integrated pipeline for exploring microbiome studies, classifying taxa, assessing genome quality, linking microbial composition to clinical phenotypes, and interpreting findings through pathway analysis and literature context.
Integrated pipeline for exploring microbiome studies, classifying taxa, assessing genome quality, linking microbial composition to clinical phenotypes, and interpreting findings through pathway analysis and literature context.
Guiding principles:
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
| Database | Best For |
|---|---|
| MGnify | Processed metagenomics studies, taxonomic/functional results |
| GTDB | Standardized bacterial/archaeal taxonomy, species-level resolution |
| GMrepo | Gut species-to-human-health phenotype associations |
| ENA | Raw sequencing datasets and study metadata |
| KEGG | Pathway mapping for microbial functional annotations |
| PubMed/EuropePMC | Published microbiome-disease studies |
| CTD | Chemical-microbiome-disease relationships |
Phase 0: Parse query → organism, biome, phenotype, or accession
Phase 1: Study Discovery → MGnify_search_studies, ENAPortal_search_studies
Phase 2: Taxonomic Classification → GTDB_search_genomes, GTDB_get_species, GTDB_search_taxon
Phase 3: Genome Quality → MGnify_search_genomes, MGnify_get_genome (CheckM metrics)
Phase 4: Functional Annotation → MGnify GO terms + KEGG pathway mapping
Phase 5: Clinical Associations → GMrepo species-phenotype links
Phase 6: Literature → PubMed/EuropePMC + CTD gene-disease
Phase 7: Interpretation & Report Synthesis
Phase 1: ENA requires structured queries (e.g., study_title="*IBD*"), not free text. If ENA fails, fall back to MGnify.
Phase 2: GTDB uses its own naming (e.g., s__Bacteroides_A fragilis vs NCBI Bacteroides fragilis). Always note discrepancies. Use GTDB_search_taxon(operation="search_taxon", query=name).
Phase 3 - Quality tiers (MIMAG):
Phase 4 - Functional interpretation: Don't just list GO terms. Connect to biology:
| Functional Category | Key KEGG Pathways | Significance |
|---|---|---|
| SCFA production | map00650, map00640 | Gut barrier, anti-inflammatory |
| LPS biosynthesis | map00540 | Pro-inflammatory, endotoxemia |
| Bile acid metabolism | map00120 | Fat absorption, FXR signaling |
| Tryptophan metabolism | map00380 | Serotonin, AhR, immune |
| Vitamin biosynthesis | map00730/740/760 | Host nutritional contribution |
Use kegg_search_pathway(keyword=...) (NOT query). Pathway IDs need organism prefix (hsa, ko, eco), NOT bare map.
Phase 5: GMrepo uses MeSH terms: "Crohn Disease" not "IBD", "Colitis, Ulcerative" not "UC", "Colorectal Neoplasms" not "colorectal cancer". Try NCBI taxon IDs if species name fails.
Phase 6 - Evidence grading:
Phase 7 - Report: Executive summary, study landscape, GTDB taxonomy, functional interpretation (not GO term lists), clinical relevance with evidence grades, mechanistic model, genome catalog with quality tiers, data gaps.
npx claudepluginhub mims-harvard/tooluniverse --plugin tooluniverseSearches and analyzes microbiome studies, genomes, and literature via MGnify, GTDB, ENA, OLS (ENVO biomes), and EuropePMC. Includes drug-microbiome tools via PubChem, CTD, KEGG, Reactome, and DrugBank.
Processes paired-end FASTQ files to produce species-level taxonomy (Kraken2/Bracken), antimicrobial resistance profiles (RGI/CARD) with WHO priority classification, and functional pathway abundances (HUMAnN3). Outputs TSV tables and publication-quality figures.
Queries the European Nucleotide Archive for sequences, reads, assemblies, and annotations via REST APIs. Searches studies/samples, retrieves FASTA/EMBL, lists FASTQ/BAM file URLs, and resolves taxonomy or cross-references.