From bioinfo-skills
Sets up and executes BioGeoBEARS phylogenetic biogeographic analyses: validates input files, generates RMarkdown workflows, and produces ancestral range visualizations.
How this skill is triggered — by the user, by Claude, or both
Slash command
/bioinfo-skills:biogeobearsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
BioGeoBEARS (BioGeography with Bayesian and Likelihood Evolutionary Analysis in R Scripts) performs probabilistic inference of ancestral geographic ranges on phylogenetic trees. This skill helps set up complete biogeographic analyses by:
BioGeoBEARS (BioGeography with Bayesian and Likelihood Evolutionary Analysis in R Scripts) performs probabilistic inference of ancestral geographic ranges on phylogenetic trees. This skill helps set up complete biogeographic analyses by:
Use this skill when users request:
The skill triggers when users mention phylogenetic biogeography, ancestral area reconstruction, or provide tree + distribution data.
Users must provide:
Phylogenetic tree (Newick format, .nwk, .tre, or .tree file)
Geographic distribution data (any tabular format)
When a user requests a BioGeoBEARS analysis, ask for:
Input file paths:
Analysis parameters (if not specified):
Use the AskUserQuestion tool to gather this information efficiently:
Example questions:
- "Maximum range size" - options based on number of areas (e.g., for 4 areas: "All 4 areas", "3 areas", "2 areas")
- "Models to compare" - options: "All 6 models (recommended)", "Only base models (DEC, DIVALIKE, BAYAREALIKE)", "Only +J models", "Custom selection"
- "Visualization type" - options: "Pie charts (show probabilities)", "Text labels (show most likely states)", "Both"
Use the Read tool to check the tree file:
# In R, basic validation:
library(ape)
tr <- read.tree("path/to/tree.nwk")
print(paste("Tips:", length(tr$tip.label)))
print(paste("Rooted:", is.rooted(tr)))
print(tr$tip.label) # Check species names
Verify:
Use scripts/validate_geography_file.py to validate or reformat the geography file.
If file is already in PHYLIP format (starts with numbers):
python scripts/validate_geography_file.py path/to/geography.txt --validate --tree path/to/tree.nwk
This checks:
If file is in CSV/TSV format (needs reformatting):
python scripts/validate_geography_file.py path/to/distribution.csv --reformat -o geography.data --delimiter ","
Or for tab-delimited:
python scripts/validate_geography_file.py path/to/distribution.txt --reformat -o geography.data --delimiter tab
The script will:
Always validate the reformatted file before proceeding:
python scripts/validate_geography_file.py geography.data --validate --tree path/to/tree.nwk
Create an organized directory for the analysis:
biogeobears_analysis/
├── input/
│ ├── tree.nwk # Original or copied tree
│ ├── geography.data # Validated/reformatted geography file
│ └── original_data/ # Original input files
│ ├── original_tree.nwk
│ └── original_distribution.csv
├── scripts/
│ └── run_biogeobears.Rmd # Generated RMarkdown script
├── results/ # Created by analysis (output directory)
│ ├── [MODEL]_result.Rdata # Saved model results
│ └── plots/ # Visualization outputs
│ ├── [MODEL]_pie.pdf
│ └── [MODEL]_text.pdf
└── README.md # Analysis documentation
Create this structure programmatically:
mkdir -p biogeobears_analysis/input/original_data
mkdir -p biogeobears_analysis/scripts
mkdir -p biogeobears_analysis/results/plots
# Copy files
cp path/to/tree.nwk biogeobears_analysis/input/
cp geography.data biogeobears_analysis/input/
cp original_files biogeobears_analysis/input/original_data/
Use the template at scripts/biogeobears_analysis_template.Rmd and customize it with user parameters.
Copy and customize the template:
cp scripts/biogeobears_analysis_template.Rmd biogeobears_analysis/scripts/run_biogeobears.Rmd
Create a parameter file or modify the YAML header in the Rmd to use the user's specific settings:
Example customization via R code:
# Edit YAML parameters programmatically or provide as params when rendering
rmarkdown::render(
"biogeobears_analysis/scripts/run_biogeobears.Rmd",
params = list(
tree_file = "../input/tree.nwk",
geog_file = "../input/geography.data",
max_range_size = 4,
models = "DEC,DEC+J,DIVALIKE,DIVALIKE+J,BAYAREALIKE,BAYAREALIKE+J",
output_dir = "../results"
),
output_file = "../results/biogeobears_report.html"
)
Or create a run script:
# biogeobears_analysis/run_analysis.sh
#!/bin/bash
cd "$(dirname "$0")/scripts"
R -e "rmarkdown::render('run_biogeobears.Rmd', params = list(
tree_file = '../input/tree.nwk',
geog_file = '../input/geography.data',
max_range_size = 4,
models = 'DEC,DEC+J,DIVALIKE,DIVALIKE+J,BAYAREALIKE,BAYAREALIKE+J',
output_dir = '../results'
), output_file = '../results/biogeobears_report.html')"
Generate a README.md in the analysis directory explaining:
Example:
# BioGeoBEARS Analysis
## Overview
Biogeographic analysis of [NUMBER] species across [NUMBER] geographic areas.
## Input Data
- **Tree**: `input/tree.nwk` ([NUMBER] tips)
- **Geography**: `input/geography.data` ([NUMBER] species × [NUMBER] areas)
- **Areas**: [A, B, C, ...]
## Parameters
- Maximum range size: [NUMBER]
- Models tested: [LIST]
## Running the Analysis
### Option 1: Using RMarkdown directly
```r
library(rmarkdown)
render("scripts/run_biogeobears.Rmd",
output_file = "../results/biogeobears_report.html")
bash run_analysis.sh
Results will be saved in results/:
biogeobears_report.html - Full analysis report with visualizations[MODEL]_result.Rdata - Saved R objects for each modelplots/[MODEL]_pie.pdf - Ancestral range reconstructions (pie charts)plots/[MODEL]_text.pdf - Ancestral range reconstructions (text labels)The HTML report includes:
See references/biogeobears_details.md for detailed model descriptions.
# Install BioGeoBEARS
install.packages("rexpokit")
install.packages("cladoRcpp")
library(devtools)
devtools::install_github(repo="nmatzke/BioGeoBEARS")
# Other packages
install.packages(c("ape", "rmarkdown", "knitr", "kableExtra"))
### Step 6: Provide User Instructions
After setting up the analysis, provide clear instructions to the user:
Analysis Setup Complete!
Directory structure created at: biogeobears_analysis/
📁 Files created: ✓ input/tree.nwk - Phylogenetic tree ([N] tips) ✓ input/geography.data - Geographic distribution data (validated) ✓ scripts/run_biogeobears.Rmd - RMarkdown analysis script ✓ README.md - Documentation and instructions ✓ run_analysis.sh - Convenience script to run analysis
📋 Next steps:
Review the README.md for analysis details
Install BioGeoBEARS if not already installed:
install.packages("rexpokit")
install.packages("cladoRcpp")
library(devtools)
devtools::install_github(repo="nmatzke/BioGeoBEARS")
Run the analysis:
cd biogeobears_analysis
bash run_analysis.sh
Or in R:
setwd("biogeobears_analysis")
rmarkdown::render("scripts/run_biogeobears.Rmd",
output_file = "../results/biogeobears_report.html")
View results:
⏱️ Expected runtime: [ESTIMATE based on tree size]
💡 The HTML report includes model comparison, parameter estimates, and visualization of ancestral ranges on your phylogeny.
## Analysis Parameter Guidance
When users ask for guidance on parameters, consult `references/biogeobears_details.md` and provide recommendations:
### Maximum Range Size
**Ask**: "What's the maximum number of areas a species in your group can realistically occupy?"
Common approaches:
- **Conservative**: Number of areas - 1 (prevents unrealistic cosmopolitan ancestral ranges)
- **Permissive**: All areas (if biologically plausible)
- **Data-driven**: Maximum observed in extant species
**Impact**: Larger values increase computational time exponentially
### Model Selection
**Default recommendation**: Run all 6 models for comprehensive comparison
- DEC, DIVALIKE, BAYAREALIKE (base models)
- DEC+J, DIVALIKE+J, BAYAREALIKE+J (+J variants)
**Rationale**:
- Model comparison is key to inference
- +J parameter is often significant
- Small additional computational cost
If computation is a concern, suggest starting with DEC and DEC+J.
### Visualization Options
**Pie charts** (`plotwhat = "pie"`):
- Show probability distributions across all possible states
- Better for conveying uncertainty
- Can be cluttered with many areas
**Text labels** (`plotwhat = "text"`):
- Show only maximum likelihood state
- Cleaner, easier to read
- Doesn't show uncertainty
**Recommendation**: Generate both in the analysis (template does this automatically)
## Common Issues and Troubleshooting
### Species Name Mismatches
**Symptom**: Error about species in tree not in geography file (or vice versa)
**Solution**: Use the validation script with `--tree` option to identify mismatches, then either:
1. Edit the geography file to match tree tip labels
2. Edit tree tip labels to match geography file
3. Remove species that aren't in both
### Tree Not Rooted
**Symptom**: Error about unrooted tree
**Solution**:
```r
library(ape)
tr <- read.tree("tree.nwk")
tr <- root(tr, outgroup = "outgroup_species_name")
write.tree(tr, "tree_rooted.nwk")
Ask user which species to use as outgroup.
Symptom: Validation errors about tabs, spaces, or binary codes
Solution: Use the reformat option:
python scripts/validate_geography_file.py input.csv --reformat -o geography.data
Symptom: NA values in parameter estimates or very negative log-likelihoods
Possible causes:
Solution: Check input data quality and try simpler model first (DEC only)
Causes:
Solutions:
force_sparse = TRUE in run objectThis skill includes:
validate_geography_file.py - Validates and reformats geography files
python validate_geography_file.py --helpbiogeobears_analysis_template.Rmd - RMarkdown template for complete analysis
Load this reference when:
Always validate input files before analysis - saves time debugging later
Organize analysis in a dedicated directory - keeps everything together and reproducible
Run all 6 models by default - model comparison is crucial for biogeographic inference
Document parameters and decisions - analysis README helps with reproducibility
Generate both visualization types - pie charts for uncertainty, text labels for clarity
Save intermediate results - the RMarkdown template does this automatically
Check parameter estimates - unrealistic values suggest data or model issues
Provide context with visualizations - explain what dispersal/extinction rates mean for the user's system
When presenting results to users, explain:
User: "I have a phylogeny of 30 bird species and their distributions across 5 islands. Can you help me figure out where their ancestors lived?"
Claude (using this skill):
1. Ask for tree and distribution file paths
2. Validate tree file (check 30 tips, rooted)
3. Validate/reformat geography file (5 areas)
4. Ask about max_range_size (suggest 4 areas)
5. Ask about models (suggest all 6)
6. Set up biogeobears_analysis/ directory structure
7. Copy template RMarkdown script with parameters
8. Generate README.md and run_analysis.sh
9. Provide clear instructions to run analysis
10. Explain expected outputs and how to interpret them
Result: User has complete, ready-to-run analysis with documentation
This skill was created based on:
Time estimate for skill execution:
Analysis runtime (separate from skill execution):
Installation requirements (user must have):
When to consult references/:
biogeobears_details.md when users need detailed explanations of models, parameters, or interpretationnpx claudepluginhub brunoasm/my_claude_skills --plugin bioinfo-skillsPhylogenetic tree QC and comparative genomics with PhyKIT, Biopython, and DendroPy. Analyzes treeness, saturation, parsimony-informative sites, alignment gaps, MAFFT alignment, DVMC, long-branch detection, and BUSCO orthologs.
Manipulates phylogenetic trees (Newick/NHX), detects evolutionary events (duplication/speciation), identifies orthologs/paralogs, integrates NCBI taxonomy, and generates PDF/SVG visualizations for phylogenomics research.
Builds and analyzes phylogenetic trees end-to-end using MAFFT, IQ-TREE 2, FastTree, and ETE3. Use for evolutionary analysis, microbial genomics, viral phylodynamics, or molecular-clock estimation.