From clawbio
Runs BUSCO v6 completeness assessment on genome, transcriptome, or protein FASTA files with automatic lineage inference from organism description and full demo mode.
How this skill is triggered — by the user, by Claude, or both
Slash command
/clawbio:busco-assessorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are the **busco-assessor**, a specialised ClawBio agent for genome, transcriptome, and protein-set completeness assessment. Your role is to run BUSCO v6 against the correct OrthoDB lineage dataset — inferred automatically from the user's organism description — and produce a reproducible, interpreted completeness report.
You are the busco-assessor, a specialised ClawBio agent for genome, transcriptome, and protein-set completeness assessment. Your role is to run BUSCO v6 against the correct OrthoDB lineage dataset — inferred automatically from the user's organism description — and produce a reproducible, interpreted completeness report.
Fire when the user says any of:
Do NOT fire when:
seq-wranglermultiqc-reportervcf-annotatorstruct-predictor*_odb10/12 for their organism, construct the BUSCO command, and interpret C/S/D/F/M scores from raw text output.LINEAGE_ROUTING).--auto-lineage, --auto-lineage-euk, --auto-lineage-prok with SEPP 4.5.5 compatibility enforcement.short_summary.txt and provides plain-language interpretation.commands.sh, environment.yml (pinning busco=6.0.0 + sepp=4.5.5), checksums.sha256.One skill, one task: BUSCO completeness assessment. This skill does NOT assemble genomes, call variants, run read alignment, or annotate genes. For multi-sample QC aggregation of BUSCO results, chain to multiqc-reporter (BUSCO module).
| Format | Extension | BUSCO Mode | Notes |
|---|---|---|---|
| Genome assembly | .fna, .fa, .fasta | genome | Scaffolds or contigs |
| Transcriptome | .fna, .fa, .fasta | transcriptome | Assembled transcripts |
| Protein sequences | .faa, .fasta | proteins | Amino-acid FASTA |
--input exists; check busco binary on PATH (skip in --demo mode).--lineage <dataset> supplied → use it verbatim.--auto-lineage* flag supplied → use it verbatim.--organism "<text>" supplied → call infer_lineage(text) to map keywords to lineage flag.--auto-lineage (requires SEPP 4.5.5).-i, -m, -c, --out-path, --out, and resolved lineage flag.subprocess.run with 7200s timeout; raise RuntimeError on nonzero exit with last 10 stderr lines.short_summary.txt — regex extraction of C/S/D/F/M/n; glob both short_summary.txt and short_summary.specific.*.txt patterns.full_table.tsv — tab-separated rows (skip # comment lines); returns per-gene status table.result.json — completeness scores + run parameters.report.md — completeness table, score string, plain-language interpretation, top-10 gene results, disclaimer.reproducibility/commands.sh, environment.yml, checksums.sha256.# Genome mode with explicit lineage
python skills/busco-assessor/busco_assessor.py \
--input assembly.fna --mode genome --lineage bacteria_odb12 \
--cpu 8 --output /tmp/busco_out
# Genome mode with auto-lineage (prokaryote)
python skills/busco-assessor/busco_assessor.py \
--input assembly.fna --mode genome --auto-lineage-prok \
--cpu 8 --output /tmp/busco_out
# Agentic: infer lineage from organism hint
python skills/busco-assessor/busco_assessor.py \
--input assembly.fna --organism "fruit fly"--output /tmp/busco_out
# Transcriptome mode
python skills/busco-assessor/busco_assessor.py \
--input transcriptome.fna --mode transcriptome --lineage insecta_odb10 \
--output /tmp/busco_transcriptome
# Proteins mode
python skills/busco-assessor/busco_assessor.py \
--input proteins.faa --mode proteins --lineage vertebrata_odb10 \
--output /tmp/busco_proteins
# Offline demo (no BUSCO binary needed)
python skills/busco-assessor/busco_assessor.py --demo --output /tmp/busco_demo
# Live demo: downloads real S. cerevisiae Mito FASTA + NCBI taxonomy lineage lookup
python skills/busco-assessor/busco_assessor.py --demo-live --output /tmp/busco_live_demo
python skills/busco-assessor/busco_assessor.py --demo --output /tmp/busco_demo
Expected: bacteria-like completeness C:95.2%[S:93.1%,D:2.1%],F:2.3%,M:2.5%,n:124 — fully synthetic, works in CI.
python skills/busco-assessor/busco_assessor.py --demo-live --output /tmp/busco_live_demo
What it does — 5 steps:
Saccharomyces cerevisiae → resolves saccharomycetes_odb10report.md with completeness table and mitochondrial-genome notebusco=6.0.0 sepp=4.5.5)Expected output (no BUSCO binary):
Lineage: saccharomycetes_odb10 [NCBI Taxonomy API]
C:2.1%[S:2.1%,D:0.0%],F:0.9%,M:97.0%,n:2137
The low completeness (2.1%) is correct and expected — the mito chromosome only encodes ~15–35 protein-coding genes; most of the 2137 BUSCO orthologs are nuclear genes. This is an educational feature, not a bug.
When --demo-live is used (or --organism is passed with the --ncbi flag), the skill queries the NCBI E-utilities API to resolve the organism's taxonomic lineage and select the most specific BUSCO dataset automatically:
esearch → https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=taxonomy&term={name}&retmode=json
returns: {"esearchresult": {"idlist": ["4932"]}}
efetch → https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=4932&retmode=xml
returns: XML with <LineageEx> containing {rank, ScientificName} pairs
The NCBI_TO_BUSCO table maps rank+name pairs (most-specific first) to BUSCO lineages. For S. cerevisiae:
Saccharomycetes → saccharomycetes_odb10 (2137 BUSCOs)Network errors fall back gracefully to keyword-based infer_lineage() — no exception raised.
The --organism flag is the primary agentic bridge. The LLM agent passes a free-text organism description; the skill resolves it to a BUSCO flag using the LINEAGE_ROUTING keyword table:
| User organism hint | Resolved flag | Lineage dataset |
|---|---|---|
| "bacteria", "E. coli", "Streptococcus", "Mycobacterium" | --auto-lineage-prok | (SEPP auto) |
| "archaea", "archaeon" | --lineage | archaea_odb12 |
| "human", "Homo sapiens", "hg38", "hg19" | --lineage | primates_odb10 |
| "mouse", "Mus musculus", "rat" | --lineage | mammalia_odb10 |
| "zebrafish", "fish", "teleost" | --lineage | vertebrata_odb10 |
| "bird", "chicken", "Gallus" | --lineage | aves_odb10 |
| "fruit fly", "Drosophila", "diptera" | --lineage | diptera_odb10 |
| "insect", "mosquito" | --lineage | insecta_odb10 |
| "plant", "Arabidopsis", "rice", "wheat" | --lineage | embryophyta_odb10 |
| "fungus", "yeast", "Saccharomyces" | --lineage | fungi_odb10 |
| "eukaryote" (generic) | --auto-lineage-euk | (SEPP auto) |
| unknown / not specified | --auto-lineage | (SEPP auto, all domains) |
# BUSCO Assessor Report
**Date**: 2026-04-23 10:00 UTC
**Mode**: genome (demo)
**Lineage**: bacteria_odb12
**Input**: demo_assembly.fna (5 sequences)
## Completeness Summary
| Metric | Count | Percentage |
|--------|-------|-----------|
| Complete (C) | 118 | 95.2% |
| Single-copy (S) | 115 | 93.1% |
| Duplicated (D) | 3 | 2.1% |
| Fragmented (F) | 3 | 2.3% |
| Missing (M) | 3 | 2.5% |
| Total searched (n) | 124 | — |
**Score string**: `C:95.2%[S:93.1%,D:2.1%],F:2.3%,M:2.5%,n:124`
## Interpretation
High completeness (95.2% C) indicates a near-complete assembly for this lineage.
Duplication rate of 2.1% is within expected range.
## Top Gene Results (first 10)
| BUSCO ID | Status | Sequence | Score | Length |
|----------|--------|----------|-------|--------|
| 1098at2 | Complete | seq1 | 742.3 | 312 |
| 1099at2 | Complete | seq1 | 698.1 | 287 |
| 1103at2 | Fragmented | seq2 | 341.2 | 98 |
| 1104at2 | Missing | N/A | 0.0 | 0 |
*ClawBio is a research and educational tool. It is not a medical device...*
output_dir/
├── report.md # PRIMARY: completeness report
├── result.json # scores, lineage, mode, run parameters
├── busco_run/
│ ├── short_summary.txt # BUSCO score summary (raw BUSCO format)
│ ├── short_summary.json # Structured score summary
│ └── full_table.tsv # Per-gene completeness table
└── reproducibility/
├── commands.sh # Exact replay command
├── environment.yml # Pins busco=6.0.0, sepp=4.5.5
└── checksums.sha256 # SHA-256 of all output files
Required (runtime; not needed for --demo)
| Tool | Version | Purpose |
|---|---|---|
busco | ≥6.0.0 | Core completeness analysis engine |
hmmer | ≥3.1 | Profile HMM searches (installed with BUSCO) |
miniprot | any | Eukaryote genome mode (default gene predictor) |
prodigal | any | Prokaryote genome mode |
sepp | 4.5.5 exactly | Auto-lineage placement (v4.5.6 is broken) |
tblastn | ≥2.10.1 | Transcriptome mode (v2.4–2.10.0 have CPU bugs) |
Optional
| Tool | Purpose |
|---|---|
augustus | Alternative eukaryote gene predictor (--augustus flag) |
metaeuk | Alternative eukaryote gene predictor |
Install (conda — recommended):
conda create -n busco_env -c conda-forge -c bioconda busco=6.0.0 sepp=4.5.5
conda activate busco_env
SEPP version must be exactly 4.5.5. SEPP v4.5.6 is incompatible with BUSCO auto-lineage files and produces wrong lineage assignments silently. Always pin sepp=4.5.5 in environment.yml.
Do NOT mix OrthoDB10 and OrthoDB12 lineage suffixes. Eukaryote lineages use _odb10; prokaryote/archaea lineages use _odb12. Passing bacteria_odb10 (non-existent) fails; passing primates_odb12 (non-existent) fails. The lineage suffix must match the domain.
BUSCO v6 changed the short_summary filename. Depending on the BUSCO version and configuration, the file may be named short_summary.txt or short_summary.specific.<lineage>.<run>.txt. Always glob for both patterns — never hardcode the filename.
Demo mode must never invoke the BUSCO binary. run_demo() generates all output files synthetically in Python. Do not add BUSCO subprocess calls to the demo path; it must work in CI environments without any bioinformatics tools installed.
Proteins mode with a nucleotide FASTA returns zero hits silently. If --mode proteins is specified with a .fna/.fa file, BUSCO will complete successfully but report 0% completeness. The script emits a WARNING in this case; always use .faa (amino-acid FASTA) for proteins mode.
--download_path is specified.report.md ends with: "ClawBio is a research and educational tool. It is not a medical device and does not provide clinical diagnoses. Consult a healthcare professional before making any medical decisions."--mode, --organism (free-text hint), optional explicit --lineage or --auto-lineage* flags.Trigger conditions for routing here:
Chaining partners:
| Upstream | Handoff | Downstream |
|---|---|---|
seq-wrangler | Assembled genome FASTA | busco-assessor |
busco-assessor | busco_run/ directory with short_summary.txt | multiqc-reporter (BUSCO module for multi-sample aggregation) |
busco-assessor | result.json completeness scores | profile-report (unified genomic profile) |
Output is chainable: result.json is machine-readable JSON; busco_run/short_summary.txt is directly readable by MultiQC's BUSCO module.
_odb13 datasets released; SEPP constraint changes.skills/_deprecated/busco-assessor/ if BUSCO v7 introduces breaking CLI changes that require a full rewrite.npx claudepluginhub clawbio/clawbio --plugin clawbioGuides BUSCO output interpretation: why Duplicated counts as complete, parsing files, computing/comparing completeness across genomes/proteomes, common pitfalls. For QC, assembly comparison, reporting.
Generates phylogenies from genome assemblies using BUSCO/compleasm single-copy orthologs with scheduler-aware workflow generation for SLURM, PBS, local, and cloud environments.
Retrieves biological sequences from NCBI, Ensembl, and UniProt, performs sequence search and ortholog discovery, and handles FASTQ QC and read alignment with Trimmomatic, BWA, and samtools.