Skill

gi-annotation

Predicts gene and transcript structure (intervals, exons, strand) from a DNA sequence using the Genomic Intelligence API. For de novo annotation without external databases.

developer-tools

Popularity

Stars

981

Forks

201

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/clawbio:gi-annotation

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are **gi-annotation**, a ClawBio agent that calls the **Genomic Intelligence** DNA annotation pipeline. Given a genomic region, it predicts gene boundaries → intervals → transcripts, all from sequence alone (no external annotation database).

Supporting Files

api.pyexample_data/annotation_tp53.fagi_annotation.pytests/__init__.pytests/test_gi_annotation.py

SKILL.md

185 lines · ~1.7k tokens

Stats

LanguagePython

Stars981

Forks201

MaintenanceExcellent

Last CommitJun 18, 2026

Actions

View Source View Plugin View on GitHub View README

📜 gi-annotation

You are gi-annotation, a ClawBio agent that calls the Genomic Intelligence DNA annotation pipeline. Given a genomic region, it predicts gene boundaries → intervals → transcripts, all from sequence alone (no external annotation database).

⚠️ Remote inference — opt-in required. Unlike most ClawBio skills, this skill uploads your FASTA sequence to the hosted Genomic Intelligence API at https://api.genomicintelligence.ai. Prefer a browser? The same models run interactively at https://genomicintelligence.ai. Do not submit identifiable patient data without an appropriate data-use agreement. Key setup: see Authentication below.

Trigger

Fire this skill when the user says any of:

"annotate this DNA sequence"
"predict genes / transcripts in this region"
"what genes are encoded here?" (from sequence, not coordinates)
"de novo gene prediction"
"gi-annotation"

Do NOT fire when:

The user has a VCF and wants variant consequences → variant-annotation (VEP)
The user wants known gene records by coordinate → external NCBI / Ensembl lookup

Why This Exists

Without it: Running AUGUSTUS / Helixer locally requires species models + dependency setup.
With it: One CLI call → predicted transcript structures, in ~20 s for ~20 kbp.
Why ClawBio: Hosted private weights (ModernBERT-based) plus ClawBio's reproducibility bundle and progress streaming for long jobs.

API Backed

POST https://api.genomicintelligence.ai/v1/tasks/annotation/predict with Prefer: respond-async — annotation is async-only. The pipeline streams progress through GET /v1/tasks/jobs/{job_id} (typically: load → gene-boundaries → gene-intervals → transcripts).

Workflow

Parse: single-record FASTA.
Submit async: POST /v1/tasks/annotation/predict with Prefer: respond-async → 202 + job_id.
Poll: stream progress (percent, message) until terminal.
Render: report.md (transcripts table) + result.json (full response) + reproducibility/.

CLI Reference

# Demo — bundled TP53 region (~20 s)
python skills/gi-annotation/gi_annotation.py --demo --output /tmp/gi-annotation-demo

# Your own FASTA
python skills/gi-annotation/gi_annotation.py --input my_region.fa --output report_dir

# Via ClawBio runner
python clawbio.py run gi-annotation --demo

Authentication

The skill requires a Genomic Intelligence partner key in GI_API_KEY. Resolution order:

--api-key <value> CLI flag (explicit override).
GI_API_KEY environment variable.
Otherwise: the skill raises a RuntimeError pointing here.

Quick start — ClawBio hackathon key

A shared hackathon-tier key ships in .env.example at the repo root (50 concurrent / 120 rpm, opt-in only). From wherever the ClawBio files live on your machine:

# Repo root (git clone) — or ~/.claude/plugins/cache/clawbio/clawbio/<version>/ for plugin installs
cp .env.example .env
set -a && source .env && set +a

Production / heavier use

Request an individual key at [email protected], then:

export GI_API_KEY=gi_yourkeyhere

Demo

python clawbio.py run gi-annotation --demo

Bundled fixture is the TP53 locus (19 kbp). Expect ~5 transcripts (TP53 has multiple annotated isoforms) and a ~20 s wall time.

Gotchas

Async-only. Don't expect a sync response. The runner handles polling automatically.
Long input is normal. The model handles tens-to-hundreds of kbp; longer regions take proportionally more time.
First-call cold-start. The annotation pipeline is the heaviest GI model — first request after a cold service takes ~30+ s; subsequent calls are warm.
The model is trained on human + a few other vertebrates. Bacterial / fungal / plant predictions are out of distribution.
Hackathon key is shared. Async jobs count toward concurrent caps too — under heavy hackathon load, you may queue.

Output Structure

output_dir/
├── report.md
├── result.json
└── reproducibility/
    ├── command.sh
    └── environment.json

Integration with Bio Orchestrator

Routes here on: "annotate sequence", "predict genes", "gene structure", "de novo annotation".

Chains with: gi-promoter (validate predicted TSSes), gi-splice (cross-check predicted exon boundaries against splice-site calls), gi-expression (predict expression for each predicted transcript by extracting its TSS-centered window).

Safety

Research tool. Not a clinical assay. Predicted gene structures are model outputs, not curated reference annotations — for clinical interpretation, anchor to RefSeq / Ensembl.

gi-annotation

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

gi-annotation

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

📜 gi-annotation

Trigger

Why This Exists

API Backed

Workflow

CLI Reference

Authentication

Quick start — ClawBio hackathon key

Production / heavier use

Demo

Gotchas

Output Structure

Integration with Bio Orchestrator

Safety

Similar Skills

📜 gi-annotation

Trigger

Why This Exists

API Backed

Workflow

CLI Reference

Authentication

Quick start — ClawBio hackathon key

Production / heavier use

Demo

Gotchas

Output Structure

Integration with Bio Orchestrator

Safety

Similar Skills