Biobank Agent
Local-first autonomous research agent for biobank, genomics, and biomedical discovery workflows

Natural-language planning, data-aware tool use, WGS analysis, cited research, runtime audit, replay, and review-gated self-evolution in one CLI.
What It Does
Biobank Agent is an LLM-powered scientific workflow agent for population-scale biobank research. It runs from a local interactive shell, discovers registered analysis skills, drafts and repairs multi-step plans, executes local tools under permission controls, records session trajectories, and produces auditable reports.
The current repository is a v3 runtime-oriented build with:
- 105 registered skills discovered from
biobank_agent.skills, including cohort analysis, modelling, WGS/VCF workflows, literature research, report writing, external review, and self-evolution support.
- 71 slash commands in the v3 command registry, including
/plan, /plan-diagnose, /plan-retry, /plan-use, /research, /doctor, /tools, /resume, /audit, /harness, /replay, /learn, and /evolve.
- Runtime-backed sessions with event logs, action graph references, plan state, trajectory replay, audit reports, and resume support.
- VirtualCell/WGS support for local VCF discovery, WGS dependency checks, exploratory VCF QC, PCA, kinship, association, burden testing, annotation, pathway enrichment, and WGS report polishing.
- OpenAI-compatible providers configured through
.env, with multi-model planning and review routes controlled by settings.
The project is local-first: data paths, reports, memory, sessions, tool calls, and audit artifacts remain on the workstation unless a configured skill or provider explicitly uses a network service.
Quick Start
git clone https://github.com/cpa2001/BioBank-Agent.git
cd BioBank-Agent
# Optional but recommended
conda create -n biobank-agent python=3.11 -y
conda activate biobank-agent
pip install -e ".[all,dev]"
cp .env.example .env
Edit .env:
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=<your-api-key>
LLM_MODEL=deepseek/deepseek-v4-pro
DATA_DIR=./data
RAW_DIR=./raw
REPORTS_DIR=./reports
PLANS_DIR=./plans
Start the interactive shell:
biobank
Run the first checks:
biobank > /doctor
biobank > /tools
biobank > /skills
For a complete runnable setup path, use docs/guides/QUICK_START.md. For the full Agent + WGS workflow, use docs/guides/END_TO_END_TUTORIAL.md.
API Connectivity Smoke Test
After editing .env, verify provider connectivity before running a long plan:
python - <<'PY'
from biobank_agent.config import get_settings
from biobank_agent.llm import LLMClient
settings = get_settings()
client = LLMClient(
base_url=settings.llm_base_url,
api_key=settings.llm_api_key,
model=settings.llm_model,
)
response = client.chat([{"role": "user", "content": "Reply with exactly: API_OK"}])
print(response.text)
print(response.usage)
PY
Expected result: the text contains API_OK. If the call fails, fix LLM_BASE_URL, LLM_API_KEY, or LLM_MODEL before testing agent workflows.
Core Workflows
Interactive Planning
Use /plan for multi-step workflows that need visible structure, approvals, repair, and provenance.
biobank > /plan Compare vitiligo cases and controls using the available WGS VCF files, then write an auditable report.
biobank > /plan-approve
If execution is blocked or a step fails, stay in the same shell:
biobank > what is the problem?
biobank > /plan-diagnose
biobank > /plan-use vcf_dir=data/vc_wgs_vcf
biobank > /plan-use workflow_mode=exploratory
biobank > continue
Useful plan commands:
| Command | Use |
|---|
/plan <task> | Draft a structured plan from a natural-language objective. |
/plan-approve | Approve and execute the active draft. |
/plan-edit <feedback> | Modify the draft or active plan with natural-language feedback. |
/plan-diagnose | Explain why the current plan is blocked or failed. |
/plan-use key=value | Add repair context such as vcf_dir=data/vc_wgs_vcf. |
/plan-retry [step_id] | Retry a failed or named plan step. |
/plan-resume | Resume a paused or repaired plan. |
/plan-skip <step_id> | Record a step id to skip (advisory — does not yet alter execution; use /plan-edit to change the plan). |
WGS and VirtualCell
The repository can discover local VCFs under data/vc_wgs_vcf. The WGS skills include: