Biobank Agent

Local-first autonomous research agent for biobank, genomics, and biomedical discovery workflows

Natural-language planning, data-aware tool use, WGS analysis, cited research, runtime audit, replay, and review-gated self-evolution in one CLI.

What It Does

Biobank Agent is an LLM-powered scientific workflow agent for population-scale biobank research. It runs from a local interactive shell, discovers registered analysis skills, drafts and repairs multi-step plans, executes local tools under permission controls, records session trajectories, and produces auditable reports.

The current repository is a v3 runtime-oriented build with:

105 registered skills discovered from biobank_agent.skills, including cohort analysis, modelling, WGS/VCF workflows, literature research, report writing, external review, and self-evolution support.
71 slash commands in the v3 command registry, including /plan, /plan-diagnose, /plan-retry, /plan-use, /research, /doctor, /tools, /resume, /audit, /harness, /replay, /learn, and /evolve.
Runtime-backed sessions with event logs, action graph references, plan state, trajectory replay, audit reports, and resume support.
VirtualCell/WGS support for local VCF discovery, WGS dependency checks, exploratory VCF QC, PCA, kinship, association, burden testing, annotation, pathway enrichment, and WGS report polishing.
OpenAI-compatible providers configured through .env, with multi-model planning and review routes controlled by settings.

The project is local-first: data paths, reports, memory, sessions, tool calls, and audit artifacts remain on the workstation unless a configured skill or provider explicitly uses a network service.

Quick Start

git clone https://github.com/cpa2001/BioBank-Agent.git
cd BioBank-Agent

# Optional but recommended
conda create -n biobank-agent python=3.11 -y
conda activate biobank-agent

pip install -e ".[all,dev]"
cp .env.example .env

Edit .env:

LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=<your-api-key>
LLM_MODEL=deepseek/deepseek-v4-pro

DATA_DIR=./data
RAW_DIR=./raw
REPORTS_DIR=./reports
PLANS_DIR=./plans

Start the interactive shell:

biobank

Run the first checks:

biobank > /doctor
biobank > /tools
biobank > /skills

For a complete runnable setup path, use docs/guides/QUICK_START.md. For the full Agent + WGS workflow, use docs/guides/END_TO_END_TUTORIAL.md.

API Connectivity Smoke Test

After editing .env, verify provider connectivity before running a long plan:

python - <<'PY'
from biobank_agent.config import get_settings
from biobank_agent.llm import LLMClient

settings = get_settings()
client = LLMClient(
    base_url=settings.llm_base_url,
    api_key=settings.llm_api_key,
    model=settings.llm_model,
)
response = client.chat([{"role": "user", "content": "Reply with exactly: API_OK"}])
print(response.text)
print(response.usage)
PY

Expected result: the text contains API_OK. If the call fails, fix LLM_BASE_URL, LLM_API_KEY, or LLM_MODEL before testing agent workflows.

Core Workflows

Interactive Planning

Use /plan for multi-step workflows that need visible structure, approvals, repair, and provenance.

biobank > /plan Compare vitiligo cases and controls using the available WGS VCF files, then write an auditable report.
biobank > /plan-approve

If execution is blocked or a step fails, stay in the same shell:

biobank > what is the problem?
biobank > /plan-diagnose
biobank > /plan-use vcf_dir=data/vc_wgs_vcf
biobank > /plan-use workflow_mode=exploratory
biobank > continue

Useful plan commands:

Command	Use
`/plan <task>`	Draft a structured plan from a natural-language objective.
`/plan-approve`	Approve and execute the active draft.
`/plan-edit <feedback>`	Modify the draft or active plan with natural-language feedback.
`/plan-diagnose`	Explain why the current plan is blocked or failed.
`/plan-use key=value`	Add repair context such as `vcf_dir=data/vc_wgs_vcf`.
`/plan-retry [step_id]`	Retry a failed or named plan step.
`/plan-resume`	Resume a paused or repaired plan.
`/plan-skip <step_id>`	Record a step id to skip (advisory — does not yet alter execution; use `/plan-edit` to change the plan).

WGS and VirtualCell

The repository can discover local VCFs under data/vc_wgs_vcf. The WGS skills include:

biobank-agent

Popularity

What's Inside

README

Biobank Agent

What It Does

Quick Start

API Connectivity Smoke Test

Core Workflows

Interactive Planning

WGS and VirtualCell

Confidence

Similar Plugins

pr-review-toolkit

feature-dev

unity-dev-toolkit

creative-writing

dotnet-skills

everything-claude-code