Skill

vdb

This skill should be used when the user asks to "query genomic datasets", "list available datasets", "explore yeast data", "query harbison", "query callingcards", "find TF targets", "describe a dataset", "run a SQL query on genomic data", "what datasets are available", "get column metadata", or mentions labretriever, VirtualDB, or the yeast resources MCP. Use to orient the user and guide effective use of the labretriever VirtualDB MCP tools.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/labretriever:vdb

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

labretriever exposes genomic and transcriptomic datasets hosted on HuggingFace

Supporting Files

references/query_patterns.md

SKILL.md

142 lines · ~1.3k tokens

Stats

LanguagePython

Stars2

MaintenanceExcellent

Last CommitJun 16, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

labretriever

labretriever exposes genomic and transcriptomic datasets hosted on HuggingFace as DuckDB SQL views, accessible through a set of MCP tools. This skill orients the user to those tools and guides effective query workflows.

Prerequisites

The labretriever MCP server must be connected. Check /plugins and confirm "labretriever MCP - connected". If not connected:

Ensure labretriever is installed: pip install labretriever
Run pyenv rehash if using pyenv
Install the plugin: /plugin marketplace add cmatKhan/labretriever then /plugin install labretriever@labretriever
Provide the path to a VirtualDB YAML config when prompted

For the BrentLab yeast collection, download the ready-to-use config from:

https://github.com/BrentLab/tfbpshiny/blob/main/tfbpshiny/brentlab_yeast_collection.yaml

Standard Query Workflow

Always follow this order when starting a new analysis:

Discover - call list_datasets to see all registered view names
Inspect schema - call describe_dataset("{name}") and describe_dataset("{name}_meta") to see columns and types for both the data view and its sample metadata view
Understand semantics - call get_column_metadata("{name}") to learn which columns are measurement values, condition labels, identifiers, etc.
Check provenance - call get_tags("{name}") to see assay type, publication, and other annotation for the dataset
Query - use query(sql) with DuckDB SQL; pass return_data=True only when you need actual rows returned (omit it for shape/count checks)

Available MCP Tools

Tool	When to use
`list_datasets`	Always call first; returns all registered view names
`describe_dataset`	Get column names and types for `{name}` or `{name}_meta`
`get_column_metadata`	Get semantic roles and condition definitions
`get_tags`	Get assay type, publication, and provenance tags
`get_common_fields`	Find columns shared across all `_meta` views for joins
`query`	Execute DuckDB SQL against any registered view
`get_config_path`	Return the config file path (call before writing Python)

Writing Effective Queries

Each dataset has two views:

{name} - the measurement data (e.g., p-values, fold changes, scores)
{name}_meta - sample/condition metadata

Always inspect both before querying. Use describe_dataset to know the exact column names — do not guess them.

Shape check before fetching data

-- Check row/column count without loading all data
SELECT COUNT(*) FROM harbison WHERE pvalue < 0.001

Call query(sql) without return_data=True to get shape only. Use return_data=True only for the final result you need to analyze.

Cross-dataset analysis

Use get_common_fields to find shared columns across _meta views, then join:

SELECT h.regulator_symbol, h.target_symbol, c.score
FROM harbison h
JOIN callingcards c ON h.target_symbol = c.target_symbol
WHERE h.condition = 'GAL' AND h.pvalue < 0.001

Filtering by condition

Condition values are dataset-specific. Always use get_column_metadata to learn valid condition values before filtering:

SELECT DISTINCT condition FROM harbison_meta

Reproducing MCP Results in Python

When asked for Python code to reproduce a query result, first call get_config_path() to retrieve the actual config file path from the running MCP server, then embed that literal path in the snippet — do not use a placeholder or os.environ.

from labretriever.virtual_db import VirtualDB

vdb = VirtualDB("/actual/path/returned/by/get_config_path")

results = vdb.query(
    "SELECT ...",  # paste the SQL from the MCP query here
    return_data=True,
)
print(results)

vdb.query() returns a pandas DataFrame when return_data=True. If private repos are in use, pass token=os.environ.get("HF_TOKEN") as well.

Error Handling

If a tool returns an error about LABRETRIEVER_CONFIG not being set, the MCP server started without a config. Use /plugin to check configuration and re-enable the plugin with a valid config path.

If a query touches a private HuggingFace repository and returns an access error, the plugin needs an HF_TOKEN. Re-enable the plugin and provide a token with access to the relevant repository.

Additional Resources

references/query_patterns.md - Common SQL patterns and Python API usage (how to reproduce MCP results in a notebook with VirtualDB.query())
Full docs: https://cmatkhan.github.io/labretriever/
VirtualDB config format: https://cmatkhan.github.io/labretriever/virtual_db_configuration/

vdb

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

vdb

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

labretriever

Prerequisites

Standard Query Workflow

Available MCP Tools

Writing Effective Queries

Shape check before fetching data

Cross-dataset analysis

Filtering by condition

Reproducing MCP Results in Python

Error Handling

Additional Resources

Similar Skills

labretriever

Prerequisites

Standard Query Workflow

Available MCP Tools

Writing Effective Queries

Shape check before fetching data

Cross-dataset analysis

Filtering by condition

Reproducing MCP Results in Python

Error Handling

Additional Resources

Similar Skills