From wandb-driven-dev
Use this skill for querying and analyzing Weights & Biases projects through the W&B SDK and the local `wandb_helpers.py` query helpers. Covers run discovery, filtered run-table queries, selected summary metrics, config comparisons, exact run counts, artifacts, sweeps, reports, and bounded history scans. This is the preferred way to query W&B from the wandb-driven-dev plugin.
How this skill is triggered — by the user, by Claude, or both
Slash command
/wandb-driven-dev:wbagentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill is the W&B query surface for `wandb-driven-dev`. Use it when a task
This skill is the W&B query surface for wandb-driven-dev. Use it when a task
needs live W&B data, run metadata, summary metrics, config values, histories,
artifacts, sweeps, or Reports SDK details.
Do not write broad ad hoc W&B loops first. Start with
scripts/wandb_helpers.py, then use direct W&B SDK calls only when the helper
does not cover the query shape.
scripts/wandb_helpers.py: primary helper module for W&B querying.references/WANDB_SDK.md: W&B SDK usage notes and query patterns.references/WANDB_CONCEPTS.md: entity/project/run/config/history concepts.references/REPORTS.md: W&B Reports authoring guide — recipe, runnable
skeleton, Runset filters, RunComparer, and rendering gotchas.Use this import pattern in one-off scripts:
import sys
sys.path.insert(0, "skills/wbagent/scripts")
from wandb_helpers import (
get_api,
probe_project,
build_filters,
fetch_runs,
fetch_run_summaries,
count_runs,
runs_to_dataframe,
compare_configs,
scan_history,
scan_history_until_step,
compare_runs_at_step,
)
Create clients with get_api() rather than bare wandb.Api() unless there is
a specific reason. It sets a larger timeout for real projects.
state, display_name, tags,
config.KEY, summary_metrics.KEY, created_at, group, and job_type.fetch_runs() for run tables. It uses selected GraphQL
summaryMetrics(keys=...) and avoids materializing wide SDK run objects.count_runs() for exact counts. It uses lazy api.runs(..., per_page=1)
and does not load all runs.fetch_run_summaries() for a small set of known run IDs when setup or
review needs selected summary keys.keys=[...] to history scans. Never call
run.history() or run.scan_history() without keys on large projects.scan_history_until_step() or compare_runs_at_step() for at-budget
comparisons so scans stop once the selected step key passes the target.probe_project(api, "entity/project") before
guessing metric names.api = get_api()
path = "entity/project"
total = count_runs(api, path)
finished = count_runs(api, path, {"state": "finished"})
crashed = count_runs(api, path, {"state": "crashed"})
print({"total": total, "finished": finished, "crashed": crashed})
api = get_api()
rows = fetch_runs(
api,
"entity/project",
metric_keys=["val/loss", "train/global_step"],
filters={
"state": "finished",
"config.model.name": "baseline",
"summary_metrics.val/loss": {"$lt": 0.2},
},
config_keys=["model.name", "lr", "batch_size"],
order="+summary_metrics.val/loss",
limit=20,
)
fetch_runs() returns flat rows with id, name, display_name, state,
created_at, requested metrics, and requested config values such as
config.model.name.
filters = build_filters(
[
"config.max_steps=20000",
"summary_metrics.val/loss<0.2",
"created_at>=2026-05-01",
],
default_state="finished",
)
rows = fetch_runs(api, "entity/project", metric_keys=["val/loss"], filters=filters)
rows = fetch_run_summaries(
api,
"entity/project",
run_ids=["abc123", "def456"],
summary_keys=["val/loss", "train/global_step"],
)
comparison = compare_runs_at_step(
api,
"entity/project",
run_ids=["abc123", "def456"],
step=10_000,
step_key="train/global_step",
metrics=["train/loss", "val/loss"],
)
This groups metrics by namespace and retries sparse metrics one by one when a combined scan misses rows.
run = api.run("entity/project/abc123")
rows = scan_history(
run,
keys=["_step", "train/global_step", "train/loss"],
max_rows=10_000,
)
scan_history() automatically uses beta_scan_history() for large runs when
available.
When answering a W&B data question, report:
For large analyses, save reusable query scripts near the experiment context when useful, but keep final answers concise and number-backed.
WANDB_SDK.md for SDK method signatures, filters, history scans,
artifacts, and sweeps.WANDB_CONCEPTS.md when entity/project/run/history/summary semantics
matter.REPORTS.md to author or update a W&B Report (recipe, skeleton, Runset
filters, RunComparer, rendering gotchas).Do not read all references by default. Use only the one needed for the current question.
npx claudepluginhub tcapelle/wandb-driven-dev --plugin wandb-driven-devProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.