Search everything...

Stats

Actions

Available In

ade-bench

Name: ade-bench
Author: typedef-ai

By typedef-ai

Generate ADE-Bench benchmark tasks from your own dbt project. Scans your models, proposes realistic bug-injection scenarios, and writes the task scaffolding (config, patches, scripts, custom assertion tests) ready to run against AI agents.

npx claudepluginhub typedef-ai/ade-bench-plugin --plugin ade-bench

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Slash Commands3

Create Task

/create-task

Automatically scan a dbt project and generate ADE-Bench benchmark tasks via pattern matching

Plan Tasks

/plan-tasks

Interactively plan and generate ADE-Bench benchmark tasks from a dbt project (recommended)

Setup

/setup

Install or verify the ADE-Bench harness at ~/.ade-bench (clones repo, installs CLI, downloads bundled DuckDB databases)

Agents1

discover-project

/discover-project

Scans a dbt project directory and returns a structured analysis of its models, sources, dependencies, database type, and complexity. Read-only — does not modify any files.

Skills3

create-task

/create-task

Scan a dbt project and generate ADE-Bench benchmark tasks. Analyzes models, proposes bug-injection scenarios by difficulty, and outputs complete task scaffolding (task.yaml, patches, setup/solution scripts, test SQL). Use when you want to benchmark AI agents against your own dbt project and data.

plan-tasks

/plan-tasks

Interactively plan and generate ADE-Bench benchmark tasks from a dbt project. Explores the codebase with the user in a pair-planning loop — understanding the domain, reasoning about what makes good benchmarks, and building up a task plan incrementally. Use when you want high-quality, tailored benchmark tasks for your own dbt project.

setup

/setup

Install or verify the ade-bench harness, the Python project that actually runs the benchmark tasks this plugin generates. Use when ade-bench isn't yet on the user's machine, when a generated task fails because the harness is missing, or when the user explicitly asks to set up ade-bench. Also invoked automatically by plan-tasks and create-task when they detect ade-bench is missing.

Stats

Version0.1.0

LanguageShell

Stars0

MaintenanceExcellent

LicenseMIT

Last CommitMay 8, 2026

AddedMay 8, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

ade-bench-marketplace

Safety Signals

Caution

Uses power tools

Uses Bash, Write, or Edit tools

README

ade-bench-plugin

A Claude Code plugin that generates ADE-Bench benchmark tasks from your own dbt project.

What it does

Instead of relying on ADE-Bench's built-in sample projects, this plugin lets you benchmark AI agents against your own dbt models and data. It provides two skills:

plan-tasks (recommended) — Interactive pair-planning. Explores your project with you, reasons about what makes good benchmarks, and builds a task plan collaboratively before generating anything.
create-task — Automated pipeline. Scans your project, matches models against a pattern catalog, and generates tasks. Faster, but less tailored.

Prerequisites

Claude Code installed
git, uv, and Docker available on your machine (for running the generated tasks)
A dbt project with either:
- A DuckDB database file, or
- A Snowflake account with credentials configured

The plugin will install ADE-Bench for you on first use — plan-tasks and create-task detect when it's missing and offer to set it up at ~/.ade-bench. You can also install it explicitly:

/ade-bench:setup

Install

From a local clone (recommended while iterating)

git clone https://github.com/typedef-ai/ade-bench-plugin
claude plugin marketplace add ./ade-bench-plugin
claude plugin install ade-bench@ade-bench-marketplace

marketplace add accepts a path to any directory containing .claude-plugin/marketplace.json — the one in this repo registers the plugin as ade-bench@ade-bench-marketplace.

Verify:

claude plugin list

From GitHub directly

claude plugin marketplace add typedef-ai/ade-bench-plugin
claude plugin install ade-bench@ade-bench-marketplace

Session-only (no install)

claude --plugin-dir /path/to/ade-bench-plugin

Loads the plugin for that one session without registering it globally — useful for one-off testing.

Usage

Interactive planning (recommended)

/ade-bench:plan-tasks /path/to/my-dbt-project

With custom instructions:

/ade-bench:plan-tasks /path/to/my-dbt-project focus on the revenue pipeline, I want hard tasks that test Snowflake skills

The skill will:

Do a quick project scan, then ask you intake questions (how many tasks, difficulty, focus area)
Enter a planning loop — explore models, update a task plan file, ask you questions
Present the converged plan for approval
Generate all task files after you approve

Automated generation

/ade-bench:create-task /path/to/my-dbt-project

With options:

/ade-bench:create-task /path/to/my-dbt-project --output-dir ./my-benchmarks --db-name my_warehouse --db-path /path/to/duckdb/files

How it works

ADE-Bench tasks follow a setup/solve/verify pattern:

A setup patch introduces a realistic bug into a working dbt model (e.g., swapping LEFT JOIN for INNER JOIN)
An agent prompt describes the symptom without revealing the cause (e.g., "some customer records are missing from the mart")
The agent investigates and attempts a fix
dbt tests and table comparison against seed CSVs determine pass/fail

Both skills automate steps 1-2. The difference is how they choose bugs:

	`plan-tasks`	`create-task`
Approach	Reads models, understands business logic, reasons about what would be meaningful	Scans for SQL tokens, matches against pattern catalog
User input	Conversational — intake questions, iterative refinement	Pick from a ranked list
Task quality	Higher — bugs are semantically motivated	Template-level — correct but more generic
Speed	Slower (5-15 min with interaction)	Faster (2-5 min)

Bug categories

Difficulty	Examples
Easy	Missing column, wrong alias, removed WHERE filter
Medium	Wrong join type, incorrect aggregation, broken CASE expression
Hard	Semantic aggregation error, broken incremental logic, upstream bug manifesting downstream

Snowflake-specific bugs are also supported: IFF() logic flips, QUALIFY filter errors, FLATTEN scope issues, TRY_TO_* removal, BOOLOR/BOOLAND_AGG swaps, and more.

Running generated tasks

After generating tasks, use ADE-Bench to run them:

# Generate seeds (DuckDB only, first run)
ade run my_project001 --db duckdb --project-type dbt --agent sage --seed \
  --tasks-dir ./my-benchmarks/tasks

# Validate with sage agent
ade run my_project001 --db duckdb --project-type dbt --agent sage \
  --tasks-dir ./my-benchmarks/tasks

# Test with a real agent
ade run my_project001 --db duckdb --project-type dbt --agent claude \
  --tasks-dir ./my-benchmarks/tasks

For Snowflake projects:

View full README on GitHub

ade-bench

Popularity

What's Inside

Confidence

README

ade-bench-plugin

What it does

Prerequisites

Install

From a local clone (recommended while iterating)

From GitHub directly

Session-only (no install)

Usage

Interactive planning (recommended)

Automated generation

How it works

Bug categories

Running generated tasks

Similar Plugins

creative-writing

fullstack-dev-skills

drawio-diagramming

unity-dev-toolkit

dotnet-skills

everything-claude-code

ade-bench-plugin

What it does

Prerequisites

Install

From a local clone (recommended while iterating)

From GitHub directly

Session-only (no install)

Usage

Interactive planning (recommended)

Automated generation

How it works

Bug categories

Running generated tasks

Popularity

Health & Quality

Similar Plugins

creative-writing

fullstack-dev-skills

drawio-diagramming

unity-dev-toolkit

dotnet-skills

everything-claude-code