By typedef-ai
Generate ADE-Bench benchmark tasks from your own dbt project. Scans your models, proposes realistic bug-injection scenarios, and writes the task scaffolding (config, patches, scripts, custom assertion tests) ready to run against AI agents.
Automatically scan a dbt project and generate ADE-Bench benchmark tasks via pattern matching
Interactively plan and generate ADE-Bench benchmark tasks from a dbt project (recommended)
Install or verify the ADE-Bench harness at ~/.ade-bench (clones repo, installs CLI, downloads bundled DuckDB databases)
Scan a dbt project and generate ADE-Bench benchmark tasks. Analyzes models, proposes bug-injection scenarios by difficulty, and outputs complete task scaffolding (task.yaml, patches, setup/solution scripts, test SQL). Use when you want to benchmark AI agents against your own dbt project and data.
Interactively plan and generate ADE-Bench benchmark tasks from a dbt project. Explores the codebase with the user in a pair-planning loop — understanding the domain, reasoning about what makes good benchmarks, and building up a task plan incrementally. Use when you want high-quality, tailored benchmark tasks for your own dbt project.
Install or verify the ade-bench harness, the Python project that actually runs the benchmark tasks this plugin generates. Use when ade-bench isn't yet on the user's machine, when a generated task fails because the harness is missing, or when the user explicitly asks to set up ade-bench. Also invoked automatically by plan-tasks and create-task when they detect ade-bench is missing.
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A Claude Code plugin that generates ADE-Bench benchmark tasks from your own dbt project.
Instead of relying on ADE-Bench's built-in sample projects, this plugin lets you benchmark AI agents against your own dbt models and data. It provides two skills:
plan-tasks (recommended) — Interactive pair-planning. Explores your project with you, reasons about what makes good benchmarks, and builds a task plan collaboratively before generating anything.create-task — Automated pipeline. Scans your project, matches models against a pattern catalog, and generates tasks. Faster, but less tailored.git, uv, and Docker available on your machine (for running the generated tasks)The plugin will install ADE-Bench for you on first use — plan-tasks and create-task detect when it's missing and offer to set it up at ~/.ade-bench. You can also install it explicitly:
/ade-bench:setup
git clone https://github.com/typedef-ai/ade-bench-plugin
claude plugin marketplace add ./ade-bench-plugin
claude plugin install ade-bench@ade-bench-marketplace
marketplace add accepts a path to any directory containing .claude-plugin/marketplace.json — the one in this repo registers the plugin as ade-bench@ade-bench-marketplace.
Verify:
claude plugin list
claude plugin marketplace add typedef-ai/ade-bench-plugin
claude plugin install ade-bench@ade-bench-marketplace
claude --plugin-dir /path/to/ade-bench-plugin
Loads the plugin for that one session without registering it globally — useful for one-off testing.
/ade-bench:plan-tasks /path/to/my-dbt-project
With custom instructions:
/ade-bench:plan-tasks /path/to/my-dbt-project focus on the revenue pipeline, I want hard tasks that test Snowflake skills
The skill will:
/ade-bench:create-task /path/to/my-dbt-project
With options:
/ade-bench:create-task /path/to/my-dbt-project --output-dir ./my-benchmarks --db-name my_warehouse --db-path /path/to/duckdb/files
ADE-Bench tasks follow a setup/solve/verify pattern:
LEFT JOIN for INNER JOIN)Both skills automate steps 1-2. The difference is how they choose bugs:
plan-tasks | create-task | |
|---|---|---|
| Approach | Reads models, understands business logic, reasons about what would be meaningful | Scans for SQL tokens, matches against pattern catalog |
| User input | Conversational — intake questions, iterative refinement | Pick from a ranked list |
| Task quality | Higher — bugs are semantically motivated | Template-level — correct but more generic |
| Speed | Slower (5-15 min with interaction) | Faster (2-5 min) |
| Difficulty | Examples |
|---|---|
| Easy | Missing column, wrong alias, removed WHERE filter |
| Medium | Wrong join type, incorrect aggregation, broken CASE expression |
| Hard | Semantic aggregation error, broken incremental logic, upstream bug manifesting downstream |
Snowflake-specific bugs are also supported: IFF() logic flips, QUALIFY filter errors, FLATTEN scope issues, TRY_TO_* removal, BOOLOR/BOOLAND_AGG swaps, and more.
After generating tasks, use ADE-Bench to run them:
# Generate seeds (DuckDB only, first run)
ade run my_project001 --db duckdb --project-type dbt --agent sage --seed \
--tasks-dir ./my-benchmarks/tasks
# Validate with sage agent
ade run my_project001 --db duckdb --project-type dbt --agent sage \
--tasks-dir ./my-benchmarks/tasks
# Test with a real agent
ade run my_project001 --db duckdb --project-type dbt --agent claude \
--tasks-dir ./my-benchmarks/tasks
For Snowflake projects:
npx claudepluginhub typedef-ai/ade-bench-plugin --plugin ade-benchComplete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Intelligent draw.io diagramming plugin with AI-powered diagram generation, multi-platform embedding (GitHub, Confluence, Azure DevOps, Notion, Teams, Harness), conditional formatting, live data binding, and MCP server integration for programmatic diagram creation and management.
Unity Development Toolkit - Expert agents for scripting/refactoring/optimization, script templates, and Agent Skills for Unity C# development
Comprehensive .NET development skills for modern C#, ASP.NET, MAUI, Blazor, Aspire, EF Core, Native AOT, testing, security, performance optimization, CI/CD, and cloud-native applications
Complete collection of battle-tested Claude Code configs from an Anthropic hackathon winner - agents, skills, hooks, and rules evolved over 10+ months of intensive daily use