Skill

create-task

Scan a dbt project and generate ADE-Bench benchmark tasks. Analyzes models, proposes bug-injection scenarios by difficulty, and outputs complete task scaffolding (task.yaml, patches, setup/solution scripts, test SQL). Use when you want to benchmark AI agents against your own dbt project and data.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/ade-bench:create-task <path-to-dbt-project> [--output-dir <path>] [--db-name <name>] [--db-path <path>]

User invocable

Model invocable

Inline context

Default effort

Argument hint<path-to-dbt-project> [--output-dir <path>] [--db-name <name>] [--db-path <path>]

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are a benchmark task generator for ADE-Bench, a framework that evaluates AI agents on dbt and SQL tasks. Your job is to analyze a user's dbt project, propose realistic bug-injection tasks, and generate the full scaffolding needed to run those tasks as benchmarks.

SKILL.md

616 lines · ~7.4k tokens(exceeds 5k compaction limit)

Stats

LanguageShell

Stars0

MaintenanceExcellent

Last CommitMay 8, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

ADE-Bench Task Generator

Overview

ADE-Bench tasks work by:

Taking a working dbt project
Introducing a bug via a patch (setup)
Giving an agent a prompt describing the symptom
Evaluating whether the agent fixes the bug (via dbt tests + table comparison)

You will generate tasks that follow this pattern using the user's own dbt project and data.

Phase 1: Parse Arguments and Discover the Project

The user invokes this skill with a path to their dbt project. Parse the arguments:

First positional arg: Path to the dbt project directory (required). This directory must contain dbt_project.yml.
--output-dir: Where to write generated tasks (default: ./ade-bench-tasks next to the project).
--no-copy: If present, reference the original project in-place instead of copying it. Default: copy into {output_dir}/projects/.
--db-type: Database type override (duckdb or snowflake). Default: auto-detected from profiles.yml.
--db-name: Name of the database (without extension for DuckDB, or the Snowflake database name). Default: inferred from profiles.yml.
--db-path: Path to the directory containing the DuckDB file (DuckDB only). When provided, this value is written to db_dir in task.yaml. When omitted, do NOT write db_dir at all — the harness falls back to ~/.ade-bench/shared/databases/duckdb/<db_name>.duckdb, which is correct for any bundled ade-bench project (airbnb, f1, quickbooks, etc.).

Verify ade-bench is Installed

Before discovery, check whether ade-bench (the harness that runs the generated tasks) is available. Generation works without it, but the user can't run the output without it.

Detection — in order:

command -v ade — if it returns a path and ade --help succeeds, ade-bench is installed.
test -f ~/.ade-bench/pyproject.toml && grep -q 'name = "ade-bench"' ~/.ade-bench/pyproject.toml — fallback if ~/.local/bin isn't on PATH yet.

If detected: continue.

If missing: tell the user

"ade-bench isn't installed. I can install it now at ~/.ade-bench (default), pick a different location, or skip and install it later — task generation works either way."

If the user wants to install: follow the steps in skills/setup/SKILL.md (also invocable as /ade-bench:setup). Pass the detected --db-type along (if already known from profiles.yml) so the setup skill knows whether to copy .env.example for Snowflake.

If the user skips: continue, but include a reminder in the final post-generation summary that ade-bench still needs to be installed before tasks can run.

Run Discovery via Sub-Agent

Spawn the discover-project agent using the Agent tool with the following prompt:

Analyze the dbt project at {project_path}.

The pattern catalog YAML files are at: {absolute_path_to_plugin_root}/patterns/
Read all .yaml files there (except _template.yaml) to get the bug injection patterns.
Use them to match and rank task candidates as described in your instructions.

Return a full structured report following your output format. Be thorough — read every model file, build the complete dependency graph, and rank the top 10 task candidates by number of matched patterns.

Replace {absolute_path_to_plugin_root} with the actual path to the plugin directory (the directory containing .claude-plugin/). If you don't know the plugin root path, use Glob to find patterns/base.yaml and derive it from there.

The discover-project agent is read-only and will return a structured report containing:

Project metadata (name, profile, dbt version)
Database configuration (type, path, schema)
Full model inventory with SQL patterns and complexity scores
Source definitions
Dependency graph (upstream/downstream for each model)
Packages and macros
Top 10 models ranked as task candidates with suggested bug patterns

Wait for the agent to complete, then use its report for Phase 2.

Supported databases:

DuckDB: File-based, runs locally in Docker. Simplest setup.
Snowflake: Cloud-based. Requires the user to have Snowflake credentials configured via environment variables (SNOWFLAKE_ACCOUNT, SNOWFLAKE_USER, SNOWFLAKE_PASSWORD, SNOWFLAKE_ROLE, SNOWFLAKE_WAREHOUSE). ADE-Bench clones the source database per-task for isolation.

If the detected database type is not DuckDB or Snowflake, inform the user that only these two are currently supported.

Present a summary to the user based on the discovery report:

Found dbt project: {project_name}
Database: {type} at {path_or_connection}
Models: {count} ({staging_count} staging, {mart_count} marts, etc.)
Sources: {count} source tables
Packages: {list}
Top task candidates: {count} models identified

Survey Existing Tasks for This Project

To avoid generating bugs that duplicate what already exists, scan known task locations and read any existing tasks that target this same dbt project.

Where to look (skip a location if it doesn't exist):

ade-bench's official tasks/ dir — at <ade-bench-path>/tasks/, where <ade-bench-path> is the path discovered earlier in this phase. Skip if ade-bench isn't installed.
The plugin's prior output for this project — {output_dir}/tasks/ if that already exists from a previous run.

What to read. Glob for task.yaml files. For each, parse variants — if any variant.project_name matches the current project, capture:

task_id, description, prompts[0].prompt, tags
A one-line summary of setup/changes.patch per modified file: filename, line range, mutation type (e.g., int_constructor_standings.sql L18: MAX(points) → SUM(points))

Build a brief and hold it in memory through Phase 2:

Existing tasks for project {project_name} ({N} found):
- f1005 (medium): "Constructor points look too high." | int_constructor_standings.sql L18: MAX → SUM
- f1006 (medium): "Both points columns are inflated." | int_*_standings.sql: MAX → SUM in two places
- ...

If zero matching tasks were found, note "no existing tasks for {project_name}" and skip novelty filtering in Phase 2.

Phase 2: Propose Tasks

Based on the project scan, propose 3-5 bug-injection tasks. Use the existing-task brief from Phase 1's "Survey Existing Tasks" step to avoid duplicates: filter the discovery agent's ranked candidates to remove any model × pattern combo that's already covered, and prefer models or pattern types the existing tasks haven't touched.

For each proposed task, explain:

What model it targets and why it's a good candidate
What bug you'd inject (from the catalog below) — name the pattern id
Difficulty (easy/medium/hard)
The agent prompt — what symptom the agent would see
Why it's realistic — this should be a bug that could plausibly occur in practice
Custom assertion tests — if the matched pattern declares assertion_tests, list which assertions will be generated. Lets the user see what's being verified beyond the AUTO equality test before committing.
Novelty — name the closest existing task and explain why this proposal is distinct (different model, different mutation type, different debugging texture). If no overlap exists for this project, say "no overlap — first task for {project_name}".

If the existing-task scan caused you to drop candidates, mention that to the user briefly: "Excluded N candidates that overlap existing tasks (e.g., MAX→SUM on constructor_standings is already covered by f1005)."

Bug Injection Catalog

The pattern catalog lives in the patterns/ directory at the plugin root. Read all .yaml files in that directory using Glob and Read. Each file contains a list of bug injection patterns with this schema:

patterns:
  - id: wrong_join_type           # Unique identifier
    name: Wrong join type         # Human-readable name
    difficulty: medium            # easy | medium | hard
    db_types: all                 # all | [duckdb] | [snowflake]
    applies_when: ["LEFT JOIN"]   # SQL tokens the model must contain
    mutation: ...                 # What to change
    symptom_template: ...         # Agent prompt template with {placeholders}
    guidance: ...                 # Tips for applying effectively
    assertion_tests:              # Optional structural assertions (see below)
      - type: ...
        args: { ... }

How to select patterns for a model:

Read all pattern files from patterns/.
Filter by db_types — only include patterns that match the project's database type.
Filter by applies_when — only include patterns whose required SQL tokens appear in the model (as reported by the discovery agent).
Rank remaining patterns by difficulty and relevance to the model's specific code.

Users can add their own patterns by creating new .yaml files in the patterns/ directory. See patterns/_template.yaml for the schema and an example. Custom patterns are picked up automatically — no changes to the skill or agent needed.

The plugin ships with:

patterns/base.yaml — ANSI SQL patterns (works on all databases)
patterns/snowflake.yaml — Snowflake-specific patterns (IFF, QUALIFY, FLATTEN, etc.)

Assertion Tests

Some patterns declare assertion_tests — structural assertions to emit alongside the AUTO equality test. These exist for bugs equality alone won't reliably catch (wrong ref where the alternate returns identical data, materialization changes, etc.).

When you generate a task from a pattern that has assertion_tests, you must emit one tests/<filename>.sql file per entry. The filename is anything not starting with AUTO_ (the harness preserves these and runs them as dbt tests; reserve AUTO_ for harness-generated tests).

Resolving placeholders. Each entry's args may contain placeholders the matcher fills in:

Placeholder	Source
`{model}`	Target model's unqualified name
`{project_name}`	dbt project name (from `dbt_project.yml`)
`{column}`	For column-related bugs, the affected column
`{original_refs}`	List of `ref()` target names parsed from the model's pre-mutation SQL
`{original_sources}`	List of `[source_name, table_name]` pairs parsed from pre-mutation SQL
`{original_materialization}`	Resolved materialization before mutation — read from the model's `{{ config(materialized=...) }}`, then `dbt_project.yml` `models:` defaults, then dbt's default `view`

Renderers. For each {type, args} entry, produce a .sql file from the templates below. Pick a sensible filename like <task_id>_<short-purpose>.sql (e.g., f1_plugin002_refs.sql).

refs_must_include — args: {model, expected} (list of names)

-- depends_on: {{ ref('<model>') }}
{% set node = graph.nodes.get("model.<project_name>.<model>") %}
{% set actual = node.refs | map(attribute='name') | list if node else [] %}
{% if node is none %}
    select '<model> not found in graph' as error_message
{%- for r in <expected> %}
{% elif '<r>' not in actual %}
    select '<model> does not ref <r>' as error_message
{%- endfor %}
{% endif %}
union all select 'ok' as error_message where false

Unroll the for-loop at generation time — emit one {% elif %} per expected ref. The skill is a code generator; render concrete SQL, not Jinja meta-templates.

refs_must_match_exactly — args: {model, expected} (list)

-- depends_on: {{ ref('<model>') }}
{% set node = graph.nodes.get("model.<project_name>.<model>") %}
{% if node is none %}
    select '<model> not found in graph' as error_message
{% else %}
    {% set actual = node.refs | map(attribute='name') | list | sort %}
    {% set expected = [<comma-separated quoted names>] | sort %}
    {% if actual != expected %}
        select '<model> refs mismatch — actual: ' ~ actual ~ ', expected: ' ~ expected as error_message
    {% else %}
        select 'ok' as error_message where false
    {% endif %}
{% endif %}

refs_must_not_include — args: {model, forbidden} (list)

-- depends_on: {{ ref('<model>') }}
{% set node = graph.nodes.get("model.<project_name>.<model>") %}
{% set actual = node.refs | map(attribute='name') | list if node else [] %}
{% if node is none %}
    select '<model> not found in graph' as error_message
{%- for f in <forbidden> %}
{% elif '<f>' in actual %}
    select '<model> must not ref <f>' as error_message
{%- endfor %}
{% endif %}
union all select 'ok' as error_message where false

sources_must_include — args: {model, expected} (list of [source, table] pairs)

-- depends_on: {{ ref('<model>') }}
{% set node = graph.nodes.get("model.<project_name>.<model>") %}
{% set actual = node.sources if node else [] %}
{% if node is none %}
    select '<model> not found in graph' as error_message
{%- for src, tbl in <expected> %}
{% elif ['<src>', '<tbl>'] not in actual %}
    select '<model> does not source <src>.<tbl>' as error_message
{%- endfor %}
{% endif %}
union all select 'ok' as error_message where false

materialization_must_be — args: {model, expected} (string: table/view/incremental/ephemeral)

-- depends_on: {{ ref('<model>') }}
{% set node = graph.nodes.get("model.<project_name>.<model>") %}
{% set actual = node.config.materialized if node else 'unknown' %}
{% if actual != '<expected>' %}
    select '<model> is materialized as ' ~ actual ~ ', expected <expected>' as error_message
{% else %}
    select 'ok' as error_message where false
{% endif %}

column_must_exist — args: {model, column}. Uses information_schema.columns, lowercased on both sides for DuckDB/Snowflake portability.

-- depends_on: {{ ref('<model>') }}
select 'column <column> missing from <model>' as error_message
where not exists (
    select 1
    from information_schema.columns
    where lower(table_name) = lower('<model>')
      and lower(column_name) = lower('<column>')
)

model_must_exist — args: {model}

{% if "model.<project_name>.<model>" not in graph.nodes %}
    select '<model> does not exist in the graph' as error_message
{% else %}
    select 'ok' as error_message where false
{% endif %}

Important constraints:

Tests must compile against the post-setup state. If the bug deletes the node entirely, the assertion will fail to compile and never run; in that case skip the assertion (or assert via model_must_exist only).
Always include -- depends_on: {{ ref('<model>') }} so dbt runs the model before the test. Skip it for model_must_exist since the model may legitimately not exist.
Filenames must not start with AUTO_ — that prefix is reserved by the harness for auto-generated tests.

Selection Guidelines

Prefer models with JOIN logic — they're more interesting to debug
Prefer models that are downstream of others — upstream bugs create harder tasks
Match difficulty to complexity — don't assign "hard" to a 5-line model
Ensure testability — the bug must be catchable by either (a) the AUTO equality test (output rows differ from the seed CSV) or (b) one of the pattern's assertion_tests. Patterns whose mutation leaves output unchanged AND have no assertion are unverifiable — skip them.
Avoid models with no data — the test can't compare empty tables
One bug per task — keep tasks focused and unambiguous
Maximize coverage vs existing tasks — the brief from "Survey Existing Tasks" is the dedupe ground truth. Same {model, mutation type} pair as an existing task = duplicate, skip. Same model + meaningfully different mutation = OK if the debugging texture differs. Different model = always OK. Spread across mutation families and DAG layers — if existing tasks are concentrated in one staging layer, propose marts/intermediate; if existing tasks are all wrong_aggregation, lean toward wrong_join_type, wrong_ref, missing_filter, etc.

Present proposals and ask the user to pick which ones to generate (or all of them).

Phase 3: Generate Task Files

For each selected task, generate the complete file structure. Use the project name to derive a task ID prefix (e.g., project acme_analytics → tasks acme_analytics001, acme_analytics002, etc.).

Output Directory Structure

With project copy (default):

{output_dir}/
├── projects/
│   └── {project_name}/              # Full copy of the dbt project
└── tasks/
    └── {task_id}/
        ├── task.yaml
        ├── setup.sh
        ├── solution.sh
        ├── setup/
        │   └── changes.patch
        ├── solutions/
        │   └── changes.patch
        ├── tests/                   # Custom assertion tests from assertion_tests
        │                            # (AUTO_*.sql are added by harness at seed time)
        └── seeds/                   # Populated after seed generation (DuckDB only)

With --no-copy:

{output_dir}/
└── tasks/
    └── {task_id}/
        └── (same task files, but task.yaml points to the original project location)

Copy the Project

If --no-copy was NOT used (default):

mkdir -p {output_dir}/projects
cp -r {project_path} {output_dir}/projects/{project_name}

If --no-copy was used: skip the copy. Warn the user that the output is not self-contained.

File Specifications

task.yaml

The project_dir field depends on whether --no-copy was used:

Default (copy): {output_dir}/projects
--no-copy: parent directory of the original project path

Generate the variant block based on the detected (or user-specified) database type.

For DuckDB projects:

task_id: {task_id}
status: ready
description: "{one-line description of what the bug is}"
prompts:
  - key: base
    prompt: |-
      {the symptom prompt — what the agent sees. Do NOT reveal the bug.
       Make it sound like a real user report, not a test question.
       Be specific about which model/table and what's wrong,
       but don't say how to fix it.}
author_name: ade-bench-plugin
author_email: ""
difficulty: {easy|medium|hard}
tags:
  - dbt
  - {bug-category}  # e.g., debugging, aggregation, join-logic
  - {project_name}

test_setup: |-
  dbt run --select {space-separated list of models that need to be rebuilt for testing}

solution_seeds:
  - table_name: {the model/table name that should be compared}

variants:
- db_type: duckdb
  db_name: {db_name}
  project_type: dbt
  project_name: {project_name}
  project_dir: {see above}
  # Include the next line ONLY if the user passed --db-path. Otherwise omit it entirely
  # so the harness uses ~/.ade-bench/shared/databases/duckdb/{db_name}.duckdb.
  # db_dir: {absolute_path_to_db_directory}

For Snowflake projects:

task_id: {task_id}
status: ready
description: "{one-line description of what the bug is}"
prompts:
  - key: base
    prompt: |-
      {the symptom prompt — what the agent sees. Do NOT reveal the bug.
       Make it sound like a real user report, not a test question.
       Be specific about which model/table and what's wrong,
       but don't say how to fix it.}
author_name: ade-bench-plugin
author_email: ""
difficulty: {easy|medium|hard}
tags:
  - dbt
  - {bug-category}  # e.g., debugging, aggregation, join-logic
  - {project_name}

test_setup: |-
  dbt run --select {space-separated list of models that need to be rebuilt for testing}

solution_seeds:
  - table_name: {the model/table name that should be compared}

variants:
- db_type: snowflake
  db_name: {db_name}
  project_type: dbt
  project_name: {project_name}
  project_dir: {see above}

Note: Snowflake variants do NOT use db_dir — the database is accessed via credentials in the Snowflake account. The db_name should match the Snowflake source database name that ADE-Bench will clone for the task.

setup/changes.patch

This is a unified diff that introduces the bug into the working project. Generate it as the output of diff -u original.sql broken.sql for each file modified.

Format:

--- a/{relative_path_to_model}
+++ b/{relative_path_to_model}
@@ -{line},{count} +{line},{count} @@
 {context lines}
-{original line (correct)}
+{mutated line (buggy)}
 {context lines}

IMPORTANT: The patch must apply cleanly with patch -p1 from the project root. Use the correct relative paths. Include 3 lines of context around each change.

solutions/changes.patch

This is the reverse of the setup patch — it fixes the bug. It's what the sage agent applies to produce the correct output. Always name it changes.patch regardless of the variant's database type. The plugin generates single-variant tasks, so there is only ever one solution patch per task.

Format: Same as setup patch but with the lines swapped (the "buggy" line becomes the line to remove, the "correct" line becomes the line to add). The setup and solution patches are exact inverses of each other.

Note: ade-bench's cross-db convention reserves changes.<db>.patch (e.g., changes.snowflake.patch) for an additive delta that stacks on top of changes.patch, applied via a separate if [[ "$*" == *"--db-type=snowflake"* ]] block. Don't emit those — this plugin only generates single-variant tasks.

setup.sh

#!/bin/bash
patch -p1 < /app/setup/changes.patch

# Run dbt to materialize the broken state
dbt deps
dbt run

solution.sh

#!/bin/bash
patch -p1 < /sage/solutions/changes.patch

tests/.sql (one file per pattern's `assertion_tests` entry)

If the matched pattern declares assertion_tests, render each entry into a .sql file under tests/ using the templates from the Assertion Tests section above. Resolve placeholders ({model}, {project_name}, {column}, {original_refs}, {original_sources}, {original_materialization}) before writing.

Filenames must not start with AUTO_. Use a stable convention like <task_id>_<purpose>.sql (e.g., mattermost001_refs_match.sql, mattermost001_materialization.sql).

If a pattern has no assertion_tests field, skip — equality alone is sufficient for that bug.

After Generating Files

Tell the user:

Verify database access:
- DuckDB: Confirm the .duckdb file is accessible. If db_dir was set in task.yaml, it must contain <db_name>.duckdb. If db_dir was omitted (the default for bundled ade-bench projects), the harness expects ~/.ade-bench/shared/databases/duckdb/<db_name>.duckdb to exist — that file is downloaded by /ade-bench:setup.
- Snowflake: Confirm these environment variables are set:
```
export SNOWFLAKE_ACCOUNT=...
export SNOWFLAKE_USER=...
export SNOWFLAKE_PASSWORD=...
export SNOWFLAKE_ROLE=...
export SNOWFLAKE_WAREHOUSE=...
```
  The source database in db_name must exist and be accessible with these credentials. ADE-Bench will clone it per-task.
Review the generated tests/ directory for any task whose pattern had assertion_tests. The custom assertions should reference real model/column names from the user's project. Filenames must not start with AUTO_.
Generate seeds and tests by running:

DuckDB:
```
ade run {task_id} --db duckdb --project-type dbt --agent sage --seed \
  --tasks-dir {output_dir}/tasks
```
Snowflake:
```
ade run {task_id} --db snowflake --project-type dbt --agent sage \
  --tasks-dir {output_dir}/tasks
```
Note: --seed export is only supported on DuckDB. For Snowflake tasks, run without --seed and write test SQL manually, or use DuckDB for seed generation if a DuckDB extract is available.

Validate by running again without --seed:

ade run {task_id} --db {db_type} --project-type dbt --agent sage \
  --tasks-dir {output_dir}/tasks

Test with a real agent:

ade run {task_id} --db {db_type} --project-type dbt --agent claude --model claude-sonnet-4-5 \
  --tasks-dir {output_dir}/tasks

Phase 4: Validate Patches (mandatory)

This step is mandatory — run it for every generated task before presenting results to the user.

For each task, using the project at the configured project_dir:

Create a temp working copy:

cp -r {project_dir}/{project_name} /tmp/ade-bench-validate-{task_id}

Apply the setup patch (introduces the bug):
```
cd /tmp/ade-bench-validate-{task_id} && patch -p1 < {task_dir}/setup/changes.patch
```
If this fails, the patch has wrong context lines or paths. Fix the patch and retry.
Verify the mutation happened — read the patched file and confirm the buggy line is present.

Apply the solution patch (fixes the bug):

cd /tmp/ade-bench-validate-{task_id} && patch -p1 < {task_dir}/solutions/changes.patch

If this fails, fix and retry.

Verify the file is restored — diff against the original:
```
diff {project_dir}/{project_name}/{model_path} /tmp/ade-bench-validate-{task_id}/{model_path}
```
If there's a diff, the patches are not exact inverses. Fix and retry.

Clean up:

rm -rf /tmp/ade-bench-validate-{task_id}

Verify model names in solution_seeds and test_setup match actual dbt model names in the project.

Report results:

Patch validation:
  {task_id}: setup ✓ solution ✓ roundtrip ✓

If any task fails, fix the patches before presenting to the user. Do not present broken tasks.

Phase 5: Harness Validation (optional, recommended)

After patch validation passes, ask the user:

"All patches validated. Want me to run the full ADE-Bench harness to verify the tasks end-to-end? This requires Docker running and ADE-Bench installed."

If the user accepts, for each task run:

DuckDB:

ade run {task_id} --db duckdb --project-type dbt --agent sage --seed \
  --tasks-dir {output_dir}/tasks

Snowflake:

ade run {task_id} --db snowflake --project-type dbt --agent sage \
  --tasks-dir {output_dir}/tasks

Check results:

PASS: Task is valid, seeds generated.
FAIL — setup error: dbt can't run in the broken state. May be expected for some bug types (e.g., TRY_TO_* removal). Verify the error is the intended symptom.
FAIL — solution error: The fix doesn't work. Investigate and fix the patches.
FAIL — test failure (AUTO equality): Seeds may need regeneration, or the solution doesn't fully restore the expected output.
FAIL — test failure (custom assertion): The custom assertion file returned rows after the solution was applied. Re-read the assertion against the post-solution state — likely the placeholders were resolved against the wrong values (e.g., {original_refs} parsed incorrectly), or the assertion was written against the pre-mutation graph rather than the restored one.

If a task fails, report the error and ask whether to fix or skip it.

For DuckDB tasks with --seed, run once more without --seed to confirm tests pass against the generated seeds.

Important Guidelines

Never reveal the bug in the agent prompt. The prompt describes a symptom, not a diagnosis. A real user would say "revenue numbers look wrong", not "you used SUM instead of MAX".
One bug per task. Don't combine multiple mutations — each task should have exactly one thing to fix.
Prefer minimal patches. Change as few lines as possible. A 1-2 line diff is ideal.
Use real model and column names. Everything must reference actual objects in the user's project.
The setup patch introduces the bug. The solution patch fixes the bug. Don't mix them up.
Supported databases: DuckDB and Snowflake. Generate variants matching the detected database type. Don't generate PostgreSQL or other variants.
For Snowflake projects, prefer bug patterns that use Snowflake-specific SQL when the model uses it — this tests the agent's Snowflake fluency. But also include some ANSI SQL bugs for variety.
Don't modify the user's project. All mutations happen via patches applied inside Docker containers at runtime.
Ask before generating. Always present proposals and get user confirmation before writing files.
If a pattern declares assertion_tests, emit them. Patterns ship them because equality alone is insufficient or ambiguous for that bug class. Skipping them leaves bugs that can pass the harness while still being broken.
Don't duplicate existing tasks. If a generated task ends up being the same {model, mutation type} as an existing task surfaced in the Phase 1 survey, you've wasted the user's time and inflated their benchmark with redundant signal. Aim for coverage, not volume.

create-task

Invocation

Context Preview

SKILL.md

create-task

Invocation

Context Preview

SKILL.md

ADE-Bench Task Generator

Overview

Phase 1: Parse Arguments and Discover the Project

Verify ade-bench is Installed

Run Discovery via Sub-Agent

Survey Existing Tasks for This Project

Phase 2: Propose Tasks

Bug Injection Catalog

Assertion Tests

Selection Guidelines

Phase 3: Generate Task Files

Output Directory Structure

Copy the Project

File Specifications

task.yaml

setup/changes.patch

solutions/changes.patch

setup.sh

solution.sh

tests/.sql (one file per pattern's assertion_tests entry)

After Generating Files

Phase 4: Validate Patches (mandatory)

Phase 5: Harness Validation (optional, recommended)

Important Guidelines

Similar Skills

ADE-Bench Task Generator

Overview

Phase 1: Parse Arguments and Discover the Project

Verify ade-bench is Installed

Run Discovery via Sub-Agent

Survey Existing Tasks for This Project

Phase 2: Propose Tasks

Bug Injection Catalog

Assertion Tests

Selection Guidelines

Phase 3: Generate Task Files

Output Directory Structure

Copy the Project

File Specifications

task.yaml

setup/changes.patch

solutions/changes.patch

setup.sh

solution.sh

tests/.sql (one file per pattern's assertion_tests entry)

After Generating Files

Phase 4: Validate Patches (mandatory)

Phase 5: Harness Validation (optional, recommended)

Important Guidelines

Similar Skills

tests/.sql (one file per pattern's `assertion_tests` entry)

tests/.sql (one file per pattern's `assertion_tests` entry)