Skill

zombie-build-datasets

Design dataset schemas, run analytics queries, and build reports against Zombie Brains datasets. Use whenever the user is designing a dataset schema, querying dataset data, building a report, computing aggregates / KPIs / rollups, comparing two time windows, materializing results, scheduling a recurring report, doing bulk JSONL ingest, setting up saved views, or asking which dataset op fits a reporting question. Covers register_dataset / register_dataset_table, dataset_query / count / get / insert / upsert / update / delete / aggregate / compare / describe / export_aggregate, declared + ad-hoc joins, saved views, and the bulk REST routes.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/zombie-brains:zombie-build-datasets

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Datasets are the analytics-grade tabular surface. Memories are for unstructured knowledge; **datasets are for structured rows you'll filter, group, and roll up.** When the user says "build me a report", "show me top N by X", "current vs prior week", "ingest this CSV" — that is a dataset job.

SKILL.md

628 lines · ~7.3k tokens(exceeds 5k compaction limit)

Stats

LanguageShell

Stars0

MaintenanceExcellent

Last CommitJun 1, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Building with Zombie Brains — Datasets

Datasets are the analytics-grade tabular surface. Memories are for unstructured knowledge; datasets are for structured rows you'll filter, group, and roll up. When the user says "build me a report", "show me top N by X", "current vs prior week", "ingest this CSV" — that is a dataset job.

What's a dataset?

A dataset is a workspace-scoped (account_owner_id) container — Airtable's "base". A dataset_table is one typed table inside a dataset — Airtable's "tab". Each dataset_table has a typed schema, a dedicated Postgres physical table (with generated columns + B-tree indexes for indexed fields), and is addressed as (dataset_slug, table_name) across the V8 surface (env.dataset.<slug>.<table>.<op>(...)), REST surface (/v1/datasets/<slug>/tables/<table>/...), and the manage tool (dataset_* actions).

Datasets are different from Phase-1 datastores (brain-scoped, single-table). The Phase-1 surface still works for legacy code but the Phase-1 datastore registry is in deprecation — favor datasets for all new work.

Permissions live on the dataset, not the table: read / write / admin. Highest grant wins across permission sets. Workspace members are admin on every dataset their workspace owns.

Consultant tree-walked access

When a human consultant (a user whose workspace has child workspaces via parent_account_owner_id) calls a read or query action — list_datasets, list_dataset_tables, dataset_count, dataset_aggregate, dataset_query, dataset_compare, dataset_describe, dataset_export_aggregate, save_view, list_views — the dataset lookup tree-walks parent_account_owner_id. The parent can address datasets in any child workspace by dataset_slug or dataset_id directly; no workspace-switch dance required.

Agents do not tree-walk. Agent callers (tools, routines, anything resolving through an agent's owner) are strictly workspace-bound — they only see datasets in their own workspace. The consultant convenience is a human-only ergonomic.

Schema design

A schema is { fields: [...], compound_indexes?: [...], relationships?: [...] }. Each field is:

{ "name": "spend", "type": "number", "required": false, "indexed": true, "unique": false }

Field types (these are exhaustive — pick the right one upfront):

Type	Use for	Indexable	Aggregatable
`string`	short identifiers, codes	yes	`min`/`max`/`count_distinct` only
`text`	long free-form text	yes (B-tree on prefix)	`count_distinct`
`number`	money, ratios, percentages	yes	`sum`/`avg`/`min`/`max`
`integer`	counts, IDs	yes	`sum`/`avg`/`min`/`max`
`boolean`	flags	yes	`count`/`count_distinct`
`date`	calendar days (no TZ)	yes	`min`/`max`
`timestamp`	event times (with TZ)	yes	`min`/`max`
`enum` (+ `values: [...]`)	constrained string set	yes	`count_distinct`
`string_array`	tags, categories	yes (GIN)	not in metrics
`number_array`	embeddings, ranges	yes (GIN)	not in metrics
`json`	unstructured blobs	no (covered by GIN on `data`)	not in metrics

Type discipline matters. Storing a date as string blocks date math, min/max ordering, and dataset_describe's date-coverage probe. Storing an integer as text blocks sum/avg. Always pick the most specific type. You can extend a schema additively (add fields, add indexes) but you cannot change a field's type — declare it correctly the first time.

Indexing decision tree

A field becomes an indexed Postgres generated column iff indexed: true. Index a field when:

You will filter, group, or sort by it (where, group_by, order_by)
Its cardinality is meaningful (more than a handful of values, not millions of nulls)
Always index date/timestamp fields — date filters are the most common where clause in any report

Do not over-index. Each generated column slows down every write. Skip indexed: true for fields you only display, not filter on. Non-indexed fields are still queryable via JSONB extraction — they just hit (data->>'field') casts instead of B-tree lookups, which is fine for low-cardinality where clauses on small tables.

unique: true creates a scoped unique index (per workspace) and is the only way to enable on_conflict: update / upsert on a field.

Compound indexes

"compound_indexes": [{ "fields": ["account_id", "report_date"] }]

Use for multi-column filters that always come together (e.g. "filings by account on a given date"). Each field must already be declared and must be a scalar (no json/array). Compound indexes are additive — add when you see the access pattern, not preemptively.

Relationships

Declare same-dataset joins on schema.relationships[]:

"relationships": [
  {
    "name": "campaign",
    "target_table": "campaigns",
    "local_field": "campaign_id",
    "target_field": "external_id",
    "kind": "many_to_one"
  }
]

kind is one_to_one / one_to_many / many_to_one. It's a semantic hint, not SQL — the generated join is identical for all kinds. Use it for UI/tooling clarity.

v1 enforces same-dataset only (target_table is a sibling). Cross-dataset joins are deferred. The target's schema is validated at join-time, not at declaration time — so you can declare a relationship pointing at a table you haven't created yet.

The dataset_* ops decision tree

Pick the smallest op that answers the question. Reaching for dataset_query and doing math in Claude is almost always wrong.

Question shape	Op	Why
"Give me these specific rows"	`dataset_query`	with `where`/`order_by`/`limit`
"How many rows match X?"	`dataset_count`	no rows returned, just a number
"Fetch one row by id"	`dataset_get`	indexed lookup
"What's in this table?"	`dataset_describe`	schema + row count + date coverage, kills field-name guessing
"Sum spend by campaign"	`dataset_aggregate`	server-side GROUP BY + metrics
"Top N by ACOS"	`dataset_aggregate` + `calculated`	calc fields participate in order_by + having
"This week vs last week"	`dataset_compare`	two-window deltas
"Materialize aggregate to a new table"	`dataset_export_aggregate`	bypass 10K result cap
"Insert/update/delete one or a batch"	`dataset_insert` / `_upsert` / `_update` / `_delete`	transactional row work
"Bulk import a JSONL file"	REST `/v1/datasets/:slug/tables/:t/upsert-jsonl`	200 MB body, auto-batched

Always call dataset_describe first when you don't know the schema. It returns declared fields with types, row count, and the date coverage range — exactly what you need to compose an aggregate without guessing whether the column is spend or cost or total_cost.

Column projection — the `select` arg on dataset_query / dataset_get

By default dataset_query returns the entire row's data blob for every match. On tables that store embeddings (1,536 floats per row) or multi-KB text fields, a naive limit: 100 can be megabytes of wire payload before you even start reading.

Narrow the read with select:

manage(dataset_query, {
  dataset_slug: 'kdp-ads-performance-warehouse',
  table_name: 'sp_performance_rows',
  where: [{ field: 'campaign_status', operator: '=', value: 'enabled' }],
  order_by: [{ field: 'created_at', direction: 'desc' }],
  limit: 50,
  select: ['campaign_id', 'campaign_name', 'spend', 'acos']
})

Server-side projection via jsonb_build_object — the wire payload contains only those keys plus id / created_at / updated_at (always returned). Names are whitelist-validated against schema.fields; an unknown name throws with the valid list. Omit or pass [] for the legacy full-row behavior.

Same arg works on dataset_get for single-row reads.

When NOT to project: when you need the row for downstream dataset_update, you usually want the full blob so the round-trip preserves untouched fields. Projection is for read-only flows (display, comparison, filtering, slicing) where you already know the few columns that matter.

dataset_aggregate deep-dive

{
  "action": "dataset_aggregate",
  "dataset_slug": "kdp-ads",
  "table_name": "ads_daily",
  "where": [
    { "field": "report_date", "operator": "gte", "value": "2026-05-21" }
  ],
  "group_by": ["campaign_id"],
  "metrics": [
    { "op": "sum", "field": "spend", "as": "total_spend" },
    { "op": "sum", "field": "sales_14d", "as": "total_sales" }
  ],
  "calculated": [
    { "as": "acos", "expr": "total_spend / total_sales" }
  ],
  "order_by": [{ "field": "acos", "direction": "asc" }],
  "having": [{ "field": "total_spend", "operator": "gt", "value": 10 }],
  "limit": 20
}

Metric ops

sum, avg — require number / integer field. Strings won't auto-coerce — the schema check rejects them upfront.
min, max — any scalar field (works on dates and strings too).
count — always COUNT(*). field is ignored.
count_distinct — COUNT(DISTINCT <field>).

Each metric gets an alias (as) — defaults to <op>_<field> (or count for op: count). All aliases must be unique across the metrics list AND must not collide with any group_by field name.

Calculated formulas

Calculated metrics run in JS, after the SQL group-by. The formula language is intentionally tiny:

Operators: +, -, *, /, parentheses
Identifiers: refs to metric aliases, group_by aliases, or earlier calculated aliases
Numeric literals: 2.5, 0.30, 100
No function calls. No globals. No string ops. Divide-by-zero produces null + a warning in metadata.

Calculated entries evaluate in declaration order, so chain them:

"calculated": [
  { "as": "acos", "expr": "total_spend / total_sales" },
  { "as": "acos_pct", "expr": "acos * 100" }
]

having + order_by on calculated aliases

You can filter and sort by calculated values:

"having": [{ "field": "acos", "operator": "lt", "value": 0.30 }],
"order_by": [{ "field": "acos", "direction": "asc" }]

Under the hood: clauses targeting metric or group_by aliases go to SQL HAVING / ORDER BY. Clauses targeting calculated aliases are deferred to JS. When any calc-field clause is present, the platform widens the internal SQL limit to 10K so the JS pass has enough candidates to honor "top N by ACOS". metadata.widened_limit: true and candidates_before_post_filter flag this when it happens.

Limits

Default: 100 rows
Max (inline): 10,000 rows — dataset_aggregate will reject limit higher than this
Max (export): 1,000,000 rows — only via dataset_export_aggregate
Max metrics: 32 per call
Max calculated: 32 per call
Max group_by dimensions: 8

If you need more than 10K result rows: tighten the where, OR use dataset_export_aggregate to materialize, then query the new table.

dataset_compare

Two-window comparison — same group_by + metrics, run twice with different where clauses, joined on group keys, with delta_<alias> and delta_pct_<alias> columns computed in JS.

{
  "action": "dataset_compare",
  "dataset_slug": "kdp-ads",
  "table_name": "ads_daily",
  "group_by": ["campaign_id"],
  "metrics": [{ "op": "sum", "field": "spend", "as": "total_spend" }],
  "calculated": [
    { "as": "acos", "expr": "total_spend / total_sales_14d" }
  ],
  "current": {
    "where": [
      { "field": "report_date", "operator": "gte", "value": "2026-05-21" },
      { "field": "report_date", "operator": "lte", "value": "2026-05-27" }
    ]
  },
  "previous": {
    "where": [
      { "field": "report_date", "operator": "gte", "value": "2026-05-14" },
      { "field": "report_date", "operator": "lte", "value": "2026-05-20" }
    ]
  },
  "order_by": [{ "field": "delta_total_spend", "direction": "desc" }],
  "limit": 50
}

v1 takes explicit date ranges. There is no "current": "trailing_7d" sugar yet — compute the dates client-side or in your routine.

Each result row has:

The group_by fields at top level
current: { <alias>: ... } and previous: { <alias>: ... } for every metric + calculated
delta_<alias> and delta_pct_<alias> flat keys (so you can sort on them)

metadata.only_in_current / only_in_previous count new and dropped groups — useful for "what campaigns are new this week" reports.

Sort field paths in order_by:

A group_by field name (campaign_id)
current.<alias> or previous.<alias>
delta_<alias> or delta_pct_<alias>

Default sort is by sum of the first metric across both periods desc (biggest movers regardless of direction).

metrics and group_by are shared across both windows. If you need per-period overrides, run two dataset_aggregate calls and join client-side.

dataset_export_aggregate

Materialize an aggregate result back into a new dataset_table. Use when:

The result exceeds 10K rows
You want a downstream report to query the rollup
A scheduled routine needs a stable artifact to ship

{
  "action": "dataset_export_aggregate",
  "dataset_slug": "kdp-ads",
  "table_name": "ads_daily",
  "group_by": ["report_date", "campaign_id"],
  "metrics": [
    { "op": "sum", "field": "spend", "as": "total_spend" },
    { "op": "sum", "field": "sales_14d", "as": "total_sales" }
  ],
  "calculated": [
    { "as": "acos", "expr": "total_spend / total_sales" }
  ],
  "target_table_name": "ads_daily_campaign_rollup",
  "target_display_name": "Campaign daily rollup",
  "target_description": "Materialized from ads_daily, rebuilt nightly."
}

Target schema is auto-derived from the aliases: group_by fields keep their source type + indexed flag; sum/avg/min/max → number; count/count_distinct → integer; calculated → number.
Requires admin on the source dataset (creating a new table is privileged).
Same-dataset target only in v1. The target table is created in the source's dataset.
if_exists: 'fail' only — pick a new name or delete the old table first.
Cap: 1,000,000 rows.

After the export, the materialized table is queryable like any other (dataset_query, dataset_aggregate, etc.). Pattern: a daily routine rebuilds ads_daily_campaign_rollup, and a saved view queries it for the dashboard.

Joins (declared + ad-hoc, multi-hop)

Joins extend dataset_aggregate / dataset_query / dataset_compare with sibling-table lookups. Two modes:

Declared (preferred)

"join": [{ "relationship": "campaign" }]

Uses the relationship declared on schema.relationships[]. The resolver pulls target_table + local_field + target_field from the declaration. Cleanest for repeated reports.

Ad-hoc (no schema change needed)

"join": [{
  "target_table": "campaigns",
  "local_field": "campaign_id",
  "target_field": "external_id"
}]

Useful for one-offs or exploratory queries where declaring a relationship isn't worth it.

Multi-hop chains

Chain via from: '<earlier_alias>'. Aliases default to the relationship name (declared) or the target table name (ad-hoc):

"join": [
  { "relationship": "campaign" },
  { "relationship": "brand", "from": "campaign" }
]
"group_by": ["campaign.brand_name"]

Field reference shape

Inside a joined query, all field-name spots (group_by, metrics.field, where.field, having.field, order_by.field, calc formula identifiers' base values):

Bare name (spend) → base table (alias t0)
Qualified (campaign.brand_name) → joined table by alias

In the output, qualified group_by columns flatten dots to underscores (campaigns.brand → result key campaigns_brand). Calc formulas reference that flattened alias.

Join semantics

type: 'left' (default) — base rows show up even with no match
type: 'inner' — base rows without a match are dropped
Max 8 joins per query
v1: same-dataset only, no FULL OUTER / CROSS, single-field pairs only

Worked example: ads → campaign → brand

{
  "action": "dataset_aggregate",
  "dataset_slug": "kdp-ads",
  "table_name": "ads_daily",
  "join": [
    { "relationship": "campaign" },
    { "relationship": "brand", "from": "campaign" }
  ],
  "where": [
    { "field": "report_date", "operator": "gte", "value": "2026-05-21" }
  ],
  "group_by": ["brand.name"],
  "metrics": [
    { "op": "sum", "field": "spend", "as": "total_spend" }
  ],
  "order_by": [{ "field": "total_spend", "direction": "desc" }],
  "limit": 20
}

Note spend is on the base table (ads_daily), and the group key resolves to a joined table's field. Mixing freely is fine.

Saved views

Canonicalize a query you'll run repeatedly:

{
  "action": "save_view",
  "view_name": "kdp_ads_daily_performance",
  "dataset_slug": "kdp-ads",
  "table_name": "ads_daily",
  "query_kind": "aggregate",
  "query": {
    "where": [{ "field": "report_date", "operator": "gte", "value": "2026-05-21" }],
    "group_by": ["campaign_id"],
    "metrics": [{ "op": "sum", "field": "spend", "as": "total_spend" }],
    "calculated": [{ "as": "acos", "expr": "total_spend / total_sales_14d" }],
    "order_by": [{ "field": "acos", "direction": "asc" }],
    "limit": 50
  }
}

Then later:

{ "action": "run_view", "view_name": "kdp_ads_daily_performance" }

query_kind: 'aggregate' or 'compare'. export_aggregate is NOT recallable as a saved view in v1.
No parameter substitution in v1. Each view is a complete, frozen query — including any date literals.
run_view re-resolves permissions per call. Revoking dataset access takes effect immediately.
Agents resolve to their owner's namespace when calling run_view — the human saves the view, the agent runs it. Cross-user view access is not supported.
Companion actions: list_views, get_view, update_view, delete_view.

The canonical use case: a daily report. Save the view once; the routine runs it on cron and ships the output to Slack/email.

Bulk JSONL REST routes (large-data ingest)

When you have more than a few thousand rows, use the REST routes — not manage(dataset_upsert) in a loop:

POST /v1/datasets/:slug/tables/:table/insert-jsonl
POST /v1/datasets/:slug/tables/:table/upsert-jsonl
POST /v1/datasets/:slug/tables/:table/update-jsonl
POST /v1/datasets/:slug/tables/:table/delete-jsonl

200 MB body, gzip transparent (Content-Encoding: gzip)
Auto-batched at 1000 records/call (the dispatcher's per-call cap)
Continue-on-error — a single bad row never aborts a million-row import; failures surface in per_batch[].error with line ranges
Same role_datasets permission walk, same quota gating, same schema validation as MCP / V8 paths

When to use REST vs MCP vs `env.dataset`

Path	Use for
REST `/v1/datasets/...`	Bulk imports (1M+ rows), CSV migrations, gzipped uploads
`manage(dataset_*)` via MCP	Interactive analytics, one-off queries, saved view CRUD
`env.dataset.<slug>.<table>.<op>()` inside V8 tool	Transactional row work from a tool / routine

Headers

Authorization: Bearer <api_key> — same key used for /mcp/:apiKey
X-On-Conflict: update | ignore | error — insert behavior on uniqueness collision
X-Conflict-On: <field> — required for upsert; the field must be unique: true in the schema
X-Batch-Size: <1..1000> — override the default 1000

Copy-pastable example

ZB_API_KEY=cm_xxx
DATASET=kdp-acquisition-diligence
TABLE=filings

curl -sS -X POST \
  "https://mcp.zombie.codes/v1/datasets/$DATASET/tables/$TABLE/upsert-jsonl" \
  -H "Authorization: Bearer $ZB_API_KEY" \
  -H "Content-Type: application/x-ndjson" \
  -H "X-Conflict-On: external_id" \
  --data-binary @chunk_001.jsonl \
  | jq '{batches_succeeded, batches_failed, totals, duration_ms}'

Body shape per line:

Op	Each JSONL line is …
insert	a record object: `{...fields...}`
upsert	a record object: `{...fields...}`
update	`{"id": "<uuid>", "fields": {...patch...}}`
delete	a bare `"<uuid>"` string OR `{"id": "<uuid>"}`

Common gotchas

name vs table_name aliasing. The dataset_* row-level actions accept either name or table_name for the table — they're aliased. register_dataset_table / update_dataset_table use name. Pick one and stick with it within a session.
Field type matters for aggregation. sum and avg reject anything that isn't number or integer. Strings stored as numbers won't be auto-coerced. If you're aggregating and getting "field type X" errors, you stored the wrong type — fix the schema (additive: declare a new typed field, backfill, leave the bad field) before retrying.
Calculated formulas are numeric-only. No if(), no string concat, no function calls. If you need conditional logic, do it in SQL via where clauses splitting the dataset, or do post-processing in your tool/routine code.
In joined queries, calc identifiers always reference numeric aliases. Don't try to reference a joined string column (campaign.name) from a formula — it'll parse but eval to null. Use string columns in group_by, numeric columns in formulas.
"limit too high" — use export. If you legitimately need >10K result rows, dataset_export_aggregate is the answer. Materialize the rollup, then query it. Trying to bypass via a wider where is the wrong direction for actual large-result reports.
Don't try to push calc-field filters into SQL HAVING. The platform handles the routing transparently — you write having: [{ field: "acos", operator: "lt", value: 0.30 }] and the platform decides whether it's a SQL clause or a post-side JS filter. Don't pre-optimize.
Indexed date fields are load-bearing. A where { report_date >= ... } against a non-indexed date field will scan the JSONB blob row-by-row. On a small table that's fine; on the daily-rollup of a year of ads data, it's painful. Index every date you'll filter on.
Schema changes are additive-only. You can add fields, add indexes on existing fields, add compound indexes, grow an enum's values list. You cannot remove a field, change a type, remove an index, or shrink an enum. For destructive changes: register a new dataset_table with the desired shape and migrate the data.
Reserved column names. id, brain_id, data, created_at, updated_at are reserved — you can't declare them as fields, and they're silently dropped from update patches.
dataset_describe is your friend. Before composing any aggregate against a table you didn't just define, call describe. It gives you: declared fields with types, row count, and the date-coverage range. Three seconds saves a guess-and-retry loop.
Quota counts against the dataset's owning workspace, not the caller's. When a consultant in workspace A writes to a dataset owned by child workspace B, the rows count against B's tier bucket — the handler resolves workspaceOwnerId from resolved.account_owner_id (the dataset's owner) before checking quota, not from the caller. Translation: parent writes to a child's dataset are billed to the child. Plan child-workspace tiers with that in mind; the parent's headroom doesn't subsidize them.
dataset_id (UUID) bypasses slug lookup — useful escape hatch. Slug lookup runs through the workspace filter (with the consultant tree-walk above). If a slug lookup ever fails — we hit this transiently with the tree walk during a deploy — passing dataset_id instead of dataset_slug fetches the row by primary key and skips the workspace-scoped slug query entirely. checkDatasetAccess still validates tree-walked access against the resolved row, so this is not a permissions bypass — just a lookup escape hatch when the slug path is misbehaving. Keep it in your back pocket for debugging.

Worked example — end-to-end Amazon ads daily report

1. Define the schema

{
  "action": "register_dataset_table",
  "dataset_slug": "kdp-ads",
  "name": "ads_daily",
  "display_name": "Amazon Ads daily rollup",
  "schema": {
    "fields": [
      { "name": "report_date", "type": "date", "required": true, "indexed": true },
      { "name": "campaign_id", "type": "string", "required": true, "indexed": true },
      { "name": "ad_group_id", "type": "string", "indexed": true },
      { "name": "asin", "type": "string", "indexed": true },
      { "name": "impressions", "type": "integer" },
      { "name": "clicks", "type": "integer" },
      { "name": "spend", "type": "number" },
      { "name": "sales_14d", "type": "number" },
      { "name": "orders_14d", "type": "integer" }
    ],
    "compound_indexes": [
      { "fields": ["report_date", "campaign_id"] }
    ],
    "relationships": [
      {
        "name": "campaign",
        "target_table": "campaigns",
        "local_field": "campaign_id",
        "target_field": "external_id",
        "kind": "many_to_one"
      }
    ]
  }
}

2. Aggregate: top 20 campaigns by spend, last 7 days, with ACOS

{
  "action": "dataset_aggregate",
  "dataset_slug": "kdp-ads",
  "table_name": "ads_daily",
  "where": [
    { "field": "report_date", "operator": "gte", "value": "2026-05-21" }
  ],
  "group_by": ["campaign_id"],
  "metrics": [
    { "op": "sum", "field": "spend", "as": "total_spend" },
    { "op": "sum", "field": "sales_14d", "as": "total_sales" },
    { "op": "sum", "field": "orders_14d", "as": "total_orders" }
  ],
  "calculated": [
    { "as": "acos", "expr": "total_spend / total_sales" }
  ],
  "having": [{ "field": "total_spend", "operator": "gt", "value": 50 }],
  "order_by": [{ "field": "total_spend", "direction": "desc" }],
  "limit": 20
}

3. Compare: trailing 7 vs prior 7

{
  "action": "dataset_compare",
  "dataset_slug": "kdp-ads",
  "table_name": "ads_daily",
  "group_by": ["campaign_id"],
  "metrics": [
    { "op": "sum", "field": "spend", "as": "total_spend" },
    { "op": "sum", "field": "sales_14d", "as": "total_sales" }
  ],
  "calculated": [
    { "as": "acos", "expr": "total_spend / total_sales" }
  ],
  "current": {
    "where": [
      { "field": "report_date", "operator": "gte", "value": "2026-05-21" },
      { "field": "report_date", "operator": "lte", "value": "2026-05-27" }
    ]
  },
  "previous": {
    "where": [
      { "field": "report_date", "operator": "gte", "value": "2026-05-14" },
      { "field": "report_date", "operator": "lte", "value": "2026-05-20" }
    ]
  },
  "order_by": [{ "field": "delta_total_spend", "direction": "desc" }],
  "limit": 50
}

4. Save it as a daily view + schedule

{
  "action": "save_view",
  "view_name": "kdp_ads_top_campaigns_7d",
  "dataset_slug": "kdp-ads",
  "table_name": "ads_daily",
  "query_kind": "aggregate",
  "query": { /* the aggregate body from step 2 */ }
}

Then a routine (see zombie-build-routines) runs manage(run_view, view_name: 'kdp_ads_top_campaigns_7d') on a cron and ships the result.

5. Multi-hop join: roll up by brand

{
  "action": "dataset_aggregate",
  "dataset_slug": "kdp-ads",
  "table_name": "ads_daily",
  "join": [
    { "relationship": "campaign" },
    { "relationship": "brand", "from": "campaign" }
  ],
  "where": [
    { "field": "report_date", "operator": "gte", "value": "2026-05-01" }
  ],
  "group_by": ["brand.name"],
  "metrics": [
    { "op": "sum", "field": "spend", "as": "total_spend" },
    { "op": "sum", "field": "sales_14d", "as": "total_sales" }
  ],
  "calculated": [
    { "as": "acos", "expr": "total_spend / total_sales" }
  ],
  "order_by": [{ "field": "total_spend", "direction": "desc" }],
  "limit": 20
}

spend and sales_14d are on the base table; the group key resolves through the join chain to brand.name.

Cross-references

zombie-build-tools — for env.dataset.<slug>.<table>.<op>(...) usage inside a V8 serverless tool. Same ops (CRUD + analytics — aggregate / compare / describe / export_aggregate), same arg shapes, different transport. The runner Proxy normalizes positional V8 args (get(id), delete(ids), insert(records, opts), upsert(records, opts), update(id, fields)) into the structured-arg wire format; analytics ops take the same single-args-object you'd pass through manage(dataset_*).
zombie-build-routines — for cron-scheduling a saved view, building daily/weekly reports.
zombie-build-permissions — for granting dataset access via permission sets (attach_dataset_to_role).

zombie-build-datasets

Invocation

Context Preview

SKILL.md

zombie-build-datasets

Invocation

Context Preview

SKILL.md

Building with Zombie Brains — Datasets

What's a dataset?

Consultant tree-walked access

Schema design

Indexing decision tree

Compound indexes

Relationships

The dataset_* ops decision tree

Column projection — the select arg on dataset_query / dataset_get

dataset_aggregate deep-dive

Metric ops

Calculated formulas

having + order_by on calculated aliases

Limits

dataset_compare

dataset_export_aggregate

Joins (declared + ad-hoc, multi-hop)

Declared (preferred)

Ad-hoc (no schema change needed)

Multi-hop chains

Field reference shape

Join semantics

Worked example: ads → campaign → brand

Saved views

Bulk JSONL REST routes (large-data ingest)

When to use REST vs MCP vs env.dataset

Headers

Copy-pastable example

Common gotchas

Worked example — end-to-end Amazon ads daily report

1. Define the schema

2. Aggregate: top 20 campaigns by spend, last 7 days, with ACOS

3. Compare: trailing 7 vs prior 7

4. Save it as a daily view + schedule

5. Multi-hop join: roll up by brand

Cross-references

Similar Skills

Building with Zombie Brains — Datasets

What's a dataset?

Consultant tree-walked access

Schema design

Indexing decision tree

Compound indexes

Relationships

The dataset_* ops decision tree

Column projection — the select arg on dataset_query / dataset_get

dataset_aggregate deep-dive

Metric ops

Calculated formulas

having + order_by on calculated aliases

Limits

dataset_compare

dataset_export_aggregate

Joins (declared + ad-hoc, multi-hop)

Declared (preferred)

Ad-hoc (no schema change needed)

Multi-hop chains

Field reference shape

Join semantics

Worked example: ads → campaign → brand

Saved views

Bulk JSONL REST routes (large-data ingest)

When to use REST vs MCP vs env.dataset

Headers

Copy-pastable example

Common gotchas

Worked example — end-to-end Amazon ads daily report

1. Define the schema

2. Aggregate: top 20 campaigns by spend, last 7 days, with ACOS

3. Compare: trailing 7 vs prior 7

4. Save it as a daily view + schedule

5. Multi-hop join: roll up by brand

Column projection — the `select` arg on dataset_query / dataset_get

When to use REST vs MCP vs `env.dataset`

Column projection — the `select` arg on dataset_query / dataset_get

When to use REST vs MCP vs `env.dataset`