From zombie-brains
Design dataset schemas, run analytics queries, and build reports against Zombie Brains datasets. Use whenever the user is designing a dataset schema, querying dataset data, building a report, computing aggregates / KPIs / rollups, comparing two time windows, materializing results, scheduling a recurring report, doing bulk JSONL ingest, setting up saved views, or asking which dataset op fits a reporting question. Covers register_dataset / register_dataset_table, dataset_query / count / get / insert / upsert / update / delete / aggregate / compare / describe / export_aggregate, declared + ad-hoc joins, saved views, and the bulk REST routes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/zombie-brains:zombie-build-datasetsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Datasets are the analytics-grade tabular surface. Memories are for unstructured knowledge; **datasets are for structured rows you'll filter, group, and roll up.** When the user says "build me a report", "show me top N by X", "current vs prior week", "ingest this CSV" — that is a dataset job.
Datasets are the analytics-grade tabular surface. Memories are for unstructured knowledge; datasets are for structured rows you'll filter, group, and roll up. When the user says "build me a report", "show me top N by X", "current vs prior week", "ingest this CSV" — that is a dataset job.
A dataset is a workspace-scoped (account_owner_id) container — Airtable's "base". A dataset_table is one typed table inside a dataset — Airtable's "tab". Each dataset_table has a typed schema, a dedicated Postgres physical table (with generated columns + B-tree indexes for indexed fields), and is addressed as (dataset_slug, table_name) across the V8 surface (env.dataset.<slug>.<table>.<op>(...)), REST surface (/v1/datasets/<slug>/tables/<table>/...), and the manage tool (dataset_* actions).
Datasets are different from Phase-1 datastores (brain-scoped, single-table). The Phase-1 surface still works for legacy code but the Phase-1 datastore registry is in deprecation — favor datasets for all new work.
Permissions live on the dataset, not the table: read / write / admin. Highest grant wins across permission sets. Workspace members are admin on every dataset their workspace owns.
When a human consultant (a user whose workspace has child workspaces via parent_account_owner_id) calls a read or query action — list_datasets, list_dataset_tables, dataset_count, dataset_aggregate, dataset_query, dataset_compare, dataset_describe, dataset_export_aggregate, save_view, list_views — the dataset lookup tree-walks parent_account_owner_id. The parent can address datasets in any child workspace by dataset_slug or dataset_id directly; no workspace-switch dance required.
Agents do not tree-walk. Agent callers (tools, routines, anything resolving through an agent's owner) are strictly workspace-bound — they only see datasets in their own workspace. The consultant convenience is a human-only ergonomic.
A schema is { fields: [...], compound_indexes?: [...], relationships?: [...] }. Each field is:
{ "name": "spend", "type": "number", "required": false, "indexed": true, "unique": false }
Field types (these are exhaustive — pick the right one upfront):
| Type | Use for | Indexable | Aggregatable |
|---|---|---|---|
string | short identifiers, codes | yes | min/max/count_distinct only |
text | long free-form text | yes (B-tree on prefix) | count_distinct |
number | money, ratios, percentages | yes | sum/avg/min/max |
integer | counts, IDs | yes | sum/avg/min/max |
boolean | flags | yes | count/count_distinct |
date | calendar days (no TZ) | yes | min/max |
timestamp | event times (with TZ) | yes | min/max |
enum (+ values: [...]) | constrained string set | yes | count_distinct |
string_array | tags, categories | yes (GIN) | not in metrics |
number_array | embeddings, ranges | yes (GIN) | not in metrics |
json | unstructured blobs | no (covered by GIN on data) | not in metrics |
Type discipline matters. Storing a date as string blocks date math, min/max ordering, and dataset_describe's date-coverage probe. Storing an integer as text blocks sum/avg. Always pick the most specific type. You can extend a schema additively (add fields, add indexes) but you cannot change a field's type — declare it correctly the first time.
A field becomes an indexed Postgres generated column iff indexed: true. Index a field when:
where, group_by, order_by)Do not over-index. Each generated column slows down every write. Skip indexed: true for fields you only display, not filter on. Non-indexed fields are still queryable via JSONB extraction — they just hit (data->>'field') casts instead of B-tree lookups, which is fine for low-cardinality where clauses on small tables.
unique: true creates a scoped unique index (per workspace) and is the only way to enable on_conflict: update / upsert on a field.
"compound_indexes": [{ "fields": ["account_id", "report_date"] }]
Use for multi-column filters that always come together (e.g. "filings by account on a given date"). Each field must already be declared and must be a scalar (no json/array). Compound indexes are additive — add when you see the access pattern, not preemptively.
Declare same-dataset joins on schema.relationships[]:
"relationships": [
{
"name": "campaign",
"target_table": "campaigns",
"local_field": "campaign_id",
"target_field": "external_id",
"kind": "many_to_one"
}
]
kind is one_to_one / one_to_many / many_to_one. It's a semantic hint, not SQL — the generated join is identical for all kinds. Use it for UI/tooling clarity.
v1 enforces same-dataset only (target_table is a sibling). Cross-dataset joins are deferred. The target's schema is validated at join-time, not at declaration time — so you can declare a relationship pointing at a table you haven't created yet.
Pick the smallest op that answers the question. Reaching for dataset_query and doing math in Claude is almost always wrong.
| Question shape | Op | Why |
|---|---|---|
| "Give me these specific rows" | dataset_query | with where/order_by/limit |
| "How many rows match X?" | dataset_count | no rows returned, just a number |
| "Fetch one row by id" | dataset_get | indexed lookup |
| "What's in this table?" | dataset_describe | schema + row count + date coverage, kills field-name guessing |
| "Sum spend by campaign" | dataset_aggregate | server-side GROUP BY + metrics |
| "Top N by ACOS" | dataset_aggregate + calculated | calc fields participate in order_by + having |
| "This week vs last week" | dataset_compare | two-window deltas |
| "Materialize aggregate to a new table" | dataset_export_aggregate | bypass 10K result cap |
| "Insert/update/delete one or a batch" | dataset_insert / _upsert / _update / _delete | transactional row work |
| "Bulk import a JSONL file" | REST /v1/datasets/:slug/tables/:t/upsert-jsonl | 200 MB body, auto-batched |
Always call dataset_describe first when you don't know the schema. It returns declared fields with types, row count, and the date coverage range — exactly what you need to compose an aggregate without guessing whether the column is spend or cost or total_cost.
select arg on dataset_query / dataset_getBy default dataset_query returns the entire row's data blob for every match. On tables that store embeddings (1,536 floats per row) or multi-KB text fields, a naive limit: 100 can be megabytes of wire payload before you even start reading.
Narrow the read with select:
manage(dataset_query, {
dataset_slug: 'kdp-ads-performance-warehouse',
table_name: 'sp_performance_rows',
where: [{ field: 'campaign_status', operator: '=', value: 'enabled' }],
order_by: [{ field: 'created_at', direction: 'desc' }],
limit: 50,
select: ['campaign_id', 'campaign_name', 'spend', 'acos']
})
Server-side projection via jsonb_build_object — the wire payload contains only those keys plus id / created_at / updated_at (always returned). Names are whitelist-validated against schema.fields; an unknown name throws with the valid list. Omit or pass [] for the legacy full-row behavior.
Same arg works on dataset_get for single-row reads.
When NOT to project: when you need the row for downstream dataset_update, you usually want the full blob so the round-trip preserves untouched fields. Projection is for read-only flows (display, comparison, filtering, slicing) where you already know the few columns that matter.
{
"action": "dataset_aggregate",
"dataset_slug": "kdp-ads",
"table_name": "ads_daily",
"where": [
{ "field": "report_date", "operator": "gte", "value": "2026-05-21" }
],
"group_by": ["campaign_id"],
"metrics": [
{ "op": "sum", "field": "spend", "as": "total_spend" },
{ "op": "sum", "field": "sales_14d", "as": "total_sales" }
],
"calculated": [
{ "as": "acos", "expr": "total_spend / total_sales" }
],
"order_by": [{ "field": "acos", "direction": "asc" }],
"having": [{ "field": "total_spend", "operator": "gt", "value": 10 }],
"limit": 20
}
sum, avg — require number / integer field. Strings won't auto-coerce — the schema check rejects them upfront.min, max — any scalar field (works on dates and strings too).count — always COUNT(*). field is ignored.count_distinct — COUNT(DISTINCT <field>).Each metric gets an alias (as) — defaults to <op>_<field> (or count for op: count). All aliases must be unique across the metrics list AND must not collide with any group_by field name.
Calculated metrics run in JS, after the SQL group-by. The formula language is intentionally tiny:
+, -, *, /, parentheses2.5, 0.30, 100null + a warning in metadata.Calculated entries evaluate in declaration order, so chain them:
"calculated": [
{ "as": "acos", "expr": "total_spend / total_sales" },
{ "as": "acos_pct", "expr": "acos * 100" }
]
You can filter and sort by calculated values:
"having": [{ "field": "acos", "operator": "lt", "value": 0.30 }],
"order_by": [{ "field": "acos", "direction": "asc" }]
Under the hood: clauses targeting metric or group_by aliases go to SQL HAVING / ORDER BY. Clauses targeting calculated aliases are deferred to JS. When any calc-field clause is present, the platform widens the internal SQL limit to 10K so the JS pass has enough candidates to honor "top N by ACOS". metadata.widened_limit: true and candidates_before_post_filter flag this when it happens.
dataset_aggregate will reject limit higher than thisdataset_export_aggregateIf you need more than 10K result rows: tighten the where, OR use dataset_export_aggregate to materialize, then query the new table.
Two-window comparison — same group_by + metrics, run twice with different where clauses, joined on group keys, with delta_<alias> and delta_pct_<alias> columns computed in JS.
{
"action": "dataset_compare",
"dataset_slug": "kdp-ads",
"table_name": "ads_daily",
"group_by": ["campaign_id"],
"metrics": [{ "op": "sum", "field": "spend", "as": "total_spend" }],
"calculated": [
{ "as": "acos", "expr": "total_spend / total_sales_14d" }
],
"current": {
"where": [
{ "field": "report_date", "operator": "gte", "value": "2026-05-21" },
{ "field": "report_date", "operator": "lte", "value": "2026-05-27" }
]
},
"previous": {
"where": [
{ "field": "report_date", "operator": "gte", "value": "2026-05-14" },
{ "field": "report_date", "operator": "lte", "value": "2026-05-20" }
]
},
"order_by": [{ "field": "delta_total_spend", "direction": "desc" }],
"limit": 50
}
v1 takes explicit date ranges. There is no "current": "trailing_7d" sugar yet — compute the dates client-side or in your routine.
Each result row has:
current: { <alias>: ... } and previous: { <alias>: ... } for every metric + calculateddelta_<alias> and delta_pct_<alias> flat keys (so you can sort on them)metadata.only_in_current / only_in_previous count new and dropped groups — useful for "what campaigns are new this week" reports.
Sort field paths in order_by:
campaign_id)current.<alias> or previous.<alias>delta_<alias> or delta_pct_<alias>Default sort is by sum of the first metric across both periods desc (biggest movers regardless of direction).
metrics and group_by are shared across both windows. If you need per-period overrides, run two dataset_aggregate calls and join client-side.
Materialize an aggregate result back into a new dataset_table. Use when:
{
"action": "dataset_export_aggregate",
"dataset_slug": "kdp-ads",
"table_name": "ads_daily",
"group_by": ["report_date", "campaign_id"],
"metrics": [
{ "op": "sum", "field": "spend", "as": "total_spend" },
{ "op": "sum", "field": "sales_14d", "as": "total_sales" }
],
"calculated": [
{ "as": "acos", "expr": "total_spend / total_sales" }
],
"target_table_name": "ads_daily_campaign_rollup",
"target_display_name": "Campaign daily rollup",
"target_description": "Materialized from ads_daily, rebuilt nightly."
}
number; count/count_distinct → integer; calculated → number.if_exists: 'fail' only — pick a new name or delete the old table first.After the export, the materialized table is queryable like any other (dataset_query, dataset_aggregate, etc.). Pattern: a daily routine rebuilds ads_daily_campaign_rollup, and a saved view queries it for the dashboard.
Joins extend dataset_aggregate / dataset_query / dataset_compare with sibling-table lookups. Two modes:
"join": [{ "relationship": "campaign" }]
Uses the relationship declared on schema.relationships[]. The resolver pulls target_table + local_field + target_field from the declaration. Cleanest for repeated reports.
"join": [{
"target_table": "campaigns",
"local_field": "campaign_id",
"target_field": "external_id"
}]
Useful for one-offs or exploratory queries where declaring a relationship isn't worth it.
Chain via from: '<earlier_alias>'. Aliases default to the relationship name (declared) or the target table name (ad-hoc):
"join": [
{ "relationship": "campaign" },
{ "relationship": "brand", "from": "campaign" }
]
"group_by": ["campaign.brand_name"]
Inside a joined query, all field-name spots (group_by, metrics.field, where.field, having.field, order_by.field, calc formula identifiers' base values):
spend) → base table (alias t0)campaign.brand_name) → joined table by aliasIn the output, qualified group_by columns flatten dots to underscores (campaigns.brand → result key campaigns_brand). Calc formulas reference that flattened alias.
type: 'left' (default) — base rows show up even with no matchtype: 'inner' — base rows without a match are dropped{
"action": "dataset_aggregate",
"dataset_slug": "kdp-ads",
"table_name": "ads_daily",
"join": [
{ "relationship": "campaign" },
{ "relationship": "brand", "from": "campaign" }
],
"where": [
{ "field": "report_date", "operator": "gte", "value": "2026-05-21" }
],
"group_by": ["brand.name"],
"metrics": [
{ "op": "sum", "field": "spend", "as": "total_spend" }
],
"order_by": [{ "field": "total_spend", "direction": "desc" }],
"limit": 20
}
Note spend is on the base table (ads_daily), and the group key resolves to a joined table's field. Mixing freely is fine.
Canonicalize a query you'll run repeatedly:
{
"action": "save_view",
"view_name": "kdp_ads_daily_performance",
"dataset_slug": "kdp-ads",
"table_name": "ads_daily",
"query_kind": "aggregate",
"query": {
"where": [{ "field": "report_date", "operator": "gte", "value": "2026-05-21" }],
"group_by": ["campaign_id"],
"metrics": [{ "op": "sum", "field": "spend", "as": "total_spend" }],
"calculated": [{ "as": "acos", "expr": "total_spend / total_sales_14d" }],
"order_by": [{ "field": "acos", "direction": "asc" }],
"limit": 50
}
}
Then later:
{ "action": "run_view", "view_name": "kdp_ads_daily_performance" }
query_kind: 'aggregate' or 'compare'. export_aggregate is NOT recallable as a saved view in v1.run_view re-resolves permissions per call. Revoking dataset access takes effect immediately.run_view — the human saves the view, the agent runs it. Cross-user view access is not supported.list_views, get_view, update_view, delete_view.The canonical use case: a daily report. Save the view once; the routine runs it on cron and ships the output to Slack/email.
When you have more than a few thousand rows, use the REST routes — not manage(dataset_upsert) in a loop:
POST /v1/datasets/:slug/tables/:table/insert-jsonl
POST /v1/datasets/:slug/tables/:table/upsert-jsonl
POST /v1/datasets/:slug/tables/:table/update-jsonl
POST /v1/datasets/:slug/tables/:table/delete-jsonl
Content-Encoding: gzip)per_batch[].error with line rangesrole_datasets permission walk, same quota gating, same schema validation as MCP / V8 pathsenv.dataset| Path | Use for |
|---|---|
REST /v1/datasets/... | Bulk imports (1M+ rows), CSV migrations, gzipped uploads |
manage(dataset_*) via MCP | Interactive analytics, one-off queries, saved view CRUD |
env.dataset.<slug>.<table>.<op>() inside V8 tool | Transactional row work from a tool / routine |
Authorization: Bearer <api_key> — same key used for /mcp/:apiKeyX-On-Conflict: update | ignore | error — insert behavior on uniqueness collisionX-Conflict-On: <field> — required for upsert; the field must be unique: true in the schemaX-Batch-Size: <1..1000> — override the default 1000ZB_API_KEY=cm_xxx
DATASET=kdp-acquisition-diligence
TABLE=filings
curl -sS -X POST \
"https://mcp.zombie.codes/v1/datasets/$DATASET/tables/$TABLE/upsert-jsonl" \
-H "Authorization: Bearer $ZB_API_KEY" \
-H "Content-Type: application/x-ndjson" \
-H "X-Conflict-On: external_id" \
--data-binary @chunk_001.jsonl \
| jq '{batches_succeeded, batches_failed, totals, duration_ms}'
Body shape per line:
| Op | Each JSONL line is … |
|---|---|
| insert | a record object: {...fields...} |
| upsert | a record object: {...fields...} |
| update | {"id": "<uuid>", "fields": {...patch...}} |
| delete | a bare "<uuid>" string OR {"id": "<uuid>"} |
name vs table_name aliasing. The dataset_* row-level actions accept either name or table_name for the table — they're aliased. register_dataset_table / update_dataset_table use name. Pick one and stick with it within a session.
Field type matters for aggregation. sum and avg reject anything that isn't number or integer. Strings stored as numbers won't be auto-coerced. If you're aggregating and getting "field type X" errors, you stored the wrong type — fix the schema (additive: declare a new typed field, backfill, leave the bad field) before retrying.
Calculated formulas are numeric-only. No if(), no string concat, no function calls. If you need conditional logic, do it in SQL via where clauses splitting the dataset, or do post-processing in your tool/routine code.
In joined queries, calc identifiers always reference numeric aliases. Don't try to reference a joined string column (campaign.name) from a formula — it'll parse but eval to null. Use string columns in group_by, numeric columns in formulas.
"limit too high" — use export. If you legitimately need >10K result rows, dataset_export_aggregate is the answer. Materialize the rollup, then query it. Trying to bypass via a wider where is the wrong direction for actual large-result reports.
Don't try to push calc-field filters into SQL HAVING. The platform handles the routing transparently — you write having: [{ field: "acos", operator: "lt", value: 0.30 }] and the platform decides whether it's a SQL clause or a post-side JS filter. Don't pre-optimize.
Indexed date fields are load-bearing. A where { report_date >= ... } against a non-indexed date field will scan the JSONB blob row-by-row. On a small table that's fine; on the daily-rollup of a year of ads data, it's painful. Index every date you'll filter on.
Schema changes are additive-only. You can add fields, add indexes on existing fields, add compound indexes, grow an enum's values list. You cannot remove a field, change a type, remove an index, or shrink an enum. For destructive changes: register a new dataset_table with the desired shape and migrate the data.
Reserved column names. id, brain_id, data, created_at, updated_at are reserved — you can't declare them as fields, and they're silently dropped from update patches.
dataset_describe is your friend. Before composing any aggregate against a table you didn't just define, call describe. It gives you: declared fields with types, row count, and the date-coverage range. Three seconds saves a guess-and-retry loop.
Quota counts against the dataset's owning workspace, not the caller's. When a consultant in workspace A writes to a dataset owned by child workspace B, the rows count against B's tier bucket — the handler resolves workspaceOwnerId from resolved.account_owner_id (the dataset's owner) before checking quota, not from the caller. Translation: parent writes to a child's dataset are billed to the child. Plan child-workspace tiers with that in mind; the parent's headroom doesn't subsidize them.
dataset_id (UUID) bypasses slug lookup — useful escape hatch. Slug lookup runs through the workspace filter (with the consultant tree-walk above). If a slug lookup ever fails — we hit this transiently with the tree walk during a deploy — passing dataset_id instead of dataset_slug fetches the row by primary key and skips the workspace-scoped slug query entirely. checkDatasetAccess still validates tree-walked access against the resolved row, so this is not a permissions bypass — just a lookup escape hatch when the slug path is misbehaving. Keep it in your back pocket for debugging.
{
"action": "register_dataset_table",
"dataset_slug": "kdp-ads",
"name": "ads_daily",
"display_name": "Amazon Ads daily rollup",
"schema": {
"fields": [
{ "name": "report_date", "type": "date", "required": true, "indexed": true },
{ "name": "campaign_id", "type": "string", "required": true, "indexed": true },
{ "name": "ad_group_id", "type": "string", "indexed": true },
{ "name": "asin", "type": "string", "indexed": true },
{ "name": "impressions", "type": "integer" },
{ "name": "clicks", "type": "integer" },
{ "name": "spend", "type": "number" },
{ "name": "sales_14d", "type": "number" },
{ "name": "orders_14d", "type": "integer" }
],
"compound_indexes": [
{ "fields": ["report_date", "campaign_id"] }
],
"relationships": [
{
"name": "campaign",
"target_table": "campaigns",
"local_field": "campaign_id",
"target_field": "external_id",
"kind": "many_to_one"
}
]
}
}
{
"action": "dataset_aggregate",
"dataset_slug": "kdp-ads",
"table_name": "ads_daily",
"where": [
{ "field": "report_date", "operator": "gte", "value": "2026-05-21" }
],
"group_by": ["campaign_id"],
"metrics": [
{ "op": "sum", "field": "spend", "as": "total_spend" },
{ "op": "sum", "field": "sales_14d", "as": "total_sales" },
{ "op": "sum", "field": "orders_14d", "as": "total_orders" }
],
"calculated": [
{ "as": "acos", "expr": "total_spend / total_sales" }
],
"having": [{ "field": "total_spend", "operator": "gt", "value": 50 }],
"order_by": [{ "field": "total_spend", "direction": "desc" }],
"limit": 20
}
{
"action": "dataset_compare",
"dataset_slug": "kdp-ads",
"table_name": "ads_daily",
"group_by": ["campaign_id"],
"metrics": [
{ "op": "sum", "field": "spend", "as": "total_spend" },
{ "op": "sum", "field": "sales_14d", "as": "total_sales" }
],
"calculated": [
{ "as": "acos", "expr": "total_spend / total_sales" }
],
"current": {
"where": [
{ "field": "report_date", "operator": "gte", "value": "2026-05-21" },
{ "field": "report_date", "operator": "lte", "value": "2026-05-27" }
]
},
"previous": {
"where": [
{ "field": "report_date", "operator": "gte", "value": "2026-05-14" },
{ "field": "report_date", "operator": "lte", "value": "2026-05-20" }
]
},
"order_by": [{ "field": "delta_total_spend", "direction": "desc" }],
"limit": 50
}
{
"action": "save_view",
"view_name": "kdp_ads_top_campaigns_7d",
"dataset_slug": "kdp-ads",
"table_name": "ads_daily",
"query_kind": "aggregate",
"query": { /* the aggregate body from step 2 */ }
}
Then a routine (see zombie-build-routines) runs manage(run_view, view_name: 'kdp_ads_top_campaigns_7d') on a cron and ships the result.
{
"action": "dataset_aggregate",
"dataset_slug": "kdp-ads",
"table_name": "ads_daily",
"join": [
{ "relationship": "campaign" },
{ "relationship": "brand", "from": "campaign" }
],
"where": [
{ "field": "report_date", "operator": "gte", "value": "2026-05-01" }
],
"group_by": ["brand.name"],
"metrics": [
{ "op": "sum", "field": "spend", "as": "total_spend" },
{ "op": "sum", "field": "sales_14d", "as": "total_sales" }
],
"calculated": [
{ "as": "acos", "expr": "total_spend / total_sales" }
],
"order_by": [{ "field": "total_spend", "direction": "desc" }],
"limit": 20
}
spend and sales_14d are on the base table; the group key resolves through the join chain to brand.name.
zombie-build-tools — for env.dataset.<slug>.<table>.<op>(...) usage inside a V8 serverless tool. Same ops (CRUD + analytics — aggregate / compare / describe / export_aggregate), same arg shapes, different transport. The runner Proxy normalizes positional V8 args (get(id), delete(ids), insert(records, opts), upsert(records, opts), update(id, fields)) into the structured-arg wire format; analytics ops take the same single-args-object you'd pass through manage(dataset_*).zombie-build-routines — for cron-scheduling a saved view, building daily/weekly reports.zombie-build-permissions — for granting dataset access via permission sets (attach_dataset_to_role).npx claudepluginhub zombie-brains/zombie-brains --plugin zombie-brainsProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.