Skill

dremio

Working with Dremio Cloud, including ingesting, transforming, and querying data. Use when the user asks about data, tables, schemas, SQL queries, datasets, analytics, or anything involving structured data.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/dremio:dremio

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

A project contains **namespaces** (managed Iceberg storage, full DDL/DML) and **sources** (connections to external systems). Sources can be object storage (S3, ADLS), databases (Oracle, Postgres, MongoDB), or other catalogs (Unity Catalog, Glue). Tables in SQL: `"Namespace"."table"` or `"Namespace"."Folder"."table"`.

SKILL.md

274 lines · ~2.9k tokens

Stats

Parent stars0

MaintenanceGood

Last CommitMar 3, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Dremio Cloud

A project contains namespaces (managed Iceberg storage, full DDL/DML) and sources (connections to external systems). Sources can be object storage (S3, ADLS), databases (Oracle, Postgres, MongoDB), or other catalogs (Unity Catalog, Glue). Tables in SQL: "Namespace"."table" or "Namespace"."Folder"."table".

REST API

Use the REST API for all Dremio interactions. Base URL: https://api.dremio.cloud/v0/projects/${DREMIO_PROJECT_ID}.

Auth setup: DREMIO_PAT and DREMIO_PROJECT_ID must be set. The .env file should use export and single quotes (PATs often contain +, /, = which can be mangled by the shell):

export DREMIO_PAT='your_pat_here'
export DREMIO_PROJECT_ID='your_project_id_here'

If the .env doesn't exist, ask the user to create one with their PAT and project ID.

zsh + special characters: PATs often contain +, /, = which zsh can silently corrupt. Either regenerate the PAT until you get one without +, or wrap REST API calls in bash -c '. /path/to/.env && curl ...'.

Use jq (not python) to parse JSON responses.

Search API

Find tables, views, and other entities by keyword:

curl -s -X POST "https://api.dremio.cloud/v0/projects/${DREMIO_PROJECT_ID}/search" \
  -H "Authorization: Bearer $DREMIO_PAT" -H "Content-Type: application/json" \
  -d '{"query": "search terms", "filter": "category in [\"TABLE\", \"VIEW\"]", "maxResults": 10}'

Filter categories: TABLE, VIEW, FOLDER, SPACE, SOURCE, REFLECTION, UDF, SCRIPT, JOB. Results include paths, columns, wiki descriptions, and labels.

Important — project name in search paths: The Search API sometimes includes the project name (e.g., "lakehouse-project") as the first element of the path array. This is a Dremio Cloud bug. Never use the project name in SQL paths. When constructing SQL from search results, strip the first path element if it matches a project name rather than a namespace or source. For example, if the search returns ["my-project", "MyNamespace", "MyTable"], the correct SQL reference is MyNamespace."MyTable", not "my-project".MyNamespace."MyTable".

Workflow

Find tables. Use the Search API with keywords. Fallback: INFORMATION_SCHEMA."TABLES" with TABLE_TYPE != 'SYSTEM_TABLE'.
Understand schema. Search API returns columns inline. For more detail, query INFORMATION_SCHEMA."COLUMNS".
Query. Use the SQL API for all queries (SELECT, DDL, DML).
Present results as markdown tables. Summarize findings. Suggest follow-ups.

Data Ingestion

CTAS (CREATE TABLE AS SELECT) — one step, but only works on data already queryable in Dremio.
COPY INTO — loads external files (CSV, JSON, Parquet) into a namespace table. Table must exist first.

For new data in S3: prefer CREATE TABLE + COPY INTO over promoting raw files (Iceberg is more efficient).

-- CTAS (from existing data)
CREATE TABLE Namespace.summary AS SELECT state, SUM(amount) FROM Namespace.raw GROUP BY state;

-- COPY INTO (from external files — two steps)
CREATE TABLE Namespace.my_table (id INT, name VARCHAR, created_at TIMESTAMP);
COPY INTO Namespace.my_table FROM '@source/path/to/data/' FILE_FORMAT 'parquet';

COPY INTO — Critical Syntax Rules

Path format: Always use @source with forward slashes for paths. NEVER use dot-notation.

-- CORRECT
COPY INTO my_table FROM '@datasets/bucket-name/folder/' FILE_FORMAT 'parquet';

-- WRONG — dot-notation causes "invalid source type" errors
COPY INTO my_table FROM '@datasets."bucket"."folder"' FILE_FORMAT 'parquet';
COPY INTO my_table FROM '@source."path"."to"' FILE_FORMAT 'parquet';

Source names: Do NOT quote the source name after @. Quoting causes "non-existent source" errors.

-- CORRECT
COPY INTO my_table FROM '@datasets/path/to/data/' FILE_FORMAT 'parquet';

-- WRONG — quoted source name fails
COPY INTO my_table FROM '@"datasets"/path/to/data/' FILE_FORMAT 'parquet';

FILE_FORMAT: Just the format name in quotes. No 'type' = prefix.

-- CORRECT
FILE_FORMAT 'parquet'
FILE_FORMAT 'csv'
FILE_FORMAT 'json'

-- WRONG
FILE_FORMAT 'type' = 'parquet'    -- parser error

Mixed schemas in a folder: If a folder contains files with different schemas, point to the specific file — otherwise COPY INTO will fail when it encounters a file whose columns don't match the target table:

-- Folder has sales.parquet and customers.parquet with different schemas
-- WRONG — will fail on schema mismatch
COPY INTO my_sales FROM '@datasets/bucket/data/' FILE_FORMAT 'parquet';

-- CORRECT — point to the specific file
COPY INTO my_sales FROM '@datasets/bucket/data/sales.parquet' FILE_FORMAT 'parquet';

Namespace & Folder Management

To create a new namespace (top-level container for Iceberg tables and views), use CREATE FOLDER via the REST SQL API. Do NOT use the catalog REST API to create spaces — spaces cannot hold Iceberg tables.

CREATE FOLDER my_namespace                    -- create a new namespace
CREATE FOLDER my_namespace.staging            -- subfolder within a namespace
CREATE FOLDER my_namespace.staging.temp       -- nested subfolder
CREATE FOLDER IF NOT EXISTS my_namespace      -- idempotent create
DROP FOLDER my_namespace.staging.temp         -- must drop children first
DROP FOLDER my_namespace.staging              -- cannot drop non-empty folders

DROP FOLDER IF EXISTS is not supported — omit IF EXISTS.

Important: Never use AT BRANCH syntax with CREATE/DROP FOLDER — it is not supported and will cause a parser error.

Dremio SQL Reference

Reserved Keywords (must quote as aliases)

day, month, year, hour, minute, second, count, value, rows, timestamp, date, time, covar_pop, table, order, group, key, type, user, name, status

Use AS "day" not AS day.

General Rules

Fully qualify table names: "Namespace"."table" or "Namespace"."Folder"."table"
LIMIT results to 100 by default unless aggregating
Handle NULLs with COALESCE or IS NOT NULL
Use aliases for computed columns

Function Syntax (Dremio-specific)

Function	Correct Syntax	Common Mistake
ILIKE	`ILIKE(col, pattern)`	`col ILIKE pattern`
PERCENTILE_CONT	`PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY col)`	Other variants
DATEDIFF	`DATEDIFF(start, end)` — 2 args, returns days	Unit argument
TIMESTAMPDIFF	`TIMESTAMPDIFF(unit, start, end)`	DATEDIFF with units
TIMESTAMPADD	`TIMESTAMPADD(unit, count, ts)`
EXTRACT	`EXTRACT(DAY FROM ts)`
JSON access	`CONVERT_FROM(col, 'JSON')."field"`	`JSON_EXTRACT_SCALAR`, `->>`
STRUCT access	`struct_col."field"`	`col['field']`
ANY_VALUE	Not supported — use `MIN(col)` or `MAX(col)`	`ANY_VALUE(col)`
FLATTEN	`FLATTEN(col)` or `CROSS JOIN UNNEST(col)`	`LATERAL FLATTEN`
ARRAY_AGG	Not supported for STRUCT/MAP/LIST; 32KB limit
Intervals	`INTERVAL '1' HOUR` (unquoted unit)	Quoting the unit
Large intervals	`TIMESTAMPADD(DAY, -365, CURRENT_DATE)`	`INTERVAL '365' DAY`

Unstructured Data & AI_GENERATE

Use AI_GENERATE + LIST_FILES to extract structured data from images, PDFs, and documents in object storage.

Basic pattern

CREATE TABLE Namespace.Results AS
WITH
  files AS (
    SELECT file, file['path'] AS file_path
    FROM TABLE(LIST_FILES('@source/path/to/files/'))
  ),
  extracted AS (
    SELECT file_path,
      AI_GENERATE('model-name',   -- e.g. 'FieldOrg', 'claude.claude-sonnet-4-6'
        ('Your extraction prompt.', file)
        WITH SCHEMA ROW(field1 VARCHAR, field2 DOUBLE)
      ) AS data
    FROM files
  )
SELECT data['field1'] AS field1, data['field2'] AS field2, file_path
FROM extracted;

Variable-length extraction (JSON array → FLATTEN)

For documents with variable numbers of items (e.g., invoice line items):

Extract with line_items_json VARCHAR in the schema — prompt the LLM to return a JSON array
Parse and flatten:

CREATE VIEW Namespace.LineItems AS
WITH parsed AS (
  SELECT *, TRY_CONVERT_FROM(
    line_items_json AS ARRAY(ROW(field_a VARCHAR, amount DOUBLE))
  ) AS items
  FROM Namespace.Documents
),
flattened AS (
  SELECT doc_id, FLATTEN(items) AS item FROM parsed
)
SELECT doc_id, item['field_a'] AS field_a, item['amount'] AS amount
FROM flattened;

Key points

Pass file column from LIST_FILES as second tuple element for images/PDFs
Access fields: data['field_name']
Use CREATE TABLE (not view) to materialize AI results — avoids re-running LLM on each query
For variable-length data: extract as VARCHAR JSON array, then TRY_CONVERT_FROM + FLATTEN

Job Status API

When submitting DDL/DML via the REST API (CTAS, COPY INTO, CREATE FOLDER, etc.), the response returns a job ID. Poll for completion:

# Submit a SQL job
JOB_ID=$(curl -s -X POST "https://api.dremio.cloud/v0/projects/${DREMIO_PROJECT_ID}/sql" \
  -H "Authorization: Bearer $DREMIO_PAT" -H "Content-Type: application/json" \
  -d '{"sql": "CREATE TABLE ..."}' | jq -r '.id')

# Poll job status
curl -s "https://api.dremio.cloud/v0/projects/${DREMIO_PROJECT_ID}/job/${JOB_ID}" \
  -H "Authorization: Bearer ${DREMIO_PAT}" | jq '{jobState, rowCount, errorMessage}'

Job states: PENDING, METADATA_RETRIEVAL, PLANNING, QUEUED, ENGINE_START, RUNNING, COMPLETED, FAILED, CANCELED

Polling tips:

CTAS on large tables (100M+ rows) can take 1-3 minutes
COPY INTO from parquet is typically faster than CTAS
CREATE FOLDER / CREATE TABLE (empty) complete in seconds
Check errorMessage on FAILED — common causes: syntax errors, schema mismatches, source permissions

Discovering Data & System Analysis

-- Browse tables/views
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE FROM INFORMATION_SCHEMA."TABLES"
WHERE TABLE_TYPE != 'SYSTEM_TABLE' ORDER BY TABLE_SCHEMA, TABLE_NAME

-- Recent jobs
SELECT * FROM sys.project.jobs_recent ORDER BY submitted_ts DESC LIMIT 20

-- Engine status / Org users
SELECT * FROM sys.project.engines
SELECT * FROM sys.organization.users

Documentation

Dremio Cloud docs: https://docs.dremio.com/dremio-cloud/

Look up syntax with WebFetch when unsure:

Topic	URL
SQL Functions	https://docs.dremio.com/dremio-cloud/sql/sql-functions/functions/
SQL Reference	https://docs.dremio.com/dremio-cloud/sql/
COPY INTO	https://docs.dremio.com/dremio-cloud/sql/commands/copy-into-table/
CREATE TABLE	https://docs.dremio.com/dremio-cloud/sql/commands/tables/create-table/
Data Types	https://docs.dremio.com/dremio-cloud/sql/data-types/
REST API	https://docs.dremio.com/dremio-cloud/api/

Best Practices

If ambiguous which table to use, show what's available and ask.
On query errors, check: unquoted reserved words, wrong table path, column typos.
Aggregate first, then drill down. Don't dump hundreds of raw rows.
Verify column types match before joins.
Suggest follow-up queries based on results.

dremio

Invocation

Context Preview

SKILL.md

dremio

Invocation

Context Preview

SKILL.md

Dremio Cloud

REST API

Search API

Workflow

Data Ingestion

COPY INTO — Critical Syntax Rules

Namespace & Folder Management

Dremio SQL Reference

Reserved Keywords (must quote as aliases)

General Rules

Function Syntax (Dremio-specific)

Unstructured Data & AI_GENERATE

Basic pattern

Variable-length extraction (JSON array → FLATTEN)

Key points

Job Status API

Discovering Data & System Analysis

Documentation

Best Practices

Similar Skills

Dremio Cloud

REST API

Search API

Workflow

Data Ingestion

COPY INTO — Critical Syntax Rules

Namespace & Folder Management

Dremio SQL Reference

Reserved Keywords (must quote as aliases)

General Rules

Function Syntax (Dremio-specific)

Unstructured Data & AI_GENERATE

Basic pattern

Variable-length extraction (JSON array → FLATTEN)

Key points

Job Status API

Discovering Data & System Analysis

Documentation

Best Practices

Similar Skills