From dremio
Working with Dremio Cloud, including ingesting, transforming, and querying data. Use when the user asks about data, tables, schemas, SQL queries, datasets, analytics, or anything involving structured data.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dremio:dremioThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A project contains **namespaces** (managed Iceberg storage, full DDL/DML) and **sources** (connections to external systems). Sources can be object storage (S3, ADLS), databases (Oracle, Postgres, MongoDB), or other catalogs (Unity Catalog, Glue). Tables in SQL: `"Namespace"."table"` or `"Namespace"."Folder"."table"`.
A project contains namespaces (managed Iceberg storage, full DDL/DML) and sources (connections to external systems). Sources can be object storage (S3, ADLS), databases (Oracle, Postgres, MongoDB), or other catalogs (Unity Catalog, Glue). Tables in SQL: "Namespace"."table" or "Namespace"."Folder"."table".
Use the REST API for all Dremio interactions. Base URL: https://api.dremio.cloud/v0/projects/${DREMIO_PROJECT_ID}.
Auth setup: DREMIO_PAT and DREMIO_PROJECT_ID must be set. The .env file should use export and single quotes (PATs often contain +, /, = which can be mangled by the shell):
export DREMIO_PAT='your_pat_here'
export DREMIO_PROJECT_ID='your_project_id_here'
If the .env doesn't exist, ask the user to create one with their PAT and project ID.
zsh + special characters: PATs often contain +, /, = which zsh can silently corrupt. Either regenerate the PAT until you get one without +, or wrap REST API calls in bash -c '. /path/to/.env && curl ...'.
Use jq (not python) to parse JSON responses.
Find tables, views, and other entities by keyword:
curl -s -X POST "https://api.dremio.cloud/v0/projects/${DREMIO_PROJECT_ID}/search" \
-H "Authorization: Bearer $DREMIO_PAT" -H "Content-Type: application/json" \
-d '{"query": "search terms", "filter": "category in [\"TABLE\", \"VIEW\"]", "maxResults": 10}'
Filter categories: TABLE, VIEW, FOLDER, SPACE, SOURCE, REFLECTION, UDF, SCRIPT, JOB. Results include paths, columns, wiki descriptions, and labels.
Important — project name in search paths: The Search API sometimes includes the project name (e.g., "lakehouse-project") as the first element of the path array. This is a Dremio Cloud bug. Never use the project name in SQL paths. When constructing SQL from search results, strip the first path element if it matches a project name rather than a namespace or source. For example, if the search returns ["my-project", "MyNamespace", "MyTable"], the correct SQL reference is MyNamespace."MyTable", not "my-project".MyNamespace."MyTable".
INFORMATION_SCHEMA."TABLES" with TABLE_TYPE != 'SYSTEM_TABLE'.INFORMATION_SCHEMA."COLUMNS".CREATE TABLE AS SELECT) — one step, but only works on data already queryable in Dremio.For new data in S3: prefer CREATE TABLE + COPY INTO over promoting raw files (Iceberg is more efficient).
-- CTAS (from existing data)
CREATE TABLE Namespace.summary AS SELECT state, SUM(amount) FROM Namespace.raw GROUP BY state;
-- COPY INTO (from external files — two steps)
CREATE TABLE Namespace.my_table (id INT, name VARCHAR, created_at TIMESTAMP);
COPY INTO Namespace.my_table FROM '@source/path/to/data/' FILE_FORMAT 'parquet';
Path format: Always use @source with forward slashes for paths. NEVER use dot-notation.
-- CORRECT
COPY INTO my_table FROM '@datasets/bucket-name/folder/' FILE_FORMAT 'parquet';
-- WRONG — dot-notation causes "invalid source type" errors
COPY INTO my_table FROM '@datasets."bucket"."folder"' FILE_FORMAT 'parquet';
COPY INTO my_table FROM '@source."path"."to"' FILE_FORMAT 'parquet';
Source names: Do NOT quote the source name after @. Quoting causes "non-existent source" errors.
-- CORRECT
COPY INTO my_table FROM '@datasets/path/to/data/' FILE_FORMAT 'parquet';
-- WRONG — quoted source name fails
COPY INTO my_table FROM '@"datasets"/path/to/data/' FILE_FORMAT 'parquet';
FILE_FORMAT: Just the format name in quotes. No 'type' = prefix.
-- CORRECT
FILE_FORMAT 'parquet'
FILE_FORMAT 'csv'
FILE_FORMAT 'json'
-- WRONG
FILE_FORMAT 'type' = 'parquet' -- parser error
Mixed schemas in a folder: If a folder contains files with different schemas, point to the specific file — otherwise COPY INTO will fail when it encounters a file whose columns don't match the target table:
-- Folder has sales.parquet and customers.parquet with different schemas
-- WRONG — will fail on schema mismatch
COPY INTO my_sales FROM '@datasets/bucket/data/' FILE_FORMAT 'parquet';
-- CORRECT — point to the specific file
COPY INTO my_sales FROM '@datasets/bucket/data/sales.parquet' FILE_FORMAT 'parquet';
To create a new namespace (top-level container for Iceberg tables and views), use CREATE FOLDER via the REST SQL API. Do NOT use the catalog REST API to create spaces — spaces cannot hold Iceberg tables.
CREATE FOLDER my_namespace -- create a new namespace
CREATE FOLDER my_namespace.staging -- subfolder within a namespace
CREATE FOLDER my_namespace.staging.temp -- nested subfolder
CREATE FOLDER IF NOT EXISTS my_namespace -- idempotent create
DROP FOLDER my_namespace.staging.temp -- must drop children first
DROP FOLDER my_namespace.staging -- cannot drop non-empty folders
DROP FOLDER IF EXISTS is not supported — omit IF EXISTS.
Important: Never use AT BRANCH syntax with CREATE/DROP FOLDER — it is not supported and will cause a parser error.
day, month, year, hour, minute, second, count, value, rows, timestamp, date, time, covar_pop, table, order, group, key, type, user, name, status
Use AS "day" not AS day.
"Namespace"."table" or "Namespace"."Folder"."table"| Function | Correct Syntax | Common Mistake |
|---|---|---|
| ILIKE | ILIKE(col, pattern) | col ILIKE pattern |
| PERCENTILE_CONT | PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY col) | Other variants |
| DATEDIFF | DATEDIFF(start, end) — 2 args, returns days | Unit argument |
| TIMESTAMPDIFF | TIMESTAMPDIFF(unit, start, end) | DATEDIFF with units |
| TIMESTAMPADD | TIMESTAMPADD(unit, count, ts) | |
| EXTRACT | EXTRACT(DAY FROM ts) | |
| JSON access | CONVERT_FROM(col, 'JSON')."field" | JSON_EXTRACT_SCALAR, ->> |
| STRUCT access | struct_col."field" | col['field'] |
| ANY_VALUE | Not supported — use MIN(col) or MAX(col) | ANY_VALUE(col) |
| FLATTEN | FLATTEN(col) or CROSS JOIN UNNEST(col) | LATERAL FLATTEN |
| ARRAY_AGG | Not supported for STRUCT/MAP/LIST; 32KB limit | |
| Intervals | INTERVAL '1' HOUR (unquoted unit) | Quoting the unit |
| Large intervals | TIMESTAMPADD(DAY, -365, CURRENT_DATE) | INTERVAL '365' DAY |
Use AI_GENERATE + LIST_FILES to extract structured data from images, PDFs, and documents in object storage.
CREATE TABLE Namespace.Results AS
WITH
files AS (
SELECT file, file['path'] AS file_path
FROM TABLE(LIST_FILES('@source/path/to/files/'))
),
extracted AS (
SELECT file_path,
AI_GENERATE('model-name', -- e.g. 'FieldOrg', 'claude.claude-sonnet-4-6'
('Your extraction prompt.', file)
WITH SCHEMA ROW(field1 VARCHAR, field2 DOUBLE)
) AS data
FROM files
)
SELECT data['field1'] AS field1, data['field2'] AS field2, file_path
FROM extracted;
For documents with variable numbers of items (e.g., invoice line items):
line_items_json VARCHAR in the schema — prompt the LLM to return a JSON arrayCREATE VIEW Namespace.LineItems AS
WITH parsed AS (
SELECT *, TRY_CONVERT_FROM(
line_items_json AS ARRAY(ROW(field_a VARCHAR, amount DOUBLE))
) AS items
FROM Namespace.Documents
),
flattened AS (
SELECT doc_id, FLATTEN(items) AS item FROM parsed
)
SELECT doc_id, item['field_a'] AS field_a, item['amount'] AS amount
FROM flattened;
file column from LIST_FILES as second tuple element for images/PDFsdata['field_name']TRY_CONVERT_FROM + FLATTENWhen submitting DDL/DML via the REST API (CTAS, COPY INTO, CREATE FOLDER, etc.), the response returns a job ID. Poll for completion:
# Submit a SQL job
JOB_ID=$(curl -s -X POST "https://api.dremio.cloud/v0/projects/${DREMIO_PROJECT_ID}/sql" \
-H "Authorization: Bearer $DREMIO_PAT" -H "Content-Type: application/json" \
-d '{"sql": "CREATE TABLE ..."}' | jq -r '.id')
# Poll job status
curl -s "https://api.dremio.cloud/v0/projects/${DREMIO_PROJECT_ID}/job/${JOB_ID}" \
-H "Authorization: Bearer ${DREMIO_PAT}" | jq '{jobState, rowCount, errorMessage}'
Job states: PENDING, METADATA_RETRIEVAL, PLANNING, QUEUED, ENGINE_START, RUNNING, COMPLETED, FAILED, CANCELED
Polling tips:
errorMessage on FAILED — common causes: syntax errors, schema mismatches, source permissions-- Browse tables/views
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE FROM INFORMATION_SCHEMA."TABLES"
WHERE TABLE_TYPE != 'SYSTEM_TABLE' ORDER BY TABLE_SCHEMA, TABLE_NAME
-- Recent jobs
SELECT * FROM sys.project.jobs_recent ORDER BY submitted_ts DESC LIMIT 20
-- Engine status / Org users
SELECT * FROM sys.project.engines
SELECT * FROM sys.organization.users
Dremio Cloud docs: https://docs.dremio.com/dremio-cloud/
Look up syntax with WebFetch when unsure:
npx claudepluginhub dremio/claude-plugins --plugin dremioExecutes and manages Athena SQL queries across Glue, S3 Tables, and Redshift catalogs with workgroup selection, asset resolution, statement classification, error recovery, and cost reporting.
Analyze lakehouse data interactively via Fabric Lakehouse Livy API sessions using PySpark/Spark SQL for DataFrames, cross-lakehouse joins, Delta time-travel, and unstructured/JSON data.
Executes read-only T-SQL queries against Fabric Data Warehouse, Lakehouse SQL Endpoints, and Mirrored Databases via CLI. Counts rows, discovers schemas, exports to CSV/JSON, and monitors query performance.