Skill

create-table

Registers Parquet, CSV, JSON, Arrow IPC, or Avro files as persistent external tables in DataFusion sessions. Auto-detects format, explores schema, and persists state for reuse across skills.

Bash

data-engineering

database

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/datafusion-skills:create-table

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

Bash

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are helping the user register a data file as a persistent table in their DataFusion session.

SKILL.md

162 lines · ~1.1k tokens

Stats

Stars12

MaintenanceGood

Last CommitMar 21, 2026

Actions

View Source View Plugin View on GitHub View README

Step 1 — Resolve the file path

If $0 is a relative path, resolve it:

RESOLVED_PATH="$(cd "$(dirname "$0")" 2>/dev/null && pwd)/$(basename "$0")"

Check the file exists (for local files):

test -f "$RESOLVED_PATH" || test -d "$RESOLVED_PATH"

Exists → continue
Not found → if it looks like an S3/GCS URI, continue anyway. Otherwise ask the user to check the path.

For directories (partitioned data), use the directory path as-is.

Step 2 — Check datafusion-cli is installed

command -v datafusion-cli

If not found, delegate to /datafusion-skills:install-datafusion.

Step 3 — Detect format

If --format was specified, use that. Otherwise detect from extension:

Extension	Format
`.parquet`, `.pq`	PARQUET
`.csv`, `.tsv`, `.txt`	CSV
`.json`, `.jsonl`, `.ndjson`	JSON
`.arrow`, `.ipc`, `.feather`	ARROW
`.avro`	AVRO
directory	PARQUET (default for partitioned data)

If the extension is unknown, try Parquet first, then CSV.

Step 4 — Derive table name

If --name was specified, use that. Otherwise derive from the filename:

Remove extension
Replace hyphens and spaces with underscores
Lowercase
Remove non-alphanumeric characters (except underscores)

Example: My-Data File.parquet → my_data_file

Confirm the name with the user.

Step 5 — Resolve state directory

STATE_DIR=""
test -f .datafusion-skills/state.sql && STATE_DIR=".datafusion-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.datafusion-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.datafusion-skills/$PROJECT_ID"

If no state directory exists, ask the user where to store state (same as other skills):

In the project directory (.datafusion-skills/)

In your home directory (~/.datafusion-skills/<project-id>/)

mkdir -p "$STATE_DIR"
touch "$STATE_DIR/state.sql"

Step 6 — Create the external table and explore

Build the CREATE EXTERNAL TABLE statement:

For Parquet:

CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS PARQUET LOCATION '<RESOLVED_PATH>';

For CSV:

CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS CSV LOCATION '<RESOLVED_PATH>' OPTIONS ('has_header' 'true');

For JSON:

CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS JSON LOCATION '<RESOLVED_PATH>';

For Arrow IPC:

CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS ARROW LOCATION '<RESOLVED_PATH>';

For Avro:

CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS AVRO LOCATION '<RESOLVED_PATH>';

Test it:

datafusion-cli --file "$STATE_DIR/state.sql" -c "
<CREATE_STATEMENT>
DESCRIBE <table_name>;
SELECT COUNT(*) AS row_count FROM <table_name>;
SELECT * FROM <table_name> LIMIT 5;
"

Step 7 — Persist to state file

Check if this table is already in the state file:

grep -q "<table_name>" "$STATE_DIR/state.sql" 2>/dev/null

If not present, append:

cat >> "$STATE_DIR/state.sql" <<'SQL'
-- Table: <table_name> (<FORMAT> from <RESOLVED_PATH>)
<CREATE_STATEMENT>
SQL

Step 8 — Report

Summarize:

Table name: <table_name>
Format: Parquet/CSV/JSON/Arrow/Avro
Location: the resolved path
Columns: list with types
Row count: total rows
State file: path to state.sql

This table is now available in all /datafusion-skills:query sessions. Try: /datafusion-skills:query SELECT * FROM <table_name> LIMIT 10

create-table

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

create-table

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Step 1 — Resolve the file path

Step 2 — Check datafusion-cli is installed

Step 3 — Detect format

Step 4 — Derive table name

Step 5 — Resolve state directory

Step 6 — Create the external table and explore

Step 7 — Persist to state file

Step 8 — Report

Similar Skills

Step 1 — Resolve the file path

Step 2 — Check datafusion-cli is installed

Step 3 — Detect format

Step 4 — Derive table name

Step 5 — Resolve state directory

Step 6 — Create the external table and explore

Step 7 — Persist to state file

Step 8 — Report

Similar Skills