From datafusion-skills
Registers Parquet, CSV, JSON, Arrow IPC, or Avro files as persistent external tables in DataFusion sessions. Auto-detects format, explores schema, and persists state for reuse across skills.
How this skill is triggered — by the user, by Claude, or both
Slash command
/datafusion-skills:create-tableThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are helping the user register a data file as a persistent table in their DataFusion session.
You are helping the user register a data file as a persistent table in their DataFusion session.
File path given: $0
Additional arguments: ${1:-}
Follow these steps in order.
If $0 is a relative path, resolve it:
RESOLVED_PATH="$(cd "$(dirname "$0")" 2>/dev/null && pwd)/$(basename "$0")"
Check the file exists (for local files):
test -f "$RESOLVED_PATH" || test -d "$RESOLVED_PATH"
For directories (partitioned data), use the directory path as-is.
command -v datafusion-cli
If not found, delegate to /datafusion-skills:install-datafusion.
If --format was specified, use that. Otherwise detect from extension:
| Extension | Format |
|---|---|
.parquet, .pq | PARQUET |
.csv, .tsv, .txt | CSV |
.json, .jsonl, .ndjson | JSON |
.arrow, .ipc, .feather | ARROW |
.avro | AVRO |
| directory | PARQUET (default for partitioned data) |
If the extension is unknown, try Parquet first, then CSV.
If --name was specified, use that. Otherwise derive from the filename:
Example: My-Data File.parquet → my_data_file
Confirm the name with the user.
STATE_DIR=""
test -f .datafusion-skills/state.sql && STATE_DIR=".datafusion-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.datafusion-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.datafusion-skills/$PROJECT_ID"
If no state directory exists, ask the user where to store state (same as other skills):
- In the project directory (
.datafusion-skills/)- In your home directory (
~/.datafusion-skills/<project-id>/)
mkdir -p "$STATE_DIR"
touch "$STATE_DIR/state.sql"
Build the CREATE EXTERNAL TABLE statement:
For Parquet:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS PARQUET LOCATION '<RESOLVED_PATH>';
For CSV:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS CSV LOCATION '<RESOLVED_PATH>' OPTIONS ('has_header' 'true');
For JSON:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS JSON LOCATION '<RESOLVED_PATH>';
For Arrow IPC:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS ARROW LOCATION '<RESOLVED_PATH>';
For Avro:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS AVRO LOCATION '<RESOLVED_PATH>';
Test it:
datafusion-cli --file "$STATE_DIR/state.sql" -c "
<CREATE_STATEMENT>
DESCRIBE <table_name>;
SELECT COUNT(*) AS row_count FROM <table_name>;
SELECT * FROM <table_name> LIMIT 5;
"
Check if this table is already in the state file:
grep -q "<table_name>" "$STATE_DIR/state.sql" 2>/dev/null
If not present, append:
cat >> "$STATE_DIR/state.sql" <<'SQL'
-- Table: <table_name> (<FORMAT> from <RESOLVED_PATH>)
<CREATE_STATEMENT>
SQL
Summarize:
<table_name>This table is now available in all
/datafusion-skills:querysessions. Try:/datafusion-skills:query SELECT * FROM <table_name> LIMIT 10
npx claudepluginhub datafusion-contrib/datafusion-skills --plugin datafusion-skillsRuns SQL queries or natural language questions against registered tables or ad-hoc on Parquet, CSV, JSON, Arrow IPC files using datafusion-cli.
Executes raw SQL or natural language queries against attached DuckDB databases or ad-hoc files. Manages session state, schema retrieval, and result size estimation.
Creates managed Iceberg tables using Amazon S3 Tables with compaction, snapshot management, schema, partitioning, Glue catalog registration, and IAM controls. For AWS data lake and analytics table setup.