Query local or cloud (S3/GCS) Parquet, CSV, JSON, Arrow, and Avro files with SQL using DataFusion in Claude Code sessions. Register persistent external tables, create and refresh materialized views, visualize and optimize execution plans, inspect schemas, and search DataFusion documentation.
Register a data file as a persistent external table in the DataFusion session. Supports Parquet, CSV, JSON, Arrow IPC, and Avro files. Explores the schema and writes to the session state file for reuse across skills.
Search Apache DataFusion documentation, user guide, and API reference. Returns relevant documentation for a question or keyword. Searches the official DataFusion repository and website.
Visualize and analyze DataFusion query execution plans. Shows logical and physical plans, identifies performance bottlenecks, and suggests optimizations. Supports EXPLAIN and EXPLAIN ANALYZE.
Install or update datafusion-cli. Supports installation via cargo install, Homebrew, or pre-built binaries. Checks the current version and offers to upgrade if outdated.
Create and manage materialized views using DataFusion. Persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes. Powered by datafusion-cli's COPY TO.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A Claude Code plugin that adds Apache DataFusion-powered skills for data exploration, querying, and materialized views.
Add the repository as a plugin source and install:
/plugin marketplace add datafusion-contrib/datafusion-skills
/plugin install datafusion-skills@datafusion-skills
This registers the GitHub repo as a marketplace and installs the plugin. Skills will be available as /datafusion-skills:<skill-name> in all future sessions.
/plugin marketplace update datafusion-skills
/plugin update datafusion-skills@datafusion-skills
queryRun SQL queries against registered tables or ad-hoc against files. Accepts raw SQL or natural language questions. Supports Parquet, CSV, JSON, Arrow IPC, and Avro.
/datafusion-skills:query SELECT * FROM 'trades.parquet' WHERE symbol = 'AAPL' LIMIT 10
/datafusion-skills:query "what are the top 5 symbols by volume?"
/datafusion-skills:query FROM sales WHERE amount > 100
read-fileRead and explore any data file — Parquet, CSV, JSON, Arrow IPC, Avro — locally or from S3/GCS. Auto-detects format by extension.
/datafusion-skills:read-file trades.parquet what columns does it have?
/datafusion-skills:read-file s3://my-bucket/data.parquet describe the schema
/datafusion-skills:read-file metrics.csv how many rows?
create-tableRegister a data file as a persistent external table. Explores the schema and persists the registration so all other skills can access the table automatically.
/datafusion-skills:create-table trades.parquet
/datafusion-skills:create-table data.csv --name sales --format csv
materialized-viewCreate and manage materialized views — persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes.
/datafusion-skills:materialized-view "create a daily summary of trades grouped by symbol"
/datafusion-skills:materialized-view refresh trades_daily
/datafusion-skills:materialized-view status
/datafusion-skills:materialized-view list
explain-planVisualize and analyze query execution plans. Identifies performance bottlenecks and suggests optimizations.
/datafusion-skills:explain-plan SELECT * FROM trades WHERE date > '2024-01-01'
/datafusion-skills:explain-plan --analyze SELECT COUNT(*) FROM large_table GROUP BY category
datafusion-docsSearch Apache DataFusion documentation — user guide, SQL reference, and API docs. Returns relevant documentation for a question or keyword.
/datafusion-skills:datafusion-docs window functions
/datafusion-skills:datafusion-docs "how do I create an external table?"
/datafusion-skills:datafusion-docs APPROX_PERCENTILE_CONT
install-datafusionInstall or update datafusion-cli. Supports Homebrew, cargo install, and pre-built binaries.
/datafusion-skills:install-datafusion
/datafusion-skills:install-datafusion --update
All skills share a single state.sql file per project — a plain SQL file containing CREATE EXTERNAL TABLE statements and configuration. When state is first needed, you'll be asked where to store it:
.datafusion-skills/state.sql) — colocated with the project, optionally gitignored~/.datafusion-skills/<project>/state.sql) — keeps the repo cleanAny skill restores the session via datafusion-cli --file state.sql.
Skills reference each other where it makes sense:
read-file suggests query for follow-up exploration and create-table for persisting dataquery uses session state from create-table automaticallymaterialized-view creates persistent Parquet files registered via create-tableexplain-plan helps optimize queries from querydatafusion-docs to troubleshoot DataFusion errors automaticallyApache DataFusion is a fast, extensible query engine built in Rust on top of Apache Arrow. It offers:
# Clone the repo
git clone https://github.com/datafusion-contrib/datafusion-skills.git
cd datafusion-skills
# Launch Claude Code with the local plugin directory
claude --plugin-dir .
Test individual skills:
npx claudepluginhub datafusion-contrib/datafusion-skills --plugin datafusion-skillsDuckDB-powered skills for Claude Code: read any data file, attach and query DuckDB databases, search DuckDB/DuckLake docs, search past session logs, and install/update DuckDB extensions.
Semantic SQL compiler — compile .view.yml schema definitions into dialect-specific SQL. Unix-philosophy CLI designed as a tool-use interface for LLMs.
Blazingly fast tabular data wrangling with 51 qsv skill-based commands for CSV, TSV, Excel, JSONL, and Parquet files
Agent skill for the sq CLI: SLQ and native SQL, sources and handles, output formats, inspect/diff/tbl, and per-driver notes for databases and file formats.
Build ClickHouse tables with sub-second queries, 10x compression, and zero full table scans
Quick insights from dlt pipeline data. Connect to a pipeline, profile tables, plan charts, and assemble marimo dashboards.