Skill

data-describe

Generates data dictionary, dataset description, and semantic tags for CSV/TSV/Excel files using LLM-powered qsv_describegpt after profiling with qsv_stats.

Rust

data-engineering

ai-ml

Popularity

Parent stars

3,643

Parent forks

105

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/qsv-data-wrangling:data-describe

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

mcp__qsv__qsv_sniffmcp__qsv__qsv_countmcp__qsv__qsv_headersmcp__qsv__qsv_indexmcp__qsv__qsv_statsmcp__qsv__qsv_describegptmcp__qsv__qsv_list_filesmcp__qsv__qsv_get_working_dirmcp__qsv__qsv_set_working_dir

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Generate AI-powered documentation for a tabular data file using `describegpt`. Produces a Data Dictionary (column labels, descriptions, types), a natural-language Description of the dataset, and semantic Tags — all via the connected LLM (no API key needed in MCP mode).

SKILL.md

62 lines · ~722 tokens

Stats

LanguageRust

Parent stars3,643

Parent forks105

MaintenanceExcellent

Last CommitApr 6, 2026

Actions

View Source View Plugin View on GitHub View README

Data Describe

Generate AI-powered documentation for a tabular data file using describegpt. Produces a Data Dictionary (column labels, descriptions, types), a natural-language Description of the dataset, and semantic Tags — all via the connected LLM (no API key needed in MCP mode).

Cowork note: If relative paths don't resolve, call qsv_get_working_dir and qsv_set_working_dir to sync the working directory.

Steps

Index: Run qsv_index on the file for fast random access.
Profile: Run qsv_stats with cardinality: true, stats_jsonl: true to generate the stats cache. describegpt reads this cache for column metadata, so it must exist first.
Describe: Run qsv_describegpt with the requested options (recommend all: true for comprehensive output). At least one inference option (dictionary, description, tags, or all) is required. Output defaults to <filestem>.describegpt.md.
Present: Display the generated Data Dictionary table, Description, and Tags to the user.

Options

Option	Effect
`--all` (recommended)	Generate Dictionary + Description + Tags in one pass
`--dictionary`	Data Dictionary only — column labels, descriptions, types
`--description`	Natural-language dataset Description only
`--tags`	Semantic Tags only
`--format`	Output format: `Markdown` (default), `JSON`, `TSV`, `TOON`
`--language`	Generate output in a non-English language (e.g. `Spanish`, `French`)
`--addl-cols-list`	Enrich the dictionary with extra columns (e.g. `"everything"`, `"moar!"`)
`--tag-vocab`	Constrain tags to a controlled vocabulary (comma-separated)
`--num-tags`	Number of tags to generate (default: 5)
`--num-examples`	Number of example values per column in the dictionary
`--enum-threshold`	Max cardinality to treat a column as an enum in the dictionary

Notes

No API key needed in MCP mode — uses the connected LLM automatically via MCP sampling
The stats cache must exist first for best results (step 2 creates it)
Output defaults to <filestem>.describegpt.md
For Excel/JSONL files, the MCP server auto-converts to CSV first
Use --format JSON when you need machine-readable output for downstream processing
Use --language to generate documentation in the user's preferred language

data-describe

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

data-describe

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Data Describe

Steps

Options

Notes

Similar Skills

Data Describe

Steps

Options

Notes

Similar Skills