From qsv-data-wrangling
Generates data dictionary, dataset description, and semantic tags for CSV/TSV/Excel files using LLM-powered qsv_describegpt after profiling with qsv_stats.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qsv-data-wrangling:data-describeThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate AI-powered documentation for a tabular data file using `describegpt`. Produces a Data Dictionary (column labels, descriptions, types), a natural-language Description of the dataset, and semantic Tags — all via the connected LLM (no API key needed in MCP mode).
Generate AI-powered documentation for a tabular data file using describegpt. Produces a Data Dictionary (column labels, descriptions, types), a natural-language Description of the dataset, and semantic Tags — all via the connected LLM (no API key needed in MCP mode).
Cowork note: If relative paths don't resolve, call
qsv_get_working_dirandqsv_set_working_dirto sync the working directory.
Index: Run qsv_index on the file for fast random access.
Profile: Run qsv_stats with cardinality: true, stats_jsonl: true to generate the stats cache. describegpt reads this cache for column metadata, so it must exist first.
Describe: Run qsv_describegpt with the requested options (recommend all: true for comprehensive output). At least one inference option (dictionary, description, tags, or all) is required. Output defaults to <filestem>.describegpt.md.
Present: Display the generated Data Dictionary table, Description, and Tags to the user.
| Option | Effect |
|---|---|
--all (recommended) | Generate Dictionary + Description + Tags in one pass |
--dictionary | Data Dictionary only — column labels, descriptions, types |
--description | Natural-language dataset Description only |
--tags | Semantic Tags only |
--format | Output format: Markdown (default), JSON, TSV, TOON |
--language | Generate output in a non-English language (e.g. Spanish, French) |
--addl-cols-list | Enrich the dictionary with extra columns (e.g. "everything", "moar!") |
--tag-vocab | Constrain tags to a controlled vocabulary (comma-separated) |
--num-tags | Number of tags to generate (default: 5) |
--num-examples | Number of example values per column in the dictionary |
--enum-threshold | Max cardinality to treat a column as an enum in the dictionary |
<filestem>.describegpt.md--format JSON when you need machine-readable output for downstream processing--language to generate documentation in the user's preferred languagenpx claudepluginhub dathere/qsv --plugin qsv-data-wranglingGenerates a citable data dictionary/codebook from tabular datasets (CSV/TSV/Excel/Parquet/Stata/SAS). Profiles each variable's role, type, missingness, and distributions, flagging coded values as [NEEDS DICTIONARY].
Guides qsv-based CSV wrangling with standard workflow order, tool selection matrix for tasks like filtering/joining/aggregating, selection syntax, and pipeline patterns for cleaning/profiling.
Analyzes CSV, Excel, parquet, or table-like files with reproducible scripts, data profiling, validation, and structured summaries.