From qsv-data-wrangling
Generates publication-quality charts from CSV/TSV/Excel files using Python and qsv for profiling, stats, frequencies, and queries.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qsv-data-wrangling:data-vizThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Create publication-quality data visualizations from tabular data files. Uses qsv to profile and prepare data, then generates Python charts with best practices for clarity, accuracy, and design.
Create publication-quality data visualizations from tabular data files. Uses qsv to profile and prepare data, then generates Python charts with best practices for clarity, accuracy, and design.
Cowork note: If relative paths don't resolve, call
qsv_get_working_dirandqsv_set_working_dirto sync the working directory.
Determine:
a. Index and detect: Run qsv_index, then qsv_sniff to detect format and encoding.
b. Understand structure: Run qsv_headers and qsv_count to get column names and row count.
c. Profile columns: Run qsv_stats with cardinality: true, stats_jsonl: true to understand types, ranges, and distributions. Read .stats.csv to inform chart design:
type → choose appropriate axis type (numeric, categorical, date)min/max → set axis rangescardinality → determine if column is categorical (low) or continuous (high)nullcount → note missing data that could affect the chartd. Check distributions: Run qsv_frequency with limit: 20 on columns you plan to plot — this reveals the actual values and whether grouping or filtering is needed.
e. Run moarstats for visualization hints: Run qsv_moarstats with advanced: true. Read the enriched .stats.csv for chart design decisions:
| Stats Column | Visualization Hint |
|---|---|
skewness / pearson_skewness | If |skewness| > 1, use log scale or split view; histogram will be lopsided on linear scale |
bimodality_coefficient | If >= 0.555, data is bimodal — overlay two distributions or use separate panels per group |
kurtosis | If > 3, heavy tails — add outlier annotations or use box plot alongside histogram |
outliers_percentage | If > 5%, annotate outliers in scatter plots; if > 10%, consider separate outlier panel |
q1, q3, iqr | Set box plot boundaries; whiskers at inner fences (q1 - 1.5*iqr, q3 + 1.5*iqr) |
cv | If CV > 100%, data is highly variable relative to mean — use normalized/percentage scale |
sparsity | If > 0.5, too many nulls to visualize meaningfully — warn user or show completeness bar |
mode, mode_count | If mode dominates (> 50% of rows), bar chart of top-N values is more informative than histogram |
f. Preview data: Run qsv_slice with len: 5 to see actual values and formats.
Use qsv to prepare visualization-ready data:
qsv_search or qsv_sqlp to subset rowsqsv_sqlp for GROUP BY, window functions, computed columnsqsv_select to keep only what's neededqsv_sqlp with ORDER BY for ordered categories or time seriesExport the prepared data to a CSV file for Python to read.
If the user didn't specify, recommend based on data and question:
| Data Relationship | Recommended Chart | How qsv Helps Choose |
|---|---|---|
| Trend over time | Line chart | stats shows Date/DateTime type |
| Comparison across categories | Bar chart (horizontal if many) | frequency shows category counts; cardinality < 20 |
| Part-to-whole composition | Stacked bar or area chart | frequency shows proportions; avoid pie unless < 6 categories |
| Distribution of values | Histogram or box plot | stats shows min/max/mean/stddev; moarstats shows kurtosis |
| Correlation between two variables | Scatter plot | stats shows two numeric columns |
| Ranking | Horizontal bar chart | frequency with --limit for top-N |
| Matrix of relationships | Heatmap | Two categorical columns with low cardinality |
| Two-variable comparison over time | Dual-axis line or grouped bar | Two numeric columns + one Date column |
Write Python code using matplotlib + seaborn (default) or plotly (if interactive requested):
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Load the prepared CSV
df = pd.read_csv('prepared_data.csv')
# Set professional style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")
# Create figure with appropriate size
fig, ax = plt.subplots(figsize=(10, 6))
# [chart-specific code]
# Always include:
ax.set_title('Clear, Descriptive Title', fontsize=14, fontweight='bold')
ax.set_xlabel('X-Axis Label', fontsize=11)
ax.set_ylabel('Y-Axis Label', fontsize=11)
# Format numbers appropriately
# - Percentages: '45.2%' not '0.452'
# - Currency: '$1.2M' not '1200000'
# - Large numbers: '2.3K' or '1.5M' not '2300' or '1500000'
# Remove chart junk
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('chart_name.png', dpi=150, bbox_inches='tight')
plt.show()
Color:
Typography:
Layout:
Accuracy:
qsv_sqlp: SELECT date_col, SUM(value) as total
FROM data GROUP BY date_col ORDER BY date_col
qsv_frequency: --select category_col --limit 10
Or for aggregated values:
qsv_sqlp: SELECT category, SUM(amount) as total
FROM data GROUP BY category ORDER BY total DESC LIMIT 10
qsv_stats: Check min, max, mean, stddev, cardinality
qsv_moarstats: --advanced for kurtosis, bimodality
qsv_sqlp: SELECT FLOOR(value/10)*10 as bin, COUNT(*) as cnt
FROM data GROUP BY bin ORDER BY bin
qsv_select: Pick the two numeric columns
qsv_stats: Verify both are numeric types with reasonable ranges
qsv_sqlp: SELECT group_col, AVG(metric) as avg_metric, COUNT(*) as n
FROM data GROUP BY group_col ORDER BY avg_metric DESC
stats and frequency reveal the right chart type and catch data issues before plottingqsv_sqlp to aggregate before passing to Python — don't load millions of rows into pandas/data-clean before visualizingnpx claudepluginhub dathere/qsv --plugin qsv-data-wranglingCreates publication-quality charts from Python DataFrames or query results using matplotlib/seaborn/plotly. Recommends types for trends, comparisons, reports; supports interactive plots.
Generates charts like bar, line, pie, scatter, heatmaps from data using Matplotlib. Analyzes structure, customizes styles, adds interactivity, exports to PNG, SVG, HTML.
Designs clear, accessible data visualizations with chart selection for comparisons/trends/distributions, styling principles, color palettes, responsiveness, and best practices.