By rappdw
Synthetic data toolkit — schema-driven generation, Excel extraction, dataset extension, anonymization, and MCP serving. Domain-agnostic engine with 10+ starter templates covering HR, e-commerce, SaaS metrics, healthcare, finance, security, logs, IoT, CRM, and surveys. Uses YAML schemas with Faker, distributions (normal/lognormal/zipf/poisson), FK integrity, behavioral profiles, temporal event generation, and multi-format writers (xlsx/csv/json/sql/parquet).
Replace real PII in a dataset with realistic synthetic equivalents while preserving row counts, column types, and statistical distributions. Detects names, emails, phones, SSNs, addresses, credit cards, and user-identifying columns via name heuristics + value patterns. Use this skill when the user wants to "anonymize this dataset", "scrub PII", "make this data safe to share", "de-identify real data", "create a synthetic copy", or needs a sharable version of production data without exposing individuals.
Compute derived, aggregated, or transformed tables from existing datasets. Use this skill when the user needs to "compute monthly scores", "aggregate by month", "create a summary table", "derive risk scores", "compute percentile ranks", "roll up events", "create benchmarks from raw data", "add a computed column", or bridge the gap between raw generated tables and downstream analytics. Works on xlsx, csv, or json input. Claude writes the computation logic; the script handles data I/O.
Extend an existing synthetic dataset by adding more rows or new columns while preserving FK integrity, ID continuity, and column distributions. Use this skill when the user wants to "add more rows", "append data", "extend this dataset", "add a new column", "grow my dataset", or needs a larger version of an existing synthetic dataset without regenerating from scratch.
Extract tabular data from Excel workbooks (.xlsx) to JSON files, one per sheet. Auto-detects whether a sheet has a title-banner row above the headers (synthdata-generate convention) or starts with headers directly. Use this skill when the user wants to convert an Excel file to JSON, extract spreadsheet data, parse an xlsx file, prepare data for downstream analysis tools that don't read Excel natively, or set up a dataset for the other synthdata skills. Also trigger on "extract the data", "parse this spreadsheet", "convert to JSON", or "read this xlsx file".
Generate synthetic tabular datasets from YAML schemas. Use this skill when the user wants to create sample data, mock data, test data, synthetic datasets, or demo data for any domain — HR directories, e-commerce orders, SaaS metrics, healthcare records, financial transactions, security events, application logs, IoT sensor readings, CRM pipelines, survey responses, or custom schemas. Ships with 10+ domain templates and supports custom YAML schemas with Faker-backed fields, statistical distributions (normal/lognormal/zipf/poisson), foreign-key integrity, behavioral profiles, and temporal event generation. Also trigger when user says "generate synthetic data", "create fake data", "mock dataset", "test data", or names a specific domain like "e-commerce data" or "HR data".
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A general-purpose Claude Code plugin for synthetic data generation across any tabular domain.
Synthdata turns a YAML schema (or one of 12 built-in templates) into realistic synthetic datasets — with Faker-backed fields, statistical distributions, foreign-key integrity, behavioral profiles, and temporal event generation. Outputs xlsx, csv, json, sql, or parquet.
| Skill | What it does |
|---|---|
| synthdata-generate | Pick a template (HR, e-commerce, SaaS, healthcare, finance, security, IoT, CRM, logs, surveys, +blank) or design custom schema via interview, generate synthetic dataset |
| synthdata-extract | Extract tabular data from Excel workbooks to JSON (auto-detects title rows and headers) |
| synthdata-extend | Add rows or new columns to an existing dataset while preserving FK integrity and profile distributions |
| synthdata-anonymize | Transform a real dataset into a synthetic equivalent — detects PII, replaces with Faker values, preserves shape and distributions |
| synthdata-compute | Derive aggregated, scored, or transformed tables from existing data — monthly rollups, composite scores, percentile ranks, segment summaries |
| synthdata-serve | Spin up a read-only MCP server from a dataset — auto-generates tools for querying, filtering, sampling, and statistics |
| synthdata-prompt-builder | Plan multi-step generation workflows — identify raw vs derived tables, match to templates, output a sequenced set of prompts |
| synthdata-tutorial | Guided interactive walkthrough of the synthdata skills |
/plugin marketplace add rappdw/synthdata
/plugin install synthdata@synthdata-marketplace
In another marketplace's marketplace.json:
{
"name": "synthdata",
"source": {
"source": "github",
"repo": "rappdw/synthdata"
}
}
claude --plugin-dir /path/to/synthdata
cp -r skills/* ~/.claude/skills/
# or use the installer:
./install.sh
./package.sh # produces dist/synthdata-v0.3.0.plugin
# Cowork > Customize > Plugins > Upload custom plugin
pip install openpyxl faker numpy pandas pyyaml mcp --break-system-packages
> Generate me a synthetic HR directory with 500 employees
> Create an e-commerce orders dataset
> Build a custom dataset for my app — I'll describe the tables
> Extract this spreadsheet to JSON
> Anonymize this customer export
> Compute monthly risk scores from my event data
> Help me plan what data I need for a fraud detection demo
> Serve this dataset as an MCP server so Claude can query it
12 domain starters ship with synthdata-generate. Pick one to get going fast, or start from blank-slate for a custom schema.
| Template | Entities |
|---|---|
| hr-directory | employees, departments |
| ecommerce-orders | customers, products, orders, order_items |
| saas-metrics | accounts, users, events, subscriptions |
| healthcare-patients | patients, providers, encounters, claims |
| financial-transactions | accounts, customers, transactions |
| security-events | users, devices, alerts, incidents |
| log-events | services, requests, errors |
| iot-sensors | devices, readings, events |
| crm-pipeline | contacts, companies, deals, activities |
| survey-responses | respondents, questions, responses |
| healthcare-hrm-security | users, threat events, phishing sims, training, DLP, abuse mailbox |
| blank-slate | minimal starter for custom schemas |
name: my-dataset
tables:
- name: users
rows: { quick: 50, medium: 1000, thorough: 5000 }
columns:
- { name: user_id, type: id, prefix: "U", width: 4 }
- { name: name, type: faker, method: name }
- { name: department, type: choice, values: [Sales, Eng, Ops], weights: [0.4, 0.4, 0.2] }
- { name: salary, type: float, distribution: lognormal, mean: 75000, sigma: 0.4, min: 30000 }
profiles:
- { name: high_risk, weight: 0.05, overrides: { risk_multiplier: 3.0 } }
- name: events
foreign_key: { column: user_id, references: users.user_id, distribution: zipfian, alpha: 1.5 }
rows_per_parent: { distribution: poisson, lam: 5 }
columns:
- { name: event_type, type: choice, values: [login, click, error] }
- { name: ts, type: timestamp, start: "2025-01-01", end: "2025-12-31" }
writers: [xlsx, json]
See skills/synthdata-generate/references/schema-spec.md for the complete spec.
Any generated (or existing) dataset can be exposed as a read-only MCP server that Claude can query directly.
python3 skills/synthdata-serve/scripts/serve.py --inspect --input ./hr.xlsx
npx claudepluginhub rappdw/synthdataPersonal Knowledge Assistance — four skills that turn any folder into a local, AI-powered knowledge system replacing Obsidian/Notion/Tana
Your AI team and personal knowledge base — Chief of Staff, hiring pipeline, and SQLite-backed search for any project
Thinking tools for AI — structured exploration, strategic debate, council deliberation, CISO review, discovery scaffolding, codebase mapping, specification generation, and meeting notes
Generate realistic test data including users, products, orders, and custom schemas for comprehensive testing
DevsForge mock data generator with Faker.js integration, realistic test data, custom generators, and fixture creation
Synthetic data generation — composable blocks and YAML-defined flows for building LLM training datasets
Write SQL, explore datasets, and generate insights faster. Build visualizations and dashboards, and turn raw data into clear stories for stakeholders.
Agent skills for building on, analyzing, and managing Microsoft Dataverse — with Dataverse MCP, PAC CLI, and Python SDK.
The most comprehensive SAP Datasphere plugin for Claude. 18 specialized skills covering exploration, data modeling, integration, BW Bridge migration, security architecture, CLI automation, business content activation, catalog governance, performance optimization, and troubleshooting — all through natural language. Powered by 45 MCP tools with enterprise-grade security.