Search everything...

Stats

Actions

Available In

synthdata

Name: synthdata
Author: rappdw

By rappdw

Synthetic data toolkit — schema-driven generation, Excel extraction, dataset extension, anonymization, and MCP serving. Domain-agnostic engine with 10+ starter templates covering HR, e-commerce, SaaS metrics, healthcare, finance, security, logs, IoT, CRM, and surveys. Uses YAML schemas with Faker, distributions (normal/lognormal/zipf/poisson), FK integrity, behavioral profiles, temporal event generation, and multi-format writers (xlsx/csv/json/sql/parquet).

npx claudepluginhub rappdw/synthdata

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Skills8

synthdata-anonymize

/synthdata-anonymize

Replace real PII in a dataset with realistic synthetic equivalents while preserving row counts, column types, and statistical distributions. Detects names, emails, phones, SSNs, addresses, credit cards, and user-identifying columns via name heuristics + value patterns. Use this skill when the user wants to "anonymize this dataset", "scrub PII", "make this data safe to share", "de-identify real data", "create a synthetic copy", or needs a sharable version of production data without exposing individuals.

synthdata-compute

/synthdata-compute

Compute derived, aggregated, or transformed tables from existing datasets. Use this skill when the user needs to "compute monthly scores", "aggregate by month", "create a summary table", "derive risk scores", "compute percentile ranks", "roll up events", "create benchmarks from raw data", "add a computed column", or bridge the gap between raw generated tables and downstream analytics. Works on xlsx, csv, or json input. Claude writes the computation logic; the script handles data I/O.

synthdata-extend

/synthdata-extend

Extend an existing synthetic dataset by adding more rows or new columns while preserving FK integrity, ID continuity, and column distributions. Use this skill when the user wants to "add more rows", "append data", "extend this dataset", "add a new column", "grow my dataset", or needs a larger version of an existing synthetic dataset without regenerating from scratch.

synthdata-extract

/synthdata-extract

Extract tabular data from Excel workbooks (.xlsx) to JSON files, one per sheet. Auto-detects whether a sheet has a title-banner row above the headers (synthdata-generate convention) or starts with headers directly. Use this skill when the user wants to convert an Excel file to JSON, extract spreadsheet data, parse an xlsx file, prepare data for downstream analysis tools that don't read Excel natively, or set up a dataset for the other synthdata skills. Also trigger on "extract the data", "parse this spreadsheet", "convert to JSON", or "read this xlsx file".

synthdata-generate

/synthdata-generate

Generate synthetic tabular datasets from YAML schemas. Use this skill when the user wants to create sample data, mock data, test data, synthetic datasets, or demo data for any domain — HR directories, e-commerce orders, SaaS metrics, healthcare records, financial transactions, security events, application logs, IoT sensor readings, CRM pipelines, survey responses, or custom schemas. Ships with 10+ domain templates and supports custom YAML schemas with Faker-backed fields, statistical distributions (normal/lognormal/zipf/poisson), foreign-key integrity, behavioral profiles, and temporal event generation. Also trigger when user says "generate synthetic data", "create fake data", "mock dataset", "test data", or names a specific domain like "e-commerce data" or "HR data".

Stats

Version0.3.0

ReleasedApr 10, 2026

LanguagePython

Stars0

MaintenanceGood

LicenseMIT

Last CommitApr 9, 2026

AddedApr 6, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

README

Synthdata Plugin

A general-purpose Claude Code plugin for synthetic data generation across any tabular domain.

Synthdata turns a YAML schema (or one of 12 built-in templates) into realistic synthetic datasets — with Faker-backed fields, statistical distributions, foreign-key integrity, behavioral profiles, and temporal event generation. Outputs xlsx, csv, json, sql, or parquet.

Skills

Skill	What it does
synthdata-generate	Pick a template (HR, e-commerce, SaaS, healthcare, finance, security, IoT, CRM, logs, surveys, +blank) or design custom schema via interview, generate synthetic dataset
synthdata-extract	Extract tabular data from Excel workbooks to JSON (auto-detects title rows and headers)
synthdata-extend	Add rows or new columns to an existing dataset while preserving FK integrity and profile distributions
synthdata-anonymize	Transform a real dataset into a synthetic equivalent — detects PII, replaces with Faker values, preserves shape and distributions
synthdata-compute	Derive aggregated, scored, or transformed tables from existing data — monthly rollups, composite scores, percentile ranks, segment summaries
synthdata-serve	Spin up a read-only MCP server from a dataset — auto-generates tools for querying, filtering, sampling, and statistics
synthdata-prompt-builder	Plan multi-step generation workflows — identify raw vs derived tables, match to templates, output a sequenced set of prompts
synthdata-tutorial	Guided interactive walkthrough of the synthdata skills

Installation

Option 1: GitHub marketplace (recommended)

/plugin marketplace add rappdw/synthdata
/plugin install synthdata@synthdata-marketplace

Option 2: Reference from another marketplace

In another marketplace's marketplace.json:

{
  "name": "synthdata",
  "source": {
    "source": "github",
    "repo": "rappdw/synthdata"
  }
}

Option 3: Plugin directory

claude --plugin-dir /path/to/synthdata

Option 4: Manual skill copy

cp -r skills/* ~/.claude/skills/
# or use the installer:
./install.sh

Option 5: Cowork upload

./package.sh                              # produces dist/synthdata-v0.3.0.plugin
# Cowork > Customize > Plugins > Upload custom plugin

Prerequisites

pip install openpyxl faker numpy pandas pyyaml mcp --break-system-packages

Quick Start

> Generate me a synthetic HR directory with 500 employees
> Create an e-commerce orders dataset
> Build a custom dataset for my app — I'll describe the tables
> Extract this spreadsheet to JSON
> Anonymize this customer export
> Compute monthly risk scores from my event data
> Help me plan what data I need for a fraud detection demo
> Serve this dataset as an MCP server so Claude can query it

Templates

12 domain starters ship with synthdata-generate. Pick one to get going fast, or start from blank-slate for a custom schema.

Template	Entities
hr-directory	employees, departments
ecommerce-orders	customers, products, orders, order_items
saas-metrics	accounts, users, events, subscriptions
healthcare-patients	patients, providers, encounters, claims
financial-transactions	accounts, customers, transactions
security-events	users, devices, alerts, incidents
log-events	services, requests, errors
iot-sensors	devices, readings, events
crm-pipeline	contacts, companies, deals, activities
survey-responses	respondents, questions, responses
healthcare-hrm-security	users, threat events, phishing sims, training, DLP, abuse mailbox
blank-slate	minimal starter for custom schemas

Schema Format

name: my-dataset
tables:
  - name: users
    rows: { quick: 50, medium: 1000, thorough: 5000 }
    columns:
      - { name: user_id, type: id, prefix: "U", width: 4 }
      - { name: name, type: faker, method: name }
      - { name: department, type: choice, values: [Sales, Eng, Ops], weights: [0.4, 0.4, 0.2] }
      - { name: salary, type: float, distribution: lognormal, mean: 75000, sigma: 0.4, min: 30000 }
    profiles:
      - { name: high_risk, weight: 0.05, overrides: { risk_multiplier: 3.0 } }
  - name: events
    foreign_key: { column: user_id, references: users.user_id, distribution: zipfian, alpha: 1.5 }
    rows_per_parent: { distribution: poisson, lam: 5 }
    columns:
      - { name: event_type, type: choice, values: [login, click, error] }
      - { name: ts, type: timestamp, start: "2025-01-01", end: "2025-12-31" }
writers: [xlsx, json]

See skills/synthdata-generate/references/schema-spec.md for the complete spec.

Serving Data as an MCP Server

Any generated (or existing) dataset can be exposed as a read-only MCP server that Claude can query directly.

Inspect first

python3 skills/synthdata-serve/scripts/serve.py --inspect --input ./hr.xlsx

View full README on GitHub

synthdata

Popularity

What's Inside

Confidence

README

Synthdata Plugin

Skills

Installation

Option 1: GitHub marketplace (recommended)

Option 2: Reference from another marketplace

Option 3: Plugin directory

Option 4: Manual skill copy

Option 5: Cowork upload

Prerequisites

Quick Start

Templates

Schema Format

Serving Data as an MCP Server

Inspect first

Similar Plugins

test-data-generator

mock-data-generator

sdg-hub

data

dataverse

datasphere

More by rappdw

pka-skills

ait

thinkkit

Synthdata Plugin

Skills

Installation

Option 1: GitHub marketplace (recommended)

Option 2: Reference from another marketplace

Option 3: Plugin directory

Option 4: Manual skill copy

Option 5: Cowork upload

Prerequisites

Quick Start

Templates

Schema Format

Serving Data as an MCP Server

Inspect first

Popularity

Health & Quality

More by rappdw

pka-skills

ait

thinkkit

Similar Plugins

test-data-generator

mock-data-generator

sdg-hub

data

dataverse

datasphere