Syntherklaas: interactive synthetic data generator

Sinterklaas waving to the crowd during an intocht, wearing his red and gold mitre and purple gloves

Real data is the fastest way to prototype.
GDPR is the fastest way to get blocked.

A good synthetic dataset is a gift — and you don't have to wait until December.

The problem

On most projects the first question is: what data do you have? And the answer is usually "none", "not enough", or "we have it but GDPR makes it off-limits".

syntherklaas skips the input-data step entirely: you have a short conversation about the shape you need — tables, columns, foreign keys, volumes, distributions — and it generates a coherent synthetic dataset from scratch.

Built on Faker (locale-aware) plus NL-locked providers for BSN (11-proof), IBAN (mod-97), postcode, and phone formats — fake values that still pass real validators. Packaged as a Claude Code skill: the dialog runs in chat, the schema is captured as a YAML, and a small Python generator turns that YAML into CSV, XLSX, SQLite, or a SQL dump.

Watch the 2-minute intro

▶ Watch on Veed — a quick tour: define two related tables, see the data model, pick volume distributions, and pick an output format.

Installation

Only tested with Claude Code.

npx skills@latest add baswenneker/syntherklaas

Restart Claude Code (or open a new session). The skill registers via .claude-plugin/plugin.json.

What a conversation looks like

The dialog is the product. You name the tables and columns; Claude picks the providers, infers the foreign keys, renders the model, and asks the right follow-ups. Here's a session — slightly compressed, but faithful to the real flow.

You: /syntherklaas

Claude: Which table do we start with? You can paste sample data, define columns together, or just give me a table name and I'll propose one.

You: users — user_id, first_name, last_name, bsn, email

Claude: Inferred for users:

col provider constraint example
user_id sequential PK 1, 2, 3, ...
first_name faker.first_name Pieter
last_name faker.last_name Jansen
bsn nl.bsn UNIQUE 391171823
email faker.email [email protected]

Look right? (ok / change <col>)

You: ok. Next: invoices — invoice_id, user_id, description, amount, iban

Claude: Inferred for invoices:

col provider constraint example
invoice_id sequential PK 1, 2, 3, ...
user_id fk → users.user_id NOT NULL 1..1000
description faker.text "Lorem ipsum..."
amount numeric_range (uniform) 137.42
iban nl.iban NL34RABO0382...

Model so far:
┌─────────────────────────┐         ┌─────────────────────────┐
│ users                   │ 1     * │ invoices                │
├─────────────────────────┤─────────┤─────────────────────────┤
│ user_id (PK)   INT      │         │ invoice_id (PK)  INT    │
│ first_name     STR      │         │ user_id (FK)     INT ───┤
│ last_name      STR      │         │ description      STR    │
│ bsn (UQ)       STR      │         │ amount           FLOAT  │
│ email          STR      │         │ iban             STR    │
└─────────────────────────┘         └─────────────────────────┘
Another table, or done?

You: done.

Claude: Volumes per table — pick a distribution:

Fixed — exact row count

Poisson λ — natural variation around λ (e.g. "around 5 invoices per user")

Normal μ ± σ — bell-shaped

Uniform [min, max] — or just describe it in words

How many users? And how many invoices per user?

You: 1000 users; around 5 invoices per user.

Claude: Preview (10 rows per table)...
users (1000 total)
| user_id | first_name | last_name        | bsn       | email                  |
| 1       | Ali        | Schellekens      | 391171823 | leonholthuis@example…  |

syntherklaas

Popularity

What's Inside

README

Syntherklaas: interactive synthetic data generator

The problem

Watch the 2-minute intro

Installation

What a conversation looks like

Confidence

Similar Plugins

caveman

frontend-design

ui-design

claude-mem

More by baswenneker

fwd-skills

Popularity

Health & Quality