Syntherklaas: interactive synthetic data generator
Real data is the fastest way to prototype.
GDPR is the fastest way to get blocked.
A good synthetic dataset is a gift — and you don't have to wait until December.
The problem
On most projects the first question is: what data do you have? And the answer is usually "none", "not enough", or "we have it but GDPR makes it off-limits".
syntherklaas skips the input-data step entirely: you have a short conversation about the shape you need — tables, columns, foreign keys, volumes, distributions — and it generates a coherent synthetic dataset from scratch.
Built on Faker (locale-aware) plus NL-locked providers for BSN (11-proof), IBAN (mod-97), postcode, and phone formats — fake values that still pass real validators. Packaged as a Claude Code skill: the dialog runs in chat, the schema is captured as a YAML, and a small Python generator turns that YAML into CSV, XLSX, SQLite, or a SQL dump.
Watch the 2-minute intro
▶ Watch on Veed — a quick tour: define two related tables, see the data model, pick volume distributions, and pick an output format.
Installation
Only tested with Claude Code.
npx skills@latest add baswenneker/syntherklaas
Restart Claude Code (or open a new session). The skill registers via .claude-plugin/plugin.json.
What a conversation looks like
The dialog is the product. You name the tables and columns; Claude picks the providers, infers the foreign keys, renders the model, and asks the right follow-ups. Here's a session — slightly compressed, but faithful to the real flow.
You: /syntherklaas
Claude: Which table do we start with? You can paste sample data, define columns together, or just give me a table name and I'll propose one.
You: users — user_id, first_name, last_name, bsn, email
Claude: Inferred for users:
| col | provider | constraint | example |
|---|
| user_id | sequential | PK | 1, 2, 3, ... |
| first_name | faker.first_name | | Pieter |
| last_name | faker.last_name | | Jansen |
| bsn | nl.bsn | UNIQUE | 391171823 |
| email | faker.email | | [email protected] |
Look right? (ok / change <col>)
You: ok. Next: invoices — invoice_id, user_id, description, amount, iban
Claude: Inferred for invoices:
| col | provider | constraint | example |
|---|
| invoice_id | sequential | PK | 1, 2, 3, ... |
| user_id | fk → users.user_id | NOT NULL | 1..1000 |
| description | faker.text | | "Lorem ipsum..." |
| amount | numeric_range (uniform) | | 137.42 |
| iban | nl.iban | | NL34RABO0382... |
Model so far:
┌─────────────────────────┐ ┌─────────────────────────┐
│ users │ 1 * │ invoices │
├─────────────────────────┤─────────┤─────────────────────────┤
│ user_id (PK) INT │ │ invoice_id (PK) INT │
│ first_name STR │ │ user_id (FK) INT ───┤
│ last_name STR │ │ description STR │
│ bsn (UQ) STR │ │ amount FLOAT │
│ email STR │ │ iban STR │
└─────────────────────────┘ └─────────────────────────┘
Another table, or done?
You: done.
Claude: Volumes per table — pick a distribution:
- Fixed — exact row count
- Poisson λ — natural variation around λ (e.g. "around 5 invoices per user")
- Normal μ ± σ — bell-shaped
- Uniform [min, max] — or just describe it in words
How many users? And how many invoices per user?
You: 1000 users; around 5 invoices per user.
Claude: Preview (10 rows per table)...
users (1000 total)
| user_id | first_name | last_name | bsn | email |
| 1 | Ali | Schellekens | 391171823 | leonholthuis@example… |