Skill

synthdata-compute

Compute derived, aggregated, or transformed tables from existing datasets. Use this skill when the user needs to "compute monthly scores", "aggregate by month", "create a summary table", "derive risk scores", "compute percentile ranks", "roll up events", "create benchmarks from raw data", "add a computed column", or bridge the gap between raw generated tables and downstream analytics. Works on xlsx, csv, or json input. Claude writes the computation logic; the script handles data I/O.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/synthdata:synthdata-compute

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

Read Bash Glob Write

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Derive aggregated, scored, or transformed tables from existing datasets. This skill bridges the

Supporting Files

scripts/compute.py

SKILL.md

206 lines · ~1.6k tokens

Stats

LanguagePython

Stars0

MaintenanceGood

Last CommitApr 9, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Synthdata Compute

Derive aggregated, scored, or transformed tables from existing datasets. This skill bridges the gap between raw synthetic data (produced by synthdata-generate) and the derived metrics that downstream tools expect — monthly rollups, composite scores, percentile ranks, segment summaries.

Prerequisites

pip install openpyxl pandas numpy --break-system-packages

Workflow

Step 1: Identify input and desired output

Ask the user:

Input: What file(s)? (xlsx workbook, csv directory, json files)
Computation: What do you need? (natural language — e.g., "monthly risk scores from event tables")
Output: Where should results go? (default: adds sheets/files alongside input)

Step 2: Inspect the data

Run the script in inspect mode to understand what we're working with:

python3 scripts/compute.py --inspect --input data.xlsx

This prints each table's name, column names with dtypes, row count, and a 3-row sample. Use this to understand the schema before writing computation logic.

Step 3: Write the computation

Write a Python file that operates on pre-loaded DataFrames. The script provides:

tables — dict mapping table names to pandas DataFrames (all input tables pre-loaded)
pd — pandas
np — numpy
result — empty dict; assign {name: DataFrame} pairs for each output table

Example computation file (monthly_risk.py):

import pandas as pd
import numpy as np

users = tables['users']
threats = tables['threat_events']
sims = tables['phishing_sims']
training = tables['training']
dlp = tables['dlp_events']
abuse = tables['abuse_mailbox']

# Extract month from event dates
threats['month'] = pd.to_datetime(threats['event_date']).dt.to_period('M')
sims['month'] = pd.to_datetime(sims['sim_date']).dt.to_period('M') if 'sim_date' in sims else ...

# Group and aggregate per user per month
# ... (Claude writes the full logic based on the user's requirements)

result['monthly_risk'] = monthly_risk_df

Save the computation to a .py file, then execute it.

Step 4: Execute

python3 scripts/compute.py --input data.xlsx --code monthly_risk.py --output data_with_scores.xlsx

CLI flags:

Flag	Description
`--input`	Source dataset (xlsx/csv-dir/json)
`--output`	Output path (default: `<input>_computed.<ext>`)
`--inspect`	Print table schemas and exit
`--code`	Path to a Python file with computation logic
`--expr`	Inline Python expression (for simple one-liners)
`--append`	Add computed tables to a copy of the input file (xlsx only)

Inline mode for simple computations:

python3 scripts/compute.py --input data.xlsx --output scored.xlsx \
    --expr "result['summary'] = tables['events'].groupby('category').size().reset_index(name='count')"

Step 5: Validate

After execution, the script prints:

Each output table's name, row count, column names
A 5-row sample of each output table
Confirmation of output path

Review the output with the user before proceeding.

Common Computation Recipes

Use these patterns as starting points when writing computation logic.

Monthly event rollup

events = tables['threat_events']
events['month'] = pd.to_datetime(events['event_date']).dt.to_period('M').astype(str)
monthly = events.groupby(['user_id', 'month']).agg(
    threat_events=('event_id', 'count'),
    real_clicks=('clicked', 'sum'),
    credentials_entered=('credentials_entered', 'sum'),
).reset_index()
result['monthly_threats'] = monthly

Composite score from weighted sub-scores

# Weights must sum to 1.0
WEIGHTS = {'threat': 0.35, 'sim': 0.25, 'dlp': 0.20, 'training': 0.10, 'reporting': 0.10}

df['composite'] = (
    df['threat_score'] * WEIGHTS['threat']
    + df['sim_score'] * WEIGHTS['sim']
    + df['dlp_score'] * WEIGHTS['dlp']
    + df['training_score'] * WEIGHTS['training']
    + df['reporting_score'] * WEIGHTS['reporting']
)

Risk tier assignment

def assign_tier(score):
    if score >= 75: return 'Critical'
    if score >= 50: return 'High'
    if score >= 25: return 'Medium'
    return 'Low'

df['risk_tier'] = df['composite_risk_score'].apply(assign_tier)

Percentile rank

df['percentile'] = df['composite_risk_score'].rank(pct=True) * 100

Segment summary (peer benchmarks)

peers = tables['peers_detail']
metrics = ['sim_click_rate', 'training_completion', 'composite_risk_score']
rows = []
for segment_col in ['segment_healthcare', 'segment_geo', 'segment_size']:
    cohort = peers[peers[segment_col] == 1]
    for m in metrics:
        rows.append({
            'segment': segment_col, 'metric': m,
            'mean': cohort[m].mean(), 'median': cohort[m].median(),
            'p25': cohort[m].quantile(0.25), 'p75': cohort[m].quantile(0.75),
        })
result['segment_summaries'] = pd.DataFrame(rows)

Department breakdown

users = tables['users']
risk = tables['monthly_risk']
merged = risk.merge(users[['user_id', 'department']], on='user_id')
dept = merged.groupby('department').agg(
    avg_risk=('composite_risk_score', 'mean'),
    critical_count=('risk_tier', lambda x: (x == 'Critical').sum()),
    user_count=('user_id', 'nunique'),
).sort_values('avg_risk', ascending=False).reset_index()
result['department_risk'] = dept

When to use this skill vs others

Need	Skill
Generate new rows from a schema/template	`synthdata-generate`
Extract Excel → JSON	`synthdata-extract`
Add rows/columns to existing data	`synthdata-extend`
Replace PII with fake values	`synthdata-anonymize`
Aggregate, score, rank, or derive new tables from existing data	synthdata-compute

synthdata-compute

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

synthdata-compute

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Synthdata Compute

Prerequisites

Workflow

Step 1: Identify input and desired output

Step 2: Inspect the data

Step 3: Write the computation

Step 4: Execute

Step 5: Validate

Common Computation Recipes

Monthly event rollup

Composite score from weighted sub-scores

Risk tier assignment

Percentile rank

Segment summary (peer benchmarks)

Department breakdown

When to use this skill vs others

Similar Skills

Synthdata Compute

Prerequisites

Workflow

Step 1: Identify input and desired output

Step 2: Inspect the data

Step 3: Write the computation

Step 4: Execute

Step 5: Validate

Common Computation Recipes

Monthly event rollup

Composite score from weighted sub-scores

Risk tier assignment

Percentile rank

Segment summary (peer benchmarks)

Department breakdown

When to use this skill vs others

Similar Skills