From synthdata
Compute derived, aggregated, or transformed tables from existing datasets. Use this skill when the user needs to "compute monthly scores", "aggregate by month", "create a summary table", "derive risk scores", "compute percentile ranks", "roll up events", "create benchmarks from raw data", "add a computed column", or bridge the gap between raw generated tables and downstream analytics. Works on xlsx, csv, or json input. Claude writes the computation logic; the script handles data I/O.
How this skill is triggered — by the user, by Claude, or both
Slash command
/synthdata:synthdata-computeThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Derive aggregated, scored, or transformed tables from existing datasets. This skill bridges the
Derive aggregated, scored, or transformed tables from existing datasets. This skill bridges the
gap between raw synthetic data (produced by synthdata-generate) and the derived metrics that
downstream tools expect — monthly rollups, composite scores, percentile ranks, segment summaries.
pip install openpyxl pandas numpy --break-system-packages
Ask the user:
Run the script in inspect mode to understand what we're working with:
python3 scripts/compute.py --inspect --input data.xlsx
This prints each table's name, column names with dtypes, row count, and a 3-row sample. Use this to understand the schema before writing computation logic.
Write a Python file that operates on pre-loaded DataFrames. The script provides:
tables — dict mapping table names to pandas DataFrames (all input tables pre-loaded)pd — pandasnp — numpyresult — empty dict; assign {name: DataFrame} pairs for each output tableExample computation file (monthly_risk.py):
import pandas as pd
import numpy as np
users = tables['users']
threats = tables['threat_events']
sims = tables['phishing_sims']
training = tables['training']
dlp = tables['dlp_events']
abuse = tables['abuse_mailbox']
# Extract month from event dates
threats['month'] = pd.to_datetime(threats['event_date']).dt.to_period('M')
sims['month'] = pd.to_datetime(sims['sim_date']).dt.to_period('M') if 'sim_date' in sims else ...
# Group and aggregate per user per month
# ... (Claude writes the full logic based on the user's requirements)
result['monthly_risk'] = monthly_risk_df
Save the computation to a .py file, then execute it.
python3 scripts/compute.py --input data.xlsx --code monthly_risk.py --output data_with_scores.xlsx
CLI flags:
| Flag | Description |
|---|---|
--input | Source dataset (xlsx/csv-dir/json) |
--output | Output path (default: <input>_computed.<ext>) |
--inspect | Print table schemas and exit |
--code | Path to a Python file with computation logic |
--expr | Inline Python expression (for simple one-liners) |
--append | Add computed tables to a copy of the input file (xlsx only) |
Inline mode for simple computations:
python3 scripts/compute.py --input data.xlsx --output scored.xlsx \
--expr "result['summary'] = tables['events'].groupby('category').size().reset_index(name='count')"
After execution, the script prints:
Review the output with the user before proceeding.
Use these patterns as starting points when writing computation logic.
events = tables['threat_events']
events['month'] = pd.to_datetime(events['event_date']).dt.to_period('M').astype(str)
monthly = events.groupby(['user_id', 'month']).agg(
threat_events=('event_id', 'count'),
real_clicks=('clicked', 'sum'),
credentials_entered=('credentials_entered', 'sum'),
).reset_index()
result['monthly_threats'] = monthly
# Weights must sum to 1.0
WEIGHTS = {'threat': 0.35, 'sim': 0.25, 'dlp': 0.20, 'training': 0.10, 'reporting': 0.10}
df['composite'] = (
df['threat_score'] * WEIGHTS['threat']
+ df['sim_score'] * WEIGHTS['sim']
+ df['dlp_score'] * WEIGHTS['dlp']
+ df['training_score'] * WEIGHTS['training']
+ df['reporting_score'] * WEIGHTS['reporting']
)
def assign_tier(score):
if score >= 75: return 'Critical'
if score >= 50: return 'High'
if score >= 25: return 'Medium'
return 'Low'
df['risk_tier'] = df['composite_risk_score'].apply(assign_tier)
df['percentile'] = df['composite_risk_score'].rank(pct=True) * 100
peers = tables['peers_detail']
metrics = ['sim_click_rate', 'training_completion', 'composite_risk_score']
rows = []
for segment_col in ['segment_healthcare', 'segment_geo', 'segment_size']:
cohort = peers[peers[segment_col] == 1]
for m in metrics:
rows.append({
'segment': segment_col, 'metric': m,
'mean': cohort[m].mean(), 'median': cohort[m].median(),
'p25': cohort[m].quantile(0.25), 'p75': cohort[m].quantile(0.75),
})
result['segment_summaries'] = pd.DataFrame(rows)
users = tables['users']
risk = tables['monthly_risk']
merged = risk.merge(users[['user_id', 'department']], on='user_id')
dept = merged.groupby('department').agg(
avg_risk=('composite_risk_score', 'mean'),
critical_count=('risk_tier', lambda x: (x == 'Critical').sum()),
user_count=('user_id', 'nunique'),
).sort_values('avg_risk', ascending=False).reset_index()
result['department_risk'] = dept
| Need | Skill |
|---|---|
| Generate new rows from a schema/template | synthdata-generate |
| Extract Excel → JSON | synthdata-extract |
| Add rows/columns to existing data | synthdata-extend |
| Replace PII with fake values | synthdata-anonymize |
| Aggregate, score, rank, or derive new tables from existing data | synthdata-compute |
npx claudepluginhub rappdw/synthdataProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.