From compound-ml
Run an end-to-end analysis on a dataset: profile, cluster, detect anomalies, and generate a plain-language report. Use when the user wants a comprehensive analysis, says 'analyze this data', 'find patterns', 'what can you tell me about this dataset', or wants to go from raw data to actionable insights.
How this skill is triggered — by the user, by Claude, or both
Slash command
/compound-ml:ml-analyzeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The flagship analysis skill. Takes a dataset and a natural language objective, automatically selects and runs the appropriate analysis methods, reviews the results for quality, and generates a comprehensive plain-language report.
The flagship analysis skill. Takes a dataset and a natural language objective, automatically selects and runs the appropriate analysis methods, reviews the results for quality, and generates a comprehensive plain-language report.
This skill orchestrates the same techniques available in ml-explore, ml-cluster, and ml-anomalies, but in a single automated pipeline. Use the individual skills for more focused, interactive work.
If no objective is provided, run a general analysis: profile, cluster, and check for anomalies.
Check core packages (pandas, sklearn). Report and stop if missing.
Check optional packages (umap, hdbscan, sentence-transformers, matplotlib) and note which are available — this affects which analysis methods can run.
Load and profile the data:
Write profile to checkpoint: .ml-checkpoints/ml-analyze/<timestamp>/profile.json
Report a brief profile summary to the user before proceeding:
"Loaded [filename]: [N] rows x [M] columns. [Brief description of data type and notable features]. Planning analysis approach..."
Based on the data profile and the user's objective, decide which analyses to run. This is inline reasoning — not a separate agent call.
Decision framework:
| Data type | Objective signals | Analysis to run |
|---|---|---|
| Text-heavy | "topics", "themes", "categories", "group" | Clustering with embedding |
| Text-heavy | "unusual", "outlier", "suspicious", "different" | Anomaly detection with embedding |
| Text-heavy | No specific objective | Clustering + anomaly detection |
| Numeric | "segments", "groups", "types" | Clustering on features |
| Numeric | "unusual", "outlier", "anomaly" | Anomaly detection on features |
| Numeric | No specific objective | Clustering + anomaly detection |
| Mixed | Any | Embed text + scale numeric, run both |
| Any | "explore", "profile", "understand" | Extended profiling (skip clustering/anomalies) |
Always run profiling. Add clustering, anomaly detection, or both based on the table above.
Report the plan:
"Analysis plan: [1] Profile the data, [2] Find natural groups using clustering, [3] Flag unusual items. Proceeding..."
Follow the same embedding/representation logic as ml-cluster Phase 2:
Write to checkpoint: .ml-checkpoints/ml-analyze/<timestamp>/representations.npy
Use timeout: 600000 for embedding generation on large datasets.
Run each selected analysis method:
Clustering (if selected):
Follow ml-cluster Phases 3-5:
Write to checkpoint: .ml-checkpoints/ml-analyze/<timestamp>/clusters.json
Anomaly detection (if selected):
Follow ml-anomalies Phases 3-4:
Write to checkpoint: .ml-checkpoints/ml-analyze/<timestamp>/anomalies.json
Use timeout: 600000 for UMAP and clustering on large datasets.
Invoke the compound-ml:review:ml-output-reviewer agent to check results for quality issues:
If the reviewer flags issues, adjust and re-run the affected analysis step with different parameters. If issues persist after one retry, include the quality concerns in the report.
Produce a comprehensive markdown report. The report must be readable by someone with no ML background — all findings explained in plain language.
Report structure:
# Analysis Report: [filename]
**Date:** [YYYY-MM-DD]
**Objective:** [user's objective or "General analysis"]
**Data:** [N] rows x [M] columns
## Executive Summary
[3-5 sentences capturing the most important findings. Lead with actionable insights, not methodology.]
## Data Overview
[Brief profile: what the data contains, data quality notes, any sampling applied]
## Findings
### Groups Discovered
[If clustering was run — describe each group with labels, sizes, descriptions, and representative examples. Use the same format as ml-cluster Phase 6.]
### Unusual Items
[If anomaly detection was run — describe top anomalies with explanations. Use the same format as ml-anomalies Phase 6.]
## Methodology
[Brief plain-language description of what methods were used and why. Mention embedding type, clustering algorithm, anomaly detectors. Keep this to 2-3 sentences — it's here for reproducibility, not the main event.]
## Recommended Next Steps
[Actionable suggestions based on findings:
- What to investigate further
- Which groups or anomalies deserve attention
- What additional data might help
- Which individual skills to use for deeper dives]
Write the report to .ml-checkpoints/ml-analyze/<timestamp>/report.md and also display it directly to the user.
On start, check for recent checkpoints (<24h) in .ml-checkpoints/ml-analyze/:
references/workflow-guide.md — Overview of the analysis pipeline for curious usersnpx claudepluginhub milasaurus/compound-ml --plugin compound-mlGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.