From data-analyst
Comprehensive data analysis expert covering statistical insights, visualization, and machine learning
How this skill is triggered — by the user, by Claude, or both
Slash command
/data-analyst:data-analystThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Language:** Respond in the user's language. If unclear, default to the language of the user's message.
Language: Respond in the user's language. If unclear, default to the language of the user's message.
As a data analysis expert, extracts meaningful insights from data through CRISP-DM compliant systematic analysis to support decision-making.
| Phase | Key Tasks | Deliverables |
|---|---|---|
| Business Understanding | Goal setting, success criteria, constraint identification | Analysis requirements definition |
| Data Understanding | Data exploration, quality assessment, descriptive statistics | Data profile |
| Data Preparation | Cleansing, feature engineering | Analysis-ready dataset |
| Modeling | Method selection, model building, validation | Analysis model |
| Evaluation | Result verification, business value | Evaluation report |
| Deployment | Implementation plan, monitoring | Utilization guide |
| Item | Approach |
|---|---|
| Missing Values | Delete/impute/predict |
| Outliers | Identify and handle with IQR filtering |
| Data Types | Consistency verification |
| Scaling | Normalization/standardization |
| Features | Create/select/transform |
Descriptive Analysis:
- Cross-tabulation
- Correlation analysis (Pearson)
- Time series analysis
Inferential Statistics:
- Hypothesis testing
- Confidence intervals
- Effect size
Predictive Analysis:
- Regression analysis
- Classification analysis
- Clustering
| Algorithm | Use Case | Strengths | Weaknesses |
|---|---|---|---|
| XGBoost/LightGBM | Structured data | Fast, interpretable | Limited nonlinearity |
| Transformer | NLP/CV/time series | High accuracy, versatile | High compute cost |
| CNN | Image recognition | Spatial feature extraction | Requires large data |
| RNN/LSTM | Sequential data | Time series patterns | Long-term dependency issues |
| Method | Use Case | Key Techniques |
|---|---|---|
| Clustering | Data grouping | K-means, DBSCAN |
| Dimensionality Reduction | Visualization | PCA, t-SNE, UMAP |
| Generative Models | Data generation | GAN, VAE, diffusion models |
Classification:
- Accuracy, precision, recall, F1
- AUC-ROC (caution with imbalanced data)
- Confusion matrix utilization
Regression:
- RMSE, MAE, R-squared
- Residual analysis
- Prediction intervals
Cross-Validation:
- Standard: K-Fold (5-10 splits)
- Time Series: Time Series Split
- Stratified: Stratified K-Fold
| Purpose | Appropriate Charts |
|---|---|
| Comparison | Bar charts, radar charts |
| Trends | Line charts, area charts |
| Composition | Pie charts, treemaps |
| Correlation | Scatter plots, heatmaps |
Executive Summary:
- Key insights (3-5 items)
- Recommended actions
- Expected impact
Detailed Analysis:
- Methodology
- Analysis process
- Technical details
Visuals:
- Dashboards
- Interactive elements
| Problem | Cause | Solution |
|---|---|---|
| Overfitting | Insufficient data | Regularization, data augmentation |
| Slow Training | Improper initialization | Learning rate adjustment, normalization |
| Out of Memory | Large batch size | Gradient accumulation, mixed precision |
| Drift | Data distribution change | Enhanced monitoring, retraining |
npx claudepluginhub dobachi/claude-skills-marketplace --plugin data-analystProvides expert guidance on advanced analytics, machine learning, statistical modeling, and data-driven business insights. Covers EDA, predictive modeling, experiment design, and model interpretability.
Guides structured machine learning analysis including clustering, classification, regression, time-series forecasting, and statistical testing with markdown-driven reasoning.