Slash Command

/compare-models

Compares multiple ML models on a shared test dataset, evaluating metrics, statistical significance, inference performance, costs, robustness, and generates a report with tables, rankings, and recommendations.

Hugging Face

Popularity

Parent stars

1,716

Parent forks

535

Invocation

How this command is triggered — by the user, by Claude, or both

Slash command

/model-evaluator:compare-models

Model invocable

No pre-commands

Context Preview

The summary Claude sees in its command listing — used to decide when to auto-load this command

# /compare-models - Compare ML Models

Compare multiple ML models to select the best performer.

## Steps

1. Ask the user for the models to compare and the evaluation dataset
2. Load all models and verify they accept the same input format
3. Run inference with each model on the identical test dataset
4. Calculate the same metrics for all models for fair comparison
5. Create a side-by-side comparison table with all metrics
6. Perform statistical significance testing between model pairs (McNemar, paired t-test)
7. Compare inference performance: latency, throughput, memory footprint
8. Calcul...

Command Content

29 lines · ~357 tokens

Stats

LanguageJavaScript

Parent stars1,716

Parent forks535

MaintenanceFair

Last CommitFeb 4, 2026

Actions

View Source View Plugin View on GitHub View README

/compare-models - Compare ML Models

Compare multiple ML models to select the best performer.

Steps

Ask the user for the models to compare and the evaluation dataset
Load all models and verify they accept the same input format
Run inference with each model on the identical test dataset
Calculate the same metrics for all models for fair comparison
Create a side-by-side comparison table with all metrics
Perform statistical significance testing between model pairs (McNemar, paired t-test)
Compare inference performance: latency, throughput, memory footprint
Calculate the cost-performance trade-off: accuracy vs compute cost
Identify which model performs best on specific data subsets
Evaluate robustness: test with noisy or adversarial inputs
Create a recommendation based on the use case priorities (accuracy vs speed vs cost)
Generate a comparison report with tables, rankings, and the recommended model

Rules

Use the exact same test data and preprocessing for all models
Apply statistical significance tests; do not rely on point estimates alone
Consider practical significance, not just statistical significance
Include model size and inference cost in the comparison
Test edge cases that differentiate the models
Report the evaluation methodology for reproducibility
Consider deployment constraints (model size, latency requirements) in recommendations

/compare-models

Popularity

Invocation

Context Preview

Command Content

/compare-models

Popularity

Invocation

Context Preview

Command Content

/compare-models - Compare ML Models

Steps

Rules

Other plugins with /compare-models

/compare-models - Compare ML Models

Steps

Rules

Other plugins with /compare-models