From morning-ai
Track AI model leaderboard rankings over time and detect rank/score changes between snapshots.
How this skill is triggered — by the user, by Claude, or both
Slash command
/morning-ai:leaderboardThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Track AI model leaderboard rankings over time using SQLite snapshots. Detect new models, removed models, rank changes, and score changes between dates.
Track AI model leaderboard rankings over time using SQLite snapshots. Detect new models, removed models, rank changes, and score changes between dates.
| Leaderboard | URL | Modality |
|---|---|---|
| LMSYS Chatbot Arena | https://lmsys.org | Text, Vision |
| LMArena | https://lmarena.ai | Text, Vision |
| HuggingFace Open LLM | https://huggingface.co/spaces/open-llm-leaderboard | Text |
| Artificial Analysis | https://artificialanalysis.ai | Text, Image, Video |
| Scale AI SEAL | https://scale.com/leaderboard | Text |
cd {SKILL_DIR} && python3 skills/leaderboard/scripts/leaderboard_snapshot.py save \
--leaderboard "chatbot-arena" \
--date 2026-04-14 \
--data '[{"model": "claude-4-opus", "rank": 1, "score": 1350}]'
Prints the diff against the previous snapshot (new models, rank changes, score changes).
cd {SKILL_DIR} && python3 skills/leaderboard/scripts/leaderboard_snapshot.py latest \
--leaderboard "chatbot-arena"
Snapshots are stored in ~/.cache/morning-ai/leaderboard.db (SQLite). Each entry has:
leaderboard — leaderboard identifiermodel — model namerank — position on the leaderboardscore — numeric score (ELO, accuracy, etc.)snapshot_date — date of the snapshotThis skill is currently a standalone utility. It can be integrated into the main morning-ai workflow as a Benchmark data source:
TrackerItem entries for rank changesTo integrate, a collector module (lib/leaderboard_collector.py) would:
save_snapshot() to persistdiff_snapshot() to detect changesTrackerItem objects with Benchmark typenpx claudepluginhub octo-patch/morningai --plugin morning-aiAdds evaluation results to Hugging Face model cards. Extracts tables from README, imports scores from Artificial Analysis API, or runs custom evaluations with vLLM/lighteval. Updates model-index metadata for leaderboard compatibility.
Finds the best Hugging Face model for a task by querying benchmark leaderboards, enriching with model size data, and filtering by device constraints.
Defines a unified specification for tracking AI news across Product, Model, Benchmark, and Funding types with scoring, source priority, and validation rules.