By rohitg00
Evaluate single ML models or compare multiple ones on test datasets across classification, regression, NLP, and generative tasks. Compute metrics, statistical significance, inference performance, costs, robustness, bias checks; generate visualized reports with confusion matrices, performance profiles, tables, rankings, and recommendations.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub rohitg00/awesome-claude-code-toolkit --plugin model-evaluatorPersistent memory for AI coding agents -- captures tool usage, compresses via LLM, injects context into future sessions. 12 hooks, 41 MCP tools, 4 skills, real-time viewer.
Complete AI coding workflow system. Self-correcting memory + persistent FTS5-indexed research wikis + auto-research loop + multi-LLM council on a single SQLite store. 33 skills, 8 agents, 22 commands, 37 hook scripts across 24 events. Cross-agent via SkillKit.
Complete developer toolkit for Claude Code
GitHub issue triage, creation, and management
Google Cloud Platform service configuration and deployment
Comprehensive model evaluation with multiple metrics
ML experiment tracking with metrics logging and run comparison
ML/perf investigation skills: topic, plan, judge, run, sweep
ML engineering plugin: Give your AI coding agent ML engineering superpowers.
Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub
Agent and skill evaluation harness with MLflow integration