Skip to main content

/

/

Stats

Actions

Tags

Stats

Actions

Tags

ClaudePluginHub

Community directory for discovering and installing Claude Code plugins.

Find plugins for your project

AI-powered recommendations based on your stack.

Product

Browse Plugins
Marketplaces
Pricing
About
Contact

Resources

Learning Center
Blog
Weekly Digest
Claude Code Docs
Plugin Guide
Plugin Reference
Plugin Marketplaces

Community

Browse on GitHub
Get Support

Legal

Terms of Service
Privacy Policy

Browse · Plugins · Top Plugins · Marketplaces · Components · Technologies · Skills · Agents · Commands · Hooks · MCP Servers · LSP Servers · Output Styles · Themes · Monitors

Categories · Productivity · Development · Testing · Deployment · Security · Documentation · Data · Utilities

© 2025 ClaudePluginHub

Community Maintained · Not affiliated with Anthropic

ClaudePluginHub

ClaudePluginHub

Tools Learn Pricing

Search everything...

mlx-inference-optimizer | mlx-optimizer

Home
Skills
mlx-optimizer
mlx-inference-optimizer

Skill

mlx-inference-optimizer

From mlx-optimizer

Optimize Python MLX inference and generation loops with warmup, batching, cache handling, synchronization, quantization, and memory checks.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/mlx-optimizer:mlx-inference-optimizer

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use this skill for MLX inference, generation, serving loops, batch scoring,

SKILL.md

39 lines · ~315 tokens

Stats

LanguagePython

Parent stars0

MaintenanceExcellent

Last CommitJun 15, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

LanguagePython

Parent stars0

MaintenanceExcellent

Last CommitJun 15, 2026

Actions

View Source View Plugin View on GitHub View README

MLX Inference Optimizer

Use this skill for MLX inference, generation, serving loops, batch scoring, streaming output, or latency/throughput questions.

Python Environment

Before any Python execution, use the target repo's .venv. Never install Python packages globally.

Required References

../../references/inference-patterns.md
../../references/eval-and-synchronization.md
../../references/memory-and-dtypes.md

Workflow

Identify the exact inference entry point and representative inputs.
Separate first-output latency from steady-state throughput.
Inspect warmup, batching, scalar extraction, streaming sync, cache growth, quantized model loading, and dtype policy.
Confirm the benchmark forces completion before stopping timers.
Verify output correctness or equivalence before accepting speedups.

Evidence To Capture

Prompt/input shape and batch size.
Warmup and measured run counts.
Synchronization boundary.
Median and range of wall time.
Memory telemetry.
Correctness or output-equivalence rule.

$

npx claudepluginhub sealad886/mlx-optimizer-plugin --plugin mlx-optimizer

Similar Skills

skill-creator

150.3k

Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.

17 files

document-skills

View skill-creator