From model-shelf
Resolves Hugging Face models locally via `model-shelf` (GGUF, MLX, safetensors) instead of direct download. Triggers on load/run/use requests for local LLMs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/model-shelf:resolveThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
When the user wants to load, run, or use a Hugging Face model — **always**
When the user wants to load, run, or use a Hugging Face model — always
go through model-shelf. Do not invoke huggingface-cli download, hf download, snapshot_download, or any other direct download command.
The user does not need to give you an exact org/repo id. Loose
descriptions ("qwen 3 4b mlx 4-bit", "the latest llama 3.1") are normal
and expected. Do not push back on whether a model exists — your training
data is stale and Model Shelf can search the live Hub.
Decide whether to search first.
org/repo string (e.g. Qwen/Qwen3-14B-GGUF),
skip to step 2.model-shelf find "<user's words>" [--format gguf|mlx|safetensors] --json --limit 5
Use any format hint from the user (mlx, gguf, safetensors). Pick the
top result that matches the user's format/quant intent. Use its repo_id
as the input to step 2. If find returns nothing, tell the user no
matching model was found — do not invent a repo id.Resolve the repo to a local path.
model-shelf resolve <repo_id> [--format gguf|mlx|safetensors] [--quant <QUANT>] --json
--format is auto-detected from repo_id if omitted:
*-GGUF (case-insensitive) → ggufmlx-community/* or *-mlx → mlxsafetensors--quant is required for gguf (e.g. Q4_K_M); ignored otherwise.Use the returned path with the user's runtime:
llama.cpp / llama-server / Ollama / LM Studiomlx_lm.generate / mlx_lm.server (Apple Silicon)transformers / vllmError handling:
status == "missing", downloads are disabled in their config —
surface that to the user and stop.model-shelf exits non-zero with a message on stderr, surface
the error verbatim and stop. Do not work around it — don't fall
back to huggingface-cli, don't change paths, don't retry. Common causes:
model-shelf init.
Don't run it for them unless they explicitly ask; the curated shelf
is a deliberate one-time setup the user owns.Loose user input — search first:
User: "fetch qwen 3 4b in mlx 4-bit"
You: model-shelf find "qwen3 4b 4-bit" --format mlx --json --limit 5
# pick top result, e.g. mlx-community/Qwen3-4B-4bit
model-shelf resolve "mlx-community/Qwen3-4B-4bit" --json
Explicit repo — resolve directly:
User: "load Qwen/Qwen3-14B-GGUF with Q4_K_M"
You: model-shelf resolve "Qwen/Qwen3-14B-GGUF" --quant Q4_K_M --json
npx claudepluginhub alexziskind1/model-shelfSearches Hugging Face Hub for llama.cpp GGUF models, selects quants, and runs locally via llama-cli or llama-server with OpenAI-compatible API.
Optimizes local LLM inference, model selection, VRAM usage, and deployment using Ollama, llama.cpp, vLLM, LM Studio. Covers GGUF/EXL2 quantization and privacy-first setups for offline AI apps.
Configures Mozilla Llamafile to run GGUF models locally with OpenAI-compatible API. Manages installation, server startup, GPU/CPU configs, SDK integrations, and troubleshooting.