From ollama
Guidance on selecting and prompting open-weight Ollama models for review and rescue tasks
How this skill is triggered — by the user, by Claude, or both
Slash command
/ollama:ollama-model-promptingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Reference this skill when deciding which model to use and how to shape prompts for open-weight models.
Reference this skill when deciding which model to use and how to shape prompts for open-weight models.
Empirically battle-tested. See docs/MODELS.md for the full results table, finding-count caveat, and reproducer.
| Use case | First choice | Notes |
|---|---|---|
| Cloud, anything | qwen3-coder-next:cloud | Fastest tested anywhere (6–9 s per command). |
| Cloud, alt | glm-5.1:cloud | Reliable structured output; clean rescue. |
| Local all-rounder | gpt-oss:20b | Fastest local; reliable JSON; ~14 GB. |
| Local rescue | gemma4:26b | Most resourceful when apply_patch rejects. |
| VRAM-constrained | qwen3.5:9b | 6.6 GB; works on every command. |
| Stop-review gate only | qwen3.5:9b or gpt-oss:20b | Either handles the single-line ALLOW/BLOCK format. |
Select via --model <name> on any companion command. Falls back to OLLAMA_PLUGIN_DEFAULT_MODEL if set, otherwise the companion will error and prompt you to run /ollama:setup.
Models known to drift on the review JSON schema (use rescue-only or avoid): qwen3.6:27b-coding-nvfp4, batiai/qwen3.6-27b:q6, kimi-k2.6:cloud (review only — adversarial works).
Tool calling is required for the agentic rescue flow (default). Use --emit-patch to force patch-emit mode, which works without tool calling.
| Model family | Tool calling | Notes |
|---|---|---|
| Llama 3.1+ / Llama 4+ | Reliable | Native tool-call support since 3.1 |
| Llama 3.2 3B/1B | Unreliable | Too small; output format degrades |
| Qwen 2.5 / Qwen 3+ | Reliable | Solid tool-call format across sizes |
| Qwen 2.5/3 Coder | Reliable | Same base; code context does not hurt tool calls |
| DeepSeek-Coder-V2+ / DeepSeek-V2+ | Reliable | Strong reasoning; good tool adherence |
| DeepSeek-R1 (distills) | Unreliable | Thinking tokens interfere with JSON/tool output |
| Mistral 7B | Unreliable | v0.2 and earlier lack native tool-call format |
| Mistral Large / Nemo / Small | Reliable | Larger Mistral variants support tool calls |
| GPT-OSS (20B/120B) | Reliable | OpenAI open-weight; native tool-call format |
| Gemma 3+ | Reliable | Tool-call support added from Gemma 3 onward |
| Gemma 2 9B/27B | Partial | Tool-call-like output but not standard format |
| GLM 4+ | Reliable | Strong instruction-following and tool format |
| Kimi K2+ | Reliable | Cloud-hosted via Ollama; reliable tool calls |
| Command-R / Command-R+ | Reliable | Cohere; designed for tool use and RAG |
| Granite 3 | Reliable | IBM; native tool-call schema |
| Phi-3 / Phi-4 | Unreliable | Small; JSON adherence inconsistent |
When in doubt: test with the stop-review gate first (simple ALLOW/BLOCK output). If that fails, the model is not ready for structured JSON tasks.
Most local models have effective context windows of 8k–32k tokens, regardless of their advertised maximum.
adversarial-review, include only the changed files plus their direct dependencies — not the full repo context.adversarial-review.md costs ~400 tokens. This is intentional: embedding the schema reduces hallucinated field names.Ollama supports two JSON enforcement modes:
format: "json" — Constrains output to valid JSON but does not enforce a specific shape. Use as a fallback for any model when you need valid JSON but cannot use schema mode.
format: <schema> (Ollama ≥ 0.5) — Constrained decoding against a JSON Schema. Significantly more reliable for structured output. Use this for review and adversarial-review with the schema from schemas/review-output.schema.json.
When to fall back:
format: "json" and post-validate against the schema.ollama-result-handling skill). Do not guess or fill in missing fields.Open models need more explicit direction than GPT-class models. Follow these rules when shaping prompts:
Explicit beats implicit. State the output format requirement at the start AND at the end of every prompt. Repeating the constraint is not redundant — small models drift in long contexts.
Short beats long. Each additional 500 tokens of instruction increases the chance the model ignores an earlier constraint. Cut every section that does not carry load-bearing information.
Examples beat instructions. Where possible, show a concrete example of the expected output shape rather than only describing it. Even a partial example anchors the model's output format.
Repeat critical constraints. The JSON-only reminder appears twice in adversarial-review.md (before the schema and after). Keep both. Do not merge them into one.
Avoid deep chain-of-thought for small models. Multi-step reasoning prompts ("first consider X, then evaluate Y, then synthesize Z") work well for 70B+ models. For 7B–16B models they cause drift — the model exhausts its output on reasoning tokens before producing structured output. Front-load the decision; keep reasoning sections short.
Use the existing pseudo-XML tag style. The <role>, <task>, <output_format> tag structure is established in this project's prompts. Open models trained on instruction-following data handle this well. Do not switch to plain prose for consistency.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub darrylmorley/ollama-plugin-cc --plugin ollama