Skill

ollama-model-prompting

From ollama

Guidance on selecting and prompting open-weight Ollama models for review and rescue tasks

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/ollama:ollama-model-prompting

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Reference this skill when deciding which model to use and how to shape prompts for open-weight models.

SKILL.md

100 lines · ~1.5k tokens

Stats

LanguageJavaScript

Parent stars2

MaintenanceExcellent

Last CommitMay 1, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Ollama Model Prompting

Reference this skill when deciding which model to use and how to shape prompts for open-weight models.

Recommended Models Per Use Case

Empirically battle-tested. See docs/MODELS.md for the full results table, finding-count caveat, and reproducer.

Use case	First choice	Notes
Cloud, anything	`qwen3-coder-next:cloud`	Fastest tested anywhere (6–9 s per command).
Cloud, alt	`glm-5.1:cloud`	Reliable structured output; clean rescue.
Local all-rounder	`gpt-oss:20b`	Fastest local; reliable JSON; ~14 GB.
Local rescue	`gemma4:26b`	Most resourceful when `apply_patch` rejects.
VRAM-constrained	`qwen3.5:9b`	6.6 GB; works on every command.
Stop-review gate only	`qwen3.5:9b` or `gpt-oss:20b`	Either handles the single-line ALLOW/BLOCK format.

Select via --model <name> on any companion command. Falls back to OLLAMA_PLUGIN_DEFAULT_MODEL if set, otherwise the companion will error and prompt you to run /ollama:setup.

Models known to drift on the review JSON schema (use rescue-only or avoid): qwen3.6:27b-coding-nvfp4, batiai/qwen3.6-27b:q6, kimi-k2.6:cloud (review only — adversarial works).

Tool-Calling Support Matrix

Tool calling is required for the agentic rescue flow (default). Use --emit-patch to force patch-emit mode, which works without tool calling.

Model family	Tool calling	Notes
Llama 3.1+ / Llama 4+	Reliable	Native tool-call support since 3.1
Llama 3.2 3B/1B	Unreliable	Too small; output format degrades
Qwen 2.5 / Qwen 3+	Reliable	Solid tool-call format across sizes
Qwen 2.5/3 Coder	Reliable	Same base; code context does not hurt tool calls
DeepSeek-Coder-V2+ / DeepSeek-V2+	Reliable	Strong reasoning; good tool adherence
DeepSeek-R1 (distills)	Unreliable	Thinking tokens interfere with JSON/tool output
Mistral 7B	Unreliable	v0.2 and earlier lack native tool-call format
Mistral Large / Nemo / Small	Reliable	Larger Mistral variants support tool calls
GPT-OSS (20B/120B)	Reliable	OpenAI open-weight; native tool-call format
Gemma 3+	Reliable	Tool-call support added from Gemma 3 onward
Gemma 2 9B/27B	Partial	Tool-call-like output but not standard format
GLM 4+	Reliable	Strong instruction-following and tool format
Kimi K2+	Reliable	Cloud-hosted via Ollama; reliable tool calls
Command-R / Command-R+	Reliable	Cohere; designed for tool use and RAG
Granite 3	Reliable	IBM; native tool-call schema
Phi-3 / Phi-4	Unreliable	Small; JSON adherence inconsistent

When in doubt: test with the stop-review gate first (simple ALLOW/BLOCK output). If that fails, the model is not ready for structured JSON tasks.

Context Window Tradeoffs

Most local models have effective context windows of 8k–32k tokens, regardless of their advertised maximum.

Keep git diffs trimmed. The companion script chunks large diffs automatically, but oversized context degrades output quality more than it adds information.
Aim for diffs under 4k tokens for 7B–8B models. Up to 16k for 14B–16B models.
For adversarial-review, include only the changed files plus their direct dependencies — not the full repo context.
The schema inline in adversarial-review.md costs ~400 tokens. This is intentional: embedding the schema reduces hallucinated field names.

JSON-Mode Reliability

Ollama supports two JSON enforcement modes:

format: "json" — Constrains output to valid JSON but does not enforce a specific shape. Use as a fallback for any model when you need valid JSON but cannot use schema mode.

format: <schema> (Ollama ≥ 0.5) — Constrained decoding against a JSON Schema. Significantly more reliable for structured output. Use this for review and adversarial-review with the schema from schemas/review-output.schema.json.

When to fall back:

If the model is older or the Ollama version is < 0.5, use format: "json" and post-validate against the schema.
If post-validation fails, retry once with a stricter prompt: add "Your previous response was missing required fields. Respond again with ALL required fields." at the top.
If the second attempt also fails, surface the raw response with a clear error (see ollama-result-handling skill). Do not guess or fill in missing fields.

Prompting Style for Open Models

Open models need more explicit direction than GPT-class models. Follow these rules when shaping prompts:

Explicit beats implicit. State the output format requirement at the start AND at the end of every prompt. Repeating the constraint is not redundant — small models drift in long contexts.

Short beats long. Each additional 500 tokens of instruction increases the chance the model ignores an earlier constraint. Cut every section that does not carry load-bearing information.

Examples beat instructions. Where possible, show a concrete example of the expected output shape rather than only describing it. Even a partial example anchors the model's output format.

Repeat critical constraints. The JSON-only reminder appears twice in adversarial-review.md (before the schema and after). Keep both. Do not merge them into one.

Avoid deep chain-of-thought for small models. Multi-step reasoning prompts ("first consider X, then evaluate Y, then synthesize Z") work well for 70B+ models. For 7B–16B models they cause drift — the model exhausts its output on reasoning tokens before producing structured output. Front-load the decision; keep reasoning sections short.

Use the existing pseudo-XML tag style. The <role>, <task>, <output_format> tag structure is established in this project's prompts. Open models trained on instruction-following data handle this well. Do not switch to plain prose for consistency.

ollama-model-prompting

Popularity

Invocation

Context Preview

SKILL.md

ollama-model-prompting

Popularity

Invocation

Context Preview

SKILL.md

Ollama Model Prompting

Recommended Models Per Use Case

Tool-Calling Support Matrix

Context Window Tradeoffs

JSON-Mode Reliability

Prompting Style for Open Models

Similar Skills

Ollama Model Prompting

Recommended Models Per Use Case

Tool-Calling Support Matrix

Context Window Tradeoffs

JSON-Mode Reliability

Prompting Style for Open Models

Similar Skills