Pick Model
Given a task description, recommend the appropriate model tier (small/fast, mid-tier, or
frontier) and whether extended thinking is warranted. The recommendation should be
model-family-agnostic — it applies whether the user is choosing between Claude, GPT, Gemini,
or any other provider.
The goal is to prevent two common mistakes: using a frontier model for a task a small model
handles fine (wasting cost and latency), and using a small model for a task that genuinely
needs deeper reasoning (wasting time on bad output).
Behavior Constraints
<output_contract>
Present the recommendation as:
- Task profile — restate what the task requires in terms of:
- Reasoning depth (pattern matching, multi-step logic, novel synthesis)
- Knowledge breadth (narrow domain, cross-domain, world knowledge)
- Output complexity (short/structured, long-form, creative)
- Accuracy stakes (best-effort is fine, errors are costly, errors are dangerous)
- Recommended tier — one of:
- Small/fast (Haiku, GPT-4.1-mini, Gemini Flash, etc.) — when the task is
straightforward: classification, extraction, reformatting, simple Q&A, boilerplate
generation, high-volume/low-stakes work
- Mid-tier (Sonnet, GPT-4.1, Gemini Pro, etc.) — when the task needs solid reasoning
and good instruction-following but isn't pushing boundaries: code generation, analysis,
summarization, multi-step workflows, most everyday work
- Frontier (Opus, GPT-5, Gemini Ultra/Deep Think, etc.) — when the task requires
novel reasoning, complex multi-domain synthesis, high-stakes accuracy, or long-horizon
planning
- Extended thinking — whether to enable it, and why:
- Yes — when the task involves complex reasoning chains, math, code debugging with
subtle logic errors, ambiguous multi-constraint problems, or tasks where "thinking out
loud" visibly improves accuracy
- No — when the task is generation-heavy rather than reasoning-heavy, when latency
matters more than depth, or when the task is straightforward enough that thinking overhead
adds cost without improving output
- Confidence and caveats — how confident the recommendation is, and what would change it
(e.g., "if the input data is messier than described, bump to mid-tier")
</output_contract>
<completeness_contract>
The task is complete when the recommendation is specific enough to act on and the user
understands why that tier was chosen — not just which one. If the task is borderline between
tiers, say so and explain what to try first.
</completeness_contract>
<reasoning_rules>
- Default down, not up. The right model is the smallest one that handles the task well. Start
from small/fast and argue upward only when specific task properties demand it.
- Do not recommend frontier models for tasks that are merely important. Importance is about
stakes, not complexity. A high-stakes but simple task (send the right email to an exec) needs
careful prompting on a mid-tier model, not a frontier model.
- Extended thinking is not "try harder." It helps with tasks that have verifiable reasoning
chains (math, logic, debugging). It adds little for creative writing, summarization, or
open-ended generation where there's no single correct answer to reason toward.
- Acknowledge uncertainty. Model capabilities change, and the right tier depends on the specific
model version, context length, and prompt quality. Frame recommendations as starting points
to validate, not final answers.
- If the task can be decomposed, say so. A complex task might use a frontier model for the
hard reasoning step and a small model for the formatting/extraction step. Mixing tiers is
often better than using one tier for everything.
</reasoning_rules>
Workflow
- Ask the user for the task description if not already provided. Optionally ask:
- Which model families or providers are available to them
- Whether cost, latency, or accuracy matters most
- Whether this is a one-off or high-volume task
- Profile the task along the four axes (reasoning, knowledge, output, stakes).
- Present the recommendation per the output contract above.