Skill

llm-install

Install a model from a URL into local Ollama with a practical quantization for this host, verify availability for API/OpenWebUI, and sync Continue config to all currently installed models. Trigger with `/llm-install <MODEL_URL>`.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/buymeagoat-skills:llm-install

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use this skill when user asks to install an LLM from a URL (Hugging Face or Ollama) and wants it ready for local use.

SKILL.md

86 lines · ~771 tokens

Stats

Stars0

MaintenanceGood

Last CommitMay 14, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Invocation

/llm-install <URL>
Examples:
- /llm-install https://huggingface.co/HauhauCS/Qwen3.5-4B-Uncensored-HauhauCS-Aggressive
- /llm-install https://huggingface.co/TrevorJS/gemma-4-E2B-it-uncensored-GGUF
- /llm-install https://ollama.com/library/qwen2.5-coder

Environment Defaults

Ollama endpoint: http://172.31.192.1:11434
Local hardware target: single-user local coding workflow on RTX 3060 12GB VRAM.
Optimize for stability + quality per GB, not maximum precision.

Decision Policy (Most Logical Variant)

Given a repository with multiple GGUF quantizations, choose in this order:

Prefer Q6_K / Q6_K_P when available (best quality/speed balance for local use).
Else use Q5_K_M / Q5_K_S.
Else use Q4_K_M (safe fallback for constrained disk/VRAM).
Use Q8_0 only when user explicitly asks for maximum quality and space allows.
Avoid BF16 by default on this host (large footprint).
Avoid Q2 / Q3 unless they are the only options.

For repos without GGUF files:

Try direct pull if Ollama supports repo format.
If repo is Safetensors-only and fails, find a GGUF mirror of the same base model.
Tell user exactly which mirror was used and why.

Auth Policy (HF Token)

First try public pull with no token.
If pull fails due to auth/private/gated access:
1. Read token from .env keys in this order: HF_TOKEN, HUGGINGFACEHUB_API_TOKEN, HUGGING_FACE_HUB_TOKEN.
2. Retry with token in request/command environment.
3. If token missing or invalid, ask user for intervention.

Install + Verify Steps

Resolve target model name and chosen quant.
Pull model to Ollama.
Verify model appears in /api/tags.
Run a minimal health call (/api/chat non-stream) to confirm runtime usability.
Confirm model is available to endpoint clients (API/OpenWebUI use same Ollama tags).

Continue Config Sync (Required)

After each successful install, sync both files to current installed models:

~/.continue/config.yaml
.continue/config.yaml (workspace)

Sync rules:

Include all models currently returned by Ollama /api/tags.
Assign roles:
- model name contains embed -> embed
- qwen2.5-coder:7b-Fallback_Coding -> autocomplete
- everything else -> chat
Keep tabAutocompleteModel set to qwen2.5-coder:7b-Fallback_Coding when present.
Preserve endpoint as http://172.31.192.1:11434.

Output Contract

Always return:

URL provided
Selected quant + reason
Installed model tag(s)
Verification result (/api/tags and chat probe)
Continue config sync result
Any fallback/mirror used
Any required user intervention

llm-install

Invocation

Context Preview

SKILL.md

llm-install

Invocation

Context Preview

SKILL.md

Invocation

Environment Defaults

Decision Policy (Most Logical Variant)

Auth Policy (HF Token)

Install + Verify Steps

Continue Config Sync (Required)

Output Contract

Similar Skills

Invocation

Environment Defaults

Decision Policy (Most Logical Variant)

Auth Policy (HF Token)

Install + Verify Steps

Continue Config Sync (Required)

Output Contract

Similar Skills