Skill

delegate-to-local-model

Delegate a self-contained subtask to the local Gemma 4 instance (llama-server on :8080) by writing a markdown task file under ~/tmp and dispatching it in the background via the bundled helper. Use when offloading deterministic, well-specified work — structured reformatting, summarisation, classification/extraction with explicit rules, single-file code transforms, translation, JSON/TSV/markdown conversion, coarse image tasks (dominant colour, rough layout, short caption, image classification), and rough transcription of clean speech audio (experimental) — to free the main model for orchestration. Do not use for tasks requiring up-to-date facts, multi-tool orchestration, nuanced judgment, open-ended reasoning under ambiguity, accurate OCR/fine image detail, or high-fidelity transcription of noisy/synthetic audio.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/huntse-agent-skills:delegate-to-local-model

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Offload a focused subtask to the local **Gemma 4 E4B-IT** running under `llama-server` on `:8080`, free the main model to keep orchestrating.

Supporting Files

scripts/delegate.sh

SKILL.md

123 lines · ~1.9k tokens

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitJun 10, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Delegate to local model

Offload a focused subtask to the local Gemma 4 E4B-IT running under llama-server on :8080, free the main model to keep orchestrating.

Quick start

Write the subtask as a markdown file under ~/tmp/. The whole file becomes the user message.

cat > ~/tmp/extract-emails.md <<'EOF'
# Task

Extract all email addresses from the text below. Output one per line, lowercased, no duplicates, no commentary.

---
{paste source text here}
EOF

Dispatch it (returns immediately — the request runs in the background):
```
~/.claude/skills/delegate-to-local-model/scripts/delegate.sh ~/tmp/extract-emails.md
```
Prints slug=, status=, response=, raw=, pid=.

To attach images, pass them as trailing arguments — they're base64'd into the same user message alongside the markdown text:
```
~/.claude/skills/delegate-to-local-model/scripts/delegate.sh ~/tmp/describe.md ~/shot.png
```
To attach audio, pass the clip the same way — any common format is accepted and transcoded to the 16 kHz mono WAV the server requires (needs ffmpeg):
```
~/.claude/skills/delegate-to-local-model/scripts/delegate.sh ~/tmp/transcribe.md ~/memo.m4a
```

Check back later:

cat ~/tmp/extract-emails.status        # running | done | failed
cat ~/tmp/extract-emails.response.md   # the model's reply once status is done

Knowing when to use it

The local model is Gemma 4 E4B-IT, Q4_K_M quantised (~7.5 B params, ~5.3 GB on disk), 128 K-trained context, 4 concurrent slots. The server is vision-capable (attach images, below) and audio-capable (experimental — attach a clip, below), with the bf16 mmproj loaded. Strong at mechanical transforms and multilingual work; weak at open-ended reasoning and anything that needs current facts.

Good fits: reformatting/restructuring, summarisation of supplied text, classification with explicit rules, single-file code edits with a clear spec, translation, "convert this output to JSON/TSV/markdown", regex-style extraction with a sample, and coarse image tasks — dominant colour, rough layout, "is there a table/chart/face", a short caption, bucketing an image into one of a few categories.

Bad fits: anything that needs the web or current dates, multi-step plans involving tool calls, judgment calls or design decisions, debugging with poorly-specified symptoms, tasks you couldn't write down in under 2 K tokens of unambiguous spec, and fine-grained vision — accurate OCR/transcription, reading small or dense text, counting many objects, anything where a wrong detail is costly. It is a 4 B Q4 vision model: treat its image answers as impressions, not measurements.

Attaching images

Pass image files as trailing arguments to the helper; they ride in the same user message as the markdown text:

cat > ~/tmp/classify.md <<'EOF'
Classify the attached image as exactly one of: screenshot, photo, diagram, document. Reply with just the label.
EOF
~/.claude/skills/delegate-to-local-model/scripts/delegate.sh ~/tmp/classify.md ~/Pictures/foo.png

Supported image types: png, jpg/jpeg, gif, webp, bmp. You can attach more than one. With no media arguments the helper behaves exactly as before (plain-text user message).

Attaching audio

Audio works now, but it is experimental and only if the clip is fed in exactly the right shape: the server accepts audio only as a 16 kHz mono 16-bit PCM WAV in an input_audio block — anything else is silently dropped and the model answers as though no audio were attached. The helper handles this for you: pass any common audio file (wav, mp3, m4a, flac, ogg, aac, opus) and it transcodes to the required WAV via ffmpeg before sending. ffmpeg must be on PATH.

cat > ~/tmp/transcribe.md <<'EOF'
Transcribe the speech in the attached audio exactly. Output only the transcript, no commentary.
EOF
~/.claude/skills/delegate-to-local-model/scripts/delegate.sh ~/tmp/transcribe.md ~/memo.m4a

Quality caveats. Transcription is decent on clean human speech and degrades on synthetic, noisy, or overlapping audio — treat transcripts as a draft, not ground truth, and re-verify anything that matters. As with vision, give it room: the helper sends max_tokens 1024 by default (override with LLAMA_MAX_TOKENS) so the answer isn't truncated by the model's "thinking".

If the spec needs more than a couple of paragraphs of nuance, the main model is probably the right worker.

Writing a good task file

The file content becomes the user message verbatim — treat it like a focused prompt:

Goal in the first line. "Extract …", "Rewrite …", "Translate …", "Classify …".
Inputs inline. Paste the text or code into the file. The model has no filesystem or web access.
Output spec. Say exactly what shape you want back ("JSON array", "one per line", "diff format", "markdown table with columns X, Y, Z"). Be explicit about what NOT to include (preamble, code fences, explanation).
Examples beat adjectives. One worked example pins down the format better than three sentences of description.
Don't ask for reasoning unless you need it — the system prompt already tells it to be terse.

Concurrency

The server has 4 slots → fire up to 4 dispatches in parallel without queueing. The 5th waits. Each slot can take the full 128 K context, so size isn't a worry on this hardware.

Environment overrides

The helper reads optional env vars:

LLAMA_SERVER (default http://localhost:8080)
LLAMA_MODEL (default gemma-4-E4B-it-Q4_K_M.gguf)
LLAMA_TIMEOUT (default 600 seconds for the curl)
LLAMA_MAX_TOKENS (default 1024; applied to multimodal requests so the answer isn't truncated by the model's "thinking")

Cleanup

~/tmp/ is ephemeral by convention — periodically rm ~/tmp/*.{status,response.md,raw.json,md} once you no longer need the artefacts. Don't put anything you need to keep there.

Anti-patterns

Delegating ambiguous tasks — if the spec leaves the model room to interpret, it will guess wrong. Tighten the spec, or do it in the main model.
Treating the response as authoritative. Spot-check: this is a 4B-param Q4 model, not a frontier model. Hallucinations on facts are routine.
Putting secrets in ~/tmp. The directory is shared with everything else you cat-into-ephemera; treat it as untrusted.
Chaining dispatches by hand. If the next step depends on the previous result, just script it; don't ask the model to "remember" — each call is stateless.

delegate-to-local-model

Invocation

Context Preview

Supporting Files

SKILL.md

delegate-to-local-model

Invocation

Context Preview

Supporting Files

SKILL.md

Delegate to local model

Quick start

Knowing when to use it

Attaching images

Attaching audio

Writing a good task file

Concurrency

Environment overrides

Cleanup

Anti-patterns

Similar Skills

Delegate to local model

Quick start

Knowing when to use it

Attaching images

Attaching audio

Writing a good task file

Concurrency

Environment overrides

Cleanup

Anti-patterns

Similar Skills