From huntse-agent-skills
Delegate a self-contained subtask to the local Gemma 4 instance (llama-server on :8080) by writing a markdown task file under ~/tmp and dispatching it in the background via the bundled helper. Use when offloading deterministic, well-specified work — structured reformatting, summarisation, classification/extraction with explicit rules, single-file code transforms, translation, JSON/TSV/markdown conversion, coarse image tasks (dominant colour, rough layout, short caption, image classification), and rough transcription of clean speech audio (experimental) — to free the main model for orchestration. Do not use for tasks requiring up-to-date facts, multi-tool orchestration, nuanced judgment, open-ended reasoning under ambiguity, accurate OCR/fine image detail, or high-fidelity transcription of noisy/synthetic audio.
How this skill is triggered — by the user, by Claude, or both
Slash command
/huntse-agent-skills:delegate-to-local-modelThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Offload a focused subtask to the local **Gemma 4 E4B-IT** running under `llama-server` on `:8080`, free the main model to keep orchestrating.
Offload a focused subtask to the local Gemma 4 E4B-IT running under llama-server on :8080, free the main model to keep orchestrating.
Write the subtask as a markdown file under ~/tmp/. The whole file becomes the user message.
cat > ~/tmp/extract-emails.md <<'EOF'
# Task
Extract all email addresses from the text below. Output one per line, lowercased, no duplicates, no commentary.
---
{paste source text here}
EOF
Dispatch it (returns immediately — the request runs in the background):
~/.claude/skills/delegate-to-local-model/scripts/delegate.sh ~/tmp/extract-emails.md
Prints slug=, status=, response=, raw=, pid=.
To attach images, pass them as trailing arguments — they're base64'd into the same user message alongside the markdown text:
~/.claude/skills/delegate-to-local-model/scripts/delegate.sh ~/tmp/describe.md ~/shot.png
To attach audio, pass the clip the same way — any common format is accepted
and transcoded to the 16 kHz mono WAV the server requires (needs ffmpeg):
~/.claude/skills/delegate-to-local-model/scripts/delegate.sh ~/tmp/transcribe.md ~/memo.m4a
Check back later:
cat ~/tmp/extract-emails.status # running | done | failed
cat ~/tmp/extract-emails.response.md # the model's reply once status is done
The local model is Gemma 4 E4B-IT, Q4_K_M quantised (~7.5 B params, ~5.3 GB on disk), 128 K-trained context, 4 concurrent slots. The server is vision-capable (attach images, below) and audio-capable (experimental — attach a clip, below), with the bf16 mmproj loaded. Strong at mechanical transforms and multilingual work; weak at open-ended reasoning and anything that needs current facts.
Good fits: reformatting/restructuring, summarisation of supplied text, classification with explicit rules, single-file code edits with a clear spec, translation, "convert this output to JSON/TSV/markdown", regex-style extraction with a sample, and coarse image tasks — dominant colour, rough layout, "is there a table/chart/face", a short caption, bucketing an image into one of a few categories.
Bad fits: anything that needs the web or current dates, multi-step plans involving tool calls, judgment calls or design decisions, debugging with poorly-specified symptoms, tasks you couldn't write down in under 2 K tokens of unambiguous spec, and fine-grained vision — accurate OCR/transcription, reading small or dense text, counting many objects, anything where a wrong detail is costly. It is a 4 B Q4 vision model: treat its image answers as impressions, not measurements.
Pass image files as trailing arguments to the helper; they ride in the same user message as the markdown text:
cat > ~/tmp/classify.md <<'EOF'
Classify the attached image as exactly one of: screenshot, photo, diagram, document. Reply with just the label.
EOF
~/.claude/skills/delegate-to-local-model/scripts/delegate.sh ~/tmp/classify.md ~/Pictures/foo.png
Supported image types: png, jpg/jpeg, gif, webp, bmp. You can attach more than one. With no media arguments the helper behaves exactly as before (plain-text user message).
Audio works now, but it is experimental and only if the clip is fed in exactly the right shape: the server accepts audio only as a 16 kHz mono 16-bit PCM WAV in an input_audio block — anything else is silently dropped and the model answers as though no audio were attached. The helper handles this for you: pass any common audio file (wav, mp3, m4a, flac, ogg, aac, opus) and it transcodes to the required WAV via ffmpeg before sending. ffmpeg must be on PATH.
cat > ~/tmp/transcribe.md <<'EOF'
Transcribe the speech in the attached audio exactly. Output only the transcript, no commentary.
EOF
~/.claude/skills/delegate-to-local-model/scripts/delegate.sh ~/tmp/transcribe.md ~/memo.m4a
Quality caveats. Transcription is decent on clean human speech and degrades on synthetic, noisy, or overlapping audio — treat transcripts as a draft, not ground truth, and re-verify anything that matters. As with vision, give it room: the helper sends max_tokens 1024 by default (override with LLAMA_MAX_TOKENS) so the answer isn't truncated by the model's "thinking".
If the spec needs more than a couple of paragraphs of nuance, the main model is probably the right worker.
The file content becomes the user message verbatim — treat it like a focused prompt:
The server has 4 slots → fire up to 4 dispatches in parallel without queueing. The 5th waits. Each slot can take the full 128 K context, so size isn't a worry on this hardware.
The helper reads optional env vars:
LLAMA_SERVER (default http://localhost:8080)LLAMA_MODEL (default gemma-4-E4B-it-Q4_K_M.gguf)LLAMA_TIMEOUT (default 600 seconds for the curl)LLAMA_MAX_TOKENS (default 1024; applied to multimodal requests so the answer isn't truncated by the model's "thinking")~/tmp/ is ephemeral by convention — periodically rm ~/tmp/*.{status,response.md,raw.json,md} once you no longer need the artefacts. Don't put anything you need to keep there.
~/tmp. The directory is shared with everything else you cat-into-ephemera; treat it as untrusted.npx claudepluginhub huntse/agent-skills --plugin huntse-agent-skillsProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.