LlamaStash

A fast terminal native app (TUI) and CLI with init wizard for launching local LLMs via llama.cpp with zero overhead. See benchmarks.
Why
Heavy abstractions (Ollama, LM Studio) hide llama.cpp; raw llama-server use is tedious. LlamaStash is a fast, transparent launcher that is also a first-class shell-tool surface for agents — one binary, daemon on demand, same primitives in the TUI and the CLI.
AI agents installing this for a user: jump to INSTALL.md § For AI agents. The non-interactive install + verify contract, and exit-code branching live there.

Install
Pick whichever channel you prefer — all install the same binary. Full per-platform notes, troubleshooting, and the agent-friendly non-interactive path live in INSTALL.md.
# macOS + Linux, one-shot
curl -fsSL https://llamastash.cli.rs/install.sh | sh
# Homebrew (macOS + Linuxbrew)
brew install llamastash/llamastash/llamastash
# From crates.io (any platform with a Rust toolchain)
cargo install llamastash --locked
Then run llamastash init — the interactive wizard installs llama-server for your hardware, downloads a starter GGUF, writes a tuned config, and smoke-launches it.
Quickstart
# Open the TUI. Scans default caches; daemon auto-spawns on demand.
llamastash
# List discovered models. TTY → padded + table; piped or
# `--no-colors` → TSV bytes. `--json` is the agent contract.
llamastash list
llamastash list --json | jq
# Launch a model by name, name substring, path, or canonical id.
llamastash start qwen-coder --ctx 16384 --reasoning on
# Drive a smoke-test request against the running endpoint.
curl -s http://127.0.0.1:41100/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model": "qwen-coder", "messages": [{"role": "user", "content": "hi"}]}'
# Stop it.
llamastash stop qwen-coder
Tip — mouse focus. Mouse capture is off by default so the terminal keeps native click-and-drag text selection. To opt in on every TUI run, alias the binary in your shell rc:
# bash / zsh
alias llamastash='llamastash --mouse-focus'
# fish
alias llamastash 'llamastash --mouse-focus'
Or set it permanently in config.yaml:
mouse_focus: true
Either source flips on click-to-focus for the Models list, the right pane, and the tab labels (Settings/Logs/Chat/Embed/Rerank). Most terminals still expose a bypass modifier (Shift on iTerm2 / Alacritty / foot / wezterm, Option on Apple Terminal) so ad-hoc selection stays reachable.
Full subcommand reference: docs/usage.md. Proxy client setup (including an OpenCode example): docs/usage.md#opencode-setup. Prefer a Vulkan llama-server build on AMD/NVIDIA hosts: docs/usage.md#preferring-a-vulkan-llama-server-build. Architecture and IPC contract: docs/architecture.md. When things go wrong: docs/troubleshooting.md.
Agent Skills
The CLI ships with an Agent Skills manifest so supported agents can load repo-specific instructions for using llamastash as a local model-management CLI.
Claude Code plugin marketplace: install the repo as a plugin, then install the bundled skill:
/plugin marketplace add llamastash/llamastash
/plugin install llamastash@llamastash
/reload-plugins
Manual install examples:
# OpenClaw
mkdir -p ~/.openclaw/skills && cp -r skills/llamastash ~/.openclaw/skills/
# OpenCode
mkdir -p ~/.config/opencode/skills && cp -r skills/llamastash ~/.config/opencode/skills/
The skill teaches agents to prefer --json, branch on LlamaStash's documented exit codes, reuse exact discovered model names, and read status --json proxy.listen before configuring an OpenAI-compatible client.
Features
Full detail per feature in FEATURES.md — including trade-offs, contracts, and links into docs/usage.md.