Name: llamastash
Author: llamastash

LlamaStash

A fast terminal native app (TUI) and CLI with init wizard for launching local LLMs via llama.cpp with zero overhead. See benchmarks.

Why

Heavy abstractions (Ollama, LM Studio) hide llama.cpp; raw llama-server use is tedious. LlamaStash is a fast, transparent launcher that is also a first-class shell-tool surface for agents — one binary, daemon on demand, same primitives in the TUI and the CLI.

AI agents installing this for a user: jump to INSTALL.md § For AI agents. The non-interactive install + verify contract, and exit-code branching live there.

TUI Gif

Install

Pick whichever channel you prefer — all install the same binary. Full per-platform notes, troubleshooting, and the agent-friendly non-interactive path live in INSTALL.md.

# macOS + Linux, one-shot
curl -fsSL https://llamastash.cli.rs/install.sh | sh

# Homebrew (macOS + Linuxbrew)
brew install llamastash/llamastash/llamastash

# From crates.io (any platform with a Rust toolchain)
cargo install llamastash --locked

Then run llamastash init — the interactive wizard installs llama-server for your hardware, downloads a starter GGUF, writes a tuned config, and smoke-launches it.

Quickstart

# Open the TUI. Scans default caches; daemon auto-spawns on demand.
llamastash

# List discovered models. TTY → padded + table; piped or
# `--no-colors` → TSV bytes. `--json` is the agent contract.
llamastash list
llamastash list --json | jq

# Launch a model by name, name substring, path, or canonical id.
llamastash start qwen-coder --ctx 16384 --reasoning on

# Drive a smoke-test request against the running endpoint.
curl -s http://127.0.0.1:41100/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model": "qwen-coder", "messages": [{"role": "user", "content": "hi"}]}'

# Stop it.
llamastash stop qwen-coder

Tip — mouse focus. Mouse capture is off by default so the terminal keeps native click-and-drag text selection. To opt in on every TUI run, alias the binary in your shell rc:

# bash / zsh
alias llamastash='llamastash --mouse-focus'

# fish
alias llamastash 'llamastash --mouse-focus'

Or set it permanently in config.yaml:

mouse_focus: true

Either source flips on click-to-focus for the Models list, the right pane, and the tab labels (Settings/Logs/Chat/Embed/Rerank). Most terminals still expose a bypass modifier (Shift on iTerm2 / Alacritty / foot / wezterm, Option on Apple Terminal) so ad-hoc selection stays reachable.

Full subcommand reference: docs/usage.md. Proxy client setup (including an OpenCode example): docs/usage.md#opencode-setup. Prefer a Vulkan llama-server build on AMD/NVIDIA hosts: docs/usage.md#preferring-a-vulkan-llama-server-build. Architecture and IPC contract: docs/architecture.md. When things go wrong: docs/troubleshooting.md.

Agent Skills

The CLI ships with an Agent Skills manifest so supported agents can load repo-specific instructions for using llamastash as a local model-management CLI.

Canonical skill bundle: skills/llamastash/

Claude Code plugin marketplace: install the repo as a plugin, then install the bundled skill:

/plugin marketplace add llamastash/llamastash
/plugin install llamastash@llamastash
/reload-plugins

Manual install examples:

# OpenClaw
mkdir -p ~/.openclaw/skills && cp -r skills/llamastash ~/.openclaw/skills/

# OpenCode
mkdir -p ~/.config/opencode/skills && cp -r skills/llamastash ~/.config/opencode/skills/

The skill teaches agents to prefer --json, branch on LlamaStash's documented exit codes, reuse exact discovered model names, and read status --json proxy.listen before configuring an OpenAI-compatible client.

Features

Full detail per feature in FEATURES.md — including trade-offs, contracts, and links into docs/usage.md.

Zero-to-chat in one command

llamastash init — first-run wizard that detects hardware, installs llama-server, picks + downloads a starter GGUF, writes a tuned config, and smoke-launches.
Hardware-aware model recommender with a VRAM-fit filter + composite ranking over a CI-refreshed benchmark snapshot.
llamastash doctor — typed, agent-branchable findings; always exits 0.

llamastash

Popularity

What's Inside

README

LlamaStash

Why

Install

Quickstart

Agent Skills

Features

Zero-to-chat in one command

Discovers what you already have

Confidence

Similar Plugins

caveman

frontend-design

ui-design

claude-mem

Popularity

Health & Quality

Similar Plugins

caveman

frontend-design

ui-design

claude-mem

marketing-skills

nanobanana