From cc-senses-plugin
Capture the screen, run a local vision-language model (Moondream2 by default via llama-cpp-python), and inject a short text description into Claude's context for Claude to act on. ~120 tokens per call vs ~1,600 if you sent the raw image to Claude's vision API. **No external daemon** — model runs in-process via llama-cpp-python.
How this skill is triggered — by the user, by Claude, or both
Slash command
/cc-senses-plugin:seeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Capture the screen, run a local vision-language model (Moondream2 by default via llama-cpp-python), and inject a short text description into Claude's context for Claude to act on. ~120 tokens per call vs ~1,600 if you sent the raw image to Claude's vision API. **No external daemon** — model runs in-process via llama-cpp-python.
Capture the screen, run a local vision-language model (Moondream2 by default via llama-cpp-python), and inject a short text description into Claude's context for Claude to act on. ~120 tokens per call vs ~1,600 if you sent the raw image to Claude's vision API. No external daemon — model runs in-process via llama-cpp-python.
make setup_see # default in-process: Moondream2 F16 (~3.75 GB)
make setup_see_qwen25 # in-process alt: Qwen2.5-VL-3B Q4_K_M (~3.27 GB)
make setup_see_qwen3vl # llamaserver opt-in: Qwen3-VL-2B Q4_K_M (~1.55 GB)
make setup_see_smolvlm # llamaserver opt-in: SmolVLM-500M Q8 (546 MB)
URLs, filenames, and [vlm] snippet shape all live in src/cc_vlm/models.toml; the
Make recipes delegate to python -m cc_vlm.setup_models <key>. Each target installs
--extra see deps, downloads the GGUF + mmproj into ~/.cache/cc-senses-plugin/models/,
prints the engine-specific runtime install hint (llama-cpp-python wheel for in-process;
llama-server binary for the llamaserver tiers), and prints the [vlm] snippet to paste
into .cc-senses.toml. See .cc-senses.example.toml
for the full [vlm] schema.
Two engines are available; auto-detect picks the first one that's configured.
| Engine | When to use | Tradeoff |
|---|---|---|
llamacpp (in-process) | You have model_path + mmproj_path set and the appropriate llama-cpp-python chat handler exists for your model family | No daemon; reloads the model on every /see call (each invocation is a fresh Python process) |
llamaserver (HTTP) | You want a hot model across /see calls, or you're using SmolVLM2 / Qwen3-VL where the in-process handler doesn't exist yet | User runs llama-server separately; ~1-2 GB RAM while alive; warm response within 200-500 ms |
Three operating modes, picked by config. Useful for SmolVLM2-2.2B, Qwen3-VL-2B, or any GGUF llama.cpp supports but llama-cpp-python doesn't yet.
Prerequisite: llama-server must be installed and on your PATH (install via your distro's llama.cpp package or build from source).
cc-senses-plugin spawns llama-server on first /see, then re-uses the warm process for subsequent calls. Set in .cc-senses.toml:
[vlm]
engine = "llamaserver"
server_url = "http://localhost:8080"
model_path = "/path/to/SmolVLM2.gguf"
mmproj_path = "/path/to/SmolVLM2-mmproj.gguf"
# auto_spawn defaults to true
First /see call takes up to ~30 s (model load). Subsequent calls are warm (~200-500 ms). Inspect/stop the spawned server:
make vlm_server_status # is it running?
make vlm_server_logs # tail the boot log
make vlm_server_stop # SIGTERM the spawned process
You start llama-server yourself; cc-senses-plugin just POSTs to it. Useful for shared servers, GPU rigs, or remote hosts.
llama-server -m /path/to/model.gguf --mmproj /path/to/mmproj.gguf --port 8080
[vlm]
engine = "llamaserver"
server_url = "http://localhost:8080"
auto_spawn = false # cc-senses-plugin will NOT try to spawn
For remote hosts (server_url = "http://my-rig.local:8080"), auto_spawn is implicitly disabled — only localhost / 127.0.0.1 / ::1 are eligible for auto-spawn.
Spawns llama-server at Claude Code session start so even the first /see is hot. Set in .cc-senses.toml:
[vlm]
engine = "llamaserver"
model_path = "/path/to/model.gguf"
mmproj_path = "/path/to/mmproj.gguf"
preload = true
When the CC session starts, the SessionStart hook (cc-vlm-preload) spawns llama-server in a detached background process. By the time you type /see for the first time, the model is already loaded and the first call is warm (~200-500 ms). Trade-off: ~1-2 GB RAM is held for the entire session, even if /see is never used. Turn preload off for sessions where you don't expect to use vision.
make vlm_server_status # verify the server came up
make vlm_server_logs # inspect boot output if it didn't
make vlm_server_stop # manually stop the daemon mid-session
The SessionEnd hook (cc-vlm-shutdown) terminates the spawned server automatically when the CC session closes. If CC crashes, the daemon may persist — make vlm_server_status catches that, make vlm_server_stop cleans it up.
/see invocations during the very first call (when the server is being spawned) may produce a transient EADDRINUSE on the losing process's llama-server spawn. The system self-heals on the next invocation — pid_is_alive detects the dead PID and re-spawns. No file lock for now (YAGNI for interactive CLI use).available() can time out waiting for /health. Check make vlm_server_logs for boot errors./see — capture full screen, describe using the configured template (default: generic)/see --template terminal — constrained to terminal output (most recent command, exit status, errors)/see --template editor — code editor state (filename, cursor line, diagnostics)/see --template browser — page title, main heading, banners/see --template gui — active window, focused element, dialog text/see --template generic — free-form ≤100-word description/see --no-cache — bypass frame-hash cache; always call the VLM/see --save-only — capture + save JPEG, print the path, don't call the VLM/see --monitor N — capture specific monitor (default 1 = primary)/see --image-file PATH — describe a pre-captured image instead of capturing the screen. Use for saved screenshots, CI/headless runs, or environments where mss cannot access a display.Two workflows for dev loops:
Direct module invocation (fastest, bypasses CC):
make see TEMPLATE=terminal # capture + describe via live VLM
make see_file FILE=shot.jpg # describe a pre-captured image (no display needed)
make see_save_only # capture + save JPEG, print path (no VLM call)
make smoke # imports + --help + full test suite
Plugin installed into Claude Code (full integration, uses CC slash commands):
make plugin_validate # sanity-check the manifest first
make plugin_install_local # registers local marketplace + installs cc-senses-plugin (project scope)
make run_cc # starts claude; then type /see in the session
make plugin_uninstall # removes plugin + marketplace when done
python -m cc_vlm $ARGUMENTS
See .cc-senses.example.toml [vlm] section for the full schema and CC_VLM_* env overrides. Supported handler_name values are listed in docs/architecture.md.
For the in-process-VLM-vs-Claude-Vision token comparison and the rationale for picking in-process llama-cpp-python over external daemons (Ollama, llama-server), see docs/architecture.md and docs/adr/0003-vlm-screen-sharing.md.
For an end-to-end user flow (feeding screen content to Claude so it can debug or suggest fixes), see docs/UserStory.md Flow B.
/see/see is stateless — each call is independent and leaves nothing persistent except the temp JPEG it writes and any cached model files you downloaded for the VLM. To fully remove:
| Artifact | Removal |
|---|---|
Per-session cache (in-memory DescribeCache) | Exits with the Python process. Nothing to clean. |
Temp JPEGs (/tmp/tmp*.jpg) | make clean_see_artifacts |
Downloaded GGUF + mmproj (~/.cache/cc-senses-plugin/models/) | make clean_models |
| Python venv + pytest/ruff caches | make clean |
| All of the above at once | make clean_all |
Plugin installation (if done via make plugin_install_local) | make plugin_uninstall |
llama-cpp-python wheel | uv pip uninstall llama-cpp-python (manual since it's not in [see] extras) |
There is no undo for past descriptions that were injected into a Claude Code conversation — once text is in the conversation it stays in the conversation history. Only future behavior is controllable (via config or by not calling /see again).
Development — functional MVP. Ships both LlamaCppVLMEngine (in-process, default)
and LlamaServerVLMEngine (mainline llama-server, opt-in via setup_see_qwen3vl /
setup_see_smolvlm). Follow-ups (Claude Vision opt-in, focused-window crop,
auto-template detection, persistent cache) are tracked in the roadmap. See
docs/adr/0003-vlm-screen-sharing.md.
npx claudepluginhub qte77/cc-senses-plugin --plugin cc-senses-pluginCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.