By adiakys
Lets Claude Code see — screen captures and webcam frames analyzed by a dedicated subagent, returning a compact textual report.
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A Claude Code skill that lets the assistant see what's on your screen — single screenshots for static questions, short videos for interactions and animations. Cross-platform (Linux X11, macOS, Windows, GNOME Wayland) with pip-only dependencies.
The main conversation context stays clean: the visual pipeline runs inside a dedicated subagent and returns only a compact textual report.
screen-vision auto-activates when Claude recognizes a visual request
("what's on my screen?", "puoi vedermi?", "does this header look right?", …).frame-analyst subagent with the visual question.screen (UI, pages, terminal) or webcam (face, object, room).snapshot (default, one frame) or video (multiple frames when
motion matters).claude_vision CLI, reads the frames, and produces a
compact textual report — no image data leaks into the main-agent context.Stop hook garbage-collects any leftover session directories./plugin marketplace add Adiakys/claude-vision
/plugin install claude-vision@claude-vision
At the next session start, the plugin bootstraps itself: it runs
pip install --target to place mss, Pillow, imageio and
opencv-python-headless into ~/.local/state/claude-vision/lib/ — isolated,
no venv, no global site-packages pollution, no tools to install besides
python3 (3.10+) and pip, both available by default on every supported OS.
It also writes a self-contained wrapper script at
~/.local/state/claude-vision/bin/claude-vision that the subagent invokes
directly; this path survives plugin updates and doesn't depend on any
Claude-Code-specific environment variable.
First session start takes ~10 seconds while pip downloads; subsequent
starts are instant (a hash marker skips reinstall until pyproject.toml
or the available system libraries change).
The interactive region picker prefers stdlib tkinter when present. On
distributions that strip it out (Debian/Ubuntu without python3-tk), the
bootstrap transparently pulls pygame-ce (~25 MB) as a fallback — no
manual install needed.
Supported environments: X11, macOS, Windows, GNOME Wayland. Not supported: KDE/Sway/Hyprland Wayland sessions.
Check your Linux session type with echo $XDG_SESSION_TYPE.
The Python package also works standalone if you want the CLI outside Claude Code — for debugging, scripts, or CI:
git clone https://github.com/Adiakys/claude-vision
cd claude-vision
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[wayland,webcam]"
python -m claude_vision --help
# Single screenshot (preferred when temporal info is not needed)
python -m claude_vision screenshot --scale-width 1568
# Video: 5 seconds at 1 fps, frames resized to 1568px wide
python -m claude_vision capture --duration 5 --fps 1 --scale-width 1568
# Full native resolution (no resize)
python -m claude_vision capture --duration 3 --scale-width 0
# Non-primary monitor
python -m claude_vision capture --duration 5 --monitor 1
# Capture only a region: interactive picker (drag a rectangle)
python -m claude_vision screenshot --region interactive
# Capture only a region: explicit pixel coords (X,Y,W,H)
python -m claude_vision capture --duration 3 --region 100,200,800,600
# Keep every frame even if nothing moves (default drops near-duplicates)
python -m claude_vision capture --duration 5 --fps 2 --no-dedupe
# Single webcam photo (preferred for static questions)
python -m claude_vision webcam-snapshot --scale-width 1568
# Short webcam video (3s at 2fps)
python -m claude_vision webcam-capture --duration 3 --fps 2 --scale-width 1568
# Non-default webcam (e.g., external USB at index 1)
python -m claude_vision webcam-snapshot --device 1
# Start an open-ended background watch (fps 0.5 = one frame every 2 sec)
python -m claude_vision watch-start --fps 0.5
# Check it's running
python -m claude_vision watch-status
# Mid-watch: get the last 5s of frames + one fresh frame right now
python -m claude_vision watch-query
# Mid-watch: widen the window to the last minute, exclude frames you've
# already looked at
python -m claude_vision watch-query --since-seconds 60 --only-unseen
# Tell the watch "I've analyzed these frames — don't return them again"
python -m claude_vision watch-mark-seen /tmp/.../frame_XXX.png
# Stop the watch (does NOT produce a summary)
python -m claude_vision watch-stop
In the Claude Code skill, the subagent handles all of this automatically:
"guarda cosa faccio" starts a watch, subsequent questions are answered
from the live session, "basta" stops it, and "riepilogami" triggers
the summary.
npx claudepluginhub adiakys/claude-vision --plugin claude-visionUltra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Multi-model consensus engine integrating OpenAI Codex CLI, Gemini CLI, and Claude CLI for collaborative code review and problem-solving.
Curate auto-memory, promote learnings to CLAUDE.md and rules, extract proven patterns into reusable skills.
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Memory compression system for Claude Code - persist context across sessions