Search everything...

Stats

Actions

Available In

claude-vision

Name: claude-vision
Author: adiakys

By adiakys

Lets Claude Code see — screen captures and webcam frames analyzed by a dedicated subagent, returning a compact textual report.

npx claudepluginhub adiakys/claude-vision --plugin claude-vision

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Agents1

frame-analyst

/frame-analyst

Owns the full visual-capture pipeline. Given a visual question (and optional hints about source, mode, duration, fps, resolution, monitor, device), captures either the user's screen or their webcam via the claude_vision CLI, reads the resulting frames, analyzes them to answer the question, cleans up artifacts, and returns a compact textual report. Use for any task that requires visual inspection of the user's screen OR webcam.

Skills1

screen-vision

/screen-vision

Activate whenever Claude needs VISUAL INFORMATION to make progress — in ANY language. Trigger on INTENT, not keywords. Use this skill when (a) the user asks anything that requires inspecting the screen, a screen region, the webcam, or their physical environment, OR (b) Claude itself realizes mid-task that it cannot solve the problem without actually seeing what is happening (e.g., debugging a CSS animation, verifying a rendered UI, diagnosing why an element looks wrong, confirming the user's physical context). In case (b), proactively propose the skill to the user rather than guessing. Italian and English are first-class — examples EN: "what's on my screen?", "this button looks off", "can you see me?", "watch what I do"; examples IT: "cosa c'è sullo schermo?", "guarda la pagina", "puoi vedermi?", "tienimi d'occhio". For any other language, activate on the semantically equivalent intent regardless of exact phrasing. The skill dispatches the frame-analyst subagent which picks source (screen / screen region / webcam), mode (single snapshot / short video / continuous background watch with live queries), captures or reads from the ongoing watch, analyzes the frames, and returns a compact textual report — the main-agent context stays clean.

Hooks1

Event Hooks

2 hooks across 2 events

README

Stats

Version0.6.2

ReleasedApr 20, 2026

LanguagePython

Stars0

MaintenanceExcellent

LicenseMIT

Last CommitMay 2, 2026

AddedApr 18, 2026

Actions

View on GitHub View README Plugin Marketplace JSON Homepage

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

claude-vision

Safety Signals

Caution

Uses power tools

Uses Bash, Write, or Edit tools

claude-vision

A Claude Code skill that lets the assistant see what's on your screen — single screenshots for static questions, short videos for interactions and animations. Cross-platform (Linux X11, macOS, Windows, GNOME Wayland) with pip-only dependencies.

The main conversation context stays clean: the visual pipeline runs inside a dedicated subagent and returns only a compact textual report.

How it works

Skill screen-vision auto-activates when Claude recognizes a visual request ("what's on my screen?", "puoi vedermi?", "does this header look right?", …).
Claude dispatches the frame-analyst subagent with the visual question.
The subagent picks source and mode:
- Source: screen (UI, pages, terminal) or webcam (face, object, room).
- Mode: snapshot (default, one frame) or video (multiple frames when motion matters).
The subagent runs the claude_vision CLI, reads the frames, and produces a compact textual report — no image data leaks into the main-agent context.
A Stop hook garbage-collects any leftover session directories.

Install as a Claude Code plugin

/plugin marketplace add Adiakys/claude-vision
/plugin install claude-vision@claude-vision

At the next session start, the plugin bootstraps itself: it runs pip install --target to place mss, Pillow, imageio and opencv-python-headless into ~/.local/state/claude-vision/lib/ — isolated, no venv, no global site-packages pollution, no tools to install besides python3 (3.10+) and pip, both available by default on every supported OS.

It also writes a self-contained wrapper script at ~/.local/state/claude-vision/bin/claude-vision that the subagent invokes directly; this path survives plugin updates and doesn't depend on any Claude-Code-specific environment variable.

First session start takes ~10 seconds while pip downloads; subsequent starts are instant (a hash marker skips reinstall until pyproject.toml or the available system libraries change).

The interactive region picker prefers stdlib tkinter when present. On distributions that strip it out (Debian/Ubuntu without python3-tk), the bootstrap transparently pulls pygame-ce (~25 MB) as a fallback — no manual install needed.

Supported environments: X11, macOS, Windows, GNOME Wayland. Not supported: KDE/Sway/Hyprland Wayland sessions.

Check your Linux session type with echo $XDG_SESSION_TYPE.

Standalone CLI (optional)

The Python package also works standalone if you want the CLI outside Claude Code — for debugging, scripts, or CI:

git clone https://github.com/Adiakys/claude-vision
cd claude-vision
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[wayland,webcam]"
python -m claude_vision --help

CLI

Screen

# Single screenshot (preferred when temporal info is not needed)
python -m claude_vision screenshot --scale-width 1568

# Video: 5 seconds at 1 fps, frames resized to 1568px wide
python -m claude_vision capture --duration 5 --fps 1 --scale-width 1568

# Full native resolution (no resize)
python -m claude_vision capture --duration 3 --scale-width 0

# Non-primary monitor
python -m claude_vision capture --duration 5 --monitor 1

# Capture only a region: interactive picker (drag a rectangle)
python -m claude_vision screenshot --region interactive

# Capture only a region: explicit pixel coords (X,Y,W,H)
python -m claude_vision capture --duration 3 --region 100,200,800,600

# Keep every frame even if nothing moves (default drops near-duplicates)
python -m claude_vision capture --duration 5 --fps 2 --no-dedupe

Webcam

# Single webcam photo (preferred for static questions)
python -m claude_vision webcam-snapshot --scale-width 1568

# Short webcam video (3s at 2fps)
python -m claude_vision webcam-capture --duration 3 --fps 2 --scale-width 1568

# Non-default webcam (e.g., external USB at index 1)
python -m claude_vision webcam-snapshot --device 1

Continuous watch (background vision with live queries)

# Start an open-ended background watch (fps 0.5 = one frame every 2 sec)
python -m claude_vision watch-start --fps 0.5

# Check it's running
python -m claude_vision watch-status

# Mid-watch: get the last 5s of frames + one fresh frame right now
python -m claude_vision watch-query

# Mid-watch: widen the window to the last minute, exclude frames you've
# already looked at
python -m claude_vision watch-query --since-seconds 60 --only-unseen

# Tell the watch "I've analyzed these frames — don't return them again"
python -m claude_vision watch-mark-seen /tmp/.../frame_XXX.png

# Stop the watch (does NOT produce a summary)
python -m claude_vision watch-stop

In the Claude Code skill, the subagent handles all of this automatically: "guarda cosa faccio" starts a watch, subsequent questions are answered from the live session, "basta" stops it, and "riepilogami" triggers the summary.

View full README on GitHub

claude-vision

Popularity

What's Inside

README

Confidence

claude-vision

How it works

Install as a Claude Code plugin

Standalone CLI (optional)

CLI

Screen

Webcam

Continuous watch (background vision with live queries)

Similar Plugins

caveman

llm-council-plugin

self-improving-agent

ui-design

claude-mem

claude-vision

How it works

Install as a Claude Code plugin

Standalone CLI (optional)

CLI

Screen

Webcam

Continuous watch (background vision with live queries)

Popularity

Health & Quality

Similar Plugins

caveman

llm-council-plugin

self-improving-agent

ui-design

claude-mem