From screen-vision
Give Claude the ability to see your screen, watch it in real-time with audio transcription, analyze video files, and extract text via OCR. 10 tools for screen capture, video analysis, image analysis, and OCR. Use when the user wants to show something on their screen, record a demo, analyze a video, or extract text.
How this skill is triggered — by the user, by Claude, or both
Slash command
/screen-vision:screen-visionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this skill when the user wants Claude to see what's on their screen, watch their screen over time, analyze videos, or extract text.
Use this skill when the user wants Claude to see what's on their screen, watch their screen over time, analyze videos, or extract text.
capture_screen(delay_seconds, monitor, scale) — Full screen capture. Use delay for window switching.capture_region(x, y, width, height, scale) — Capture a specific rectangular area.capture_window(window_title, scale) — Capture a window by title (e.g., "Chrome", "Terminal").list_monitors() — List all displays with resolutions and positions.get_active_context() — Lightweight window/cursor/monitor info without a screenshot.read_screen_text(region) — OCR on screen or region. Optional region as "x,y,width,height".understand_screen(prompt) — AI-powered screen analysis via GenAI Gateway.analyze_image(file_path, prompt) — Analyze any image file (e.g., AirDropped photo).watch_screen(duration_seconds, interval_seconds, include_audio, max_frames) — Watch screen with frame sampling + optional audio transcription via Whisper.analyze_video(file_path, start_time, end_time, max_frames) — Extract keyframes from local video files.Screen Vision includes security controls:
~/.screen-vision/audit.logThese are not required — tools gracefully degrade without them:
brew install tesseractbrew install ffmpegUser: "What's on my screen?"
Claude: [calls capture_screen()]
User: "Watch my screen while I demo this feature"
Claude: [calls watch_screen(duration_seconds=60, include_audio=true)]
User: "Analyze the video at ~/Downloads/demo.mp4"
Claude: [calls analyze_video(file_path="~/Downloads/demo.mp4")]
User: "Read the text on my screen"
Claude: [calls read_screen_text()]
User: "What window am I in?"
Claude: [calls get_active_context()]
capture_screen with delay for window switching: "take a screenshot in 5 seconds"watch_screen with audio for demos where user is talkingcapture_region for specific UI elementsread_screen_text when user wants text extraction only (faster than full capture)get_active_context for lightweight queries (no image processing)npx claudepluginhub avicuna/claude-plugins --plugin screen-visionCaptures desktop screenshots (full screen, app/window, region) on macOS/Linux via Python/Bash scripts. Use for explicit requests or when tool-specific capture unavailable.
Captures screenshots and videos of running macOS app windows via osascript, screencapture, and ffmpeg for UI verification, mockups, and visual comparisons in agent workflows.
Downloads videos from YouTube, Instagram, X/Twitter, Vimeo, TikTok, or local paths, extracts frames and transcripts (via captions or on-device mlx-whisper), and lets Claude answer questions about the video content.