How this skill is triggered — by the user, by Claude, or both
Slash command
/computer-vision:cv-helpThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Show available Computer Vision tools and usage examples.
Show available Computer Vision tools and usage examples.
| Tool | Description |
|---|---|
cv_list_windows | List all visible windows with HWND, title, process, rect |
cv_screenshot_window | Capture a specific window by HWND |
cv_screenshot_desktop | Capture the entire desktop (all monitors) |
cv_screenshot_region | Capture a rectangular region of the screen |
cv_focus_window | Bring a window to the foreground |
cv_mouse_click | Click at screen coordinates (left/right/double/middle/drag) |
cv_type_text | Type text into the foreground window |
cv_send_keys | Send key combinations (Ctrl+S, Alt+Tab, etc.) |
cv_move_window | Move/resize a window or maximize/minimize/restore |
cv_ocr | Extract text from a window or region with bounding boxes and confidence |
cv_find | Find elements by natural language query (UIA + OCR fuzzy search) |
cv_get_text | Extract all visible text from a window (UIA primary, OCR fallback) |
cv_list_monitors | List all monitors with resolution, DPI, and position |
cv_read_ui | Read the UI accessibility tree of a window |
cv_wait_for_window | Wait for a window matching a title pattern to appear |
cv_wait | Simple delay (max 30 seconds) |
Find and click an element by description:
cv_find(query="Submit button", hwnd=<HWND>) — finds matching elementscv_mouse_clickExtract text from any app:
cv_get_text(hwnd=<HWND>) — UIA for native apps, OCR fallback for Chrome/ElectronList windows and take a screenshot:
cv_list_windows to see all open windowscv_screenshot_window with that HWNDClick a button in an app:
cv_screenshot_window to see the current statecv_mouse_click at those coordinatesDrag and drop (works with WebView, UWP, Electron, WPF apps):
cv_mouse_click(x=<END_X>, y=<END_Y>, start_x=<START_X>, start_y=<START_Y>, hwnd=<HWND>) — drag from start to enddrag_duration_ms (default 300ms)OCR with bounding boxes:
cv_ocr(hwnd=<HWND>) — extract text with word-level bounding boxes and confidence scoresBackground mode (work without disturbing the user):
cv_mouse_click(x=100, y=200, hwnd=<HWND>, background=True) — click without moving cursorcv_type_text(text="hello", hwnd=<HWND>, background=True) — type without stealing focuscv_send_keys(keys="ctrl+s", hwnd=<HWND>, background=True) — send keys in backgroundGrid overlay for precise clicking:
cv_screenshot_window(hwnd=<HWND>, grid=True) — screenshot with coordinate grid overlaycv_mouse_click(x=300, y=200, hwnd=<HWND>, coordinate_space="window_capture") — click at exact grid positionnpx claudepluginhub southlab-ai/claude-plugin-marketplace --plugin computer-visionAutomates GUI interactions via screen capture, mouse clicks, typing, scrolling for UI testing, visual verification, and non-browser apps. Bridges Playwright to user browsers using extensions or CDP endpoints.
Controls desktop GUI as a fallback when APIs, CLIs, file editing, and browser automation are unavailable or have failed. Clicks, types, reads screen, and drives native apps on Windows/macOS/Linux.
Automates Android, iOS, Aurora OS, and Desktop via CLI: screenshots, annotations, taps/swipes/text input, app install/launch/stop/uninstall, file push/pull, shell commands, device info queries.