From ec-watch
Watch a video (URL or local path). Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls the transcript from captions (or local faster-whisper fallback), and hands the result to Claude so it can answer questions about what's in the video.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ec-watch:ec-watch <video-url-or-path> [question]<video-url-or-path> [question]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You don't have a video input; this skill gives you one. A Python script downloads the video, extracts frames as JPEGs, gets a timestamped transcript (native captions first, then local faster-whisper fallback), and prints frame paths. You then analyze each frame path using the MiniMax understanding_image MCP tool to see the images and combine them with the transcript to answer the user.
You don't have a video input; this skill gives you one. A Python script downloads the video, extracts frames as JPEGs, gets a timestamped transcript (native captions first, then local faster-whisper fallback), and prints frame paths. You then analyze each frame path using the MiniMax understanding_image MCP tool to see the images and combine them with the transcript to answer the user.
The very first time /ec-watch is invoked, setup runs automatically and sets up faster-whisper, verifies binaries, and marks setup complete. After that, Step 0 is skipped entirely — subsequent invocations go straight to Step 1.
To re-run setup manually (only if needed):
python "C:/Users/Emir/.claude/skills/ec-watch/scripts/setup.py"
.mp4, .mov, .mkv, .webm, etc.) and asks about it./ec-watch <url-or-path> [question].Step 1 — parse the user input. Separate the video source (URL or path) from any question the user asked. Example: /ec-watch https://youtu.be/abc what language is this in? → source = https://youtu.be/abc, question = what language is this in?.
Step 2 — run the watch script. Use python on Windows:
python "C:/Users/Emir/.claude/skills/ec-watch/scripts/watch.py" "<source>"
Optional flags:
--start T / --end T — focus on a section (SS, MM:SS, or HH:MM:SS)--max-frames N — lower the cap (default 80, max 100)--resolution W — frame width in px (default 512)--fps F — override auto-fps (clamped to 2 fps max)--out-dir DIR — working directory (default: tmp)--whisper local — force local faster-whisper (default, no API key needed)--no-whisper — disable Whisper fallback (frames-only)Examples:
# Last 10 seconds
python "C:/Users/Emir/.claude/skills/ec-watch/scripts/watch.py" video.mp4 --start 50 --end 60
# Zoom into 2:15 → 2:45
python "C:/Users/Emir/.claude/skills/ec-watch/scripts/watch.py" "$URL" --start 2:15 --end 2:45
# From 1h12m to end
python "C:/Users/Emir/.claude/skills/ec-watch/scripts/watch.py" "$URL" --start 1:12:00
Step 3 — Analyze frames with MiniMax MCP.
After frames are extracted, use the MiniMax understanding_image MCP tool to analyze each frame. The tool name is mcp__minimax__understanding_image and it accepts:
prompt: "Describe what's on screen, any UI elements, text visible, and the overall context."image_url: The file path to the JPEG frameProcess frames in batches of 5-10 for efficiency. Combine the visual analysis with the transcript to answer the user.
Step 4 — answer the user. You now have:
whisper (local) or captions)If the user asked a specific question, answer citing timestamps. If they didn't ask, summarize the video — key moments, visuals, spoken content.
Step 5 — clean up. Delete the working directory if user won't ask follow-ups:
rm -rf "C:/Users/Emir/AppData/Local/Temp/watch-xxx"
The script gets a timestamped transcript in one of two ways:
No API key needed for local transcription.
python "C:/Users/Emir/.claude/skills/ec-watch/scripts/setup.py"--start/--endIf user asks a follow-up, do not re-run the script — you already have frames and transcript in context.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub emir-can-tr/ec-watch --plugin watch