From google
This skill should be used when the user asks to "analyze video", "summarize video", "extract video transcript", "understand video content", "video to text", "describe video", "ask questions about video", or "what happens in this video". Analyzes videos using Google Gemini API with local file upload or YouTube URL input.
How this skill is triggered — by the user, by Claude, or both
Slash command
/google:video-understandingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Analyze video content using Google Gemini 3 Flash API. Extract summaries, transcripts, timestamps, and answer questions about video content from local files or YouTube URLs.
Analyze video content using Google Gemini 3 Flash API. Extract summaries, transcripts, timestamps, and answer questions about video content from local files or YouTube URLs.
# Install SDK
uv pip install google-genai
# Set API key
export GEMINI_API_KEY="your-api-key"
For files >100MB or videos >1 minute, use Files API for reliable upload.
from google import genai
client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
# Upload video file
video_file = client.files.upload(file="/path/to/video.mp4")
# Wait for processing
import time
while video_file.state.name == "PROCESSING":
time.sleep(5)
video_file = client.files.get(name=video_file.name)
if video_file.state.name == "FAILED":
raise ValueError("Video processing failed")
# Analyze
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[video_file, "Summarize this video"]
)
print(response.text)
import base64
with open("/path/to/short_video.mp4", "rb") as f:
video_data = base64.standard_b64encode(f.read()).decode("utf-8")
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[
{"inline_data": {"mime_type": "video/mp4", "data": video_data}},
"What is happening in this video?"
]
)
from google.genai import types
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=types.Content(
parts=[
types.Part(
file_data=types.FileData(
file_uri="https://www.youtube.com/watch?v=VIDEO_ID"
)
),
types.Part(text="Summarize this video with key timestamps")
]
)
)
YouTube Limits:
prompt = "Provide a comprehensive summary of this video including main topics, key points, and conclusions."
prompt = """List all important moments with timestamps in MM:SS format:
- Scene changes
- Key topics discussed
- Notable events"""
prompt = "Transcribe all spoken dialogue in this video with speaker identification where possible."
prompt = "Describe the visual content: settings, people, objects, actions, and any on-screen text."
# Timestamp-specific question
prompt = "What is being demonstrated at 01:30?"
# Content-specific question
prompt = "What tools are used in this tutorial?"
Analyze specific segments only:
from google.genai import types
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[
types.Part(
file_data=types.FileData(file_uri=video_file.uri),
video_metadata=types.VideoMetadata(
start_offset="60s", # Start at 1 minute
end_offset="180s" # End at 3 minutes
)
),
"Summarize this segment"
]
)
Adjust sampling rate for different content types:
# Default: 1 FPS
# Static content (presentations): lower FPS saves tokens
# Fast action (sports): higher FPS captures more detail
video_metadata=types.VideoMetadata(fps=0.5) # 1 frame per 2 seconds
Reduce token usage with lower resolution:
config = types.GenerateContentConfig(
media_resolution="low" # 66 tokens/frame vs 258 default
)
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[video_file, prompt],
config=config
)
Understanding token costs for capacity planning:
| Component | Tokens per Second |
|---|---|
| Video frames (default) | 258 |
| Video frames (low res) | 66 |
| Audio | 32 |
| Total (default) | ~300 |
| Total (low res) | ~100 |
Example: 10-minute video at default resolution:
| Context Window | Default Resolution | Low Resolution |
|---|---|---|
| 1M tokens | ~1 hour | ~3 hours |
Multi-video: Gemini 2.5+ supports up to 10 videos per request.
video/mp4, video/mpeg, video/mov, video/avi, video/x-flv, video/mpg, video/webm, video/wmv, video/3gpp
Determine input method:
Choose analysis type based on user request
Configure optimization:
Execute and iterate based on results
Delete uploaded files after use:
client.files.delete(name=video_file.name)
List all uploaded files:
for f in client.files.list():
print(f"{f.name}: {f.state.name}")
try:
response = client.models.generate_content(...)
except Exception as e:
if "PERMISSION_DENIED" in str(e):
# Check API key or quota
pass
elif "INVALID_ARGUMENT" in str(e):
# Check video format or size
pass
raise
For detailed API documentation and advanced use cases:
Utility scripts for common operations:
npx claudepluginhub jongwony/cc-plugin --plugin googleAnalyzes video files or YouTube URLs: extracts frames/audio, detects scenes/motion/silence/transitions via ffmpeg tools with structured workflow.
Analyzes a video synchronously using TwelveLabs AI to return a summary or answer questions about its content. Accepts video URLs, file paths, asset IDs, or indexed video IDs.
Imports, searches, and analyzes videos from YouTube, TikTok, Instagram using Memories.ai LVMM for persistent intelligence, summarization, knowledge bases, and social trends research.