Skill

ai-video-understanding

Multimodal AI patterns for extracting structured information from video using Gemini, GPT-4o, and Claude vision. Frame sampling strategies, structured output with Pydantic, temporal reasoning, and cost optimization. Use for video analysis, report generation from video, and visual inspection.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/aiee-skills:ai-video-understanding

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadGrepGlob

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use this skill when building pipelines that extract structured information from video using multimodal LLMs, or when choosing between native video input (Gemini) and frame-based approaches (GPT-4o, Claude).

Supporting Files

examples.mdreference.md

SKILL.md

42 lines · ~534 tokens

Stats

LanguageJavaScript

Stars3

MaintenanceExcellent

Last CommitJun 17, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions