From aiee-skills
Multimodal AI patterns for extracting structured information from video using Gemini, GPT-4o, and Claude vision. Frame sampling strategies, structured output with Pydantic, temporal reasoning, and cost optimization. Use for video analysis, report generation from video, and visual inspection.
How this skill is triggered — by the user, by Claude, or both
Slash command
/aiee-skills:ai-video-understandingThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this skill when building pipelines that extract structured information from video using multimodal LLMs, or when choosing between native video input (Gemini) and frame-based approaches (GPT-4o, Claude).
Use this skill when building pipelines that extract structured information from video using multimodal LLMs, or when choosing between native video input (Gemini) and frame-based approaches (GPT-4o, Claude).
| Model | Video Input | Max Duration/Images | Input $/1M tokens | Best For |
|---|---|---|---|---|
| Gemini 2.5 Pro | Native | 1 hour, 10 videos | $4.00 | Long-form, temporal reasoning, cached Q&A |
| Gemini 2.5 Flash | Native | Similar to Pro | ~$0.50 | High-volume, cost-sensitive |
| GPT-4o | Frame-based | Hundreds of frames | $2.50 | Structured extraction, schema adherence |
| Claude Sonnet 4.6 | Frame-based | 600 images/request | $3.00 | High frame-count, multi-turn analysis |
| Task Type | Recommended FPS | Accuracy |
|---|---|---|
| General overview | 0.5-1 FPS | ~85% |
| Action detection | 2-4 FPS | ~95% |
| Fine-grained motion | 8-16 FPS | ~98%+ |
| Long-form (budget) | Grid: 48 frames/image | ~70-80% |
See reference.md for detailed patterns and architecture decisions. See examples.md for production code examples.
npx claudepluginhub ai-enhanced-engineer/aiee-team --plugin aiee-teamProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.