By lattifai
Automate AI-powered media workflows: download videos/audio/captions from YouTube/1000+ platforms, transcribe to timestamped markdown with speakers/chapters via Gemini, translate SRT/VTT/ASS to bilingual output via Claude/Gemini, force-align word-level timings, and batch-convert 30+ caption formats.
Use when user needs accurate/precise caption timing, or aligning captions with audio/video using forced alignment. Corrects caption timing to match actual speech. Uses LattifAI Lattice-1 model.
Use when converting between caption formats (SRT, VTT, ASS, TTML, Gemini MD, etc.). Supports 30+ caption formats.
Use when downloading videos, audio, or captions from YouTube and other video platforms. Supports quality selection.
Use when transcribing audio/video to text with timestamps, speaker labels, and chapters. Supports YouTube URLs and local files. Produces structured markdown output.
Use when translating captions/captions to another language. Supports bilingual output and context-aware translation. Default uses Claude native, Gemini API optional.
Captions Made Easy — Claude Code Caption Skills
"I need bilingual captions for this Fireship vibe coding video https://youtube.com/watch?v=Tw18-4U7mts"
One sentence. Claude handles the download, transcription, and translation.
npx skills add https://github.com/lattifai/omni-captions-skills
Claude Code Plugin System:
/plugin marketplace add lattifai/omni-captions-skills
/plugin install omnicaptions@lattifai-omni-captions-skills
Local Development:
git clone https://github.com/lattifai/omni-captions-skills.git
claude --plugin-dir ./omni-captions-skills
❯ Make bilingual captions for this Fireship vibe coding video https://youtube.com/watch?v=Tw18-4U7mts
1
00:00:00,000 --> 00:00:03,200
Mass hysteria satisfies a deep human need.
群体性癔症满足了人类某种深层需求。
2
00:00:03,200 --> 00:00:07,440
Vibe coding is programming without actually writing any code yourself.
Vibe coding 就是不用自己写代码的编程方式。
| Skill | Description |
|---|---|
transcribe | YouTube/video → Markdown with timestamps |
translate | Translate captions, bilingual output supported |
convert | Convert between 30+ caption formats |
download | Download YouTube video/audio/captions |
LaiCut | Forced alignment, word-level timing accuracy |
Invoke via
/omnicaptions:transcribeor/omnicaptions-transcribe
Standard transcription gives "approximate" timestamps. LaiCut uses LattifAI Lattice-1 model to match text precisely to audio waveforms, achieving word-level accuracy.
Install LaiCut:
# Using uv (recommended, auto-configures package index)
uv pip install "omni-captions-skills[laicut]" --extra-index-url https://lattifai.github.io/pypi/simple/
# Using pip
pip install "omni-captions-skills[laicut]" --extra-index-url https://lattifai.github.io/pypi/simple/
Supported languages: English, Chinese, German, and mixed
Recommended workflow: Align before translate (translated text doesn't match original audio)
| Feature | API Key | Note |
|---|---|---|
| Translation | None required | Uses Claude by default, works out of the box |
| Transcription | Gemini API | Optional, only needed for transcription |
| LaiCut alignment | LattifAI API | Optional, only needed for precise alignment |
Gemini is only used for video transcription. When a video has no captions, you'll be prompted whether to transcribe — configure then. Translation uses Claude by default, works out of the box.
API keys are prompted automatically and saved to ~/.config/omnicaptions/config.json
# With captions: download → align → translate
omnicaptions download "https://youtube.com/watch?v=xxx"
omnicaptions LaiCut video.mp4 video.en.vtt -o video_LaiCut.srt
omnicaptions translate video_LaiCut.srt -l zh --bilingual
# Without captions: transcribe → align → translate
omnicaptions transcribe video.mp4
omnicaptions LaiCut video.mp4 video_GeminiUnd.md -o video_LaiCut.srt
omnicaptions translate video_LaiCut.srt -l zh --bilingual
Credits: @dotey for the transcription prompt | Built on lattifai-captions
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub lattifai/omni-captions-skills --plugin omnicaptionsAudio-text alignment, transcription, translation, karaoke, and subtitle toolkit. Built on the Agent Skills standard — works in Claude Code, Codex CLI, Gemini CLI, and any agent that loads SKILL.md files. Powered by the LattifAI Lattice-1 forced-alignment model.
Audio-text alignment, transcription, translation, karaoke, and subtitle toolkit. Built on the Agent Skills standard — works in Claude Code, Codex CLI, Gemini CLI, and any agent that loads SKILL.md files. Powered by the LattifAI Lattice-1 forced-alignment model.
AI-powered video processing toolkit - download videos, remove silence, trim/cut, extract audio, transcribe, generate descriptions, upload to YouTube and Bunny.net
Download, transcribe, and narrate videos
Translate video subtitles to any language with native-quality refinement. Full pipeline: transcribe → translate → refine → embed RTL-safe subtitles. Use for: translate video, תרגם סרטון, video translation, foreign subtitles, Hebrew subtitles, translated captions.
Download videos from 1800+ platforms and generate AI summaries with complete resource packages
Extract subtitles/transcripts from YouTube videos via CLI or browser automation