Generate AI voice content and videos from text via CLI: synthesize speech in 200+ voices across 40+ languages, create multi-speaker podcasts, transcribe audio/video with word-level timestamps, dub videos from SRT, translate videos end-to-end, and turn articles into vertical card videos or shareable card images.
Use when the user wants to turn text content into a set of polished, shareable visual CARD IMAGES or narrated card VIDEOS — knowledge cards, quote cards, 小红书图文, carousel cards, poster cards — rendered as HTML/CSS and exported via Playwright at ratios like 1:1 / 3:4 / 9:16; optionally produces a narrated MP4 video from those cards via `voxflow card render` (per-card TTS + FFmpeg static-image clips with optional subtitle bar / intro+outro cards / BGM mix). Triggers: card / 卡片 / 知识卡 / 文字卡片 / 金句卡 / 图文卡片 / 卡片生成 / make cards / card video / 卡片视频. For article → Slice-themed card VIDEO use voxflow:slice; for short videos / AI clips use voxflow:video; for podcasts use voxflow:podcast.
Use when the user wants to read text aloud (TTS), search VoxFlow voices, sample AI stories, or set up VoxFlow install/auth/quota — the entry-point voice toolkit. For podcasts use voxflow:podcast; for short videos / AI clips use voxflow:video; for article-to-card reels (Slice) use voxflow:slice; for shareable card images or narrated card videos use voxflow:card; for transcription / dubbing / subtitle translation use voxflow:transcribe.
Use when the user wants to produce a multi-speaker AI podcast from a topic, URL, or script — covers the full CLI workflow from LLM dialogue generation to per-speaker TTS synthesis and final audio export (MP3/WAV).
Use when the user wants to turn a long article / note / report into a vertical 1080×1920 card video — VoxFlow Slice. 13 themes — paper-slide (纸面), editorial-mag (编辑刊), bold-poster (大字海报), notion-card (Notion 卡), brutalist (粗野), glass-dark (玻璃夜), editorial-stencil (编辑·海报), broadsheet (财经刊), blueprint (蓝晒图), daisy-pastel (雏菊), showa-catalog (昭和目录), photo-feature (摄影刊), atmospheric (深夜刊). Triggers — Slice / slice video / 切片视频 / 文章转视频 / 知识卡片视频 / 抖音知识号 / 小红书图文转视频 / 知乎长文转视频 / 公众号转视频 / PaperSlide / paperslide / paper-slide (legacy name).
Use when the user wants to transcribe audio/video (including 30-min+ files with word-level timestamps via Azure Batch), translate subtitles, dub a video from SRT, run end-to-end video translation, summarize spoken content, or publish a finished translated video for Skill/agent orchestration. Covers asr, asr-jobs, translate, dub, video-translate, summarize, and publish CLI commands.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Skill files in this repo are auto-overwritten on each release. For feedback or fixes, please open an issue instead of submitting a PR.

Voice in your AI workflow. Six skills that let any AI coding agent (Claude Code · Cursor · Codex · Gemini CLI · Cline) speak in 200+ voices, generate podcasts, dub videos, transcribe audio, and turn text into card images or narrated card videos — through one CLI.
Why VoxFlow over a raw TTS API? One CLI handles auth, voice search, multi-speaker dialogue, video pipelines, and quota. The skills layer makes it native to whichever agent you're already using — no new context-switch.
The simplest path — let the VoxFlow CLI auto-detect your agent and run the right command:
npm install -g voxflow
voxflow skills install
It detects Claude Code / Cursor / Codex / Gemini / WorkBuddy / OpenClaw on your $PATH, picks the right install command, asks for confirmation, runs it, and prints next steps. Use --all to install for every detected agent, or --for <agent> to force one.
If you'd rather run the install command directly — one command for every agent:
npx -y skills add VoxFlowStudio/skills --all --yes --global
The skills npm package detects every AI agent on your machine (Claude Code,
Cursor, Codex CLI, Gemini CLI, Cline, Amp, Antigravity, CodeBuddy, OpenClaw…)
and writes the 6 VoxFlow skills (hub, podcast, transcribe, video,
slice, card) to each agent's standard skill location in a single shot.
npm install -g voxflow
voxflow login # one-time browser auth
Six focused skills, each loaded on demand:
| Skill | Invoked as | What it covers |
|---|---|---|
| hub | voxflow:hub | say · narrate · story · voices · auth · quota · feedback |
| podcast | voxflow:podcast | Multi-speaker AI podcast from topic / URL / script |
| transcribe | voxflow:transcribe | asr · asr-jobs · translate · dub · video-translate · summarize · publish |
| video | voxflow:video | picstory · present · slides · explain · image |
| slice | voxflow:slice | Article → vertical card video (1080×1920); 13 editorial / poster / magazine themes |
| card | voxflow:card | Text → shareable card images (HTML/CSS + Playwright); 1:1 / 3:4 / 9:16, editorial design system. Optional narrated MP4 video via voxflow card render (TTS + FFmpeg, in-project output) |
Use the hub skill as the starting point — it routes to the others automatically.
| You say | Agent runs |
|---|---|
| "Read this README out loud" | voxflow narrate README.md -o readme.mp3 |
| "Make a 5-min podcast on AI agents" | voxflow podcast "AI agents" --length short |
| "Dub this tutorial into Japanese" | voxflow video-translate tutorial.mp4 --to ja |
| "把这段话合成语音" | voxflow say "..." -o output.mp3 |
| "生成一个 AI 播客" | voxflow podcast "topic" --length medium |
| "把这个视频翻译成日语" | voxflow video-translate video.mp4 --to ja |
| "转录这段录音" | voxflow asr recording.mp3 |
| "做一个 AI 知识短视频" | voxflow picstory "topic" --style sketchnote |
| "生成一套演示幻灯片" | voxflow slides "topic" --slides 8 |
voxflow/
hub/SKILL.md # TTS, voice search, auth, quota, feedback
podcast/SKILL.md # AI dialogue podcast
transcribe/SKILL.md # ASR, translation, dubbing
video/SKILL.md # AI short video, slides, images
slice/SKILL.md # Article → vertical card video (13 themes)
card/SKILL.md # Text → shareable card images (1:1 / 3:4 / 9:16)
registry.json # VoxFlow CLI add-on recipes index
voxflow/
dub-anime-jp-zh/ # Anime fan-dub voice preset (JP→ZH)
Install a recipe:
voxflow add dub-anime-jp-zh
Free tier: 10,000 / month. Check before large jobs:
voxflow status
| Operation | Cost |
|---|---|
say (1 call) | ~100 |
narrate (per segment) | ~100 |
podcast (medium) | ~5,000 |
picstory (5 scenes) | ~3,100 |
voxflow login for interactive auth or VOXFLOW_TOKEN env var for CI.npx claudepluginhub voxflowstudio/skills --plugin voxflowUltra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Frontend design skill for UI/UX implementation
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Memory compression system for Claude Code - persist context across sessions
Marketing skills for AI agents — conversion optimization, copywriting, SEO, paid ads, ad creative, and growth
Standalone image generation plugin using Nano Banana MCP server. Generates and edits images, icons, diagrams, patterns, and visual assets via Gemini image models. No Gemini CLI dependency required.