From smart-blog-skills
Generates audio narration of blog posts using Google Gemini TTS with summary, full read-aloud, and two-speaker podcast modes. Outputs MP3 with HTML5 embed code.
How this skill is triggered — by the user, by Claude, or both
Slash command
/smart-blog-skills:audio [generate|voices|setup] [file-or-text] [--mode summary|full|dialogue] [--voice name][generate|voices|setup] [file-or-text] [--mode summary|full|dialogue] [--voice name]The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate professional audio narration of blog content using Google's Gemini TTS.
Generate professional audio narration of blog content using Google's Gemini TTS. Three modes: summary (200-300 word spoken overview), full article read-aloud, or two-speaker podcast dialogue. 30 voices, 80+ languages, HTML5 embed output.
| Command | What it does |
|---|---|
/smart-blog-skills:audio generate <file> | Generate audio narration of a blog post |
/smart-blog-skills:audio voices | Show available voices with characteristics |
/smart-blog-skills:audio setup | Check/configure API key for Gemini TTS |
run.py)GOOGLE_AI_API_KEY environment variable (same key used by image)# CORRECT:
python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json
# WRONG:
python3 scripts/generate_audio.py --text "..." # Fails without venv
Before generating audio, check for the API key:
echo $GOOGLE_AI_API_KEY
export GOOGLE_AI_API_KEY=your-key
This is the same key used by /smart-blog-skills:image -- if image generation works, audio works too."For /smart-blog-skills:audio setup:
GOOGLE_AI_API_KEY is set in environmentpython3 scripts/run.py generate_audio.py --text "Test" --dry-run --jsonFor /smart-blog-skills:audio voices:
Ask the user which voice they prefer, or recommend based on content type:
For /smart-blog-skills:audio generate <file>:
Read the file and extract:
Ask the user (or auto-select if they specified --mode):
| Mode | When to use | Output |
|---|---|---|
| Summary | Quick audio overview (1-2 min) | 200-300 word spoken summary |
| Full | Complete read-aloud (5-15 min) | Full article as natural speech |
| Dialogue | Podcast-style (3-8 min) | Two-person conversation about the article |
CRITICAL: Claude prepares the text. The script does TTS only.
Summary mode: Write a 200-300 word spoken summary of the article. Rules:
Full mode: Strip the markdown content to clean spoken text:
Dialogue mode: Write a 2-person conversation script about the article:
[Speaker1] What's the key takeaway here?If the user chose a voice, use it. Otherwise, recommend based on mode:
Write the prepared text to a temp file, then call:
# Single voice (summary or full mode)
python3 scripts/run.py generate_audio.py \
--text-file /tmp/blog_audio_prepared.txt \
--voice Charon \
--model flash \
--output /path/to/audio/post-slug.mp3 \
--json
# Two voices (dialogue mode)
python3 scripts/run.py generate_audio.py \
--text-file /tmp/blog_audio_dialogue.txt \
--voice Puck \
--voice2 Kore \
--model pro \
--output /path/to/audio/post-slug-dialogue.mp3 \
--json
Model selection:
flash (default): Fast, cheap. Good for summaries and standard narration.pro: Higher quality. Use for dialogue mode or premium content.Present the result to the user:
<audio controls preload="metadata">
<source src="audio/post-slug.mp3" type="audio/mpeg">
Your browser does not support the audio element.
</audio>
<audio controls preload="metadata">
<source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>
[audio src="audio/post-slug.mp3"]
Insert the audio player after the introduction (below the first H2) or at the very top of the article with a label: "Listen to this article" or "Audio version".
| Error | Resolution |
|---|---|
| GOOGLE_AI_API_KEY not set | Get key at https://aistudio.google.com/apikey |
| FFmpeg not found | Install: sudo apt install ffmpeg. Falls back to WAV output. |
| Rate limited | Wait and retry. |
| Text too long (>32k tokens) | Split into sections, generate separately |
| Unknown voice name | Run /smart-blog-skills:audio voices to see valid options |
| API key missing (internal call) | Return silently -- writing workflow continues |
npx claudepluginhub rainday/smart-blog-skills --plugin smart-blog-skillsGenerates audio narration of blog posts using Google Gemini TTS with three modes: summary, full read-aloud, and two-speaker podcast dialogue. Supports 30 voices and outputs MP3 with HTML5 embed code.
Generate audio content — text-to-speech, podcasts, voice cloning, sound effects, speech-to-speech, dubbing, and audio isolation. Currently powered by ElevenLabs. Works with both the Python SDK and the ElevenLabs CLI. Includes ready-to-run generator scripts that Claude writes to a temp file and executes directly. Triggers: audio, elevenlabs, text-to-speech, TTS, podcast, voice, voiceover, narration, voice clone, sound effects, dubbing, speech-to-speech, audio isolation.
Generates realistic AI text-to-speech audio using Google Gemini TTS, ElevenLabs, and OpenAI TTS. Supports multi-speaker dialogues, podcasts, audiobooks, and voiceovers.