Skill

audio

Generates audio narration of blog posts using Google Gemini TTS with summary, full read-aloud, and two-speaker podcast modes. Outputs MP3 with HTML5 embed code.

Python

FFmpeg

documentation

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/smart-blog-skills:audio [generate|voices|setup] [file-or-text] [--mode summary|full|dialogue] [--voice name]

User invocable

Model invocable

Inline context

Default effort

Argument hint[generate|voices|setup] [file-or-text] [--mode summary|full|dialogue] [--voice name]

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Generate professional audio narration of blog content using Google's Gemini TTS.

SKILL.md

215 lines · ~1.8k tokens

Stats

LanguagePython

Stars35

Forks3

MaintenanceExcellent

Last CommitMay 30, 2026

Actions

View Source View Plugin View on GitHub View README

Blog Audio -- Gemini TTS Narration for Blog Posts

Generate professional audio narration of blog content using Google's Gemini TTS. Three modes: summary (200-300 word spoken overview), full article read-aloud, or two-speaker podcast dialogue. 30 voices, 80+ languages, HTML5 embed output.

Quick Reference

Command	What it does
`/smart-blog-skills:audio generate <file>`	Generate audio narration of a blog post
`/smart-blog-skills:audio voices`	Show available voices with characteristics
`/smart-blog-skills:audio setup`	Check/configure API key for Gemini TTS

Prerequisites

Python 3.11+ (venv managed automatically by run.py)
GOOGLE_AI_API_KEY environment variable (same key used by image)
FFmpeg (for WAV-to-MP3 conversion; falls back to WAV if missing)

Always Use run.py Wrapper

# CORRECT:
python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json

# WRONG:
python3 scripts/generate_audio.py --text "..."  # Fails without venv

API Key Check (Gate Pattern)

Before generating audio, check for the API key:

echo $GOOGLE_AI_API_KEY

If set: proceed with generation
If not set: guide the user: "Audio generation requires a Google AI API key. Get one free at https://aistudio.google.com/apikey Then set it: export GOOGLE_AI_API_KEY=your-key This is the same key used by /smart-blog-skills:image -- if image generation works, audio works too."
When called internally (from write): return silently if key is missing. Never block the writing workflow.

Setup

For /smart-blog-skills:audio setup:

Check if GOOGLE_AI_API_KEY is set in environment
If image is configured, the key is already available
If not, guide user to https://aistudio.google.com/apikey
Verify with a dry run: python3 scripts/run.py generate_audio.py --text "Test" --dry-run --json

Voice Selection

For /smart-blog-skills:audio voices:

Ask the user which voice they prefer, or recommend based on content type:

Article narration: Charon (Informative) or Sadaltager (Knowledgeable)
Tutorial/how-to: Achird (Friendly) or Sulafat (Warm)
News/analysis: Rasalgethi (Informative) or Schedar (Even)
Lifestyle/wellness: Aoede (Breezy) or Vindemiatrix (Gentle)
Dialogue host: Puck (Upbeat) or Laomedeia (Upbeat)
Dialogue expert: Kore (Firm) or Charon (Informative)

Generation Workflow

For /smart-blog-skills:audio generate <file>:

Step 1: Read the Blog Post

Read the file and extract:

Title (from H1 or frontmatter)
Full content (markdown body)
Approximate word count

Step 2: Choose Mode

Ask the user (or auto-select if they specified --mode):

Mode	When to use	Output
Summary	Quick audio overview (1-2 min)	200-300 word spoken summary
Full	Complete read-aloud (5-15 min)	Full article as natural speech
Dialogue	Podcast-style (3-8 min)	Two-person conversation about the article

Step 3: Prepare Text

CRITICAL: Claude prepares the text. The script does TTS only.

Summary mode: Write a 200-300 word spoken summary of the article. Rules:

Write as natural speech, not written text
Open with the article's key finding or answer
Cover 3-5 main takeaways
Close with actionable advice
No markdown, no "In this article...", no meta-commentary
Use conversational transitions ("Here's what matters...", "The key finding is...")

Full mode: Strip the markdown content to clean spoken text:

Headings become natural transitions ("Next, let's look at...")
Links become plain text (remove URLs, keep anchor text)
Images and charts: omit or briefly describe ("As the data shows...")
Code blocks: describe verbally ("The code uses a for-loop to...")
Lists: convert to natural sentences
Remove frontmatter, schema markup, HTML tags
Add brief intro: "This is [title], published on [date]."

Dialogue mode: Write a 2-person conversation script about the article:

Speaker1 = Host (curious, asks good questions)
Speaker2 = Expert (knowledgeable, gives clear answers)
Format each line as: [Speaker1] What's the key takeaway here?
Cover the article's main points conversationally
15-25 exchanges (produces ~3-8 minutes)
Natural, not stilted

Step 4: Select Voice

If the user chose a voice, use it. Otherwise, recommend based on mode:

Summary/Full: default to Charon (Informative)
Dialogue: default to Puck (Host) + Kore (Expert)

Step 5: Generate Audio

Write the prepared text to a temp file, then call:

# Single voice (summary or full mode)
python3 scripts/run.py generate_audio.py \
  --text-file /tmp/blog_audio_prepared.txt \
  --voice Charon \
  --model flash \
  --output /path/to/audio/post-slug.mp3 \
  --json

# Two voices (dialogue mode)
python3 scripts/run.py generate_audio.py \
  --text-file /tmp/blog_audio_dialogue.txt \
  --voice Puck \
  --voice2 Kore \
  --model pro \
  --output /path/to/audio/post-slug-dialogue.mp3 \
  --json

Model selection:

flash (default): Fast, cheap. Good for summaries and standard narration.
pro: Higher quality. Use for dialogue mode or premium content.

Step 6: Deliver

Present the result to the user:

File path -- where the audio was saved
Duration -- human-readable (e.g., "3:42")
Embed code -- ready-to-paste HTML5 audio tag
Cost -- estimated API cost
Placement suggestion -- where to insert the embed in the blog post

Embedding Guide

Standard HTML (Hugo, Jekyll, static sites)

<audio controls preload="metadata">
  <source src="audio/post-slug.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

MDX (Next.js, Gatsby)

<audio controls preload="metadata">
  <source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>

WordPress

[audio src="audio/post-slug.mp3"]

Placement

Insert the audio player after the introduction (below the first H2) or at the very top of the article with a label: "Listen to this article" or "Audio version".

Error Handling

Error	Resolution
GOOGLE_AI_API_KEY not set	Get key at https://aistudio.google.com/apikey
FFmpeg not found	Install: `sudo apt install ffmpeg`. Falls back to WAV output.
Rate limited	Wait and retry.
Text too long (>32k tokens)	Split into sections, generate separately
Unknown voice name	Run `/smart-blog-skills:audio voices` to see valid options
API key missing (internal call)	Return silently -- writing workflow continues

audio

Popularity

Invocation

Context Preview

SKILL.md

audio

Popularity

Invocation

Context Preview

SKILL.md

Blog Audio -- Gemini TTS Narration for Blog Posts

Quick Reference

Prerequisites

Always Use run.py Wrapper

API Key Check (Gate Pattern)

Setup

Voice Selection

Generation Workflow

Step 1: Read the Blog Post

Step 2: Choose Mode

Step 3: Prepare Text

Step 4: Select Voice

Step 5: Generate Audio

Step 6: Deliver

Embedding Guide

Standard HTML (Hugo, Jekyll, static sites)

MDX (Next.js, Gatsby)

WordPress

Placement

Error Handling

Similar Skills

Blog Audio -- Gemini TTS Narration for Blog Posts

Quick Reference

Prerequisites

Always Use run.py Wrapper

API Key Check (Gate Pattern)

Setup

Voice Selection

Generation Workflow

Step 1: Read the Blog Post

Step 2: Choose Mode

Step 3: Prepare Text

Step 4: Select Voice

Step 5: Generate Audio

Step 6: Deliver

Embedding Guide

Standard HTML (Hugo, Jekyll, static sites)

MDX (Next.js, Gatsby)

WordPress

Placement

Error Handling

Similar Skills