From ai-business-skills
Voice cloning, podcast, audiobook, and voiceover production using ElevenLabs, Murf, and PlayHT. Supports short clips, 30-60 min podcasts, and 1:10 repurposing.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ai-business-skills:25-voice-clone-podcast-globalThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **This skill focuses on audio AI** — voice clone, podcast, audiobook, voiceover.
This skill focuses on audio AI — voice clone, podcast, audiobook, voiceover. Pairs with
24-ai-avatar-production-global(video) — combine both for full content stack coverage.
Audio AI is the tech behind synthetic voices that sound nearly human — from a sample of your voice, AI learns and produces a synthetic clone (voice clone). You write text -> AI reads it back (Text-to-Speech).
Differences vs video AI:
| Situation | Pick audio AI | Pick video AI |
|---|---|---|
| Long-form content (>10 min) | YES — podcast format | NO — too long for video |
| Don't want to be on camera | YES | NO |
| Need volume content fast | YES — 1 podcast = 10 shorts | YES but more expensive |
| Audience listens while driving / at gym | YES | NO |
| Need visuals to demo | NO | YES |
| Personal brand thought leader | YES — podcast = authority | YES — if face brand exists |
| Task | Time | Cost (USD/mo) |
|---|---|---|
| Voice clone setup | 30-60 min | $5-22 (ElevenLabs Starter/Pro) |
| 60s voiceover (TikTok) | 5-10 min | $5-22 |
| 30 min solo podcast | 1-2 hrs | $22-99 (ElevenLabs + Riverside) |
| Audiobook chapter (15 min) | 30-45 min | $22-99 |
| 1 podcast -> 10 clips | 1-2 hrs | $0-30 (Descript/Opus Clip) |
Ask up to 4 questions before starting:
Based on the answers, pick the appropriate use case + tool stack.
| Criterion | Minimum | Optimal |
|---|---|---|
| Length | 1 min (Free tier) | 3-5 min (Pro tier) |
| Room | Quiet, no echo | Acoustic treatment, rugs, curtains |
| Mic | Phone + headset mic | Condenser mic (AT2020, $80-100) |
| Distance | 20-30 cm | 15-20 cm with pop filter |
| Format | MP3 128 kbps | WAV 44.1 kHz |
| Content | One pre-written passage | Three passages: business / casual / emotional |
Full reference:
references/voice-clone-prompts-global.md— sample scripts across English variants (US/UK/AU/SG/IN) and 3 topics (business / lifestyle / educational).
| Tool | English clone quality | Price/mo | Setup time | Best for |
|---|---|---|---|---|
| ElevenLabs Pro | Excellent (10/10) | $22 | 30 min | Multilingual, content creator |
| HeyGen Voice | Good (8/10) | Bundled with avatar | 15 min | Combo with video AI |
| Murf | Excellent (9/10) | $29-79 | 30 min | Corporate voiceover, e-learning |
| PlayHT | Excellent (9.5/10) | $39-99 | 30 min | API-driven, instant clone |
| Descript Overdub | Good (8/10) | $24 (Hobbyist) | 30 min | Podcast editing |
| Resemble.ai | Excellent (9/10) | $30-99 | 1 hr | Brand custom voice, emotion control |
Recommendations:
VOICE CLONE LICENSE AGREEMENT
I, [Full name], ID/passport: [number], grant [Brand/Company]:
1. Permission to use samples of my voice to create an AI voice clone.
2. Use of the voice clone in [scope: internal / advertising / podcast / etc.].
3. Term: from [DD/MM/YYYY] to [DD/MM/YYYY].
4. Right of withdrawal: I may request deletion of the voice clone at any time
in writing; the brand has 7 days to fully remove it.
5. Disclosure: the brand commits to disclose "AI-generated voice" wherever
required by applicable law (FTC, EU AI Act, etc.).
Signed: ____________ Date: ____________
Spec:
Script template (30s):
[HOOK 0-3s] "Did you know [shocking stat]?"
[PROBLEM 3-10s] "Most people are still stuck in [wrong loop]"
[SOLUTION 10-22s] "I tried [method], and here are 3 things..."
[PAYOFF 22-27s] "Result: [specific number]"
[CTA 27-30s] "Comment 'YES' to get the full breakdown"
Voice settings (ElevenLabs):
Structure:
Pacing:
Sound design:
Voice settings (ElevenLabs):
Structure:
Pacing:
Consistency check (most important):
Voice settings (ElevenLabs):
| Tool | Price/mo | English quality | Multilingual | Setup | Pros | Cons | Best for |
|---|---|---|---|---|---|---|---|
| ElevenLabs | $5-99 | 10/10 | 30+ langs | 30 min | Best clone, multilingual | Pricier high tiers | Multilingual creator |
| HeyGen Voice | Bundle w/ avatar | 8/10 | 40+ langs | 15 min | Combo with avatar | Voice clone less expressive | Combo with video |
| Descript | $24-30 | 9/10 | EN focus | 30 min | Audio editing first | Multilingual weaker | Podcast editing |
| Riverside | $19-29 | n/a (recording) | n/a | 5 min | Studio recording | Not TTS | Live podcast |
| Murf | $29-79 | 9/10 | 20+ langs | 30 min | 120+ voice library | Voice clone limited tier | Corporate voiceover |
| PlayHT | $39-99 | 9.5/10 | 100+ langs | 30 min | Strong API, instant clone | UI dense | Developer/API |
| Resemble.ai | $30-99 | 9/10 | 60+ langs | 1 hr | Custom emotion control | Steep learning curve | Brand custom voice |
Recommended combos 2025-2026:
Use case: solo podcaster who wants conversational format but can't find a co-host. AI co-host = a second AI voice that asks questions while you answer.
Step 1: Define the AI co-host's personality
Name: [AI co-host name]
Personality: curious, asks deep follow-ups, occasionally light humor
Role: asks the host questions, doesn't talk too much
Speaking style: casual, natural, addresses the host by first name
Knowledge level: average — asks questions like a listener would
Catchphrases: "Wow, that's wild." / "What does that mean exactly?" / "Can you go deeper?"
Step 2: Create a separate voice clone for the AI co-host
Step 3: Tool stack
[INTRO]
Host: Hey everyone, today [AI co-host] and I are diving into...
AI co-host: Hi all, I'm [name]. Today I want to dig into [topic] from [host]'s
point of view. Let's go!
[BODY — 5-7 Q&A pairs]
AI co-host: [Broad opening question]
Host: [Answers 2-3 minutes]
AI co-host: [Deeper follow-up]
Host: [Answers with a concrete example]
... repeat 5-7 times ...
[OUTRO]
AI co-host: Thanks [host] for sharing. The biggest thing I learned was...
Host: Thanks [AI co-host]. If you have questions, drop them in the comments...
Tip: pre-write 7-10 AI co-host questions in a doc, record host responses in one go. Then generate AI co-host audio in ElevenLabs and splice in via Descript.
[1] Record 60-min podcast (Riverside)
v
[2] Auto-transcript (Descript / Riverside)
v
[3] Identify hooks (10-15 quotable lines)
v
[4] Cut 30-60s clips per quote (Opus Clip / Descript)
v
[5] Add captions (auto-caption)
v
[6] Distribute across 4 platforms
Find moments in the transcript with these traits:
Target: 10-15 hooks per 60-min podcast. Pick the 10 best.
| Platform | Format | Length | Caption | Bonus |
|---|---|---|---|---|
| TikTok | 9:16 (1080×1920) | 30-60s | Bold caption on top | Trend audio overlay (low volume) |
| Instagram Reels | 9:16 | 15-90s | Clean subtitle, sans-serif font | Strong cover image |
| YouTube Shorts | 9:16 | <60s | Auto-caption | Title with target keyword |
| LinkedIn audio | 1:1 (square video w/ audio) | 60-120s | Subtitle below | Long-form thread (carousel) |
Pro tip: each clip should target one platform with platform-specific captions and cover image. Maximizes reach.
Pass: 40+/50. Below 40 = re-render or re-record.
| Situation | Disclosure | Placement |
|---|---|---|
| Commercial advertising | REQUIRED | Caption + end of audio ("This audio uses an AI voice clone") |
| Personal brand podcast | RECOMMENDED — transparency | Episode description |
| Fiction audiobook | OPTIONAL | Optional — credits at end |
| News/educational | REQUIRED | Beginning of audio + caption |
| Internal corporate content | NOT REQUIRED | n/a |
Disclosure caption template:
This audio uses AI voice cloning technology
(ElevenLabs / Murf / [tool name]). Content was written and reviewed by [Name].
Full reference:
references/ai-video-disclosure-global.md— FTC, EU AI Act, FCC, and OFCOM requirements; 3-tier disclosure framework, situational templates (also applies to audio).
Before publishing audio:
Skill 25 (Global) | v1.0.0
npx claudepluginhub minhnv0807/ai-business-skills --plugin ai-business-skillsGuides voice cloning (ElevenLabs, HeyGen, Vbee) and AI audio production for podcasts, audiobooks, and voiceovers. Includes repurposing one podcast into ten short clips.
Generate audio content — text-to-speech, podcasts, voice cloning, sound effects, speech-to-speech, dubbing, and audio isolation. Currently powered by ElevenLabs. Works with both the Python SDK and the ElevenLabs CLI. Includes ready-to-run generator scripts that Claude writes to a temp file and executes directly. Triggers: audio, elevenlabs, text-to-speech, TTS, podcast, voice, voiceover, narration, voice clone, sound effects, dubbing, speech-to-speech, audio isolation.
Creates single-voice audio content like audiobooks, voiceovers, narrations, jingles, and ads via TTS orchestration, background music, and FFmpeg assembly.