From pika
Generates a 60-second two-host podcast video from a URL or free-form topic, with 4 acts of multi-shot dialogue and optional voice cloning. Use when the user asks to make a podcast, review a URL, or create an interview-style clip.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pika:podcast <url-or-topic> [bg_img=] [host_a_img=] [host_b_img=] [voice_a=] [voice_b=] [use_avatar] [aspect_ratio=16:9]<url-or-topic> [bg_img=] [host_a_img=] [host_b_img=] [voice_a=] [voice_b=] [use_avatar] [aspect_ratio=16:9]The summary Claude sees in its skill listing — used to decide when to auto-load this skill
4 acts × 15s each = 60s. Host A always LEFT, Host B always RIGHT. Accepts a URL **or** a free-form topic / brief.
4 acts × 15s each = 60s. Host A always LEFT, Host B always RIGHT. Accepts a URL or a free-form topic / brief.
| Param | Default | Notes |
|---|---|---|
input | required | URL to review or free-form topic / brief (e.g. "I and Elon Musk talk about Mars") |
bg_img | auto-generated | Podcast studio background |
host_a_img | auto-generated | Host A portrait — see Real-person handling below |
host_b_img | auto-generated | Host B portrait — see Real-person handling below |
voice_a | 876341503281471517 | Kling preset or cloned voice ID for Host A |
voice_b | 829837252279803904 | Kling preset or cloned voice ID for Host B |
use_avatar | off | Clone user's identity voice as Host A via clone_voice |
aspect_ratio | 16:9 | Output aspect ratio |
voice_a defaults to the Kling preset 876341503281471517 and voice_b to 829837252279803904. Do not ask "which voice?" or "should I clone yours?" before firing — only honor explicit overrides (voice_a=, voice_b=, use_avatar).--yes flag is accepted as a no-op for backward compatibility.Claude Desktop can't pass inline-pasted images to MCP tools yet (Anthropic-side limitation). If the user pastes a photo inline, or mentions a local file they want as host_a_img / host_b_img, pause Step 1 and kindly send them this — something like:
Heads up — pasted images don't reach MCP tools on Claude Desktop yet (Anthropic limitation). Two easy options for your photo:
- Paste a URL if it's already hosted (Imgur, S3, your site) — fastest
- Attach the image file so I can upload it before generation.
When a local file arrives, convert it to a public URL with upload_asset and use the returned public_url as the parameter before Step 1. Already-hosted https://... URLs work as-is and skip this entirely.
If the user names a real public figure without attaching anything, do NOT auto-generate their likeness — Step 4 (Real-person handling) uses an archetype portrait instead.
Strip flags (--yes, --no-captions, etc.) and key=value parameters from $ARGUMENTS. If what remains is empty or whitespace-only, print this menu verbatim as your full response, then stop and wait for the user's next message — do NOT call any tool, do NOT proceed to Step 1, do NOT invent a topic or URL. If the stripped input is non-empty (a URL or any prose), skip this step silently and proceed to Step 1.
What would you like a podcast about? I can take any of:
- A website URL (product page, docs site, launch page) — e.g.
https://pika.art- A GitHub repo — e.g.
https://github.com/anthropics/claude-code- A blog post / article URL — e.g. a recent piece you'd like discussed
- A free-form topic or brief — e.g. "I and Elon Musk talk about Mars" or "two scientists debate AGI"
Reply with your choice and I'll generate a 1-minute two-host podcast video (4 acts × ~15s).
Tip: you don't need to type
/pika:podcast— just say things like "make a podcast about ", "podcast review of ", or "I and talk about " and I'll fire this skill automatically.
When the user replies, treat their reply as the resolved input (URL or topic) and proceed to Step 1. Do not re-prompt.
Generate only what's not provided. Default archetype prompts:
bg_img — modern podcast studio, two chairs, warm lighting, no people, 16:9host_a_img — enthusiastic host, studio portrait, left-side framing, 1:1host_b_img — pragmatic skeptic host, studio portrait, right-side framing, 1:1If the input mentions specific personas (Step 3), tune the archetype to match the persona vibe — see Real-person handling below.
use_avatar is set)identity_voice_info → { voice_id, platform, sample_url }sample_url is present: call clone_voice(voice_url=sample_url, voice_name="host_a_voice") → set voice_a to the returned Kling voice IDStrip flags (--yes, --no-captions, etc.) and key=value parameters from $ARGUMENTS. Inspect what remains.
URL mode — input contains a https?:// URL:
capture_website on the URL.Topic mode — input is free-form prose (no URL):
use_avatar flow if not already, or default avatar) and Host B = X.If the parsed input names a specific real public figure as a host (e.g. "Elon Musk", "Taylor Swift", "Joe Rogan"):
host_a_img=<url> or host_b_img=<url>, use the provided image as-is. The user takes responsibility for likeness rights.voice_a= / voice_b=) or invokes use_avatar (which clones the user's own voice for Host A).This guardrail keeps the skill creative ("I want a podcast where I argue with a tech CEO about Mars") without auto-generating deepfakes of named real people.
Write 4 acts × 2 lines (HOST_A / HOST_B). Each line ~10–12s of spoken dialogue.
Required (Matan rules — apply to both URL and topic modes):
Acts: Hook → Feature deep-dive → The Turn → Verdict (In topic mode the analogue: Hook → Substance → The Pivot → Verdict.)
Delegate to a subagent with all resolved assets and the script. The subagent runs acts 1→2→3→4 sequentially — do NOT parallelize.
Each act: one generate_reference_video call (kling-v3-omni, duration=15, sound=true). Pass reference_images=[bg_img, host_a_img, host_b_img] and voice_ids=[voice_a, voice_b]. Optional knobs (added by pika-mcp-server BACK-339, 2026-05-10): quality_mode: "pro" for higher-fidelity kling output (longer wall-clock; reserve for high-stakes renders), and kling_model to pin a specific kling family member if you need reproducibility across runs. Three shots:
<<<voice_1>>> '<HOST_A line>'<<<voice_2>>> '<HOST_B line>'Emotional beats per act:
After act 4, subagent calls edit_concat([act1, act2, act3, act4]) and returns the final video URL.
Return the final video URL and a one-sentence verdict. Do not call add_captions — Whisper auto-transcription is unreliable on the domain-specific terms typical of podcast dialogue (product names, persona names, technical jargon). Native Kling Omni audio is the deliverable.
Rules:
voice_ids must be valid Kling voice IDs — never use name-style strings like Calm_Man<<<image_2>>>), Host B always RIGHT (<<<image_3>>>) — never swappedThese anchors keep the podcast output coherent across URL and topic modes:
| Phrase | Where | Why load-bearing |
|---|---|---|
Host A always LEFT, Host B always RIGHT | Layout and shot prompts | Prevents host identity swapping across the four separate act renders. |
4 acts × 15s each | Overall structure | Keeps the concat predictable and avoids uneven act pacing. |
Hook → Feature deep-dive → The Turn → Verdict | Script structure | Gives the episode a conversational arc instead of four disconnected reactions. |
wait, actually... skeptic-flip moment | Script requirements | Creates the pivot that makes the podcast feel like a real exchange. |
Do not call add_captions | Output rule | Avoids low-quality burned captions on fast two-host dialogue with names and jargon. |
Use Kling v3-omni for the four acts because it supports native dialogue with two reference hosts and voice tokens in a single shot plan. The tradeoff is that acts run sequentially for consistency and can take longer than pure edit/composite flows. Do not add a separate caption or music layer by default; the value of this skill is the native spoken exchange.
Typical wall-clock is 8-18 minutes:
| Step | Wall clock | Notes |
|---|---|---|
| Missing asset generation | 30-90s | Skipped for provided background/host refs |
| URL/topic parse + script | 1-3 min | URL mode depends on page fetch quality |
| Four Kling acts | 6-14 min | Runs sequentially to reduce host/voice drift |
| Concat + return | 30-90s | Final URL only; captions skipped by default |
URL mode (review a website / repo / blog):
/pika:podcast https://pika.art
/pika:podcast https://github.com/anthropics/claude-code
/pika:podcast https://cursor.com use_avatar
Topic mode (free-form brief):
/pika:podcast Two AI researchers debate whether AGI arrives before 2030
/pika:podcast I and a Mars-obsessed tech CEO talk about colonization timelines
/pika:podcast interview with a seed-stage VC about what kills most startups
/pika:podcast podcast about quantum computing breakthroughs in 2026
Mixed (URL inside a topic prompt — agent prefers URL mode if a valid URL is found):
/pika:podcast podcast about https://pika.art with skeptical investor energy
npx claudepluginhub pika-labs/pika-plugins --plugin pikaCreates podcast episodes, interviews, dialogues, and audio dramas via interactive prompts, Claude script generation, Gemini TTS multi-speaker voices, Lyria intro/outro music, and FFmpeg assembly.
Guides voice cloning (ElevenLabs, HeyGen, Vbee) and AI audio production for podcasts, audiobooks, and voiceovers. Includes repurposing one podcast into ten short clips.
Generates Korean podcast episodes from URLs, tweets, articles, PDFs: analyzes sources, writes script, OpenAI TTS audio, MP4 conversion, YouTube auto-upload. Partial execution supported.