From pepe-multi-channel-content-pipelines
Generate consistent virtual characters (face, outfit, voice, pose) as 8-second 9:16 video clips with native voiceover and lip-sync using Google's Veo 3.1 via the Gemini API. Covers operator setup (Google AI Studio account, API key, quota), reference-image preparation, prompt structure for character consistency, voiceover prompting, RAI-filter handling, and validating outputs. Use whenever a content pipeline needs the same virtual character to appear across multiple shots without drifting in appearance or voice.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pepe-multi-channel-content-pipelines:virtual-character-veo-3-1This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Veo 3.1 generates **synchronized video + audio (including dialogue with on-screen lip-sync) in one shot** from a text prompt and up to two reference images. That property is what makes it usable as a "virtual character actor": same face, same voice, repeated shots, no separate TTS or lip-sync pass.
Veo 3.1 generates synchronized video + audio (including dialogue with on-screen lip-sync) in one shot from a text prompt and up to two reference images. That property is what makes it usable as a "virtual character actor": same face, same voice, repeated shots, no separate TTS or lip-sync pass.
This skill is the foundation of any multi-channel pipeline that ships a persistent virtual character (a brand avatar, an AI host, a fictional spokesperson). Once it works, the publishing skills downstream (publishing-instagram, publishing-x, publishing-blog) consume its output.
Each command is a discrete procedure the agent runs. The first command runs once per operator during onboarding; the remaining commands run per shot.
Audience: the human operator. The agent prompts the operator to complete each numbered step and confirms each one before proceeding to Command 2.
Create a Google AI Studio account. Visit https://aistudio.google.com/, sign in with a Google account (a fresh project-dedicated Google account is recommended so the quota and the key don't collide with personal use), accept the terms.
Create a Google Cloud project. From AI Studio, follow the "Get API key" flow — it creates a Google Cloud project automatically if none exists. Note the project ID (<your-project>-XXXX).
Enable the Veo 3.1 model. Veo 3.1 is a preview model — Google gates it behind a per-account opt-in. From AI Studio's model picker, select veo-3.1-generate-preview. If it appears greyed out, your account isn't in the preview group yet; request access from the model page.
Generate an API key. AI Studio → API keys → "Create API key". Copy the key (it shows once). It will look like AIzaSy... (~39 chars).
Store the key. Persist it in a credentials directory the agent can read, never in source control:
mkdir -p ~/.openclaw/credentials/gemini
printf '%s' '<your-key>' > ~/.openclaw/credentials/gemini/api-key
chmod 600 ~/.openclaw/credentials/gemini/api-key
Smoke-test the key. The agent runs:
curl -sf "https://generativelanguage.googleapis.com/v1beta/models?key=$(cat ~/.openclaw/credentials/gemini/api-key)" \
| jq -r '.models[].name' | grep -i veo
This must list models/veo-3.1-generate-preview (or a newer preview). If the key works but Veo isn't listed, the account isn't enrolled in the preview yet — return to step 3.
Verify quota. Veo has a tight per-minute quota (≈ 1 long-running operation in flight at a time on the free tier; preview quota varies). The agent runs Command 2 with a one-line "test render" prompt and checks that the operation completes within ~3 min. If 429s persist, the operator must request a quota bump from the Veo product page or upgrade billing.
The operator confirms once: "Setup complete." From this point on the agent runs the remaining commands autonomously.
Veo's character consistency comes from reference-image conditioning: you pass 1-2 reference images per call, and Veo holds appearance close to those references across shots. So before generating any real content, build a canonical reference set for the character.
character_1024.jpg).costar_1024.jpg). Veo accepts up to 2 reference images per call.character-contract.md alongside the reference images. Example for a frog-monk:
"Pepe Arturo is a small calm frog monk, brown rope-belt habit, sitting cross-legged. Calm, grounded posture. When he speaks, soft male voice in English, slow cadence, no Italian, no filler."
"<character> sitting still, neutral expression, no dialogue, ambient outdoor sound"). Inspect the result: face matches the reference, no morphing across the 8s, no unwanted accents in the visual. If the face drifts, the reference image is too low-res or too cluttered — re-crop tighter on the face and retry.The reference sheet is durable. It only changes when the operator deliberately rebrands the character.
This is the per-piece command. The agent runs it once for every new shot in the content calendar.
Construct the prompt. A Veo shot prompt has four parts in this order: (a) the appearance contract from Command 2, (b) the scene (location, framing, action), (c) the dialogue cue (Off-screen male voice in English calmly narrates: "<line>" for VO, or the character speaks the line in English with clear lip-sync for on-screen speech), (d) audio ambience hints.
Compose the request. POST to:
https://generativelanguage.googleapis.com/v1beta/models/veo-3.1-generate-preview:predictLongRunning?key=<KEY>
Body schema (JSON):
{
"instances": [{
"prompt": "<full prompt from step 1>",
"referenceImages": [
{"image": {"bytesBase64Encoded": "<base64 character_1024.jpg>"}},
{"image": {"bytesBase64Encoded": "<base64 costar_1024.jpg>"}}
]
}],
"parameters": {
"aspectRatio": "9:16",
"durationSeconds": 8,
"personGeneration": "allow_adult"
}
}
The response carries a long-running operation name (operations/...).
Poll until done. GET https://generativelanguage.googleapis.com/v1beta/<operation-name>?key=<KEY> every 10 seconds. When done: true, the response holds the result video as either a videoUri (signed URL) or bytesBase64Encoded blob. Typical wall-clock: 60-180 s per 8s shot.
Download to a working dir. Convention: /tmp/veo-<character>-take<N>/ containing submit.py, character_1024.jpg, costar_1024.jpg if used, op.txt (operation name), op-result.json, raw.mp4.
(Optional) Re-encode for portability. Veo raw is h264 in an MP4 container with format_tags=encoder Google. Re-encode with ffmpeg -i raw.mp4 -c:v libx264 -preset slow -crf 18 -c:a aac -b:a 192k take<N>.mp4 if the downstream channel (some legacy IG flows, some browsers) chokes on the raw container. The new file shows encoder Lavf<NN>.<NN> and is the canonical artefact for the publishing skills.
Stage to canonical storage. Move take<N>.mp4 to the operator's canonical content store (a host directory, an S3 bucket, an Obsidian vault assets folder, whatever the operator's content-strategy skill defined). The publishing skills read from there.
Veo outputs are content-blind from the agent's perspective, so validate before queueing for publish.
ffprobe -v error -show_entries format_tags=encoder raw.mp4 must return Google (proof it came from Veo, not a previous step or a swap). After re-encode the value flips to Lavf... — note both expected states.ffmpeg -ss <t> -i raw.mp4 -frames:v 1 frame-<i>.png). For the agent, open each frame and confirm the character matches the reference sheet: face shape, outfit, posture. For a real-person co-star, confirm no morphing.ffprobe -v error -select_streams a -show_entries stream=codec_name,duration raw.mp4 — must show an audio stream with ≈ 8 s duration. Missing audio means Veo's audio safety filter likely tripped on the dialogue text (see Command 5).Veo rejects prompts containing certain content (literal brand names embroidered on clothing, real-person names attached to risky activities, etc.) — and the rejection mode is silent video corruption rather than a clean HTTP error. Procedure:
For a content calendar that ships >1 shot per week, run Command 3 in batched sweeps rather than ad-hoc.
state/veo-queue/<id>.op) the moment the API returns it. If the agent crashes mid-poll, the next run reads the state file and resumes polling — never re-submits and never loses the result.The Pepe Arturo brand uses this skill verbatim, with:
pepe_1024.jpg (frog-monk avatar derived from avatars/pepe-arturo.png) + helmut_1024.jpg (Helmut Hoffer von Ankershoffen as recurring real co-star, from state/content-source/helmut-snowflake-profile.jpg)./Volumes/My Shared Files/Pepe Reels/ on Helmut's macOS host (the host↔VM share).format_tags=encoder Google immediately after download, then re-encoded for IG before publish (take<N>.mp4 carries Lavf...)./tmp/veo-reel-take<N>/, retained for 7 days for re-runs.next_due_at falls in the past, not by a wall-clock cron entry. See content-strategy-planning-optimization for the calendar contract.publishing-instagram / publishing-x / publishing-blog.Provides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
npx claudepluginhub helmut-hoffer-von-ankershoffen/helmguild-plugins --plugin pepe-multi-channel-content-pipelines