From sanzaru
Loads Sora2 prompting guide with tool references for video/image/audio generation, model selection, and reference image best practices. Useful for effective media prompting.
How this skill is triggered — by the user, by Claude, or both
Slash command
/sanzaru:prompt-guidanceThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
| Category | Tool | Pattern | Description |
| Category | Tool | Pattern | Description |
|---|---|---|---|
| Video | create_video | async | Create Sora video (returns job ID, poll for completion) |
get_video_status | poll | Check video generation progress (0-100%) | |
download_video | sync | Download completed video/thumbnail/spritesheet | |
list_videos | sync | List video jobs with pagination | |
list_local_videos | sync | List downloaded video files | |
delete_video | sync | Permanently delete a video from OpenAI | |
remix_video | async | Create new video by remixing an existing one | |
| Image | generate_image | sync | Images API — returns immediately, no polling (RECOMMENDED) |
edit_image | sync | Edit/compose images (up to 16 inputs) | |
create_image | async | Responses API — for iterative refinement chains | |
get_image_status | poll | Check image generation status | |
download_image | sync | Download completed image | |
| Reference | list_reference_images | sync | List available images for Sora |
prepare_reference_image | sync | Resize image to exact Sora dimensions | |
| Audio | create_audio | sync | Text-to-speech (10 voices, any length) |
transcribe_audio | sync | Whisper transcription | |
chat_with_audio | sync | GPT-4o audio analysis | |
list_audio_files | sync | List and filter audio files |
sora-2 (default): Faster, cheaper, good for iterationsora-2-pro: Higher quality, supports larger resolutions (1024x1792, 1792x1024)| Tool | API | Best For |
|---|---|---|
generate_image | Images API | New generation — synchronous, no polling (RECOMMENDED) |
edit_image | Images API | Editing existing images, composition |
create_image | Responses API | Iterative refinement with previous_response_id |
CRITICAL: When using
input_reference_filenamewith Sora, describe motion/action ONLY. Do NOT re-describe what's already in the image.
The reference image already contains: character, setting, framing, style, lighting. Your prompt should only describe: what happens next, motion, camera movement.
BAD — re-describing the image:
create_video(
prompt="A pilot in orange suit in cockpit with glowing instruments...",
input_reference_filename="pilot.png"
)
GOOD — motion only:
create_video(
prompt="The pilot glances up, takes a breath, then returns focus to the instruments.",
input_reference_filename="pilot.png"
)
Write prompts in this order for best results:
| Weak | Strong |
|---|---|
| "A beautiful street at night" | "Wet asphalt, neon signs reflecting in puddles, steam from grate" |
| "Person moves quickly" | "Cyclist pedals three times, brakes, stops at crosswalk" |
| "Cinematic look" | "Anamorphic 2.0x lens, shallow DOF, volumetric light" |
Duration tips: 4s clips have best instruction following. Use 8s for simple scenes. 12s only for slow, ambient shots.
# Video: create → poll → download
video = create_video(prompt="...", size="1280x720")
status = get_video_status(video.id) # Poll until "completed"
download_video(video.id, filename="output.mp4")
# Image (Responses API): create → poll → download
resp = create_image(prompt="...")
status = get_image_status(resp.id) # Poll until "completed"
download_image(resp.id, filename="output.png")
# Image (Images API): SYNCHRONOUS — no polling!
result = generate_image(prompt="...") # Returns immediately
create_image when generate_image is simpler — Most cases don't need async pollingprepare_reference_image to resize.seconds must be a string: "8" not 8create_video and create_image are async; always poll status before downloadingFor detailed guidance:
npx claudepluginhub tjc-lp/sanzaru --plugin sanzaruGenerates AI videos from text descriptions or images using Google Veo 3.1 (default) or OpenAI Sora. Supports dialogue/audio, reference images, image-to-video animation, and interactive requirement gathering.
Generates and edits AI video via Hyper MCP: text/image-to-video (Sora, Veo, Seedance), scene chaining, analysis, transcription, subtitles, TikTok-style captions, voiceover, clipping, stitching, and text overlays.
Generates videos from text prompts via fal.ai models like Kling 2.6 Pro, Sora 2, LTX-2 Pro, Runway Gen-3 Turbo, Luma Dream Machine; supplies endpoints, durations, aspect ratios, prompt structures, TypeScript/Python code.