Add MotionPNGTuber / MotionPNGTuber_UI style talking characters and Japanese TTS narration to Remotion or HyperFrames videos. Use when Codex needs to generate dialogue audio with VOICEVOX or AivisSpeech, place the audio on a Remotion or HyperFrames timeline, and render a PNGTuber character using a mouthless video or frame sequence, mouth_track.json, and mouth sprites; fix mouth alignment, green-screened assets, lip-sync timing, or render issues involving MotionPNGTuber in Remotion or HyperFrames.
How this skill is triggered — by the user, by Claude, or both
Slash command
/remotion-motionpngtuber:remotion-motionpngtuberThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Implement MotionPNGTuber as a frame-driven canvas overlay, not as pre-baked mouth overlay images unless the user explicitly asks for baked assets.
Implement MotionPNGTuber as a frame-driven canvas overlay, not as pre-baked mouth overlay images unless the user explicitly asks for baked assets.
The MotionPNGTuber_UI browser player depends on runtime HTMLVideoElement.currentTime, requestVideoFrameCallback, WebAudio volume analysis, and DOM resize state. Do not paste that class directly into Remotion or HyperFrames. Port the important rendering model instead:
<canvas> with the same source coordinate system as mouth_track.json.useCurrentFrame() or HyperFrames' seeked GSAP timeline time.Choose the runtime from the target project or the user's request:
If the body source is an animated mouthless video, the render must keep that body animation. Do not replace it with a single still frame as an optimization. If direct video rendering is too slow or unstable, extract the video into a frame sequence and render the frame matching the same trackFrameIndex used for the mouth canvas.
Also handle narration audio generation when the user provides a VOICEVOX-compatible engine. VOICEVOX and AivisSpeech should be treated as local HTTP TTS engines with the same basic flow: inspect /speakers, create an audio query with /audio_query, then synthesize WAV with /synthesis.
Confirm or extract required inputs:
remotion or hyperframes, chosen from the existing project unless the user specifies one.mouth_track.json, mouth/*.png, and a mouthless body video or frame sequence.../../assets/default-pngtuber/nike_loop_fix relative to this SKILL.md. It contains mouth_track.json, mouth/closed.png, mouth/half.png, mouth/open.png, and loop_mouthless_h264.mp4.public/pngtuber/nike_loop_fix, and reference that copy in the composition.pngtuber/nike_loop_fix, and reference that copy from the HTML composition.voicevox or aivisspeech.http://localhost:50021 or http://localhost:10101./speakers when the user provides only a model name or says there is one model.speedScale.Inspect the MotionPNGTuber assets:
mouth_track.json: note fps, width, height, frames[].quad, calibration, and calibrationApplied.mouth/closed.png and mouth/open.png; use half.png if present.Generate TTS audio:
references/tts-generation.md.voice-001.wav.ffprobe or Remotion/Mediabunny and convert them to frame counts.Preserve coordinate systems:
width and height must match mouth_track.json source dimensions.left, top, width, height, scale, clip, and crop as the body source.clipPath: inset(...) or equivalent crop to both the body and the canvas.Drive synchronization from the chosen runtime:
trackFrameIndex = Math.floor((loopFrame / compositionFps) * track.fps) % track.frames.length.trackFrameIndex for the body frame sequence when body frames are extracted from the same source.mouthTrack.frames.length, not over the composition duration. This preserves the original MotionPNGTuber motion and keeps mouth tracking aligned.trackFrameIndex = Math.floor((timelineTime % loopDurationSeconds) * track.fps) % track.frames.length.requestAnimationFrame, Date.now(), WebAudio realtime analysis, or a manually played video element during render.Choose mouth state deterministically:
start, end, and state.mouthTrack.fps to decide how often the mouth opens; it only maps timeline time to tracked mouth/body coordinates.closed, half, open fallback order.Validate visually and aurally:
hyperframes lint, hyperframes validate, and a draft hyperframes render, then inspect the rendered MP4 or extracted frames.For reusable implementation details, read only what is needed:
references/canvas-overlay-pattern.mdreferences/hyperframes-canvas-overlay-pattern.mdreferences/tts-generation.md<image> clipping as the primary implementation; it can fail in Remotion output depending on asset loading and alpha handling.mouth-frames unless explicitly requested or needed as an optimization after the canvas version is correct./speakers and the user's stated model/style.Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub tegnike/remotion-motionpngtuber --plugin remotion-motionpngtuber