From claude-mods
Probe-first ffmpeg/ffprobe media processing: transcode, trim, concatenate, color grade, normalize audio, scene/silence detection, HLS packaging, hardware encoding, and VMAF quality gates with 30+ commands and 8 scripts.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-mods:ffmpeg-opsWhen to use
Use for ANY ffmpeg/ffprobe invocation or media task - converting, cutting, grading, packaging, or preparing audio for STT - BEFORE hand-writing a command; the cookbook and scripts encode the footguns (seek accuracy, keyframe snapping, quoting, pix_fmt) that silently ruin output.
This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Operational expertise for ffmpeg/ffprobe: the ~30 commands that cover most real work,
assets/edl-schema.jsonassets/encoding-presets.jsonassets/hls-ladder.jsonreferences/analysis-validation.mdreferences/audio.mdreferences/capture-devices.mdreferences/color-grading.mdreferences/color-hdr.mdreferences/edit-as-code.mdreferences/encoding.mdreferences/error-decoder.mdreferences/filtergraph.mdreferences/hardware-accel.mdreferences/images-gif.mdreferences/look-recipes.mdreferences/quality-metrics.mdreferences/restoration.mdreferences/streaming-hls.mdreferences/stt-whisper.mdreferences/subtitles.mdOperational expertise for ffmpeg/ffprobe: the ~30 commands that cover most real work, the footguns that silently ruin output, EDL-driven editing (edit-as-code), and eight scripts that replace the logic an agent would otherwise re-derive every task.
Never transcode, cut, or filter blind. Every media task starts by probing the input — codec, duration, frame rate (constant or variable?), pixel format, rotation, stream layout. Half of all "ffmpeg did something weird" reports are a property of the input the command never checked.
python skills/ffmpeg-ops/scripts/probe-media.py input.mp4 # human summary
python skills/ffmpeg-ops/scripts/probe-media.py --doctor input.mp4 # TRIAGE: hazards + exact fixes
python skills/ffmpeg-ops/scripts/probe-media.py --json input.mp4 | jq '.data.streams'
python skills/ffmpeg-ops/scripts/probe-media.py --keyframes-near 92.5 input.mp4
--doctor makes the doctrine self-enforcing: VFR, HDR transfer, rotation
metadata, interlacing, non-yuv420p delivery, and moov-at-EOF each come back as a
finding with the exact fix command, and exit 10 means "fix before processing".
The --keyframes-near form answers "can I stream-copy a cut at 92.5s?" — it
reports the nearest keyframes so you know whether a copy cut will snap (see
Footguns). When a command fails with a cryptic message, decode it:
references/error-decoder.md.
Before recommending an encoder, verify the build has it. Installed ffmpeg builds vary wildly (especially hardware encoders — listed ≠ working):
bash skills/ffmpeg-ops/scripts/capability-scan.sh # full: proof-encodes each hw encoder
bash skills/ffmpeg-ops/scripts/capability-scan.sh --quick # list-only, no GPU touch
Commands are bash-form; they run unchanged in PowerShell except where the
Windows notes say otherwise. Replace -y/-n (overwrite/never)
consciously — never leave an agent-run command interactive.
# Web-compatible H.264 — THE default delivery encode. yuv420p + faststart are not
# optional: without them Safari/QuickTime/old devices show black video, and the
# moov atom sits at EOF so browsers can't start playback until fully downloaded.
ffmpeg -i in.mov -c:v libx264 -crf 20 -preset slow -pix_fmt yuv420p \
-c:a aac -b:a 192k -movflags +faststart out.mp4
# H.265/HEVC — ~40% smaller at same quality, slower encode, less universal playback.
# -tag:v hvc1 is required for Apple players to recognize the stream.
ffmpeg -i in.mp4 -c:v libx265 -crf 24 -preset slow -tag:v hvc1 \
-c:a copy -movflags +faststart out.mp4
# AV1 via SVT-AV1 (libaom is 10-50x slower; only use it for research-grade encodes).
# preset 0-13: lower = slower/better; 6 is the quality/speed sweet spot.
ffmpeg -i in.mp4 -c:v libsvtav1 -crf 32 -preset 6 -c:a libopus -b:a 128k out.webm
# Remux only — change container, zero quality loss, near-instant. Try this FIRST
# when the ask is "make this .mkv play in X": often the codecs are fine.
ffmpeg -i in.mkv -c copy -movflags +faststart out.mp4
# Normalize a problem source (HEVC/VFR phone footage, Zoom/Loom exports) before ANY
# downstream editing. VFR breaks cut math, concat sync, and Remotion/player seeking.
ffmpeg -i in.mov -c:v libx264 -crf 18 -preset fast -pix_fmt yuv420p \
-fps_mode cfr -r 30 -c:a aac -b:a 192k normalized.mp4
# Archival master — FFV1 lossless in MKV (the preservation standard).
ffmpeg -i in.mp4 -c:v ffv1 -level 3 -g 1 -slicecrc 1 -c:a flac archive.mkv
# "Make it fit in 25MB" — computed two-pass bitrate, auto audio/downscale, VERIFIED:
python skills/ffmpeg-ops/scripts/smart-compress.py --target 25MB video.mp4
Codec choice, CRF/preset matrices, two-pass bitrate targeting, per-platform social targets: references/encoding.md + assets/encoding-presets.json.
# Fast lossless trim (stream copy). -ss/-to BEFORE -i = input seek, absolute times.
# CAVEAT: with -c copy the start snaps to the previous keyframe — can be seconds
# early, or give frozen/black lead-in. Check first with probe-media.py --keyframes-near.
ffmpeg -ss 00:01:30 -to 00:02:00 -i in.mp4 -c copy -avoid_negative_ts make_zero cut.mp4
# Frame-accurate trim (re-encode). Input-side -ss IS frame-accurate when re-encoding
# (ffmpeg decodes from the prior keyframe and discards) — fast AND exact. The old
# "put -ss after -i for accuracy" advice costs a full decode from 0:00 for nothing.
ffmpeg -ss 00:01:30 -to 00:02:00 -i in.mp4 -c:v libx264 -crf 18 -c:a aac cut.mp4
# Join files with IDENTICAL codec/params — concat demuxer, no re-encode.
printf "file '%s'\n" seg1.mp4 seg2.mp4 seg3.mp4 > concat.txt
ffmpeg -f concat -safe 0 -i concat.txt -c copy joined.mp4
# Join files with DIFFERENT codecs/sizes — concat filter, re-encodes.
ffmpeg -i a.mp4 -i b.mov -filter_complex \
"[0:v][0:a][1:v][1:a]concat=n=2:v=1:a=1[v][a]" \
-map "[v]" -map "[a]" -c:v libx264 -crf 20 -c:a aac joined.mp4
# Remove a middle segment (keep 0-60s and 120s-end): cut both keeps, then concat.
# For multi-cut edits, write an EDL and use cut-from-edl.py instead (see EDL workflow).
-ss semantics in full, keyframe theory, concat ×3 (demuxer/filter/protocol),
edit-decision-list editing: references/trim-concat.md
and references/edit-as-code.md.
# Resize to width, keep aspect. ALWAYS -2 (not -1): yuv420p needs even dimensions.
ffmpeg -i in.mp4 -vf "scale=1280:-2" -c:a copy out.mp4
# Crop (w:h:x:y from top-left); cropdetect finds black bars for you:
ffmpeg -i in.mp4 -vf cropdetect -frames:v 120 -f null - 2>&1 | rg crop=
ffmpeg -i in.mp4 -vf "crop=1920:800:0:140" -c:a copy out.mp4
# Vertical 9:16 from landscape — blurred-pad pattern (social standard):
ffmpeg -i in.mp4 -filter_complex \
"[0:v]scale=1080:1920:force_original_aspect_ratio=increase,crop=1080:1920,boxblur=20[bg];
[0:v]scale=1080:-2[fg];[bg][fg]overlay=(W-w)/2:(H-h)/2" -c:a copy vertical.mp4
# Rotate: fix metadata only (instant) vs bake pixels (re-encode).
ffmpeg -display_rotation 90 -i in.mp4 -c copy out.mp4 # metadata flip (ffmpeg 6+)
ffmpeg -i in.mp4 -vf "transpose=1" -c:a copy out.mp4 # transpose=1: 90° clockwise
# Frame-rate change (drops/dups frames; for smooth slow-mo see minterpolate below)
ffmpeg -i in.mp4 -vf "fps=30" -c:a copy out.mp4
# 2x speed-up: video PTS halved + audio atempo (atempo accepts 0.5-100; chain
# atempo=0.5,atempo=0.5 for 0.25x). -map ordering keeps streams paired.
ffmpeg -i in.mp4 -filter_complex \
"[0:v]setpts=0.5*PTS[v];[0:a]atempo=2.0[a]" -map "[v]" -map "[a]" fast.mp4
# Interpolated slow-mo (synthesizes in-between frames — slow but smooth):
ffmpeg -i in.mp4 -vf "minterpolate=fps=60:mi_mode=mci:mc_mode=aobmc,setpts=2*PTS" -an slow.mp4
# Timelapse from photos (and the reverse: video -> frames, under Images below)
ffmpeg -framerate 24 -pattern_type glob -i 'photos/*.jpg' \
-c:v libx264 -crf 20 -pix_fmt yuv420p timelapse.mp4
Filtergraph syntax (labels, chains, split), speed ramps, full filter cookbook: references/filtergraph.md.
# Watermark bottom-right with 24px margin (W/H = video, w/h = overlay dims):
ffmpeg -i in.mp4 -i logo.png -filter_complex \
"overlay=W-w-24:H-h-24:format=auto" -c:a copy out.mp4
# Burn a running timecode (note %{pts\:hms} — the colon must be escaped INSIDE
# the drawtext argument; see Windows notes for fontfile paths):
ffmpeg -i in.mp4 -vf \
"drawtext=text='%{pts\:hms}':fontsize=48:fontcolor=white:box=1:[email protected]:x=24:y=24" \
-c:a copy out.mp4
# Burn-in subtitles (hard subs; needs libass). Pragmatic path rule: cd to the
# subtitle's directory and use a bare relative filename — the filter's path
# escaping is the single worst quoting trap in ffmpeg, especially on Windows.
ffmpeg -i in.mp4 -vf "subtitles=subs.srt" -c:a copy burned.mp4
# Soft subtitles (toggleable, instant — no re-encode):
ffmpeg -i in.mp4 -i subs.srt -map 0 -map 1 -c copy -c:s mov_text soft.mp4 # mp4
ffmpeg -i in.mkv -i subs.srt -map 0 -map 1 -c copy -c:s srt soft.mkv # mkv
Styling (ASS force_style), extraction, format conversion, STT round-trip: references/subtitles.md.
# Extract audio without re-encoding (copy the stream as-is; pick the container
# matching the codec — probe first: aac->.m4a, opus->.opus/.ogg, mp3->.mp3):
ffmpeg -i in.mp4 -vn -c:a copy out.m4a
# Extract + transcode to Opus (best codec per bit: voice 24-32k mono, music 96-128k):
ffmpeg -i in.mp4 -vn -c:a libopus -b:a 128k out.opus
# Replace a video's audio track (keep video untouched):
ffmpeg -i video.mp4 -i music.m4a -map 0:v -map 1:a -c:v copy -c:a aac -shortest out.mp4
# Mix two audio inputs (normalize=0 stops amix halving the volume of each input):
ffmpeg -i voice.wav -i music.mp3 -filter_complex \
"[1:a]volume=0.25[m];[0:a][m]amix=inputs=2:duration=first:normalize=0[a]" \
-map "[a]" -c:a aac mixed.m4a
# Loudness-normalize, one-pass (quick; DYNAMIC mode — fine for drafts).
# Two-pass linear mode is measurably better: use loudnorm-scan.py (Scripts below).
# loudnorm internally upsamples to 192kHz — the -ar 48000 puts it back.
ffmpeg -i in.mp4 -af "loudnorm=I=-16:TP=-1.5:LRA=11" -ar 48000 -c:v copy out.mp4
# Trim leading/trailing silence:
ffmpeg -i in.wav -af \
"silenceremove=start_periods=1:start_threshold=-40dB:detection=peak,areverse,silenceremove=start_periods=1:start_threshold=-40dB:detection=peak,areverse" \
trimmed.wav
Targets: -14 LUFS streaming platforms, -16 podcasts, -23 EBU R128 broadcast. Channel mapping, multi-track, restoration filters: references/audio.md.
# THE canonical STT extraction — 16 kHz mono 16-bit PCM (what whisper.cpp /
# faster-whisper actually resample to; doing it here is faster and deterministic):
ffmpeg -i in.mp4 -vn -ac 1 -ar 16000 -c:a pcm_s16le stt.wav
# Pipe raw PCM straight to whisper.cpp — no temp file:
ffmpeg -v error -i in.mp4 -vn -ac 1 -ar 16000 -f s16le - | whisper-cli -m model.bin -f -
# Chunk long audio ON SILENCE BOUNDARIES (never mid-word) for parallel transcription:
python skills/ffmpeg-ops/scripts/detect-segments.py --silence --json in.mp4 \
| jq '.data.speech[]'
Pre-STT cleanup (when afftdn/highpass help vs hurt accuracy), WhisperX word-level
alignment (±50 ms), transcript JSON shape, the summarisation pipeline:
references/stt-whisper.md.
# Thumbnail at a timestamp (input-side -ss: instant even at 2h offsets):
ffmpeg -ss 00:00:05 -i in.mp4 -frames:v 1 -q:v 2 thumb.jpg
# Contact sheet: 1 frame every 10s, tiled 4x3 (visual summary / scrub preview):
ffmpeg -i in.mp4 -vf "fps=1/10,scale=320:-2,tile=4x3" -frames:v 1 sheet.png
# High-quality GIF — palettegen/paletteuse is THE difference between a 256-color
# dithered mess and a clean GIF. Single pass via split:
ffmpeg -ss 5 -to 8 -i in.mp4 -filter_complex \
"fps=12,scale=480:-1:flags=lanczos,split[s0][s1];[s0]palettegen=max_colors=128[p];[s1][p]paletteuse=dither=bayer:bayer_scale=4" \
out.gif
# Embedded chapters from scene/silence detection (or YouTube description text):
python skills/ffmpeg-ops/scripts/make-chapters.py --from-scenes --media talk.mp4 \
--min-gap 30 --write chaptered.mp4
python skills/ffmpeg-ops/scripts/make-chapters.py --from-silence --media lecture.mp4 \
--format youtube
# Frames for ML datasets — fixed fps, model-square crop:
ffmpeg -i in.mp4 -vf "fps=1,scale=512:512:force_original_aspect_ratio=increase,crop=512:512" \
frames/%06d.png
# Image sequence -> video:
ffmpeg -framerate 24 -i frames/%06d.png -c:v libx264 -crf 18 -pix_fmt yuv420p out.mp4
# Player scrub-preview sprites + the WebVTT thumbnail track that maps them:
python skills/ffmpeg-ops/scripts/make-sprites.py --interval 5 video.mp4
Sprite sheets for web players, AVIF/WebP stills, dataset prep patterns: references/images-gif.md.
# Corruption / decode-error check (exit code is NOT the signal — the log is):
ffmpeg -v error -i in.mp4 -f null - 2> errors.log && [ ! -s errors.log ] && echo CLEAN
# Per-frame hashes — prove two pipelines produce identical frames:
ffmpeg -i in.mp4 -map 0:v -f framemd5 -
# Strip ALL metadata (GPS, device info — privacy before sharing phone video).
# -map_metadata -1 keeps rotation side-data; verify orientation after.
ffmpeg -i in.mp4 -map_metadata -1 -c copy clean.mp4
# Quick probes (machine-readable; prefer probe-media.py for the full picture):
ffprobe -v error -show_entries format=duration -of default=nw=1:nk=1 in.mp4
ffprobe -v error -select_streams v:0 -show_entries stream=codec_name,width,height,r_frame_rate -of csv=p=0 in.mp4
Safe re-encode of untrusted uploads, scene-change detection, integrity in CI: references/analysis-validation.md.
yt-dlp embeds ffmpeg for merge/remux; these are the post-download patterns:
# Prefer h264+m4a at download time (avoids a transcode entirely):
yt-dlp -S "res:1080,vcodec:h264,acodec:m4a" --remux-video mp4 URL
# Clip a section AT download (server-side range requests; much faster than full DL):
yt-dlp --download-sections "*10:00-12:30" -S "res:1080,vcodec:h264" URL
# Audio-only for STT/summarisation:
yt-dlp -x --audio-format opus URL
# Already downloaded a VP9/AV1 .webm that needs to be H.264 .mp4: that is a normal
# transcode — use the web-compatible H.264 recipe above, NOT --recode-video.
# Synthetic video+audio — fixtures, pipeline tests, alignment checks (no real media
# needed; this is how tests/run.sh builds its fixtures):
ffmpeg -f lavfi -i testsrc2=duration=2:size=640x360:rate=30 \
-f lavfi -i "sine=frequency=440:duration=2" \
-c:v libx264 -pix_fmt yuv420p -c:a aac fixture.mp4
Audio-reactive visuals (showwaves/showspectrum), podcast audiograms: references/visualization.md.
The table that pays this skill's rent. Each row is a class of silent failure.
| Footgun | The trap | The rule |
|---|---|---|
-ss + -c copy | Cut starts seconds early or with frozen/black lead-in (snapped to prior keyframe) | Copy cuts snap. Check probe-media.py --keyframes-near; re-encode when exact |
Output-side -to after input-side -ss | Timestamps reset at the seek point, so -to silently becomes a duration | Keep -ss/-to on the same side of -i (both input-side is fast and absolute) |
Missing -pix_fmt yuv420p | Encode "works" but Safari/QuickTime/TVs show black or refuse to play (defaulted to yuv444p/yuv422p from a high-quality source) | Always set it for delivery H.264/H.265 |
Missing -movflags +faststart | Browser can't start playback until the whole file downloads (moov at EOF) | Always set it for web-served MP4 |
| Default stream selection | ffmpeg picks ONE stream per type (highest-res video, most-channels audio) — extra audio tracks and all subs are silently dropped | -map 0 to keep everything, explicit -map otherwise |
-vf + -c:v copy together | Hard error — filters require decoding | Filtering implies re-encode; pick one |
| VFR source (phone/Zoom/Loom/screen-rec) | Cut math drifts, concat desyncs, players stutter | Normalize first: -fps_mode cfr -r 30 + re-encode (cookbook) |
-vsync (deprecated) | Old flag, removed direction | Use -fps_mode (cfr/vfr/passthrough) |
scale=W:-1 | Odd height → encoder error with yuv420p | Always -2 |
| concat demuxer on mismatched inputs | "Works" then glitches/desyncs at boundaries (codec/timebase mismatch) | Demuxer = identical params only; else concat filter with re-encode |
| amix default | Each input's volume halved (normalize defaults on) | amix=...:normalize=0 + explicit volume= |
| One-pass loudnorm | Dynamic mode pumps quiet passages; output silently 192 kHz | Two-pass linear via loudnorm-scan.py; add -ar 48000 |
-shortest absent on audio-replace | Output runs as long as the LONGEST input (silence or frozen frame tail) | Add -shortest when muxing separate A/V |
| BT.601/709 colour shift | Slightly wrong colours after scaling SD↔HD (matrix guessed from resolution) | Tag explicitly when it matters: see references/color-hdr.md |
| drawtext/subtitles path escaping | Filter args re-parse : and \ — Windows paths like C:\x explode inside filter strings | cd to the asset's dir and use bare relative names; or escape as C\:/path |
| Interactive overwrite prompt | Agent-run command hangs forever on "File exists. Overwrite? [y/N]" | Always pass -y or -n explicitly |
% in cmd.exe | %06d patterns and %{pts} get mangled by cmd variable expansion | Use PowerShell or bash; in .bat double to %% |
Platform-agnostic commands, but when running on Windows:
-vf 'scale=1280:-2,fps=30' needs no escaping. Double quotes only
interpolate $ and backtick — filtergraphs rarely contain either.NUL not /dev/null for two-pass logs: -passlogfile defaults are fine, but
ffmpeg ... -f null NUL (PowerShell also accepts -f null -, which is portable —
prefer it).fontfile='C\:/Windows/Fonts/arial.ttf' — forward
slashes, escaped drive colon, inside the filter string.-f null - and relative paths to sidestep both quoting tables at once.Codec — H.264 (libx264): default; universal playback, fast, good per-bit at
-crf 18..23. → H.265 (libx265): same quality ~40% smaller; slower; needs
-tag:v hvc1 for Apple; fine for storage/modern devices. → AV1 (libsvtav1): best
compression, royalty-free, web-first (YouTube/Netflix path); encode cost highest;
playback on older hardware is software-only. → VP9: only when a pipeline demands
webm and AV1 is unavailable. → FFV1: archival masters only.
Cut method — Need exact frames OR applying any filter → re-encode (input-side
-ss, -crf 18). Cut points happen to sit on keyframes (verify with
--keyframes-near) OR a ±2s slop is acceptable → stream copy with
-avoid_negative_ts make_zero. Many cuts from one source → EDL workflow below.
CPU vs hardware encode — Hardware (NVENC/QSV/AMF/VideoToolbox) is 5-20× faster
but worse quality per bit than libx264/x265 at slow presets. Use hardware for:
speed-critical batch work, live/streaming, drafts, "good enough" deliveries (bump
bitrate ~30% to compensate). Use CPU for: final masters, size-constrained targets,
quality comparisons. Always capability-scan.sh first — listed encoders fail at
runtime on driver mismatches. Details: references/hardware-accel.md.
For any multi-cut edit, do not fire ad-hoc trim commands. Write an edit decision list — a JSON file naming every clip, time range, and why — then cut from it. The edit becomes reviewable (rationale is written down), rerunnable (regenerate the output any time), and diffable (versions of the edit are git history).
# 1. Find candidate cut points (silence = clean speech boundaries):
python skills/ffmpeg-ops/scripts/detect-segments.py --silence --json take3.mp4
# 2. Author the EDL (schema: assets/edl-schema.json) with per-scene rationale.
# 3. Dry-run prints every ffmpeg command it would run (default — nothing executes):
python skills/ffmpeg-ops/scripts/cut-from-edl.py edit.json
# 4. Execute: cuts + concat -> final. Re-encodes by default for frame accuracy;
# --copy for keyframe-aligned EDLs.
python skills/ffmpeg-ops/scripts/cut-from-edl.py edit.json --execute -o final.mp4
Rules that make this work (from the Fable launch-video pipeline): cuts must land in silence; the model reasons over transcripts, not frames; after cutting, re-transcribe the output to verify (no filler words survived, no words clipped). Full architecture, EDL schema, verification loop: references/edit-as-code.md.
# Apply a .cube LUT (tetrahedral = highest quality interpolation):
ffmpeg -i in.mp4 -vf "lut3d=file=grade.cube:interp=tetrahedral" \
-c:v libx264 -crf 18 -c:a copy graded.mp4
# Generate a family of grade candidates + an HTML still-chooser:
python skills/ffmpeg-ops/scripts/gen-luts.py --variants all --out-dir work/luts \
--previews in.mp4
The human picks the grade. Generate variants, render preview stills, present a chooser — never auto-select a look. Grading is a taste call; the agent's job is the lattice math and the apply command. LUT format, log-footage normalization (S-Log3/V-Log → Rec.709), curves/eq safe ranges, checking work with ffmpeg's built-in scopes (waveform/vectorscope): references/color-grading.md. The 25-look recipe catalog — film stocks (Kodachrome, CineStill halation, Technicolor, Eterna), signature grades (Mad Max, Fincher, Matrix, BR2049, Amélie…), era/genre moods, Sin City selective color — every chain build-validated, plus the Hald-CLUT match-any-look workflow and scope-matching ladder: references/look-recipes.md. Pipeline correctness (pix_fmt, HDR→SDR tonemapping, range/matrix tagging): references/color-hdr.md.
# VMAF/SSIM/PSNR verdict on an encode (exit 10 = below threshold -> branch on it):
python skills/ffmpeg-ops/scripts/quality-compare.py reference.mp4 encoded.mp4 \
--metrics ssim,psnr
python skills/ffmpeg-ops/scripts/quality-compare.py reference.mp4 encoded.mp4 \
--metrics vmaf --min-vmaf 90 --json | jq '.data.vmaf'
VMAF ≥ 93 at 1080p ≈ visually transparent; 80-93 = noticeable on inspection.
Side-by-side visual A/B (hstack), metric interpretation, encode-ladder tuning:
references/quality-metrics.md.
All eleven follow the Skill Resource Protocol:
--help with examples, stdout = data only, --json envelopes
(claude-mods.ffmpeg-ops.*/v1), semantic exit codes (0 ok, 2 usage, 3 input
missing, 4 invalid input, 5 missing dependency, 7 ffmpeg unavailable,
10 domain finding).
| Script | Job | Worked invocation |
|---|---|---|
probe-media.py | Normalized inspection, keyframe proximity, --doctor triage (hazard → fix command, exit 10) | probe-media.py --doctor in.mp4 |
capability-scan.sh | What can THIS ffmpeg build do (proof-encodes hw encoders; --quick skips) | capability-scan.sh --json | jq '.data.encoders' — exit 10 = a listed encoder failed verification |
quality-compare.py | VMAF/SSIM/PSNR gate | quality-compare.py ref.mp4 enc.mp4 --min-vmaf 90 — exit 10 = below threshold |
loudnorm-scan.py | Two-pass loudnorm: measures pass 1, emits exact pass-2 filter | loudnorm-scan.py -I -16 in.mp4 --json | jq -r '.data.pass2_filter' |
detect-segments.py | Silence/scene boundaries as JSON segments (STT chunking, dead-air cuts, shot splits) | detect-segments.py --scenes --json in.mp4 | jq '.data.segments' |
cut-from-edl.py | EDL JSON → validated cuts + concat (dry-run by default) | cut-from-edl.py edit.json --execute -o final.mp4 |
make-chapters.py | Scene/silence points (or explicit JSON) → embedded chapters / YouTube text / WebVTT | make-chapters.py --from-scenes --media talk.mp4 --write chaptered.mp4 |
smart-compress.py | Fit a size cap: computed two-pass bitrate, auto audio/downscale, size-verified (exit 10 = still over) | smart-compress.py --target 25MB video.mp4 |
make-sprites.py | Scrub-preview sprite sheets + WebVTT thumbnail track (#xywh) | make-sprites.py --interval 5 video.mp4 |
gen-luts.py | Emit .cube grade variants (+ --previews still chooser) | gen-luts.py --variants warm_filmic,punchy --out-dir luts/ |
verify-commands.sh | Staleness verifier: --offline structural (CI), --live checks docs against the installed build | verify-commands.sh --live — exit 10 = doc drift, 7 = no ffmpeg |
Load on demand — one concept per file:
| Reference | Load when |
|---|---|
| encoding.md | Choosing codec/CRF/preset, two-pass, social platform targets, archival |
| hardware-accel.md | NVENC/QSV/AMF/VideoToolbox/VAAPI flags, quality caveats, detection |
| filtergraph.md | Any -filter_complex, labels/chains/split, speed ramps, xstack |
| trim-concat.md | Cut accuracy, keyframes, concat selection, segment removal |
| edit-as-code.md | Multi-cut edits, EDL schema, transcript-driven editing, verify loop |
| audio.md | Loudness, mixing, channel layout, audio repair |
| stt-whisper.md | Whisper/WhisperX prep, chunking, transcript JSON, summarisation pipeline |
| subtitles.md | Burn vs soft, styling, extraction, format conversion |
| color-grading.md | LUTs, .cube format, log normalization, scopes, grade workflow |
| look-recipes.md | 25-look catalog (film stocks, signature movie grades, era/genre moods), Hald-CLUT extraction, scope-matching |
| color-hdr.md | pix_fmt, HDR→SDR tonemap, BT.601/709 tagging, 10-bit |
| quality-metrics.md | VMAF/SSIM interpretation, visual A/B, ladder tuning |
| streaming-hls.md | HLS/DASH packaging, ABR ladders, live restream |
| images-gif.md | GIF quality, sprite sheets, dataset frame extraction |
| restoration.md | Deinterlace, denoise, deband, stabilize, audio cleanup |
| analysis-validation.md | Corruption checks, hashing, metadata stripping, untrusted uploads |
| capture-devices.md | Screen/webcam capture per OS (gdigrab/dshow, avfoundation, x11grab) |
| error-decoder.md | An ffmpeg command failed with a cryptic message — symptom → cause → fix |
| visualization.md | Waveform/spectrogram videos, audiograms, comparison grids |
Assets: encoding-presets.json (recipe data incl. date-stamped social targets), hls-ladder.json (ABR ladder), edl-schema.json (the cut-from-edl.py contract).
bash skills/ffmpeg-ops/tests/run.sh # offline suite; synthesizes fixtures via lavfi
Structural assertions always run; media round-trips run only when ffmpeg is on PATH (loud skip otherwise — never a silent false-clean).
npx claudepluginhub 0xdarkmatter/claude-mods --plugin claude-modsProvides FFmpeg recipes for video post-production: format conversion, scaling, trimming, concatenation, compression, and AI post-processing with correct filter order.
Translates natural language descriptions of video/audio operations into correct FFmpeg commands for transcoding, trimming, filtering, and more.
FFmpeg CLI reference for video/audio processing: format conversion, resizing/cropping/trimming, audio mixing/extraction, overlays/text/subtitles, GIFs/thumbnails, GPU encoding, ffprobe inspection.