From erm — filler-word remover
Installs and runs the erm CLI to remove filler words (um, uh, etc.) from spoken-audio recordings. Activates when cleaning up podcasts, voiceovers, or audio with disfluencies.
How this skill is triggered — by the user, by Claude, or both
Slash command
/erm:ermThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
`erm` strips disfluencies from English speech audio. It transcribes with
erm strips disfluencies from English speech audio. It transcribes with
faster-whisper, runs extra audio-domain detectors for fillers Whisper hides,
and splices with ffmpeg (energy-snapped, crossfaded, room-tone-matched).
When you need authoritative detail, resolve it in this order (each works in more environments than the last):
erm --help and erm validate --help — definitive flags and defaults; works once installed.usage, recipes, troubleshooting, etc.${CLAUDE_PLUGIN_ROOT}/docs/*.md and the
source of truth for flag defaults, ${CLAUDE_PLUGIN_ROOT}/src/erm/cli.py.Never guess flag names or defaults — read one of the above.
erm needs Python 3.11+ and ffmpeg/ffprobe on PATH.
ffmpeg -version. If missing, suggest the OS install
(brew install ffmpeg, apt install ffmpeg, choco install ffmpeg).uv --version succeeds, run erm straight
from PyPI with uvx erm … — no install step; uv fetches and caches the
environment on first run, so later runs are fast. Pin a version with
uvx erm@<version> … when needed. Verify: uvx erm --help.uv on PATH). Create an isolated env and
install from PyPI:
python3 -m venv .venv
source .venv/bin/activate
pip install erm
erm --help # verify
Launcher convention. In the commands throughout this skill, erm means the
launcher you resolved above: prefix with uvx under tier 1
(e.g. uvx erm INPUT.wav --dry-run), or use plain erm after activating the
venv under tier 2.
Transcription runs on CPU by default (no setup). GPU is optional and needs the
CUDA runtime libs; --device auto falls back to CPU. Add the CUDA wheels to the
same environment — uvx --with nvidia-cublas-cu12 --with nvidia-cudnn-cu12 erm …
under tier 1, or pip install nvidia-cublas-cu12 nvidia-cudnn-cu12 into the venv
under tier 2. See the transcription docs page for details.
erm's behavior forks on a couple of choices. Use AskUserQuestion to settle
these only when they aren't already clear from the request, then proceed:
--mode remove (default — excises fillers, timeline shrinks)
vs --mode silence (mutes in place, duration preserved — required for
video sync and multitrack stems).erm emits the
cleaned audio only (.wav) by default (the "pull the audio out" case). Add
--video to render the picture too — container inferred from the input,
A/V in sync by construction. With --video: --mode silence stream-copies the
picture losslessly (caption/lip-sync safe), --video-splice {crossfade,cut}
picks the splice style, --vcodec/--crf/--preset tune the re-encode. See
the video doc.If the user already implied the answers (e.g. "clean my podcast"), don't ask — pick the sensible default and say what you chose.
Then read the recipes doc and use the matching copy-paste command.
erm INPUT.wav --dry-run — prints/writes the cut-list JSON
(*-cuts-*.json); renders nothing. Review what it intends to cut.erm INPUT.wav — writes INPUT-cleaned-<timestamp>.wav next to the input.erm validate INPUT.wav OUTPUT.wav — re-transcribes the output and
asserts no fillers survive, plus container/duration sanity. Exit 0 = pass.Useful flags (confirm with erm --help): -o/--output, --json, --model,
--device, --fillers, --video (render the picture from a video input). The
full usage doc explains the workflow in depth.
Adjusting the word list. If the user wants to strip an extra word (e.g.
"also remove 'basically' / 'like'"), prefer --add-fillers "basically,like" —
it keeps the built-in defaults and unions the new words on top. Use
--remove-fillers WORD to drop a default that over-matches their voice. Reach
for --fillers only to replace the whole set, since it requires re-typing every
stem. Custom words match verbatim (no automatic elongation). See the
recipes doc → "Custom filler vocabulary".
If fillers remain, real words get clipped, splices click/smear, the noise floor pumps, or words run together — hand off to the erm-tune skill, which maps each symptom to the right knob.
npx claudepluginhub dougcalobrisi/erm --plugin ermDiagnoses and tunes erm audio cleanup output by symptom — fillers, clipping, clicks, noise, or spacing issues. Adjusts detection, crossfade, denoise, and model settings.
Processes raw YouTube recordings via Tubeify API to remove pauses, filler words (um, uh), and dead air. Useful for editing videos, cleaning audio, trimming silences, or polishing content.
Cleans raw auto-generated podcast transcripts for publication: removes filler words, corrects errors, adds speaker labels, and formats for readability while preserving authentic voice.