Skill

erm

Installs and runs the erm CLI to remove filler words (um, uh, etc.) from spoken-audio recordings. Activates when cleaning up podcasts, voiceovers, or audio with disfluencies.

Python

cli-tools

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/erm:erm

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

BashReadAskUserQuestion

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

`erm` strips disfluencies from English speech audio. It transcribes with

SKILL.md

110 lines · ~1.3k tokens

Stats

LanguagePython

Stars93

Forks2

MaintenanceExcellent

Last CommitJun 14, 2026

Actions

View Source View Plugin View on GitHub View README

erm — install and use

erm strips disfluencies from English speech audio. It transcribes with faster-whisper, runs extra audio-domain detectors for fillers Whisper hides, and splices with ffmpeg (energy-snapped, crossfaded, room-tone-matched).

Resolving documentation

When you need authoritative detail, resolve it in this order (each works in more environments than the last):

erm --help and erm validate --help — definitive flags and defaults; works once installed.
Public docs: https://dougcalobrisi.github.io/erm/ — usage, recipes, troubleshooting, etc.
Bundled docs (Claude Code/Cowork plugin only): ${CLAUDE_PLUGIN_ROOT}/docs/*.md and the source of truth for flag defaults, ${CLAUDE_PLUGIN_ROOT}/src/erm/cli.py.

Never guess flag names or defaults — read one of the above.

1. Install / run

erm needs Python 3.11+ and ffmpeg/ffprobe on PATH.

Check ffmpeg: ffmpeg -version. If missing, suggest the OS install (brew install ffmpeg, apt install ffmpeg, choco install ffmpeg).
Resolve a launcher — prefer uv (broadest, no persistent install):
- Tier 1 — uvx (preferred). If uv --version succeeds, run erm straight from PyPI with uvx erm … — no install step; uv fetches and caches the environment on first run, so later runs are fast. Pin a version with uvx erm@<version> … when needed. Verify: uvx erm --help.
- Tier 2 — venv fallback (no uv on PATH). Create an isolated env and install from PyPI:
```
python3 -m venv .venv
source .venv/bin/activate
pip install erm
erm --help   # verify
```

Launcher convention. In the commands throughout this skill, erm means the launcher you resolved above: prefix with uvx under tier 1 (e.g. uvx erm INPUT.wav --dry-run), or use plain erm after activating the venv under tier 2.

Transcription runs on CPU by default (no setup). GPU is optional and needs the CUDA runtime libs; --device auto falls back to CPU. Add the CUDA wheels to the same environment — uvx --with nvidia-cublas-cu12 --with nvidia-cudnn-cu12 erm … under tier 1, or pip install nvidia-cublas-cu12 nvidia-cudnn-cu12 into the venv under tier 2. See the transcription docs page for details.

2. Ask before choosing a command

erm's behavior forks on a couple of choices. Use AskUserQuestion to settle these only when they aren't already clear from the request, then proceed:

What kind of audio? podcast/interview · video (caption-timed or A/V sync) · multitrack stem · already-clean studio. This selects the recipe.
Render mode? --mode remove (default — excises fillers, timeline shrinks) vs --mode silence (mutes in place, duration preserved — required for video sync and multitrack stems).
Video input — audio or picture? For a video file, erm emits the cleaned audio only (.wav) by default (the "pull the audio out" case). Add --video to render the picture too — container inferred from the input, A/V in sync by construction. With --video: --mode silence stream-copies the picture losslessly (caption/lip-sync safe), --video-splice {crossfade,cut} picks the splice style, --vcodec/--crf/--preset tune the re-encode. See the video doc.

If the user already implied the answers (e.g. "clean my podcast"), don't ask — pick the sensible default and say what you chose.

Then read the recipes doc and use the matching copy-paste command.

3. Core workflow (the iterate loop)

Inspect first: erm INPUT.wav --dry-run — prints/writes the cut-list JSON (*-cuts-*.json); renders nothing. Review what it intends to cut.
Render: erm INPUT.wav — writes INPUT-cleaned-<timestamp>.wav next to the input.
Validate: erm validate INPUT.wav OUTPUT.wav — re-transcribes the output and asserts no fillers survive, plus container/duration sanity. Exit 0 = pass.

Useful flags (confirm with erm --help): -o/--output, --json, --model, --device, --fillers, --video (render the picture from a video input). The full usage doc explains the workflow in depth.

Adjusting the word list. If the user wants to strip an extra word (e.g. "also remove 'basically' / 'like'"), prefer --add-fillers "basically,like" — it keeps the built-in defaults and unions the new words on top. Use --remove-fillers WORD to drop a default that over-matches their voice. Reach for --fillers only to replace the whole set, since it requires re-typing every stem. Custom words match verbatim (no automatic elongation). See the recipes doc → "Custom filler vocabulary".

4. When results aren't perfect

If fillers remain, real words get clipped, splices click/smear, the noise floor pumps, or words run together — hand off to the erm-tune skill, which maps each symptom to the right knob.

erm

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

erm

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

erm — install and use

Resolving documentation

1. Install / run

2. Ask before choosing a command

3. Core workflow (the iterate loop)

4. When results aren't perfect

Similar Skills

erm — install and use

Resolving documentation

1. Install / run

2. Ask before choosing a command

3. Core workflow (the iterate loop)

4. When results aren't perfect

Similar Skills