From transcribe
Transcribe a meeting recording locally with speaker diarization. On macOS (Apple Silicon) uses mlx-whisper + FluidAudio (CoreML, ANE) for ~30× realtime speed; on other platforms falls back to whisply (faster-whisper + pyannote). Use when the user asks to transcribe a recording, generate a transcript, or identify speakers from audio/video.
How this skill is triggered — by the user, by Claude, or both
Slash command
/transcribe:transcribe <path-to-audio-or-video-file><path-to-audio-or-video-file>This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Transcribe an audio or video recording locally:
Transcribe an audio or video recording locally:
The skill bootstraps and maintains its own global Python venv at ~/.local/share/transcribe-skill/.venv/ and (on Mac) builds FluidAudio into ~/.local/share/transcribe-skill/FluidAudio/. No per-project setup needed.
The input file path is provided as $ARGUMENTS.
macOS: no token needed. Verify Swift is available:
command -v swift >/dev/null && echo OK || echo "run: xcode-select --install"
Other platforms: verify HF_TOKEN is set (env, project .env, or ~/.config/transcribe-skill/.env):
grep -h HF_TOKEN .env ~/.config/transcribe-skill/.env 2>/dev/null | head -1
If missing, read ${CLAUDE_SKILL_DIR}/references/local-setup.md and walk the user through setup.
No venv setup needed. run.sh auto-creates everything on first invocation:
bash ${CLAUDE_SKILL_DIR}/scripts/run.sh $ARGUMENTS
The wrapper will:
<file>.transcribe.json (reused on subsequent runs — no recompute)<file>.transcript.md with generic speaker labels (Speaker 0, Speaker 1, ...)If <file>.transcribe.json already exists, the script skips compute and reuses it. Use --force to re-run. Legacy <file>.whisply.json caches from older skill versions are also honored.
Flags (passed through to transcribe.py):
--model <name> — override Whisper model. Mac default: mlx-community/whisper-large-v3-turbo. Fallback default: large-v3-turbo. Use large-v3 for max accuracy.--language <code> — force language (default auto-detect; e.g. ru, en, de)--num-speakers <N> — fix speaker count if auto-detect splits wrong--output <path> — output .md path (default: <input>.transcript.md)--force — bypass JSON cache and re-run pipelineAfter Step 1 finishes, immediately read the generated .transcript.md file and analyze the conversation to identify speakers. Do not just dump the transcript and stop — the auto-labeling step is part of every transcribe invocation.
.assistant/, CLAUDE.md, project README, attendee lists)Build a proposed mapping like:
Speaker 0 → Alex Smith (host, leads the discussion)
Speaker 1 → Maria Jones (engineer, presents the design)
Present the proposed mapping to the user and ask for confirmation or corrections using AskUserQuestion. Show a few representative quotes from each speaker to help the user verify.
Once the user confirms the speaker mapping, run the relabel command:
bash ${CLAUDE_SKILL_DIR}/scripts/run.sh $ARGUMENTS --relabel '{"Speaker 0": "Alex Smith", "Speaker 1": "Maria Jones"}'
This rewrites .transcript.md with real names. The cached .transcribe.json is reused — no recompute.
Ask the user where to save the final transcript. Suggest a sensible default based on the project structure (e.g., a transcripts/ directory or project root).
Move the final .transcript.md to the chosen location with a descriptive filename.
Report the final file path to the user.
~/.cache/huggingface/ and FluidAudio models (~50-100 MB) to ~/Library/Application Support/FluidAudio/Models/.transcribe.json contains raw STT segments (with word-level timestamps) + diarization segments — add *.transcribe.json to .gitignore~/.local/share/transcribe-skill/.venv/ (override with TRANSCRIBE_VENV env var)~/.local/share/transcribe-skill/FluidAudio/.build/release/fluidaudiocli (override with FLUIDAUDIO_BIN)Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub ayusavin/skills --plugin transcribe