Converse

Non-blocking, interruptible voice interface for Claude Code.

Talk to Claude Code while it works. Claude reads its responses aloud, and you can interrupt at any time just by starting to speak — the listener detects your voice, kills the TTS playback, and hands Claude your new input.

What makes it different

Most "voice for the terminal" setups are turn-based: Claude speaks, you wait, you record, you send. Converse doesn't block. The mic stays open the whole session, Whisper transcribes continuously, and barge-in is handled purely by audio energy — no wake word, no push-to-talk, no modal recording prompt.

Requirements

Linux (uses flock(1) and XDG_RUNTIME_DIR)
Python 3 with pyaudio, numpy, requests
A running Whisper server (OpenAI-compatible API) on localhost:2022
A running Kokoro TTS server (OpenAI-compatible API) on localhost:8880

The easiest way to run both is voicemode, which installs and manages Whisper + Kokoro locally.

Installation

/plugin install converse@nafg/claude-converse

Required setup

1. whisper-server needs --no-context.

Without the flag, whisper-server's decoder carries tokens across HTTP requests, so a single bad transcription contaminates every subsequent one until you restart the server. Edit ~/.voicemode/services/whisper/bin/start-whisper-server.sh and add --no-context to the exec line, then voicemode service restart whisper. Full details in CLAUDE.md.

2. Statusline integration (optional but recommended).

A small line in your statusline shows live transcriptions while you speak and a mic indicator when voice mode is active. Add the one-liner for your shell to your statusline command — see CLAUDE.md for the exact snippet.

Usage

/converse on     # start voice mode
/converse off    # stop voice mode

Once on, just talk. Claude will:

Echo your transcription as a blockquote
Respond in text, which the Stop hook speaks via Kokoro
Accept interruption: speak over the TTS and it cuts off; the new transcription goes to Claude as normal input

How it works

┌──────────┐   energy VAD    ┌─────────┐   HTTP    ┌─────────┐
│   mic    ├────────────────►│ listener├──────────►│ whisper │
└──────────┘                 └────┬────┘           └─────────┘
                                  │ stdout
                                  ▼
                             ┌─────────┐
                             │ Claude  │
                             └────┬────┘
                                  │ Stop hook
                                  ▼
┌──────────┐   PyAudio       ┌─────────┐   HTTP    ┌─────────┐
│ speakers │◄────────────────│  speak  │◄──────────│ kokoro  │
└──────────┘                 └─────────┘           └─────────┘

listener.py runs persistently via Monitor; does energy-based VAD, sends utterances to Whisper, streams transcriptions to Claude.
speak.py runs as a Stop hook; strips markdown, chunks by sentence, streams WAV from Kokoro, plays via PyAudio.
Barge-in: listener detects sustained loud audio, sends SIGTERM to the TTS process group via a PID file.

Customization

All configurable via environment variables:

Variable	Default	Purpose
`VAD_THRESHOLD`	300	RMS energy bar for "is this speech?"
`VAD_SPEECH_START_FRAMES`	3	Consecutive speech frames before an utterance starts
`VAD_CHUNK_SILENCE_FRAMES`	20	Short silence triggers status-line update (~600ms)
`VAD_UTTERANCE_END_FRAMES`	50	Long silence ends the utterance (~1500ms)
`VAD_MIN_UTTERANCE_FRAMES`	10	Shorter utterances are discarded (~300ms)
`VAD_BARGE_IN_ENERGY_MULT`	2.0	Barge-in energy bar = THRESHOLD × this
`VAD_BARGE_IN_FRAMES`	6	Consecutive loud frames required for barge-in
`VAD_PRE_BUFFER_FRAMES`	10	Frames kept before speech trigger to avoid clipping
`KOKORO_URL`	`http://localhost:8880/v1/audio/speech`	TTS endpoint
`KOKORO_VOICE`	`af_heart`	Kokoro voice name
`KOKORO_MODEL`	`kokoro`	Kokoro model name
`WHISPER_URL`	`http://localhost:2022/v1/audio/transcriptions`	STT endpoint
`WHISPER_INITIAL_PROMPT`	(empty)	Primes Whisper's decoder with domain vocabulary — a short phrase listing technologies or jargon in play. Dramatically reduces mis-hears on technical terms.
`CONVERSE_STATUS_WINDOW`	30	Seconds of transcription history to show in statusline
`CONVERSE_STATUS_PREFIX`	`🎤`	Leading label for statusline output

converse

Popularity

What's Inside

README

Converse

What makes it different

Requirements

Installation

Required setup

Usage

How it works

Customization

Troubleshooting

Confidence

Similar Plugins

voicemode

elevenlabs-pack

callme

caveman

claude-mem

llm-council-plugin