cc-gc-stts

Talk to Claude Code, Gemini CLI or Antigravity CLI aka agy and hear them talk back. This project adds seamless Speech-to-Text (STT) and Text-to-Speech (TTS) capabilities via a Model Context Protocol (MCP) server.
📰 Read the story: True voice mode for Claude Code

✨ Features
- 🎙️ Speech-to-Text (STT): Dictate your prompts instead of typing.
- 🔊 Text-to-Speech (TTS): Hear the model's responses read aloud.
- 🔄 Conversational Loop: Use the
/stts command for a continuous voice-driven session.
- 🚀 Persistent Daemon: Fast startup using a reusable Chrome window.
- 🛠️ Cross-Platform: Works with both Claude Code and Gemini CLI.
- 🕘 History: Recall past prompts and responses from a dropdown above each panel, or with
Alt+↑ / Alt+↓.
🏗️ How It Works
stts uses a background daemon to manage a persistent Chrome/Chromium window:
- MCP Server: Exposes
stt and tts tools to the AI model. Talks to the daemon over plain HTTP — one short request per call, no polling, no per-call subprocess spawn.
- Daemon: A local HTTP + WebSocket server on port
15986 that controls a Chrome instance in "app mode". Stores its profile under $TMPDIR/cc-gc-stts-user-data-dir.
- Browser UI ↔ Daemon: A single persistent WebSocket at
/ws carries every per-turn message. The daemon pushes a request frame the moment the model calls stt or tts; the page pushes back complete / cancel / close when the user is done.
- Browser UI: Uses the native Web Speech API for recognition and synthesis. Free at the wallet — note that on Linux Chrome routes recognition audio through Google's servers, so this is not a fully offline pipeline.
- Smart Auto-Advance: In the
/stts voice loop, if you simply listen through the response without touching anything, the loop advances automatically the moment speech ends. Only if you press Stop or Play (or say "stop it" / "play it") does the page wait for a manual Got it! so you stay in control of replays.
- Automatic Lifecycle: The daemon starts on demand and shuts down when the Chrome window is closed.
- Port-collision aware: If port
15986 is held by a non-stts process, the launcher fails fast with a clear error instead of timing out.
🚀 Quick Start
1. Build the project
npm install
npm run build
2. Install
Claude Code
claude plugins marketplace add https://github.com/sandipchitale/cc-gc-stts.git
claude plugin install stts
Gemini CLI
gemini extensions install --consent https://github.com/sandipchitale/cc-gc-stts.git
Antigravity CLI
agy plugin install --consent https://github.com/sandipchitale/cc-gc-stts.git
⌨️ Usage
Conversational Loop
Run the voice-driven loop where you speak, the model processes, and the response is read back to you:
- Claude Code:
/stts
- Gemini CLI:
/stts
- Antigravity CLI:
/stts
Direct Tool Usage
You can also ask the model to "use the stt tool" or "speak this using tts" directly in your prompts.
🗣️ Voice Commands & Shortcuts
Both STT and TTS modes support voice-activated commands for a hands-free experience.
Popular Commands
| Command | Action |
|---|
send prompt | Submits your dictated text |
cancel prompt | Aborts the current recording |
new paragraph | Inserts a line break |
got it | (TTS mode) Acknowledges the response and continues — only required if you used Stop or Play during playback; otherwise the loop auto-advances |
stop it | (TTS mode) Stops the current playback (after this, Got it! is required to advance) |
play it | (TTS mode) Replays the response (after this, Got it! is required to advance) |
Note: Many more punctuation and formatting commands are supported (e.g., insert comma, select all, undo it). Toggle the side panel to see the full list.
Keyboard Shortcuts:
Ctrl+R: Toggle recording/playback side panel.
Enter: Send prompt (Talk side).
Escape: Stop recording or close the commands panel.
Alt+↑ / Alt+↓: Cycle through prompt or response history when the textarea is focused.

🕘 Prompt & Response History
Each panel has a History bar above its textarea: