stfu
Near-silent execution mode for coding agents.
Inspired by caveman, but aimed at a different problem: not funny compression, operational silence.
stfu is for users who do not want narration, restated requests, status theater, or filler. The agent should do the work and report only externally useful state.
Core idea: portable.
- the skill text itself should work in any agent that supports user-installable skills or prompt modules
- command toggles like
/stfu on and /stfu off are agent-specific adapters
- this repo ships adapters for Claude Code and Codex-style plugin layouts
- explicit invocation like
$stfu should apply the rules immediately without an activation speech
Commands
$stfu - portable explicit invocation where the agent supports direct skill calls
/stfu on - enable stfu mode for the current session
/stfu off - disable stfu mode for the current session
Install
stfu.skill - packaged single-skill archive for skill installers
dist/stfu.skill - release copy of the portable archive
dist/stfu-claude-plugin.zip - Claude Code plugin bundle
dist/stfu-codex-plugin.zip - Codex plugin bundle
.claude-plugin/ - Claude Code plugin metadata and hooks
plugins/stfu/ - Codex-style plugin layout
stfu/SKILL.md - raw portable skill source for any other agent
Release
Requirements:
python3
zip
pip install -r requirements.txt
Build all distributable artifacts:
./build-release.sh
This rebuilds:
stfu.skill
dist/stfu.skill
dist/stfu-claude-plugin.zip
dist/stfu-codex-plugin.zip
License
MIT. See LICENSE.txt.
Session scope is binary:
- new session = off
/stfu on = on until /stfu off
/stfu off = back to normal
Contract
When stfu is on, the agent should remove:
- greetings
- thanks
- request restatements
- activation chatter
- obvious progress narration
- unasked-for summaries
- next-step menus the user did not ask for
- filler like "let me check", "I am going to", "here is a quick update"
When stfu is on, the agent must still show:
- changed files or produced outputs
- test/build/lint status when relevant
- URLs, branches, commits, or PRs when relevant
- blockers
- one short question when a choice is required
- irreversible or security risk warnings
Silence is not opacity. If the agent says less, it must show more evidence.
Host limitation: the skill can compress assistant-authored natural language. It cannot guarantee silence from the host UI itself, including tool logs or status panes.
Examples
Normal:
Sure - I updated the auth flow and ran the tests for you. The fix was in src/auth.ts and I also touched tests/auth.test.ts. Everything passed.
stfu:
Changed: `src/auth.ts`, `tests/auth.test.ts`
Tests: `pnpm test auth` OK
Blocked:
Need one choice: `postgres` or `sqlite`?
Repo overview:
Terminal-first coding agent with a runtime/adapter split.
- `apps/cli`: terminal UI
- `apps/native`: runtime core
- `apps/api`: server/auth/billing layer
- `apps/web`: site/docs
Destructive action:
Warning: dropping `users` is irreversible.
Confirm before running it.
Benchmarks
Latest local run:
- CLI:
codex-cli 0.118.0
- Model:
gpt-5.4
- Scenarios:
9
- Method:
baseline vs terse control vs stfu
| Arm | Output tokens | Savings vs baseline | Savings vs terse | Utility checks |
|---|
| baseline | 587 | 0% | -27% | 28/31 |
| terse | 464 | 21% | 0% | 29/31 |
| stfu | 292 | 50% | 37% | 31/31 |
Takeaway:
- plain terse prompting saved
21%
stfu saved 50% vs baseline
stfu saved 37% on top of the terse control
- current heuristic utility checks passed
31/31 without trading the output down to useless one-word replies
Reproduce:
STFU_EVAL_CLI=codex python evals/llm_run.py
uv run --with tiktoken python evals/measure.py
Detailed breakdown:
evals/snapshots/benchmark.md
Repo layout
stfu/SKILL.md - source of truth
skills/stfu/SKILL.md - skill mirror for packaging
plugins/stfu/skills/stfu/SKILL.md - plugin mirror for Codex-style packaging
.claude-plugin/plugin.json - Claude Code plugin manifest
hooks/ - session-only mode tracking for /stfu on and /stfu off
evals/ - benchmark harness for output length and minimum utility checks
Evals
The eval harness keeps the same honest control as caveman:
__baseline__ - no system prompt
__terse__ - Answer concisely.
stfu - Answer concisely. plus the stfu skill
Unlike caveman, token count alone is not enough. stfu can only win if it stays short without hiding the useful result. The harness also checks for: