watchmen
Watchmen turns every coding session into the skill bundles you’d never sit down and write yourself.
Install ·
What it does ·
Mission control ·
How it works ·
Cost & privacy
Reusable skills and workspace briefs, built from what you actually do, carried
across Claude Code, Codex, and pi.dev. You install it once and never change how
you work.
The manual fix is writing a CLAUDE.md or AGENTS.md by hand, but that's you
doing the learning on behalf of the agent. That's backwards.
watchmen sits behind Claude Code, Codex, and pi.dev. It silently
watches your sessions, mines what you actually do, and writes skill bundles +
workspace briefs (CLAUDE.md / AGENTS.md) so your next session is smarter
than your last. Same skills follow you across agents — switch between Claude
Code and Codex on the same repo and they pick up where the other left off.
Why it matters
- Fewer tokens burned re-explaining yourself. Your agent stops
rediscovering the same procedures every session. The skill is already on
disk, so context that used to be spent re-deriving it isn't.
- Fewer tool errors per session. watchmen's own impact card tracks median
tool errors before and after a project's skills land, on a 16-week curve.
- Unified context layer across every agent. Switch from Claude Code to
Codex in the middle of development and the skills you need will follow. No
need to re-onboard the new agent.
Local storage, cross-agent, continuous.
watchmen stores transcripts, metrics, analyses, and generated bundles on your
machine. Analysis runs send selected session excerpts to OpenRouter using your
API key; nothing is uploaded outside those explicit LLM calls.
What watchmen actually does
While you work, it:
- 🤖 Captures sessions from every agent — Claude Code (
~/.claude/projects/), Codex (~/.codex/sessions/), pi.dev. One corpus across tools.
- 📚 Auto-curates skills — recurring procedures get turned into runnable skill bundles your agent can call:
SKILL.md + scripts + references.
- ✍️ Auto-writes CLAUDE.md + AGENTS.md — workspace brief, identical content for both, refreshed continuously.
- 📈 Surfaces what's working — mission control web UI, per-project impact tracking, friction signals, action queue.
- 💡 Retrospective skill hints — when you could've used an existing skill, the next statusLine update tells you. Never modifies your agent's context. Never blocks you.
You install it. It runs. Your agents get better every day you use them.
The CLI
watchmen init
Six steps. Most run in seconds; the LLM passes (analyze + curate) are the only slow ones — you see the cost estimate before they run. Stop at any confirmation gate; nothing partial is left behind.
Mission control
A local web dashboard at http://127.0.0.1:8979 — no hosted account or remote
dashboard. Top-of-page tells you:
- Skill calls this week vs last week — are your curated skills being invoked?
- Tool errors per session — is friction going up or down?
- Active repos — what's getting work this week?
- Skill leaderboard — which repo's skills are firing most
- Status tiles — traffic-light health per project (healthy / stale / uncurated)
- Next actions — ranked queue, e.g. "kai-bench has 28 prompts to analyze · Run"
Per-project impact
Drill into any tracked repo and you get a before/after view scoped to that project. 16-week chart of tool errors per session with a dashed annotation at the date the curator first landed. Pre/post comparison table: sessions, median tool errors, median prompts, median cost. Honest empty states when there isn't enough post-treatment data yet — never silently disappears.
Subtitle reads "Correlation only — not a controlled experiment." We don't oversell the signal.
Three themes
Light comic-pulp newsprint by default. Doomsday noir mode for the dark-mode crowd. Rorschach sepia-typewriter for diary-mode fans. Switch instantly at /settings — picker persists per browser via localStorage, no reload.
Dashboard + impact-card screenshots ship with v0.6 — generated against a mock corpus so no real project data leaks into the docs.
Install
Three commands and a wizard. Total wall time ~10 min + 30–90 min per project of LLM passes.
git clone https://github.com/firstbatchxyz/watchmen.git
cd watchmen
uv sync && uv tool install --editable .
watchmen init