From yao-skills
Use when session context exceeds ~100k tokens, when switching between unrelated tasks, before starting long-running loops (ralph, autopilot, /loop), when usage costs spike, or when user mentions "context", "compact", "clear", "handover", "session太長", or "usage太貴". Symptoms include slow responses, accumulated tool-result bloat, 4hr+ sessions, and subagent-heavy workflows.
How this skill is triggered — by the user, by Claude, or both
Slash command
/yao-skills:context-hygieneThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Session context grows monotonically — every tool call, file read, and response adds tokens. Even with prompt caching, every turn at high context pays real cost: cached input is **0.1x base (not zero)**, output is **never cached** and costs **5x input**. A 180k-token session costs ~1.8x more per turn than a 30k-token session even with perfect cache hits, and that compounds over a 50-turn session.
Session context grows monotonically — every tool call, file read, and response adds tokens. Even with prompt caching, every turn at high context pays real cost: cached input is 0.1x base (not zero), output is never cached and costs 5x input. A 180k-token session costs ~1.8x more per turn than a 30k-token session even with perfect cache hits, and that compounds over a 50-turn session.
The fix is not "cache harder" — it's reset context at the right moments via /compact (mid-task) or handover-doc + /clear (task boundary).
digraph context_decision {
"Context size?" [shape=diamond];
"Task boundary?" [shape=diamond];
"Continue" [shape=box];
"/compact with focus" [shape=box];
"Handover doc + /clear" [shape=box];
"Context size?" -> "Continue" [label="< 80k"];
"Context size?" -> "Task boundary?" [label="80-150k"];
"Context size?" -> "Handover doc + /clear" [label="> 150k"];
"Task boundary?" -> "Handover doc + /clear" [label="yes"];
"Task boundary?" -> "/compact with focus" [label="no, same task"];
}
Triggers:
/context output)/loop)Skip when:
Common (wrong) belief: "Long context with cache hits is cheap because cache reads are 0.1x."
Why it's wrong:
Per-turn cost (input-token-equivalents, assuming 3k output per turn):
| Context | Cached input cost | Output cost | Per-turn total |
|---|---|---|---|
| 30k + 1k new | 30k × 0.1 = 3k | 3k × 5 = 15k | ~19k |
| 90k + 1k new | 90k × 0.1 = 9k | 3k × 5 = 15k | ~25k |
| 180k + 1k new | 180k × 0.1 = 18k | 3k × 5 = 15k | ~34k |
180k vs 30k = 1.8x per turn. Over 50 turns: ~750k extra tokens, even with perfect cache hits.
Compact from 180k → 40k drops per-turn cost from ~34k to ~21k = 38% reduction every turn thereafter.
Use when you want to keep working on the same thing but drop accumulated junk (file dumps, old tool outputs, exploration).
Always provide a focus prompt — without it, the model picks what to drop and often drops the wrong thing:
/compact focus on the current plan, files touched, code paths under change, and open TODOs. drop full file contents, exploratory tool outputs, and resolved sub-questions.
What /compact preserves: high-level summary of the conversation, your stated plan, recent decisions.
What it drops: full file contents, intermediate tool result bodies, abandoned threads.
Typical reduction: 180k → 30-50k. Resume from where you were.
When switching tasks or session is too far gone, write a handover doc to disk BEFORE /clear. The doc becomes the new session's seed context.
Where to write:
docs/HANDOVER.md (committed, survives across sessions).omc/notepad.md (OMC-managed scratch)docs/HANDOVER_<branch>.mdNever write the handover doc only in the conversation — that's exactly the context you're about to clear.
# HANDOVER — 2026-05-19 — <branch / topic>
## Goal
<one sentence: what we're trying to accomplish>
## Status
- [x] Done: <list>
- [~] In progress: <what + file:line>
- [ ] Next: <list>
## Key files / functions
- `path/to/file.py:42` — <why this matters>
- `path/to/test.py:88` — <failing test name>
## Open questions / decisions pending
- <question> — <current leaning>
## Watch out for
- <gotcha>, <prior failure mode>, <constraint from spec/ADR>
## Last commit
<sha + subject>
## How to resume
1. Read this doc
2. <next concrete action>
After /clear, the first thing the new session does is read this file.
Long-running loops are the 8+ hour sessions that quietly eat budget. Three mitigations:
.omc/state/ or docs/PROGRESS.md. The loop reads the checkpoint on restart, not conversation history./clear, restart from checkpoint. Loops should be designed to resume from disk, not memory.codex, search/long-scan → gemini-cli, simple/cheap work → secondary LLM endpoint.Loop session that's been running 6 hours and "feels fine" is almost certainly accumulating. Check /context and usage.
Subagent-heavy sessions are a separate cost driver — each subagent runs its own LLM calls in its own context. Mitigations:
For one-off subagent work (not full PR workflows — see research-toolkit:workflow-routing for that):
| Task shape | Route to | Why |
|---|---|---|
| Architecture / hard trade-off / SPEC-level | Claude Opus | Heavy reasoning, deep context |
| Routine implementation / refactor / well-spec'd code | Claude Sonnet | ~5x cheaper than Opus, fast, clean impl |
| Complex implementation / algorithmic / multi-file | codex (codex:codex-rescue) | gpt-5.5 high-effort; OAuth-free if ChatGPT Plus quota available |
| Search / long-context scan / retrieval across many files | gemini-cli | Long context window, cheap, good at "find references / summarize across N files" |
| Simple / trivial / classification / cheap bulk work | Secondary LLM endpoint (e.g. a cheaper provider via CLAUDE_CONFIG_DIR) | $0 or near-$0 marginal; saves Claude quota for triage-worthy work |
Default split when in doubt: Sonnet for impl, Opus only for architecture decisions. Don't reflex-pick Opus for every subagent — that's where the 30%-from-subagents cost lives.
For PR-scale routing (plan / implement / review chains across A / B / C / D / Mini workflows), use research-toolkit:workflow-routing instead — it has the full matrix, elasticity gates, and parallelism hygiene rules.
See also superpowers:dispatching-parallel-agents for when subagents pay off vs inline.
| Situation | Action |
|---|---|
/context > 100k, same task, mid-flow | /compact <focus prompt> |
| Switching to a new feature/area | Write handover → /clear → re-read handover |
| Session > 4 hours OR > 150k tokens | Same as above, no exceptions |
Starting ralph / autopilot / /loop | Fresh session — never inherit conversation |
| Loop running > 4 hours | Checkpoint to disk → kill → restart fresh |
"Should I /clear now?" | If asking, yes — but write handover first |
| User says "session太長" / "usage太貴" | This skill applies; offer compact or handover |
/clear/context and usageresearch-toolkit:workflow-routing — PR-scale plan/implement/review routing (A/B/C/D/Mini matrix + elasticity gates)superpowers:writing-plans — plan files (separate concern, but a plan doc is often the seed of a handover)superpowers:dispatching-parallel-agents — when subagents are worth the costcodex:codex-rescue — delegating to codex (gpt-5.5) for complex implnpx claudepluginhub solitude6060/yao-skills --plugin yao-skillsGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.