From adk-sessions-memory
Use this skill to enable ADK 2.0 context caching and compression so long- running agent sessions don't blow the model's context window or burn tokens on repeated prefixes. Triggers on: "ADK context cache", "compress ADK history", "ADK long conversation memory", "prevent context overflow ADK", "ADK token budget", "summarize old turns ADK", "ADK conversation compaction". Generates configuration for cached prefixes and a compression callback that summarizes old events.
How this skill is triggered — by the user, by Claude, or both
Slash command
/adk-sessions-memory:context-cache-compressThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Long sessions hit the model's context window. ADK 2.0 supports two mitigations: prompt caching (Gemini-side) and event compression (ADK-side).
Long sessions hit the model's context window. ADK 2.0 supports two mitigations: prompt caching (Gemini-side) and event compression (ADK-side).
Cache long static prefixes (system prompt + few-shot examples) so subsequent calls reuse them at lower cost/latency.
from google.adk.agents import LlmAgent
LARGE_INSTRUCTION = open("./few_shot_examples.md").read() # ~50KB
root_agent = LlmAgent(
name="cached_agent",
model="gemini-2.5-flash",
instruction=LARGE_INSTRUCTION,
cache_config={
"cache_instruction": True,
"cache_ttl_seconds": 3600,
},
)
ADK creates an explicit Vertex cache resource and reuses it across invocations.
Summarize old events when total tokens exceed a threshold:
from google.adk.callbacks import on_before_model_call
@on_before_model_call
async def compress_history(ctx, request):
if ctx.session.token_count > 100_000:
# Drop oldest 50% of events, replace with a summary event
old = ctx.session.events[: len(ctx.session.events) // 2]
summary = await summarize_events(old)
ctx.session.events = [summary, *ctx.session.events[len(old):]]
return request
Cap to last N turns:
@on_before_model_call
async def sliding_window(ctx, request):
MAX_TURNS = 20
if len(ctx.session.events) > MAX_TURNS * 2: # user+assistant pairs
ctx.session.events = ctx.session.events[-MAX_TURNS * 2:]
return request
Keep recent verbatim, summarize middle, archive oldest:
@on_before_model_call
async def hierarchical(ctx, request):
events = ctx.session.events
if len(events) > 60:
recent = events[-20:]
middle_summary = await summarize_events(events[-60:-20])
archive_summary = ctx.session.state.get("archive_summary", "")
new_archive = await summarize_events(events[:-60])
ctx.session.state["archive_summary"] = archive_summary + "\n" + new_archive
ctx.session.events = [middle_summary, *recent]
return request
ctx.session.token_count)session-rewind-checkpoint if you need to revert compressionsnpx claudepluginhub healthcare-ai-consulting-llc/adk-2-toolkit --plugin adk-sessions-memoryCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.