From qa-cache-testing
Pure-reference catalog of cache-stampede (thundering-herd) phenomena and mitigations. Defines the stampede (massively-parallel cache misses on key expiry trigger simultaneous recomputation, often congestion-collapse) and the three canonical mitigation families: locking (one writer recomputes, others wait or serve stale), external recomputation (a separate process refreshes on schedule or near-expiry), and probabilistic early expiration via XFetch (each requester independently decides to refresh with probability rising toward expiry; uses formula `(time() - delta * beta * log(rand(0,1))) >= expiry`). Use when designing cache-refresh strategy or diagnosing a known stampede incident. Composes cache-coherence-patterns-reference + stale-while-revalidate-reference.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qa-cache-testing:cache-stampede-referenceThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A cache stampede (also "dog-piling," "thundering herd at cache
A cache stampede (also "dog-piling," "thundering herd at cache miss") occurs when a cached value expires under high load - many requesters simultaneously detect the miss, all recompute, all write back. Per en.wikipedia.org/wiki/Cache_stampede: "massively parallel computing systems with caching mechanisms come under a very high load" and "multiple threads of execution will all attempt to render the content of that page simultaneously." The pathological state: "congestion collapse, preventing the resource from being recached and maintaining zero cache hit rates."
This skill is a pure reference consumed by per-tier test skills.
| Signal | Interpretation |
|---|---|
| DB or upstream service latency spikes at cache-key TTL boundaries | Stampede on key expiry |
| Cache hit rate drops near zero, recovers slowly | Stampede causing congestion collapse |
| Periodic load spikes synchronised with cron / scheduled jobs | Multiple processes invalidating + recomputing |
| Recompute-cost-vs-traffic ratio > 0.1 | Hot key - stampede risk |
Per Wikipedia's cache-stampede article:
Upon cache miss, processes attempt to acquire a lock for that key. Only the lock holder recomputes; others either wait, return "not found," or use a stale value.
def get(key):
val = cache.get(key)
if val is not None and not val.stale:
return val
if cache.acquire_lock(key, ttl=30):
try:
val = recompute(key)
cache.set(key, val, ttl=300)
return val
finally:
cache.release_lock(key)
else:
# Another process is recomputing; serve stale or wait
return val or wait_then_get(key)
Drawbacks per Wikipedia: "complex implementation handling edge cases like process failures and race conditions." Lock holder crashing → cache empty for the lock TTL.
Mitigation: short-TTL locks with periodic refresh while recomputing.
A separate process recomputes the cache periodically or near expiry, decoupled from the request path. Per Wikipedia: "triggered when values approach expiration, periodically, or on cache miss."
# Cron / scheduled job
def refresh_hot_keys():
for key in HOT_KEYS:
val = recompute(key)
cache.set(key, val, ttl=600)
When it fits: static cache keys ("homepage data," "top-10 products"). Hot keys are knowable in advance. The recompute schedule overlaps the cache TTL.
Drawback: doesn't help with unknown / user-specific hot keys; needs separate infrastructure.
Each requester independently decides - with rising probability as the value ages - to refresh before formal expiry. Per Wikipedia, the canonical formula:
if (!value || (time() - delta * beta * log(rand(0,1))) >= expiry)
recompute_and_cache(key)
else
return value
Where:
| Variable | Meaning |
|---|---|
delta | Time to recompute the value (scales the probability distribution) |
beta | Tuning parameter (default 1; >1 favours earlier refresh) |
log(rand(0,1)) | Always negative; magnitude controls the early-refresh probability |
time() | Wall-clock or monotonic time |
expiry | Absolute expiry time stored alongside the value |
The "exponential distribution" of refresh decisions means most requesters use the cached value; only a few do early refresh. Per Wikipedia: "setting beta=1 works well in practice."
Implementation:
import math, random, time
def get_xfetch(key):
entry = cache.get(key) # contains {value, expiry, delta}
if not entry:
val, delta = measure_recompute(key)
expiry = time.time() + 300
cache.set(key, {"value": val, "expiry": expiry, "delta": delta}, ttl=300)
return val
now = time.time()
rand = max(random.random(), 1e-10)
if now - entry["delta"] * 1.0 * math.log(rand) >= entry["expiry"]:
# Early refresh
val, delta = measure_recompute(key)
expiry = now + 300
cache.set(key, {"value": val, "expiry": expiry, "delta": delta}, ttl=300)
return val
return entry["value"]
The delta (recompute cost) is measured during refresh and
stored. Expensive-to-recompute values get earlier refresh
attempts.
The strongest setups combine:
| Layer | Strategy |
|---|---|
| Cache backend | TTL + stale-while-revalidate per stale-while-revalidate-reference |
| App logic | XFetch on read for hot keys |
| Operations | External recompute for known-hot keys |
| Safety net | Distributed lock (Redis SET NX EX) as a backstop |
The point of layering: XFetch handles unknown hot keys gracefully; external recompute handles known hot keys; locks catch the few that slip through.
| Behaviour | Test |
|---|---|
| Lock holds under contention | N concurrent gets on missing key → 1 recompute, N-1 waits/stale |
| XFetch probability rises near expiry | Statistical test: many runs, fraction refreshing before expiry within target band |
| External recompute fires before TTL | E2E: write source-of-truth → wait → assert cache reflects new value |
| Stampede absent under load | Load test: N=1000 concurrent on cold key → upstream sees ≤N recomputes (target: 1-5) |
Refresh-cost (delta) updated on each refresh | Inspect cached metadata |
| Anti-pattern | Why it fails | Fix |
|---|---|---|
| No mitigation at all | Stampede inevitable under traffic | Pick at least one strategy |
| Lock without TTL | Lock holder crash → deadlock | TTL on locks; refresh while recomputing |
XFetch with beta very high (10+) | All requesters refresh constantly | beta=1; tune via load test |
| External recompute without monitoring | Cron job fails silently; stampedes return | Alarm on cache-miss rate spike |
| Cache TTL = stale-while-revalidate window | RFC 5861 SWR window depends on Cache-Control: stale-while-revalidate=N being separate | See stale-while-revalidate-reference |
| Stampede-mitigation tested only under low load | Pass at 10 RPS; fail at 1000 | Test at production-equivalent concurrency |
Hot key with must-revalidate | Forced re-validation = forced stampede on TTL | Use SWR or grace mode |
| Logging the stampede instead of measuring it | Logs swamped during incident; no recovery signal | Metric on cache-miss rate; alarm |
| Trusting client-side retries to "thin" the herd | Retries can amplify | Server-side rate limit on the recompute path |
delta measurement to use
e.g., p95 not mean.cache-coherence-patterns-reference,
stale-while-revalidate-reference.redis-cache-tests,
cdn-cache-purge-tests,
varnish-test-vtc-syntax,
cache-key-collision-detector.npx claudepluginhub testland/qa --plugin qa-cache-testingProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.