From quoth
Hierarchical Thompson sampling with cluster-level posteriors + 10% exploration + SNIPS counterfactual updates. Use when building retrieval/recommendation systems with implicit feedback that need to balance exploitation with exploration at scale (10k+ items).
How this skill is triggered — by the user, by Claude, or both
Slash command
/quoth:contextual-banditsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- Large item catalog (10k+) where per-item Beta(α,β) is infeasible as sole signal
Per-item LinTS stores O(d²) matrix per arm: 1024d × 100k items ≈ 800GB. Infeasible.
Hierarchical decomposition:
Memory at 100k items, K=316 clusters: ~5KB of cluster stats.
Input: candidates (pre-filtered via HNSW top-N), clusterMap, K=3, queryEmbedding
1. Group candidates by cluster_id
2. For each cluster c: sample s_c ~ Beta(α_c, β_c)
3. Sort clusters by s_c desc
4. From each cluster (top-sampled first), rank items by:
score = 0.6·cosine(query, item.embedding) + 0.4·(α_i/(α_i+β_i))
5. Take top items until K reached; record cluster+within propensities
Critical for counterfactual updates (SNIPS):
θ_i ≈ (s_c_i / Σs) × (1 / (rank_within × |cluster|))
clip θ_i ≥ 0.01 to prevent weight explosion
Marsaglia-Tsang gamma method:
function sampleBeta(α, β) {
const g1 = sampleGamma(α), g2 = sampleGamma(β)
return g1 / (g1 + g2)
}
Why: without exploration, the system converges on whatever was initially popular. Exploration creates clean counterfactual data for unbiased SNIPS updates.
Mechanism: with probability ε=0.10, replace one of the K=3 ranked slots with a uniformly random candidate from the pool (excluding already-selected).
IF random() < ε:
slot = random(0, K-1)
replacement = uniform_random_from(pool - selected)
selected[slot] = replacement # mark is_exploration=true
propensity = ε / |available|
Why this matters for SNIPS: without exploration, the probability of a random item being picked approaches 0, making SNIPS weights (1/θ) unbounded. Exploration guarantees θ_i ≥ ε / pool_size, capping SNIPS weights at pool_size / ε ≈ 100-1000.
At injection time, persist per-slot:
INSERT INTO injection_log (session_id, pattern_id, cluster_id, rank, propensity, is_exploration, query_text, injected_at)
VALUES (?, ?, ?, ?, ?, ?, ?, now)
Critical for offline SNIPS evaluation — DO NOT drop this log.
Problem: we log injections with propensities θ_i and observe rewards r_i. Naive IPS (1/N) Σ r_i / θ_i has unbounded variance when θ_i is small.
SNIPS (Swaminathan & Joachims 2015):
r̂(cluster) = Σ_i (w_i · r_i) / Σ_i w_i where w_i = clip(1/θ_i, cap)
Self-normalization removes the bias introduced by clipping. Bounded variance. Production-dominant at Netflix/Spotify.
Given n observations and SNIPS estimate r̂:
α_new = α_old + n · r̂
β_new = β_old + n · (1 - r̂)
Cap n ≤ 10 per batch to prevent overshoot from correlated samples.
ESS = (Σw)² / Σw²
If ESS << n, weights are concentrated (few observations dominate) → confidence interval wider.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub montinou/quoth