From Strata
Evaluates retrieval/search changes with a committed golden set, reporting recall@k and MRR locally. Run before/after reranking, reweighting, or scope changes to detect regressions.
How this skill is triggered — by the user, by Claude, or both
Slash command
/strata:evalThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A regression guard + measuring stick for vault recall. It runs a small,
A regression guard + measuring stick for vault recall. It runs a small,
committed set of query → expected notes cases through the same retrieval the
recall tool uses, and reports recall@k and MRR.
Lives at <vault>/<repo>/.eval/golden.json (commit it — it's versioned with
the vault):
{
"cases": [
{"query": "rate limiting policy",
"expected": ["decisions/2026-05-21-token-bucket.md"],
"scope": "decisions"}
]
}
scope is optional (null/omitted = all scopes). 20–50 hand-picked cases is
plenty. Seed them from queries you actually run, or from the usage ledger's
top-recalled notes.
"${CLAUDE_PLUGIN_ROOT}/bin/strata" eval -k 5
Compare the pipeline with and without the cross-encoder rerank:
"${CLAUDE_PLUGIN_ROOT}/bin/strata" eval -k 5 --sweep
--sweep runs the golden set rerank-OFF then rerank-ON and prints both rows +
the lift, so the decision is a number. If the lift is zero (or negative) on your
set, leave rerank off (it's off by default) — don't pay the per-call model load
for no gain.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub gideondk/strata --plugin strata