Skill

strata:eval

Evaluates retrieval/search changes with a committed golden set, reporting recall@k and MRR locally. Run before/after reranking, reweighting, or scope changes to detect regressions.

automation

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/strata:eval

User invocable

Model invocation disabled

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

A regression guard + measuring stick for vault recall. It runs a small,

SKILL.md

55 lines · ~435 tokens

Stats

LanguagePython

Stars5

Forks1

MaintenanceExcellent

Last CommitJun 8, 2026

Actions

View Source View Plugin View on GitHub View README

strata:eval

A regression guard + measuring stick for vault recall. It runs a small, committed set of query → expected notes cases through the same retrieval the recall tool uses, and reports recall@k and MRR.

The golden set

Lives at <vault>/<repo>/.eval/golden.json (commit it — it's versioned with the vault):

{
  "cases": [
    {"query": "rate limiting policy",
     "expected": ["decisions/2026-05-21-token-bucket.md"],
     "scope": "decisions"}
  ]
}

scope is optional (null/omitted = all scopes). 20–50 hand-picked cases is plenty. Seed them from queries you actually run, or from the usage ledger's top-recalled notes.

Run it

"${CLAUDE_PLUGIN_ROOT}/bin/strata" eval -k 5

Measure the rerank lift

Compare the pipeline with and without the cross-encoder rerank:

"${CLAUDE_PLUGIN_ROOT}/bin/strata" eval -k 5 --sweep

--sweep runs the golden set rerank-OFF then rerank-ON and prints both rows + the lift, so the decision is a number. If the lift is zero (or negative) on your set, leave rerank off (it's off by default) — don't pay the per-call model load for no gain.

When to run

Before/after any retrieval change (rerank, RRF weighting, a new scope).
Periodically, as a regression guard — a drop means recall quality slipped.

strata:eval

Popularity

Invocation

Context Preview

SKILL.md

strata:eval

Popularity

Invocation

Context Preview

SKILL.md

strata:eval

The golden set

Run it

Measure the rerank lift

When to run

Similar Skills

strata:eval

The golden set

Run it

Measure the rerank lift

When to run

Similar Skills