Skill

mastery-assessment

Probes a learner's ability to recall module claims from memory, tags with Bloom levels, assigns Dreyfus mastery bands, and schedules spaced-repetition reviews.

developer-tools

Popularity

Stars

121

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/thinking-frameworks-skills:mastery-assessment

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill checks whether the learner *holds* a module's claims — not whether they recognize them. Recognition is cheap and treacherous: a learner reads "heritability is a property of a population, not a trait," nods, and feels they know it. Retrieval is the test. This skill asks the learner to produce the claim, the mechanism, and an application from memory **before** any prompt is shown, then...

SKILL.md

122 lines · ~2.7k tokens

Stats

Stars121

Forks16

MaintenanceExcellent

Last CommitJun 15, 2026

Actions

View Source View Plugin View on GitHub View README

Mastery assessment

This skill checks whether the learner holds a module's claims — not whether they recognize them. Recognition is cheap and treacherous: a learner reads "heritability is a property of a population, not a trait," nods, and feels they know it. Retrieval is the test. This skill asks the learner to produce the claim, the mechanism, and an application from memory before any prompt is shown, then probes exactly where production broke. The retrieval effort is the point, not a side effect — see [[learning-system|LEARNING_SYSTEM.md]] and the desirable-difficulty principle in conventions.md §8.

It composes three skills: [[evaluation-rubrics]] supplies the band descriptors and scoring scale, memory-retrieval-learning supplies the spacing schedule and the case for testing-as-encoding, and socratic-teaching-scaffolds supplies the question ladder used to probe a gap without lecturing the answer back in.

This is an assessment, not a tutoring session. The job is to measure and schedule, not to teach. When a gap is found, probe it once or twice to characterize it, then record it — do not slide into a lecture. The reading loop handles teaching; this loop handles measurement.

Inputs

The module being assessed (pNmM) and its question-bank note in assessments/question-banks/.
The evergreen notes tagged with that module (the core claims under test).
Each note's current mastery band and review-due date.
The last assessment-session note for this module, if one exists (for trend and to recall prior misconceptions).

The closed-book-first protocol

Run every probe through four moves, in order. Never collapse them.

Ask. Pose the question cold. No hint, no multiple choice, no restating the claim. "What does the breeder's equation predict, and what are its terms?" The learner answers from memory only.
Wait. Give the learner room to produce a full answer, including "I don't know" or a partial one. Do not fill the silence with the answer. A struggled retrieval that eventually lands is worth more, for retention, than a smooth recognition.
Probe the gap. Where the answer is incomplete, wrong, or memorized-but-hollow, ask one or two scaffolded follow-ups (the socratic-teaching-scaffolds ladder: from a concrete instance up to the principle, or a "what would happen if…" counterfactual). The probe diagnoses whether the structure is there, not whether they can be led to the words.
Confirm. Only now reveal the canonical claim from the evergreen note. State plainly whether the learner had it, partly had it, or missed it, and name the specific gap. The learner sees the correct answer last, against their own attempt — this contrast is itself an encoding event.

A hollow answer (correct words, no mechanism behind them) scores below a struggled answer that reconstructs the mechanism in the learner's own terms. Production over recognition, always.

Bloom-tagged probes

Tag each probe with the Bloom level it exercises (conventions.md §8). A module assessed only at remember/understand has not been assessed. Push to apply/analyze; a capstone-adjacent module reaches create.

Bloom level	What the probe demands	Genomics example
remember	recall a definition or term	"What does GEBV stand for?"
understand	restate a claim in own words + mechanism	"Why is heritability a property of a population, not a trait?"
apply	use the idea on a fresh case	"Selection intensity doubles; what happens to predicted response, holding h² and σ_P fixed?"
analyze	decompose, compare, find the flaw	"A model's accuracy rose with markers then fell. Decompose the two forces in tension."
evaluate	judge a claim or a design	"Is it sound to validate genomic prediction by random CV when families are structured? Defend."
create	design or simulate something new	"Sketch a simulation that would let you see response shrink as the population inbreeds."

Cover at least three Bloom levels per session, including one at apply or higher. Record which levels were probed in the session note's bloom-levels field.

Dreyfus mastery bands

Place the learner in one band per core claim, then take the module's band as the modal (typical) claim band, noting the spread. Bands are Dreyfus-flavored (conventions.md §8); a module is mastered at proficient+ on its core claims.

Band	Behavioral signature under closed-book probing
unfamiliar	cannot produce the claim or recognize it when shown
aware	recognizes the claim when shown; cannot reconstruct it cold
functional	reconstructs the claim and its mechanism cold, follows the standard steps, but needs prompting on edge cases
proficient	applies the claim to a fresh case unprompted, connects it to neighboring claims, sees when it breaks
fluent	reasons with the claim instinctively, teaches it, recognizes its limits and the assumptions it rides on

Borrow the descriptor-per-level discipline of [[evaluation-rubrics]]: each band is a behavioral descriptor you can match against a transcript, not a vibe. If you cannot point at the move in the answer that earns a band, you have over-rated it.

Misconception detection

A wrong answer is data; a structured wrong answer is gold. Watch for the recurring genomics misconceptions and log each one explicitly, because a misconception that survives is what the next review must target.

Heritability as destiny / as a trait property — treating h² as fixed and trait-intrinsic rather than population- and environment-specific.
Correlation read as cause in GWAS — a significant marker is the causal variant, ignoring linkage disequilibrium and the fact the SNP is usually a tag.
"More markers must mean more accuracy" — ignoring the bias–variance tradeoff and that informative content saturates with LD.
Confusing additive with total genetic variance — folding dominance/epistasis into "the genetics" when response to selection rides on additive variance.
Leakage in prediction CV — random cross-validation across related individuals inflates accuracy; the honest test holds out families/environments.

For each misconception caught, record the claim it attaches to, the exact wrong reasoning the learner produced, and the probe that surfaced it. This goes in the session note and steers the next surfacing.

Scoring

Score each probe 0 / 0.5 / 1: miss / partial-or-hollow / clean cold reconstruction. A hollow recognition caps at 0.5.
Session score is the sum over total, reported as n/m (e.g. 7/10), and matched to the module band.
Map score band to mastery band as a prior, then override from behavior if the transcript shows otherwise (a learner can score 8/10 yet reveal a load-bearing misconception that pins them at functional). Behavior wins over arithmetic.
The module is clear to advance only at proficient+ on its core claims with no live misconception on a core claim.

Spaced-repetition scheduling

Each evergreen note and module carries a review-due date. Schedule the next surfacing on expanding intervals: 1d → 3d → 7d → 16d → 35d, then continue stretching (~2.2×) for fluent material. The interval expands on success (the spacing effect: a recall that's just hard enough cements the trace) and contracts on a miss.

Per claim, track its position on the ladder:

Clean reconstruction (score 1): advance one rung — 7d becomes 16d.
Partial/hollow (0.5): repeat the current rung — 7d stays 7d.
Miss, or a live misconception (0): pull back down the ladder — drop to 3d (or 1d if it was a core claim at an early rung). Pulling the date earlier is the whole point: missed material must come back fast.

Worked example. The claim "response to selection scales with additive variance, not total genetic variance" sits at the 7d rung, last reviewed today, 2026-05-30.

Clean cold reconstruction → next rung 16d → review-due: 2026-06-15, band nudges toward proficient.
Hollow answer (named the equation, couldn't say why additive) → repeat 7d → review-due: 2026-06-06, band holds at functional, misconception "additive vs total variance" logged.
Missed, asserted total variance drives response → contract to 3d → review-due: 2026-06-02, band drops to aware, misconception logged as the target of the next session.

Set the module's review-due to the earliest review-due among its core claims, so the soonest-failing claim pulls the whole module back.

Outputs (everything proposes; the learner approves)

This skill never writes to the vault autonomously (conventions.md §10). It proposes two artifacts and waits for go-ahead.

1. An assessment-session note at assessments/log/pNmM-assessment-YYYY-MM-DD.md, frontmatter exactly per conventions.md §6:

---
type: assessment-session
module: p2m3
date: 2026-05-30
bloom-levels: [understand, apply, analyze]
mastery-band: functional
score: "7/10"
next-review: 2026-06-06
tags: [assessment]
---

Body: a per-claim table (claim link [[slug|Title]] · Bloom level · score · band · the gap) using the piped link form (conventions.md §3), a misconceptions section quoting the learner's own wrong reasoning, the verdict on advancement, and the per-claim schedule moves.

2. A proposed review-due (and, where it changed, mastery) update on each evergreen note touched — shown as a diff per note, never applied without approval. Format the proposal so the learner can accept the whole batch or cherry-pick.

End the session with one retrieval the learner should attempt unaided before the next surfacing — the weakest core claim, posed as a cold question, not a summary of how they did. Leaving the loop open on an effortful question is itself spaced encoding.

mastery-assessment

Popularity

Invocation

Context Preview

SKILL.md

mastery-assessment

Popularity

Invocation

Context Preview

SKILL.md

Mastery assessment

Inputs

The closed-book-first protocol

Bloom-tagged probes

Dreyfus mastery bands

Misconception detection

Scoring

Spaced-repetition scheduling

Outputs (everything proposes; the learner approves)

Similar Skills

Mastery assessment

Inputs

The closed-book-first protocol

Bloom-tagged probes

Dreyfus mastery bands

Misconception detection

Scoring

Spaced-repetition scheduling

Outputs (everything proposes; the learner approves)

Similar Skills