From easy-cheese
Gates code sharing by requiring the author to explain a diff's causal logic, graded against the SOLO Taxonomy by a fresh-context judge. Mitigates epistemic debt before PRs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/easy-cheese:hard-cheeseThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this skill when the human author needs to demonstrate they understand the causal logic of a diff before it is shared for review or a pull request is opened. The gate exists to mitigate **epistemic debt** — the failure mode where AI-scaffolded code passes review, type-checks, and tests green while the author cannot explain it to a reviewer.
Use this skill when the human author needs to demonstrate they understand the causal logic of a diff before it is shared for review or a pull request is opened. The gate exists to mitigate epistemic debt — the failure mode where AI-scaffolded code passes review, type-checks, and tests green while the author cannot explain it to a reviewer.
Do not use it for code review (/age covers eight orthogonal review dimensions), for test hardening (/press), or for fix application (/cure). Those skills review the artifact. Hard-cheese reviews the human's understanding of the artifact — orthogonal axis.
/hard-cheese [<slug>] [--socratic-cap N=3] [--no-judge]
Arguments:
<slug> — optional. Identifies the artifact at .cheese/hard/<slug>.md. When omitted, fall back to the git short SHA of HEAD. An explicit slug always wins.--socratic-cap N — max retry attempts before the gate marks the artifact FAILED and exits non-zero. Default 3. Vibecheck does not cap; easy-cheese does to avoid infinite loops.--no-judge — log-only mode. Capture the user's explanation, write the artifact with status: LOGGED, skip the judge sub-agent spawn. Mirrors vibecheck's optional JSONL telemetry mode.| Mode | How it fires | Where the gate sits |
|---|---|---|
| standalone | User runs /hard-cheese <slug> directly before opening a pull request. | Outside the pipeline. No upstream skill required. |
| propagated | /cure invokes /hard-cheese <slug> when --hard is in scope and the user selects the share-for-review option at cure's handoff (or, under --auto --hard, at the end of cure's final auto pass). | At the cure → share for review boundary — the moment code escapes the local machine. |
--hard propagates through /cheese → /mold → /cook → /press → /age → /cure. Upstream skills only pass the flag along; /cure is the only skill that actually invokes /hard-cheese. See references/composition.md for the full matrix.
Resolve scope.
diff_base = origin/main, diff_head = <short-sha of HEAD>..cheese/specs/<slug>.md exists, load it as the intent reference (optional — diff is the ground truth).origin/main, exit 0 with "nothing to gate on" and write no artifact.Freshness check.
.cheese/hard/<slug>.md exists and its diff_head matches the current HEAD short SHA, print "previously passed at attempt <N>" and exit 0.diff_head differs (or status is not PASS/LOGGED), start a fresh attempt sequence. Prior attempts are not erased — they stay in the same file as historical context, and the new sequence appends below.Compose the vibecheck prompt (faithful to Sankaranarayanan 2026, generalised to "share for review" so the gate stays implementation-agnostic):
Before this is shared for review, explain its causal logic in your own words. How does work? Why does it produce the desired behavior? What state, control flow, or invariants does it rely on?
Render a diff summary alongside the prompt: files changed, key hunks. Cap the diff excerpt at roughly 80 lines so the user can still see what they are explaining without scrolling. The spec excerpt (if loaded) is shown above the diff summary.
Capture the user's explanation as free text. No coaching, no autosuggest, no example answers — the explanation is the artifact under test.
Spawn the judge sub-agent in fresh context (same pattern /ultracook uses for adversarial review). Same-context self-judging is biased and not useful. The judge:
references/judge-prompt.md as its system prompt.{score, level, pass, feedback, socratic_qs}.See references/judge-prompt.md for the full system prompt and output shape.
Skip this step when --no-judge is set: mark the attempt status: LOGGED, write the artifact, exit 0.
On judge result:
score >= 3 → append PASS attempt to artifact, set status: PASS, exit 0.score < 3 → append FAIL attempt, render the Socratic questions inline for the user, loop back to step 4 if attempts < --socratic-cap.0 (FAIL OPEN — see ## Divergence from the paper).On cap exhaustion: set the artifact status: FAILED, print the path, exit non-zero. Downstream chains must not proceed.
.cheese/hard/<slug>.md is the audit trail. The directory is gitignored by repo convention (.gitignore already ignores .cheese/), so the trail stays local — matching vibecheck's local-only stance on telemetry.
---
slug: <slug>
attribution: |
Sankaranarayanan, S. (2026). Mitigating 'Epistemic Debt' in
Generative AI-Scaffolded Novice Programming using Metacognitive
Scripts. Proceedings of the 13th ACM Conference on Learning at
Scale. https://arxiv.org/abs/2602.20206
Implementation reference:
https://github.com/sreecharansankaranarayanan/vibecheck
rubric: SOLO Taxonomy (1-5), pass threshold = 3 (Multistructural-or-higher; the paper labels this the 'Relational' pass condition)
divergence: fail-open on judge error (vibecheck fails closed)
diff_base: <sha>
diff_head: <short-sha>
files_changed: <count>
status: PASS | FAIL | FAILED | LOGGED
attempts: <n>
---
## Attempt 1 (FAIL — SOLO 2 Unistructural)
git: <sha at attempt time>
> <user explanation verbatim>
**Judge feedback**: <one-paragraph critique>
**Socratic questions**:
- <q1>
- <q2>
## Attempt 2 (PASS — SOLO 4 Relational)
git: <sha at attempt time>
> <user explanation verbatim>
**Judge feedback**: <one-paragraph critique>
Attempts append; nothing is overwritten within a single invocation. If a re-invocation finds the artifact stale (HEAD moved), the new attempt sequence is appended below the prior one rather than replacing it — the trail is cumulative.
The judge is a fresh-context full-peer sub-agent, the same pattern /ultracook uses. Rules:
subagent_type: "general-purpose" with references/judge-prompt.md as the system prompt. Model inherits from the parent — do not pass haiku or any other tier downgrade.ERROR and the gate fails open (see ## Divergence from the paper).If the host harness has no sub-agent primitive, /hard-cheese is the wrong skill — the gate cannot run without a fresh judge. Recommend /hard-cheese --no-judge for users who still want the explanation captured as telemetry without the grading step.
This skill implements the metacognitive-script mechanism described in:
Sankaranarayanan, S. (2026). Mitigating 'Epistemic Debt' in Generative AI-Scaffolded Novice Programming using Metacognitive Scripts. Proceedings of the 13th ACM Conference on Learning at Scale. https://arxiv.org/abs/2602.20206
The implementation reference (intercept-at-acceptance, SOLO rubric, Socratic retry) is the open-source VS Code extension by the paper's author:
https://github.com/sreecharansankaranarayanan/vibecheck
The attribution appears in this SKILL.md, in references/judge-prompt.md, and in every .cheese/hard/<slug>.md artifact's frontmatter so the citation travels with the audit trail.
Hard-cheese departs from vibecheck in exactly one place, and the divergence is called out explicitly so it stays legible:
Vibecheck fails closed on judge error. If the Judge LLM cannot produce a verdict, the modal blocks code application until the judge recovers or the user retries with a different model.
Hard-cheese fails open on judge error. If the fresh-context judge sub-agent crashes, times out, or returns malformed JSON, the gate writes an ERROR attempt, prints a clear warning, and exits 0 — the user is allowed to proceed.
Rationale: judge invocation is per-PR-attempt and per-retry, and a strict fail-closed policy creates a worse experience under API hiccups than the epistemic-debt cost it averts. This is the only departure from the paper's mechanism. New divergences must be added to this section so each one stays legible.
--auto--hard and --auto may coexist. The gate is the only point at which --hard punctures --auto. Everywhere else, auto's skip-handoff semantics apply.
Concretely, under /cure --auto --hard --stake medium+:
cook → press → age → cure per --auto's normal contract./hard-cheese <slug> fires once."gate passed → ready to share for review".0 with a warning (the fail-open divergence).Non-TTY guard: if the gate detects it is running without a human (no interactive input is available), it fails closed and aborts. A vacuous "auto-pass" with no human in the loop would defeat the entire mechanism.
/cure --auto alone (no --hard) is unchanged — the gate never fires. The single puncture point is documented in references/composition.md and in skills/cure/SKILL.md.
When the gate ends, print:
Hard-cheese artifact: .cheese/hard/<slug>.md
Status: PASS | FAILED | LOGGED | ERROR
Attempts: <n>
Followed by:
Ready to share for review.Cap exhausted. Improve understanding of the change before sharing.Telemetry only — judge skipped via --no-judge.Fail-open divergence active — gate exited 0; you may share for review at your discretion.| Need | Prefer | Fallback |
|---|---|---|
| Diff inspection for the user-facing summary | delta | git diff --unified=3 |
| Reading the spec (when present) | cheez-read | host file read |
| Spawning the judge | host sub-agent primitive (Agent() or harness equivalent) | none — without sub-agent spawn, run --no-judge mode and tell the user the judge is unavailable |
| GitHub / PR context (out of scope here) | n/a | n/a |
/gh or any specific PR-creation tool. The gate's contract is "before code is shared for review" — implementation-agnostic.skills/age/references/voice.md in this repo): say what the gate result was, flag residual risk as certain | speculating | don't know, do not soften FAILED into "almost passing".references/judge-prompt.md — SOLO Taxonomy rubric, judge sub-agent system prompt, JSON output shape.references/composition.md — the full --hard / --auto matrix and the single puncture point.npx claudepluginhub paulnsorensen/easy-cheesePrompts developers to explain AI-generated code or plans via rubber duck questioning to verify comprehension and prevent rubber-stamping.
Guides rigorous code review practices: technical feedback reception, subagent reviewer requests after tasks, verification gates before PR completion claims.
Runs cross-model code reviews using the external Codex CLI tool from a Claude session. Catches bugs that single-model self-review would miss by leveraging a different reviewer architecture.