Skill

calibrate

Scores past predictions against actual sprint outcomes, creates calibration claims, computes accuracy scorecards by evidence tier and claim type. Useful for feedback loops after implementations.

developer-tools

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/grainulator:calibrate

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

The user wants to check what actually happened after a sprint's recommendations were implemented.

SKILL.md

58 lines · ~473 tokens

Stats

LanguageHTML

Stars85

Forks6

MaintenanceExcellent

Last CommitApr 19, 2026

Actions

View Source View Plugin View on GitHub View README

/calibrate -- Score predictions vs outcomes

The user wants to check what actually happened after a sprint's recommendations were implemented.

Arguments

$ARGUMENTS

Expected format: /calibrate --outcome "what happened" or /calibrate <claim_id> "actual result"

Instructions

Parse the outcome: The user provides outcome data as free text or claim-specific results.
Match outcomes to predictions: Use wheat_search to find the original estimate, recommendation, or risk claims that predicted something. Compare prediction to actual outcome.
Create calibration claims as cal### claims with evidence tier production (these are real outcomes):
- If prediction was accurate: factual claim noting the match
- If prediction was wrong: factual claim noting the delta (predicted X, actual Y)
- If prediction was partially right: estimate claim with the refined numbers
Compute accuracy scorecard:
- Group by evidence tier: what % of stated vs web vs documented vs tested claims were accurate?
- Group by claim type: are estimates less accurate than factual claims?
- This validates whether the evidence tier system is predictive
Run wheat_compile.

Print scorecard:

Calibration results:
Predictions scored: <N>
Accurate: <N> (<percent>)
Partially accurate: <N>
Wrong: <N>

Accuracy by evidence tier:
  stated: <percent>
  web: <percent>
  documented: <percent>
  tested: <percent>

Next steps:
  /brief              -- recompile with calibrated data
  /research <topic>   -- investigate where predictions went wrong

calibrate

Popularity

Invocation

Context Preview

SKILL.md

calibrate

Popularity

Invocation

Context Preview

SKILL.md

/calibrate -- Score predictions vs outcomes

Arguments

Instructions

Similar Skills

/calibrate -- Score predictions vs outcomes

Arguments

Instructions

Similar Skills