From grimoire
Facilitates cross-manager calibration sessions to align performance ratings across teams, reduce bias, and ensure fair compensation decisions.
How this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:run-performance-calibrationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Facilitate a cross-manager calibration session where each manager's ratings are reviewed against a shared standard — to correct for individual rater bias, surface promotion candidates, and produce fair performance outcomes across the team.
Facilitate a cross-manager calibration session where each manager's ratings are reviewed against a shared standard — to correct for individual rater bias, surface promotion candidates, and produce fair performance outcomes across the team.
Adopted by: Google, Amazon, Microsoft, Meta, and Salesforce all run formal calibration sessions as part of their annual performance cycle; Google's re:Work program documents calibration as a required step before performance ratings are finalized; SHRM's performance management best practice guide identifies calibration as the primary mechanism for reducing rater bias at scale; the US federal government mandates performance calibration under OPM guidelines for Senior Executive Service ratings Impact: A 2019 Deloitte human capital research study found that organizations using structured calibration sessions reduced rating distribution skew by 35% compared to organizations relying on individual manager ratings alone; Google's internal research (published in re:Work) found that calibrated ratings produced 28% less variance in performance scores for comparable performers across different managers; uncalibrated ratings systematically disadvantage employees whose managers are lenient raters, those working in less visible roles, and those on teams with harsher managers Why best: Without calibration, performance ratings measure both the employee's performance and the manager's rating tendencies — a harsh rater's strong performers receive the same ratings as a lenient rater's average performers; this inequity flows directly into compensation and promotion decisions, creating retention risk among the best performers who are rated fairly only in relative terms, and advancement inequity among employees in harsher rating environments
Sources: Google re:Work "Manager Training: Performance Calibration"; Deloitte "Performance Management: Playing a Winning Hand" (2019); SHRM "Calibration Meetings: Best Practices" (shrm.org); Buckingham & Goodall "Reinventing Performance Management" (Harvard Business Review, 2015)
Calibration sessions without a clear objective drift into individual performance debates. Define upfront what decisions this calibration is informing:
Calibration objective: Align performance ratings for the [team/department]
before [compensation decisions / promotion decisions / review finalization].
Specific decisions:
- Confirm final rating distribution (if applicable)
- Surface promotion candidates for [role/level]
- Flag cases requiring additional management discussion
Distribute the objective in writing before the session so managers can prepare evidence, not just opinions.
Before the session, each manager submits for each of their direct reports:
Evidence-first submission forces managers to anchor on observed behavior, not impressions. It also surfaces managers who cannot articulate specific evidence for their ratings — a leading indicator of rater bias.
Submit ratings before the session, not during it. In-session rating assignment is dominated by the loudest voice in the room.
Before the session, compile all ratings into a single distribution view:
Rating distribution across [team/department]:
Outstanding: X employees (Y%)
Exceeds: X employees (Y%)
Meets: X employees (Y%)
Partially meets: X employees (Y%)
Does not meet: X employees (Y%)
Outlier flags:
- [Manager A]: 80% of team rated Outstanding (above team average of 20%)
- [Manager B]: 0% rated Outstanding (below team average)
The distribution view surfaces rater bias immediately. Individual ratings are defended; distributions are harder to rationalize.
Calibration sessions run 2–3 hours for groups of 5–10 managers. Larger groups require breakout calibration groups with a synthesis session.
Facilitation protocol:
Open with calibration anchors (15 min): Align on what each rating level means using behavioral anchors, not just label definitions:
"Outstanding": demonstrated exceptional impact significantly above role expectations,
with specific outcomes that would not have happened without this person's contribution.
"Meets expectations": delivered on all core responsibilities, achieved agreed goals,
no significant gaps in execution or behavior.
If managers are using "Meets" to mean different things, ratings are incomparable. Anchor alignment is the most important 15 minutes of the session.
Manager-by-manager review (60–90 min): For each manager, briefly review their team's distribution:
Focus the group discussion on: ratings at the boundaries (the Outstanding/Exceeds boundary; the Meets/Partially Meets boundary), and employees being considered for promotion or performance action.
Discussion protocol for disputed ratings:
Manager presents: "I rated [name] Outstanding because [3 specific evidence points]."
Group can challenge: "That level of impact — can you compare it to someone we all
agreed was Outstanding last cycle?"
Manager responds with evidence or adjusts.
Never allow discussion about the person's personality, potential, or anecdotes — only observable performance evidence in the review period.
Bias check questions (ask throughout):
Promotion discussion (30 min, if applicable): Surface employees nominated for promotion. For each:
After the session:
Managers communicate ratings to employees individually. They do not disclose the calibration process details — they share the final rating and the evidence: "Your rating is [X] because [specific evidence]."
Calibration produces organizational data, not just individual ratings:
npx claudepluginhub jeffreytse/grimoire --plugin grimoireGenerates templates for self-assessments, manager performance reviews, and calibration prep. Useful for review cycles, feedback structuring, and goal setting.
Designs a structured performance review system including cadence, rating dimensions, calibration, and automation. Useful for building or redesigning employee evaluation frameworks.