Skill

run-performance-calibration

Facilitates cross-manager calibration sessions to align performance ratings across teams, reduce bias, and ensure fair compensation decisions.

developer-tools

Popularity

Stars

Forks

Shared by

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/grimoire:run-performance-calibration

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Facilitate a cross-manager calibration session where each manager's ratings are reviewed against a shared standard — to correct for individual rater bias, surface promotion candidates, and produce fair performance outcomes across the team.

SKILL.md

151 lines · ~2.6k tokens

Stats

LanguageShell

Stars12

Forks1

MaintenanceExcellent

Last CommitJun 17, 2026

Actions

View Source View Plugin View on GitHub View README

Run Performance Calibration

Why This Is Best Practice

Adopted by: Google, Amazon, Microsoft, Meta, and Salesforce all run formal calibration sessions as part of their annual performance cycle; Google's re:Work program documents calibration as a required step before performance ratings are finalized; SHRM's performance management best practice guide identifies calibration as the primary mechanism for reducing rater bias at scale; the US federal government mandates performance calibration under OPM guidelines for Senior Executive Service ratings Impact: A 2019 Deloitte human capital research study found that organizations using structured calibration sessions reduced rating distribution skew by 35% compared to organizations relying on individual manager ratings alone; Google's internal research (published in re:Work) found that calibrated ratings produced 28% less variance in performance scores for comparable performers across different managers; uncalibrated ratings systematically disadvantage employees whose managers are lenient raters, those working in less visible roles, and those on teams with harsher managers Why best: Without calibration, performance ratings measure both the employee's performance and the manager's rating tendencies — a harsh rater's strong performers receive the same ratings as a lenient rater's average performers; this inequity flows directly into compensation and promotion decisions, creating retention risk among the best performers who are rated fairly only in relative terms, and advancement inequity among employees in harsher rating environments

Sources: Google re:Work "Manager Training: Performance Calibration"; Deloitte "Performance Management: Playing a Winning Hand" (2019); SHRM "Calibration Meetings: Best Practices" (shrm.org); Buckingham & Goodall "Reinventing Performance Management" (Harvard Business Review, 2015)

Steps

1. Set the calibration objective before the session

Calibration sessions without a clear objective drift into individual performance debates. Define upfront what decisions this calibration is informing:

Calibration objective: Align performance ratings for the [team/department] 
before [compensation decisions / promotion decisions / review finalization].
Specific decisions:
- Confirm final rating distribution (if applicable)
- Surface promotion candidates for [role/level]
- Flag cases requiring additional management discussion

Distribute the objective in writing before the session so managers can prepare evidence, not just opinions.

2. Pre-calibration: managers submit ratings with evidence summaries

Before the session, each manager submits for each of their direct reports:

Proposed performance rating (using the agreed scale)
3–5 bullet points of specific evidence supporting the rating
Any flag: "promotion consideration," "improvement needed," "exceptional case"

Evidence-first submission forces managers to anchor on observed behavior, not impressions. It also surfaces managers who cannot articulate specific evidence for their ratings — a leading indicator of rater bias.

Submit ratings before the session, not during it. In-session rating assignment is dominated by the loudest voice in the room.

3. Prepare the calibration view

Before the session, compile all ratings into a single distribution view:

Rating distribution across [team/department]:
Outstanding:       X employees (Y%)
Exceeds:           X employees (Y%)
Meets:             X employees (Y%)
Partially meets:   X employees (Y%)
Does not meet:     X employees (Y%)

Outlier flags:
- [Manager A]: 80% of team rated Outstanding (above team average of 20%)
- [Manager B]: 0% rated Outstanding (below team average)

The distribution view surfaces rater bias immediately. Individual ratings are defended; distributions are harder to rationalize.

4. Run the calibration session

Calibration sessions run 2–3 hours for groups of 5–10 managers. Larger groups require breakout calibration groups with a synthesis session.

Facilitation protocol:

Open with calibration anchors (15 min): Align on what each rating level means using behavioral anchors, not just label definitions:

"Outstanding": demonstrated exceptional impact significantly above role expectations, 
with specific outcomes that would not have happened without this person's contribution.

"Meets expectations": delivered on all core responsibilities, achieved agreed goals, 
no significant gaps in execution or behavior.

If managers are using "Meets" to mean different things, ratings are incomparable. Anchor alignment is the most important 15 minutes of the session.

Manager-by-manager review (60–90 min): For each manager, briefly review their team's distribution:

Does the distribution look consistent with what other managers in similar contexts produced?
Are there outliers in either direction that require discussion?

Focus the group discussion on: ratings at the boundaries (the Outstanding/Exceeds boundary; the Meets/Partially Meets boundary), and employees being considered for promotion or performance action.

Discussion protocol for disputed ratings:

Manager presents: "I rated [name] Outstanding because [3 specific evidence points]."
Group can challenge: "That level of impact — can you compare it to someone we all 
agreed was Outstanding last cycle?"
Manager responds with evidence or adjusts.

Never allow discussion about the person's personality, potential, or anecdotes — only observable performance evidence in the review period.

Bias check questions (ask throughout):

"Is this rating based on this review period's performance or career trajectory?"
"Would we rate this the same if the employee worked on [another team]?"
"Is there a recency effect — does the last 6 weeks of their year dominate the rating?"
"Have we applied the same standard to comparable employees across different managers?"

Promotion discussion (30 min, if applicable): Surface employees nominated for promotion. For each:

Presenting manager makes the case with specific evidence
Group calibrates: "Is this readiness we're seeing from this person, or is this manager advocacy?"
Group reaches a recommendation: promote, develop for 1 cycle, or not ready

5. Document outcomes and communicate back

After the session:

Finalize rating decisions and confirm who made any adjustments and why
Document promotion recommendations and next steps
Share the calibrated distribution with HR before any communication to employees

Managers communicate ratings to employees individually. They do not disclose the calibration process details — they share the final rating and the evidence: "Your rating is [X] because [specific evidence]."

6. Use calibration data to improve future cycles

Calibration produces organizational data, not just individual ratings:

Which managers consistently rate above or below the distribution? Flag for coaching or rater training.
Are women or underrepresented groups rated differently than comparable performers? Run demographic analysis on the final distribution.
What did top performers across all managers have in common? Use this to sharpen the success profile for the role.

Rules

Submit ratings and evidence before the session — in-session rating assignment is dominated by social dynamics, not evidence
Anchor on behavioral evidence only — personality observations, potential assessments, and likability are not performance evidence
Fix the distribution view before individual debates — rater consistency problems are visible at the distribution level; address them there before defending individual ratings
Calibration adjustments are normal, not shameful — the goal is accuracy; a manager whose rating is adjusted upward or downward has received useful calibration data, not a critique
Keep deliberations confidential — employees should not know how their rating was debated in calibration; share only the final rating and the evidence

Common Mistakes

Calibrating without behavioral anchors: "Meets expectations" means wildly different things to different managers; anchor alignment at the start is mandatory, not optional.
Reviewing every employee equally: 90% of calibration time should go to boundary cases and promotion/action decisions; reviewing solid mid-range performers adds no value and consumes the time budget.
Letting the loudest manager dominate: a single persuasive advocate can shift the group's calibration away from evidence; the facilitator must enforce evidence-only discussion.
Running calibration as a rubber stamp: if ratings never change in calibration, the session is performative; calibration's value is in the adjustments it produces.
Not doing demographic analysis: if the calibrated distribution shows systematic underrating of a demographic group, the calibration failed its purpose; run the analysis as a standard closing step.

When NOT to Use

For teams of fewer than 5 employees reporting to a single manager — calibration requires cross-manager comparison; a single-manager review is a one-on-one discussion with HR, not a calibration.
When there is no agreed rating scale or shared performance expectations — calibration aligns application of a standard; it cannot create the standard; define the performance framework first.

run-performance-calibration

Popularity

Invocation

Context Preview

SKILL.md

run-performance-calibration

Popularity

Invocation

Context Preview

SKILL.md

Run Performance Calibration

Why This Is Best Practice

Steps

1. Set the calibration objective before the session

2. Pre-calibration: managers submit ratings with evidence summaries

3. Prepare the calibration view

4. Run the calibration session

5. Document outcomes and communicate back

6. Use calibration data to improve future cycles

Rules

Common Mistakes

When NOT to Use

Similar Skills

Run Performance Calibration

Why This Is Best Practice

Steps

1. Set the calibration objective before the session

2. Pre-calibration: managers submit ratings with evidence summaries

3. Prepare the calibration view

4. Run the calibration session

5. Document outcomes and communicate back

6. Use calibration data to improve future cycles

Rules

Common Mistakes

When NOT to Use

Similar Skills