From skills-for-humanity
Tests whether stated confidence levels match available evidence, catching overconfidence and underconfidence. Guides users through evidence auditing, failure mode analysis, and frequency testing to produce calibrated confidence estimates.
How this skill is triggered — by the user, by Claude, or both
Slash command
/skills-for-humanity:s4h-probability-confidence-calibrationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Overconfidence is the most documented and costly bias in judgment. People who say they are 90% confident are right far less than 90% of the time. But underconfidence is also costly — excessive hedging prevents commitment and action when evidence is actually sufficient. Calibration is not about being less confident; it is about having confidence levels that match the available evidence.
Overconfidence is the most documented and costly bias in judgment. People who say they are 90% confident are right far less than 90% of the time. But underconfidence is also costly — excessive hedging prevents commitment and action when evidence is actually sufficient. Calibration is not about being less confident; it is about having confidence levels that match the available evidence.
Step 1: State the Claim and Current Confidence Name the specific claim — not a vague domain but a specific, falsifiable statement — and state the current confidence level as a percentage.
Framing check: Confirm the specific claim and confidence level before continuing. State what you've identified — the actual claim being evaluated and its stated confidence percentage — in one sentence, then use AskUserQuestion:
Step 2: Audit Supporting Evidence List the evidence supporting the claim. For each piece: classify its strength:
Step 3: List Counter-Evidence and Gaps What evidence exists against the claim? What would you expect to see if the claim were true that you do not see? What have you not checked that bears on the claim?
Step 4: Identify the Most Likely Failure Mode If this claim is wrong, what is the most probable reason? Is that failure mode being appropriately weighted in the current confidence assessment, or is it being minimized?
Step 5: Apply the Frequency Test At this confidence level, across many similar judgments, you would be right X% of the time. Does that feel right given the quality of evidence? This frequentist reframe often corrects overconfidence.
Step 6: State Calibrated Confidence Adjust the confidence level to reflect the evidence quality, gaps, and failure mode weight. State direction of adjustment and reason.
Before proceeding, use the AskUserQuestion tool. State your interpretation of the situation in 1–2 sentences — what is being analyzed and what the core question is — then ask:
Proceed based on their selection. If the user reframes, incorporate the correction before running any analysis.
Claim: [specific, falsifiable statement]
Original Confidence: [%]
Evidence Quality Audit
| Evidence | Type (observation/inference/anecdote/assumption) | Strength |
|---|---|---|
Counter-Evidence and Gaps: [what works against the claim or has not been checked]
Most Likely Failure Mode: [if wrong, why — and is it being weighted correctly?]
Calibrated Confidence: [%] — [direction: raised/lowered/unchanged + one-sentence rationale]
A well-calibrated person is not one who is always uncertain — they are confident when evidence is strong and uncertain when it is weak. The goal is accuracy of confidence, not uniformly lower confidence.
After delivering this output, use AskUserQuestion to offer the next move:
/s4h-probability-scenario-weighting — Weight scenarios with calibrated confidence levels/s4h-decision-premortem-analysis — Stress-test with calibrated risk estimates/s4h-decision-reversibility-analysis — Assess how reversibility changes given this uncertainty levelnpx claudepluginhub human-avatar/skills-for-humanityRoutes probabilistic thinking to the right skill: base-rate anchoring, confidence calibration, expected value, or scenario weighting. Activates on queries about probability, likelihood, and uncertainty.
Detects and removes cognitive biases from reasoning using Julia Galef's Scout Mindset framework. Provides reversal tests, scope sensitivity checks, status quo bias tests, confidence interval audits, and full bias audits.
Calibrates AI confidence to evidence, flagging uncertainty and limitations before presenting conclusions. Useful when accuracy matters or knowledge is partial.