From decision-analytics-toolkit
Close the loop on past decisions: record the assumptions, predicted ranges, and tripwires at decision time, then later review whether they held up and whether the reasoning was sound. Use this skill whenever someone wants to track a decision over time, journal a decision and its rationale, follow up on an earlier call, check whether forecasts or estimates came true, run a post-mortem or after-action review, separate a good decision from a lucky outcome, or measure how well-calibrated their predictions are. Trigger on phrases like "did my estimate hold up," "review that decision," "track this over time," "decision journal," "follow up in six months," "was I right," "post-mortem," "after-action review," "calibration," "how good are my forecasts," or any request to capture a decision now for later scoring, or to score a decision made earlier — including the predictions, assumptions, and Monte Carlo / scenario outputs from the other toolkit skills.
How this skill is triggered — by the user, by Claude, or both
Slash command
/decision-analytics-toolkit:decision-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Most decision tools stop at the moment of choice. This one covers what comes after: did
Most decision tools stop at the moment of choice. This one covers what comes after: did the estimates and assumptions actually hold, and — separately — was the reasoning sound? That distinction is the whole point. A good decision can have a bad outcome (you took a sensible bet and got unlucky) and a bad decision can have a good outcome (you got lucky). Judging decisions only by results ("resulting") teaches the wrong lessons. Reviewing the process, the assumptions, and the predicted ranges teaches the right ones — and tracked across many decisions, it tells you whether your confidence is honest (calibration).
This skill closes the loop for the whole toolkit: the ranges from a Monte Carlo run, the probabilities from a scenario analysis, the assumptions from a Key Assumptions Check, and the tripwires from an Indicators & Warnings list all become things you can score later.
Create a record with scripts/decision_record.py --new --out <file>.json, then fill it in
(or build it directly from another skill's output). Store records together in one folder
(e.g. decisions/) so calibration can aggregate them later.
What makes a record reviewable later is falsifiable predictions and assumptions — each with a resolution date and a clear test for "did this hold?" Pull these straight from the other skills:
Good predictions are specific, measurable, and dated. "The market will do fine" can't be
scored; "10-year value between $61k and $129k (80% interval)" and "≥83% chance of beating
3% inflation by 2036" can. Validate a finished record with
scripts/decision_record.py <file>.json, which also lists which predictions are due.
The review dates live in the record's review_schedule. Resurface them two ways
(use either or both):
find_free_time / availability where useful so the review
lands on a real working block. If no calendar is connected, suggest connecting one, or
fall back to the in-app reminder.When a decision has natural milestone dates (a contract renewal, a launch, a forecast
horizon), schedule reviews to those dates rather than arbitrary intervals — the record's
resolves_on fields are the natural anchors.
At the review date, fill the record's prediction actual values and add a review entry
(the script prints the template), then run
scripts/score_review.py <file>.json. It reports, for that decision:
Then do the judgment the script can't: grade the process separately from the outcome. Ask "given only what I knew then, was this a reasonable choice?" Record that grade, the lessons, and concrete actions (rebalance, revise a model input, change a weight, update a belief). The aim is to improve future decisions, not to assign blame for variance.
Point scripts/score_review.py --calibrate <folder>/ at a folder of reviewed records to
aggregate: overall interval hit-rate (is ~80% of reality landing in your 80% intervals?),
mean Brier score, and a calibration table bucketing your stated probabilities against how
often those predictions came true. The classic finding is overconfidence — intervals too
narrow, high-confidence calls missing more than they should. If you see it, widen your
ranges and discount your certainty; that single correction improves every future decision
the toolkit helps you make.
Resulting is the enemy. Always separate decision quality from outcome quality. Reward sound process even when the result disappointed; scrutinize lucky wins.
Only falsifiable claims teach. If a prediction or assumption can't be clearly judged true or false later, rewrite it until it can, or drop it.
Update, don't rationalize. A broken assumption or a missed interval is information, not failure. The value is in changing your model and your calibration, not in defending the original call.
Keep records together and revisit on cadence. Calibration only emerges across many decisions, so store records in one place and actually return to them.
For a capture, deliver the saved record and offer to schedule the review (reminder and/or calendar event). For a review, deliver the scored results, the process-vs-outcome judgment, and a short lessons-and-actions list — and offer to save it back into the record and to update any downstream model (e.g. re-run the Monte Carlo with corrected assumptions). For a calibration pass, deliver the aggregate hit-rate, Brier score, and calibration table with a plain-language read of where confidence needs adjusting.
npx claudepluginhub jdstanhope/decision-analytics-marketplace --plugin decision-analytics-toolkitProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.