From tailored-reviewer
This skill should be used when the user asks to "backtest review skills", "test detection rate", "バックテスト", "レビュースキルをテスト", "検出率を測定", "過去のバグで検証", or wants to verify that generated review skills can detect known bugs by replaying historical states.
How this skill is triggered — by the user, by Claude, or both
Slash command
/tailored-reviewer:backtestThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Test generated review skills against historical bugs by replaying the codebase
Test generated review skills against historical bugs by replaying the codebase state at the time each bug was introduced. Measures both recall (did we catch known bugs?) and precision (were our findings validated by subsequent fixes?).
Prerequisites:
When invoked with --add-case or when the user wants to add a test case:
Prompt for:
Append to backtest/test-cases.md:
### Case [N]: [brief title]
- **Source**: [PR/JIRA/Sentry/postmortem reference]
- **Bug commit**: [hash]
- **Fix commit**: [hash] (if available)
- **Description**: [what the bug was]
- **Expected perspective**: [which perspective should detect it]
- **Added**: [date]
When importing from interview data:
For each test case in backtest/test-cases.md:
Checkout: cd workspace && git checkout {bug_commit} (the state WITH the bug)
Generate diff: git diff {bug_commit}~1 {bug_commit} (the buggy change)
Execute review: Run the review entry point (review-* skill in .claude/skills/) against this diff, with the --backtest context flag. The consolidation step will save the review to reviews/ as usual — backtest does NOT change the review output location.
Evaluate detection (recall): Did any finding match the known bug?
Evaluate precision (forward validation): For findings that DON'T match the known bug:
git log {bug_commit}..{default_branch} -- {file_path}Restore: cd workspace && git checkout {default_branch}
IMPORTANT: In step 1, we checkout {bug_commit} (NOT {bug_commit}~1). The workspace must contain the buggy code so that the orchestrator's Phase 1.5 fact-check (workspace verification) can confirm the bug exists. If the workspace were at {bug_commit}~1, the buggy code wouldn't exist in workspace and all findings would be falsely dropped.
The review itself is saved to reviews/ by the orchestrator (same as any normal review).
The backtest evaluation (recall/precision analysis) is written separately to backtest/results/YYYY-MM-DD-{target}.md:
# Backtest Results: [date]
## Summary
- Test cases: N
- Detected (recall): N/N (X%)
- Partial: N (X%)
- Missed: N (X%)
- Additional findings: N
- Validated by subsequent fixes: N (X%)
- Unvalidated: N
## Per-Case Results
### Case [N]: [title]
- **Known bug result**: detected / partial / missed
- **Detecting perspective**: [which perspective found it, if any]
- **Finding**: [the relevant finding, if any]
- **Notes**: [why it was missed, if applicable]
- **Additional findings**: N
- Validated: [list findings that were later fixed, with fix commit hash]
- Unvalidated: [list findings with no subsequent fix]
## Analysis
### Recall (known bug detection)
[Perspectives or bug types with low detection rate]
### Precision (forward validation)
[Rate of findings validated by subsequent fixes]
[High validation rate = review is finding real issues]
[Low validation rate = review may be producing noise]
### Recommendations
[Specific suggestions for skill improvement based on misses and validation rates]
backtestの結果からMISS/Partialを分析し、backtest/learnings.md に構造化して追記する。
このファイルは build-skills と update-skills が読み込み、スキル生成に反映する。
各 MISS または Partial match について:
根本原因分析: なぜ検出できなかったか?
パターン抽出: 再利用可能な検出ルールに変換
追記: backtest/learnings.md に以下の形式で追記
### Learning [N]: [パターン名]
- **Source**: backtest [date], Case [N] (MISS/Partial)
- **Bug**: [何が起きたか]
- **Root cause**: [なぜ検出できなかったか]
- **Check to add**: [具体的に何をチェックすべきか]
- **Target perspective**: [どのパースペクティブに追加すべきか]
- **Pattern type**: code-symmetry / state-transition / boundary-check / ...
- **Added**: [date]
既存の learning と重複する場合は追記しない。
A review system with high recall AND high precision is genuinely useful — it catches known bugs and also surfaces issues that developers independently recognized and fixed.
Compare with previous backtest results to track improvement over time.
npx claudepluginhub suzuki0keiichi/claude-plugins-suzuki0keiichi --plugin tailored-reviewerReviews diffs, PRs, and agent output for bugs, security issues, mocks, and code quality. Automates codebase audits with domain-specific checks and deep scanning.
Improves regression detection by learning from historical patterns, prioritizing tests by risk, analyzing root causes, and predicting code change failures.
Orchestrates a three-phase adversarial code review with isolated agents (Hunter, Skeptic, Referee) to eliminate sycophancy and produce high-fidelity bug reports. Use for thorough code review, bug hunting, security audits.