From tribunal
Intelligent routing of tasks to the best CLI tool based on historical performance data, weighted decay scoring, and failure pattern matching.
How this skill is triggered — by the user, by Claude, or both
Slash command
/tribunal:agent-selectionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Intelligent routing of tasks to the best CLI tool based on historical performance data, weighted decay scoring, and failure pattern matching.
Intelligent routing of tasks to the best CLI tool based on historical performance data, weighted decay scoring, and failure pattern matching.
When multiple CLI tools (Gemini CLI, OpenAI Codex, Claude Code, etc.) are available, this skill determines which tool is most likely to succeed for a given task type and tag set. It replaces static ordering with data-driven selection that adapts over time.
Scoring parameters are read from tribunal.yaml under the scoring key:
scoring:
min_samples: 3 # Minimum observations before using data-driven scoring
decay_rate: 0.1 # Exponential decay rate (lambda) per day
failure_penalty_window: 14 # Days to look back for failure patterns
failure_penalty_step: 0.15 # Score reduction per matching failure
failure_penalty_floor: 0.3 # Minimum penalty multiplier
static_priority:
- gemini
- codex
- claude
For tools with at least min_samples observations:
raw_score = sum(outcome_i * e^(-lambda * age_i)) / sum(e^(-lambda * age_i))
confidence = weight_total / (weight_total + 1)
score = confidence * raw_score + (1 - confidence) * 0.5
Where:
outcome_i is 1 for success, 0 for failureage_i is days since the observation timestamplambda is decay_rate from configAfter computing the base score, apply a penalty for recent failures matching the current task tags:
failure_penalty_window daysmax(failure_penalty_floor, 1.0 - count * failure_penalty_step)When a tool has fewer than min_samples observations, its score is derived from its position in static_priority:
score = (total_tools - position_index) / total_tools
Position 0 (first in list) gets the highest score.
Task arrives with type + tags
|
v
Load tribunal-stats.jsonl
|
v
For each available tool:
|
+-- samples < min_samples? --> Use static priority score
|
+-- samples >= min_samples? --> Compute weighted decay score
| |
| v
| Apply failure pattern penalty
| |
v v
Collect all scores
|
v
Sort descending by score
|
v
Return ranked list
Each task execution appends a JSONL entry to tribunal-stats.jsonl:
{
"timestamp": "2026-03-08T12:00:00Z",
"tool": "codex",
"task_type": "implementation",
"success": true,
"tags": ["database", "migration"],
"failure_reason": null,
"failure_category": null,
"tool_self_report": "completed in 45s, all tests pass"
}
Fields:
node lib/agent-scorer.js \
--stats tribunal-stats.jsonl \
--task-type implementation \
--available "gemini,codex,claude" \
--static-priority "gemini,codex,claude" \
--min-samples 3 \
--decay-rate 0.1 \
--task-tags "database,async"
{
"ranking": [
{ "tool": "gemini", "score": 0.92, "basis": "weighted_decay", "samples": 12 },
{ "tool": "codex", "score": 0.71, "basis": "weighted_decay", "samples": 8 },
{ "tool": "claude", "score": 0.67, "basis": "static", "samples": 1 }
],
"task_type": "implementation",
"timestamp": "2026-03-08T12:00:00.000Z"
}
const { scoreTools, loadStats } = require('./lib/agent-scorer');
const stats = loadStats('tribunal-stats.jsonl');
const ranking = scoreTools(
stats,
'implementation',
['gemini', 'codex', 'claude'],
['gemini', 'codex', 'claude'],
3, // minSamples
0.1, // decayRate
['database', 'async'] // taskTags
);
npx claudepluginhub jpeggdev/buildwithjpegg --plugin tribunalContains evaluation data for measuring skill selection accuracy, including direct, negative, context-dependent, and cascade-ordering tests.
Ask the AgentDB bandit which RL algorithm / skill / pattern fits the current task best. Use at task start when there are multiple plausible approaches and you want the data-driven pick.
Scores candidate agent actions by utility (gain minus step cost, uncertainty, redundancy) to guide tool calls, delegation, verification, and termination in LLM orchestration.