From tp-sadd
Execute tasks with meta-judge verification: single-task, sequential-steps, parallel-targets, or competitive generation with quality gates
How this skill is triggered — by the user, by Claude, or both
Slash command
/tp-sadd:sadd-execute Task description [--mode single|steps|parallel|competitive] [--files f1,f2]When to use
When user says 'execute this', 'implement with verification', 'meta-judge this', 'run the task', 'build this with quality gates', 'implement [something]'. IMMEDIATELY when user asks to implement anything that needs independent verification. FIRST when task requires parallel implementation, sequential steps, or competitive generation. DO NOT use when the task is a simple one-liner needing only basic implementation — use a basic sub-agent dispatch instead. DO NOT use when you need to delegate work to a sub-agent without quality verification — use sadd-dispatch instead.
Task description [--mode single|steps|parallel|competitive] [--files f1,f2]The summary Claude sees in its skill listing — used to decide when to auto-load this skill
IF single self-contained task needs implementation with quality verification → single mode: parallel meta-judge + implementor, then judge with retry loop
IF single self-contained task needs implementation with quality verification → single mode: parallel meta-judge + implementor, then judge with retry loop IF task decomposes into ordered steps where each depends on previous → sequential mode: decompose into steps with per-step meta-judge + judge IF multiple independent targets that can execute simultaneously → parallel mode: independence validation, requirement grouping, parallel dispatch, isolated retries IF high-stakes best-of-N where quality matters more than speed → competitive mode: 3 generators + meta-judge in parallel, 3 judges, adaptive strategy IF task is trivial (no verification needed) → use a simple subagent dispatch without verification overhead IF retries exceed max without passing any mode → escalate to user with failure analysis
Execute tasks using the meta-judge → implement → judge → retry pattern. The orchestrator dispatches, never implements. Four execution modes share this core loop but differ in decomposition strategy, concurrency model, and retry policy.
The orchestrator dispatches, never implements. Reading files, writing code, or running tools directly violates separation of concerns. The orchestrator's job is coordination, not execution. Quality verification uses independent judges with fresh context to prevent confirmation bias.
See the meta-judge evaluation pattern documentation for the core loop, YAML specification structure, threshold scoring, and critical constraints.
| Profile | Model |
|---|---|
| Complex reasoning (architecture, design, critical decisions) | Opus |
| Medium complexity, patterned work | Sonnet |
| Simple transformations | Haiku |
| Default (uncertain) | Opus |
For parallel modes (parallel, competitive), use the same model tier for all concurrent agents.
Execute one self-contained task with parallel meta-judge + implementor dispatch, then judge verification and retry loop.
When to use: Single, self-contained task that produces a coherent output. Not for multi-step workflows or parallel targets.
Process:
Decompose a complex task into ordered, dependent subtasks with per-step meta-judge + judge and context passing between steps.
When to use: Complex tasks with natural decomposition boundaries and dependency ordering. Not for branching or parallel paths.
Decomposition patterns:
| Task Type | Decomposition |
|---|---|
| Interface change | Interface → Implementation → Consumers → Tests |
| Feature addition | Core logic → Integration → API layer |
| Refactoring | Extract/modify core → Internal references → External references |
| Multi-layer change | Data layer → Business layer → API layer → Client layer |
Process:
Task Decomposition — Identify natural boundaries and dependencies. Output a decomposition table with step number, description, dependencies, complexity, type, and expected output. Include a dependency graph showing sequencing.
Per-Step Execution — For each step in order:
Final Summary — Report task, total steps, per-step results (model, judge score, retries, status), files modified, key decisions, verification summary. Judge reports in .specs/reports/.
Error handling: Max retries = STOP and escalate with failure analysis. Context missing = re-examine previous step output or dispatch clarification sub-agent. Step conflicts = stop, analyze decomposition correctness, options include re-ordering, combining, or adding a reconciliation step.
Execute multiple independent tasks simultaneously with requirement grouping to minimize agent count, plus isolated retries per target.
When to use: Multiple independent targets with no shared files or state. Not for interdependent tasks.
Process:
Target Identification — Extract targets from --files, --targets, or infer from task description.
Independence Validation — Verify: no shared files between targets, no target reads another's output, no shared mutable state, execution order does not matter. If any check fails, inform user and recommend sequential mode.
Requirement Grouping — Group tasks to minimize agent count:
| Grouping | When | Meta-Judges | Judges |
|---|---|---|---|
| REPEATABLE | Same task applied to different targets | ONE shared (reusable spec, generic language) | One per target, SAME reusable spec |
| SHARED | Interdependent tasks reviewed together | ONE combined (covers all tasks) | ONE for entire group, combined spec |
| INDEPENDENT | Fully separate, no grouping benefit | One per task | One per task |
Decision rule: Default to INDEPENDENT when uncertain. Over-grouping risks incorrect evaluation specs. Implementation agents are always isolated — one per task, never shared.
Meta-Judge Dispatch (ALL in parallel) — Launch one meta-judge per group/independent task. Launch each implementor immediately after its meta-judge completes (do not wait for all meta-judges).
Parallel Implementation — Launch ALL implementation agents in a single message. Each with CoT prefix, task body (target-specific), and self-critique suffix. Each ends with Summary section.
Judge Verification — After ALL implementors complete, dispatch judges per grouping. Include "Pre-existing or Expected Parallel Changes" section. Parse only structured headers.
Retry Loop — Isolated retries per target (max 3). For SHARED groups: re-launch only the failing implementor(s), then re-launch shared judge against ALL changes (passed + retried). Failed targets are isolated and do not affect other targets.
Output Summary — Per-target results (grouping, model, judge score, retries, status), overall completion stats, files modified, any failed targets with options.
Generate 3 competing solutions in parallel, evaluate with 3 judges, then adaptively select the best strategy for final output.
When to use: High-stakes tasks where quality matters more than speed or cost. Not when a single good solution suffices.
Process:
Competitive Generation + Meta-Judge — Launch 4 agents in a single message. Meta-judge first in dispatch order:
Multi-Judge Evaluation — Launch 3 judges in parallel. Each receives ALL candidate solution paths and the EXACT meta-judge specification YAML. Each produces structured report with: VOTE (preferred solution), SCORES per solution, CRITERIA scores, and evidence-based justification. Parse only structured headers.
Adaptive Strategy Selection — Analyze judge vote headers:
| Condition | Strategy | Action |
|---|---|---|
| Unanimous vote (all 3 prefer same) | SELECT_AND_POLISH | Polish winner with targeted improvements from judge feedback. Cherry-pick 1-2 elements from runner-ups if praised. |
| All avg scores < 3.0 | REDESIGN | Analyze failure modes across all solutions, extract lessons learned, regenerate with new constraints and guidance |
| Split decision (no unanimous, scores >= 3.0) | FULL_SYNTHESIS | Proceed to Phase 4 |
Synthesis (FULL_SYNTHESIS only) — Launch one synthesis agent receiving ALL candidate solutions and ALL evaluation reports. Copies superior sections when one solution wins, combines approaches when hybrid is better, fixes identified issues. Documents every decision with rationale. Must create something new, not rewrite entirely.
Output artifacts:
| Artifact | Location |
|---|---|
| Candidate solutions | {solution}.[a|b|c].{ext} |
| Evaluation reports | .specs/reports/{name}-{date}.[1|2|3].md |
| Final solution | {output_path} |
The meta-judge only needs the task description — not the implementation output — to generate evaluation criteria. Running both in parallel saves one round-trip per task or step without sacrificing judge quality. This compounds across multiple steps or targets.
The evaluation criteria for a task are invariant. If the criteria changed between attempts, scores would be incomparable and the retry agent would aim at a moving target. Reusing the same spec ensures consistent measurement across all attempts.
The judge must evaluate the work without being influenced by the reasoning that produced it. A fresh sub-agent with only the work and criteria provides unbiased assessment, catching blind spots the implementation agent's self-critique missed.
Three is the minimum for meaningful diversity and tie-breaking. Two solutions can differ without indicating which approach is better. Three provides enough variety to cover different solution-space regions while keeping agent count manageable. Five shows diminishing returns.
Full synthesis is the most expensive phase. When judges unanimously agree, synthesis wastes cost and risks degrading a superior solution by blending in inferior elements. Adaptive selection saves ~15-20% on average.
A repeatable task applied to 5 targets needs 1 meta-judge, not 5. A shared group of 2 interdependent tasks needs 1 judge, not 2. Grouping reduces total agent count by 30-60% without reducing evaluation quality.
Each step evaluates different criteria: interfaces check correctness and completeness; callers check consistency and lack of regressions. A single specification cannot capture these different requirements.
Sharing an implementation agent between targets would require it to hold multiple file contexts in a single window, defeating context isolation. Meta-judges and judges can be shared because they evaluate, not implement — their contexts are about criteria, not files.
In parallel execution, one failing target should not delay or block other targets. They are independent by design. Isolated failures mean other targets complete and verify normally while only the failed target is retried or escalated.
Implementation agents produce detailed internal reasoning that is irrelevant to downstream steps. Passing only what is needed (interfaces, file paths, decisions) keeps sub-agent contexts clean and focused. Downstream agents can read files directly if they need implementation details.
npx claudepluginhub git-fg/taches-principled --plugin tp-saddProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.