Skill

assay

Evaluates competing approaches against codebase constraints, returning structured recommendations with confidence scoring, kill criteria, and evidence grounding. Used by design, spec, and migrate skills.

developer-tools

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/crucible:assay

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Supporting Files

assay-evaluator-prompt.md

SKILL.md

238 lines · ~2.5k tokens

Stats

LanguagePython

Stars10

Forks2

MaintenanceExcellent

Last CommitJun 17, 2026

Actions

View Source View Plugin View on GitHub View README

Assay

Overview

All subagent dispatches use disk-mediated dispatch. See shared/dispatch-convention.md for the full protocol.

Evaluate competing approaches against codebase constraints. Returns a structured Assay Report with a recommendation, alternatives with kill criteria, and confidence scoring. Evidence-grounded — recommendations cite specific file:line references, not generic best practices.

Skill type: Rigid — follow exactly, no shortcuts.

Models:

Evaluator agent: Opus (synthesis/judgment work needs the best model)
Orchestrator: runs on whatever model the session uses

Announce at start: "I'm using the assay skill to evaluate competing approaches."

Name origin: In metallurgy, an assay tests raw material to determine its quality and composition before committing it to the forge.

Invocation API

/assay
  question: "How should the auth middleware handle token refresh?"
  context: { ... }
  decision_type: "architecture"
  approaches: [...]
  cascading_decisions: [...]

Parameters

question (required) — The decision or question to evaluate. One clear sentence.

context (required) — Evidence for the evaluator to reason against. Accepts different shapes depending on the caller:

Caller	Context Shape	Key Fields
`/design`	Recon brief + agent findings	`project_structure`, `existing_patterns`, `scope_boundaries`, `prior_art`
`/spec`	Recon brief + agent findings (autonomous)	`project_structure`, `existing_patterns`, `scope_boundaries`, `prior_art`
`/migrate`	Recon brief + migration analysis	`project_structure`, `migration_target`, `breaking_changes`, `blast_radius`
Generic caller	Freeform evidence	`description` (string) — unstructured context, lower confidence

When context contains unrecognized keys, the evaluator treats them as additional evidence. When context is a bare string, treat as { "description": context }.

decision_type (optional) — architecture | strategy | diagnosis | optimization. Auto-detected from the question if omitted. Defaults to architecture when ambiguous.

approaches (optional) — Array of { name, description } candidates to evaluate. When omitted, the evaluator generates 2-4 candidates from the question and context.

cascading_decisions (optional) — Array of { decision, reasoning } representing prior decisions. Treated as hard constraints — the evaluator cannot modify or challenge them. Conflicts are reported in prior_decision_conflicts.

The Process

Phase 1: Input Validation

Verify question is present and non-empty
Verify context is present (object or string)
If decision_type is provided, validate it's one of the 4 recognized values
If approaches is provided, verify it's an array with at least 2 entries, each having name and description

Phase 2: Dispatch Evaluator

Dispatch a single Opus agent using skills/assay/assay-evaluator-prompt.md.

Fill template placeholders before writing the dispatch file:

{{QUESTION}} — the decision question
{{CONTEXT}} — the full context object/string
{{DECISION_TYPE}} — the decision type (provided or "auto-detect")
{{APPROACHES}} — the approaches array (or "Generate 2-4 candidates")
{{CASCADING_DECISIONS}} — cascading decisions array (or "None")

Phase 3: Validate Output

Parse the evaluator's response as JSON. Validate:

All required fields present: decision_type, confidence, missing_information, recommended, alternatives, prior_decision_conflicts
recommended has: name, rationale, evidence, risks, kill_criteria, constraint_fit
Each alternative has: name, constraint_fit, pros, cons, would_recommend_if
constraint_fit objects have: pattern_alignment, scope_fit, reversibility, integration_risk
confidence is one of: high, medium, low

On validation failure: Retry once with the validation errors as feedback. On second failure, return:

{ "error": "Evaluator produced invalid output after retry", "raw_output": "..." }

Phase 4: Return Report

Return the validated Assay Report to the caller.

Decision Type Adaptation

The evaluator adapts scoring weights based on decision type:

Type	Primary Weight	Secondary Weight
`architecture`	Reversibility, constraint fit	Long-term cost, extensibility
`strategy`	Risk, phasing	Blast radius, team capacity
`diagnosis`	Evidence strength, testability	Explanation coverage, simplicity
`optimization`	Measurable improvement	Disruption cost, reversibility

Output: Assay Report

{
  "decision_type": "architecture",
  "confidence": "high",
  "missing_information": [],
  "recommended": {
    "name": "Event-driven via message bus",
    "rationale": "Aligns with existing src/events/bus.ts pattern...",
    "evidence": ["src/events/bus.ts:14 — existing event dispatch"],
    "risks": ["Adds async complexity to currently synchronous flow"],
    "kill_criteria": "Switch away if latency requirements exceed 50ms p99",
    "constraint_fit": {
      "pattern_alignment": "high",
      "scope_fit": "high",
      "reversibility": "two-way door",
      "integration_risk": "low"
    }
  },
  "alternatives": [
    {
      "name": "Direct service calls",
      "constraint_fit": {
        "pattern_alignment": "medium",
        "scope_fit": "high",
        "reversibility": "one-way door",
        "integration_risk": "medium"
      },
      "pros": ["Simpler mental model", "Synchronous"],
      "cons": ["Tight coupling", "Requires shared deployment"],
      "would_recommend_if": "Latency is critical or team prefers simplicity"
    }
  ],
  "prior_decision_conflicts": []
}

Confidence Scoring

Level	Criteria
`high`	One approach clearly dominates on all weighted dimensions
`medium`	Two viable options with trade-offs that depend on priority
`low`	Need more information — `missing_information` lists what would help

Evidence Grounding

Every recommendation must cite specific evidence from the context:

File:line references from recon briefs
Specific pattern names from the codebase
Concrete constraint violations or alignments

"This is the industry standard approach" is NOT evidence. "This aligns with how src/api/routes/users.ts already handles it" IS evidence.

Without a recon brief, evidence cites the caller's context. Confidence scores skew lower.

Kill Criteria

kill_criteria on recommended approach: condition that would flip the recommendation
would_recommend_if on each alternative: condition that would make it the recommendation

These make decisions revisitable without re-running the full analysis.

Error Handling

Failure	Behavior
Missing `question` or `context`	Return error immediately — no dispatch
Evaluator returns invalid JSON	Retry once with validation errors. Second failure returns `{ "error": ... }`
Evaluator timeout	Return `{ "error": "Evaluator timed out" }`
Invalid `decision_type`	Warn and default to `architecture`
`approaches` has fewer than 2 entries	Ignore provided approaches, let evaluator generate candidates

Integration

Called by

Skill	Decision Type	Context Source	Approaches
`/design`	`architecture`	Recon brief + cascading decisions	Evaluator generates
`/spec`	`architecture`	Recon brief + cascading decisions (autonomous — confidence routing)	Evaluator generates
`/migrate`	`strategy`	Recon brief + migration analysis	Evaluator generates

Not called by (investigated, not a fit): /debugging (hypothesis evaluation uses quality-gate, not assay), /prospector (competing design evaluation is more sophisticated than assay for this use case). See #147 for rationale.

Consumer Dispatch Examples

From /design:

/assay
  question: "How should components communicate in the new auth module?"
  context: { recon brief with project_structure, existing_patterns }
  decision_type: "architecture"
  cascading_decisions: [{ decision: "Using Redis for session store", reasoning: "..." }]

From /spec:

/assay
  question: "How should the auth middleware handle token refresh?"
  context: { recon brief + investigation findings }
  decision_type: "architecture"
  cascading_decisions: [{ decision: "Using Redis for session store", reasoning: "..." }]

Spec consumes assay output autonomously: high confidence = accept, medium = terminal alert, low = block alert.

From /migrate:

/assay
  question: "What migration strategy minimizes risk for the React 18→19 upgrade?"
  context: { recon brief + migration_target: "React 19", breaking_changes: [...] }
  decision_type: "strategy"

Standalone Usage

/assay question: "Should we use PostgreSQL or SQLite for this project?"
  context: "Small team, <10K users, read-heavy workload, deployed on single server"

Dispatches

Evaluator agent (Opus) via skills/assay/assay-evaluator-prompt.md

Does NOT

Investigate the codebase (that's /recon)
Challenge prior decisions (that's /design's Challenger agent)
Make the decision for the user (it recommends; the caller decides)
Iterate or loop (one dispatch, one report)

assay

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

assay

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Assay

Overview

Invocation API

Parameters

The Process

Phase 1: Input Validation

Phase 2: Dispatch Evaluator

Phase 3: Validate Output

Phase 4: Return Report

Decision Type Adaptation

Output: Assay Report

Confidence Scoring

Evidence Grounding

Kill Criteria

Error Handling

Integration

Called by

Consumer Dispatch Examples

Standalone Usage

Dispatches

Does NOT

Similar Skills

Assay

Overview

Invocation API

Parameters

The Process

Phase 1: Input Validation

Phase 2: Dispatch Evaluator

Phase 3: Validate Output

Phase 4: Return Report

Decision Type Adaptation

Output: Assay Report

Confidence Scoring

Evidence Grounding

Kill Criteria

Error Handling

Integration

Called by

Consumer Dispatch Examples

Standalone Usage

Dispatches

Does NOT

Similar Skills