From eou-foundry
Audits value_invocations in EOU run traces against Rule 97 and F14-F17 failure taxonomy. Detects citation theater, hallucinated value IDs, and judgment drift.
How this skill is triggered — by the user, by Claude, or both
Slash command
/eou-foundry:audit-judgment TARGET_EOU_ID_OR_PATHTARGET_EOU_ID_OR_PATHtargetThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Audit the value_invocations of `$target` — an EOU with `classification.judgment_authorized:true` — against Rule 97 and the F14–F17 failure taxonomy.
Audit the value_invocations of $target — an EOU with classification.judgment_authorized:true — against Rule 97 and the F14–F17 failure taxonomy.
Performs the fourth audit layer per D4.1 (post-ECP-0019). Audits agentic-judgment correctness, not output validity (that's $eou-validate), run correctness (run-trace validation), or EOU design (that's $eou-audit). The four layers compose; this skill is the agentic-judgment specialist.
$target (required) — EOU ID resolved to foundry/eous/{id}.yml or foundry/meta-eous/{id}.yml, or a direct file path. Must have classification.judgment_authorized:true.foundry/captured-workflows/cw-*.yml (the app's constitution); foundry/runs/{target}/ (run traces); optionally foundry/audits/eou-audits/{target}.audit.yml (prior $eou-audit findings).target_eou_spec (the EOU under audit)foundry/captured-workflows/ (find the approved captured_workflow for the app)rules/97-value-invocation.mdengine/failure-taxonomy.yml (F14–F17 definitions)engine/maturity-model.yml (judgment_maturity J0–J4 axis)schemas/run-trace.schema.yml (value_invocations entry shape)engine/meta-eous/audit-judgment.ymlHalt and report without running checks if:
judgment_authorized:false or absent — Rule 97 does not apply.human_approval exists for the app — judgment_authorized:true would itself be a validator failure per ECP-0018.lifecycle_stage: candidate or draft — Rule 97 enforcement begins at simulated per the exemption clause.foundry/runs/{target}/ — judgment_maturity is J0 or J1; nothing to audit yet. Emit "judgment_maturity:J1_INVOCATION_NAIVE — no runs to audit" status.Load target_eou_spec. Verify judgment_authorized:true. If false → "rule does not apply"; return.
Load the app's captured_workflow (longest-prefix-match path per Rule 96 multi-tenant resolution). Verify human_approval complete at all four gates and domain_values count ≥3. If not → emit precondition failure (matches ECP-0018 validator).
For each value_invocations entry across all run traces: verify domain_value_id resolves to an id in the captured_workflow's current domain_values list. Mismatched ids → F17 finding (severity blocking).
For each invocation: examine rule_conflict and priority_at_invocation. If a higher-priority domain_value would have resolved the same conflict differently (cross-reference the captured_workflow's decides_when blocks), flag F15 (severity high).
Across ≥3 runs, compute invocation distribution per domain_value_id. Compare actual invocation frequency to declared priority weights (priority 1 = highest, expected highest invocation frequency on contested cases that it governs). Deviations >20% on top-three values → F16 finding (severity high). Skip with "drift not yet evaluable" if <3 runs.
Scan run traces for execution steps that indicate contested cases (entries in decision_points flagged as contested or matching contested-case heuristics) but lack a corresponding value_invocations entry AND lack an escalations_triggered entry referencing the case. Each unrecorded contested case → F14 finding (severity blocking at pilot+; high at draft).
For up to 5 sampled invocations (configurable via generation_budget.max_swap_tests):
foundry/runs/audit-judgment/swap-tests/{run_id}/ and never affect production).Write to foundry/audits/judgment-audits/{target}.judgment-audit.yml:
audit_date:
target_eou:
target_eou_version:
captured_workflow_id:
captured_workflow_version:
runs_analyzed: # list of run_ids
checks:
- check_name: F17 | F15 | F16 | F14 | counterfactual_swap
status: pass | fail | skip
findings:
- severity: blocking | high | medium | low
invocation_id: # for F14-F17 (or null for F14 silent case)
description:
required_fix:
counterfactual_swap_audit:
swap_tests_run:
swap_tests_with_output_change:
baseline_variance_runs:
verdict: PASS | FAIL
summary:
total_findings:
by_severity: {blocking: 0, high: 0, medium: 0, low: 0}
verdict: PASS | FAIL | CONDITIONAL_PASS
judgment_maturity_recommendation: J1 | J2 | J3 | J4
Record the audit run trace at foundry/runs/audit-judgment/{run_id}.yml per ECP-0014 trace obligation.
judgment_authorized:false on its own classification.$eou-promote per D5.2 gate evidence.foundry/audits/judgment-audits/, foundry/runs/audit-judgment/).foundry/runs/audit-judgment/swap-tests/ — never write to production directories under swap conditions.Upstream: the fourth audit layer per D4.1 (post-ECP-0019). Receives an EOU with judgment_authorized:true and its app's captured_workflow.
Downstream: writes foundry/audits/judgment-audits/{target}.judgment-audit.yml. The report is consumed by $eou-promote for judgment_maturity promotion evidence, by $foundry-audit for portfolio-level judgment health, and by human reviewers for sophisticated-theater detection that mechanical checks cannot catch (per V2).
Related: $eou-audit (third audit layer — design); $eou-validate (first audit layer — schema); $foundry-audit (portfolio audit including judgment-audit aggregation).
Pipeline: EOU at judgment_authorized:true → accumulates value_invocations in run traces → audit-judgment audits invocations and runs counterfactual-swap → judgment-audit report → $eou-promote for judgment_maturity promotion
Rule 96 vs Rule 97: Rule 96 governs static spec discipline (specs must cite domain_values in success_criteria.must_pass). Rule 97 governs runtime invocation discipline (runs must invoke values for contested cases; invocations must respect priority; counterfactual-swap must produce changes). $eou-audit enforces Rule 96; $audit-judgment enforces Rule 97.
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub xiaolai/eou-foundry --plugin eou-foundry