Stats

Actions

Available In

Tags

judge-codex — Codex-as-judge for the plan cycle ecosystem

Autonomous + rigorous orthogonal-LLM jury for the plan cycle pipeline. Breaks the Claude-only monoculture by adding GPT-Codex as an independent reviewer that re-validates each cycle artifact against its golden-rule contract.

Why this exists

plan already runs /review with 5–7 specialized sub-agents in parallel (architecture, tests, wiring, cross-validation, domain). They are all Claude — same model family, same training, same blind spots. Concurrency hazards, side-channel risks, and certain classes of subtle bugs get systematically under-weighted because every reviewer shares the same priors.

judge-codex adds a second LLM family (GPT-Codex via the official codex CLI) as an orthogonal jury that consumes the same artifact and emits an independent verdict using plan's canonical verdict vocabulary. When Claude and Codex disagree, the disagreement itself is the signal — and the loop halts for human adjudication.

What you get

Slash commands (one per cycle stage):

Command

When

What it judges

/judge-codex:discover <slug>

After /discover-plan produces a blueprint

≥2-source evidence rule, fabricated_citation, empty corners, cross-cutting comparison rigor

/judge-codex:plan <slug>

After /to-plan (typically also after /plan-confidence)

Coverage Matrix semantic completeness, Goal SMART quality, ADR alternative honesty, TDD discipline in bug-fixes, citation fabrication

/judge-codex:implementation <slug>

After /implement emits IMPLEMENTATION_COMPLETE

Wiring triad depth (caller + integration test + runtime metric), TDD RED→GREEN→REFACTOR audit-trail, dead code in production paths

/judge-codex:final <slug>

After /review emits the consolidated report

Review-of-review: does the consolidated finding-set itself hold up? Catches the consolidate_findings.py YAML-bug class of meta-defects

/judge-codex:auto <slug>

End-to-end

Runs all 4 above sequentially against artifacts produced by a single slice

/judge-codex:setup

Once per environment

Verifies codex CLI install + login; offers to install if missing

/judge-codex:status

Anytime

Lists background jobs

/judge-codex:help

Anytime

Shows commands + their golden-rule mapping

Sub-agent:

Name

Role

judge-codex-jury

Thin forwarding wrapper that hands a cycle artifact + golden-rule contract to Codex via codex-companion-judge.mjs. Returns Codex stdout verbatim.

Verdict vocabulary (locked, mirrors plan)

SHIPPABLE — 90–100 green SHIPPABLE_WITH_CAVEATS — 70–89 caveats logged NEEDS_REVISION — 50–69 loop back FAIL_SOFT — 49 soft cap blew FAIL_HARD — 49 hard cap blew → block downstream INVALID — 0 structural integrity broken

Different from codex-plugin-cc: that plugin uses a binary approve / needs-attention schema (general-purpose code review). judge-codex outputs structured findings keyed by plan's cycle-specific golden rules.

Quick start

# 1. Install Codex CLI (one-time) npm install -g @openai/codex codex login # 2. Install this plugin /plugin marketplace add paulohenriquevn/judge-codex-plugin-cc /plugin install judge-codex@judge-codex /reload-plugins # 3. Verify /judge-codex:setup # 4. Use after any plan cycle /judge-codex:discover my-slug /judge-codex:plan my-slug /judge-codex:implementation my-slug /judge-codex:final my-slug # OR end-to-end after a slice completes /judge-codex:auto my-slug

How it composes with plan

plan's existing /review (5-7 Claude sub-agents): ───────────────────┐ ├── architecture-reviewer │ ├── test-auditor │ ├── wiring-validator │ ├── cross-validation │ └── domain-specific (1-3) │ │ NEW: /judge-codex (Codex orthogonal jury): ─────────────────────────┤ ├── /judge-codex:discover (blueprint judge) │ ├── /judge-codex:plan (plan judge) ├── verdict ├── /judge-codex:implementation │ └── /judge-codex:final (review-of-review) │ │ When Claude and Codex agree → confidence ↑ │ When they disagree → loop halts for human │ ┘

Cycle-aware vs generic Codex review

judge-codex — Codex-as-judge for the `plan` cycle ecosystem

Autonomous + rigorous orthogonal-LLM jury for the plan cycle pipeline. Breaks the Claude-only monoculture by adding GPT-Codex as an independent reviewer that re-validates each cycle artifact against its golden-rule contract.

Why this exists

What you get

Slash commands (one per cycle stage):

Command	When	What it judges
`/judge-codex:discover <slug>`	After `/discover-plan` produces a blueprint	≥2-source evidence rule, `fabricated_citation`, empty corners, cross-cutting comparison rigor
`/judge-codex:plan <slug>`	After `/to-plan` (typically also after `/plan-confidence`)	Coverage Matrix semantic completeness, Goal SMART quality, ADR alternative honesty, TDD discipline in bug-fixes, citation fabrication
`/judge-codex:implementation <slug>`	After `/implement` emits `IMPLEMENTATION_COMPLETE`	Wiring triad depth (caller + integration test + runtime metric), TDD RED→GREEN→REFACTOR audit-trail, dead code in production paths
`/judge-codex:final <slug>`	After `/review` emits the consolidated report	Review-of-review: does the consolidated finding-set itself hold up? Catches the `consolidate_findings.py` YAML-bug class of meta-defects
`/judge-codex:auto <slug>`	End-to-end	Runs all 4 above sequentially against artifacts produced by a single slice
`/judge-codex:setup`	Once per environment	Verifies `codex` CLI install + login; offers to install if missing
`/judge-codex:status`	Anytime	Lists background jobs
`/judge-codex:help`	Anytime	Shows commands + their golden-rule mapping

Sub-agent:

Name	Role
`judge-codex-jury`	Thin forwarding wrapper that hands a cycle artifact + golden-rule contract to Codex via `codex-companion-judge.mjs`. Returns Codex stdout verbatim.

Verdict vocabulary (locked, mirrors `plan`)

SHIPPABLE              — 90–100  green
SHIPPABLE_WITH_CAVEATS — 70–89   caveats logged
NEEDS_REVISION         — 50–69   loop back
FAIL_SOFT              — 49      soft cap blew
FAIL_HARD              — 49      hard cap blew → block downstream
INVALID                — 0       structural integrity broken

Different from codex-plugin-cc: that plugin uses a binary approve / needs-attention schema (general-purpose code review). judge-codex outputs structured findings keyed by plan's cycle-specific golden rules.

Quick start

# 1. Install Codex CLI (one-time)
npm install -g @openai/codex
codex login

# 2. Install this plugin
/plugin marketplace add paulohenriquevn/judge-codex-plugin-cc
/plugin install judge-codex@judge-codex
/reload-plugins

# 3. Verify
/judge-codex:setup

# 4. Use after any plan cycle
/judge-codex:discover my-slug
/judge-codex:plan my-slug
/judge-codex:implementation my-slug
/judge-codex:final my-slug

# OR end-to-end after a slice completes
/judge-codex:auto my-slug

How it composes with `plan`

plan's existing /review (5-7 Claude sub-agents):  ───────────────────┐
  ├── architecture-reviewer                                          │
  ├── test-auditor                                                   │
  ├── wiring-validator                                               │
  ├── cross-validation                                               │
  └── domain-specific (1-3)                                          │
                                                                     │
NEW: /judge-codex (Codex orthogonal jury):  ─────────────────────────┤
  ├── /judge-codex:discover    (blueprint judge)                     │
  ├── /judge-codex:plan        (plan judge)                          ├── verdict
  ├── /judge-codex:implementation                                    │
  └── /judge-codex:final       (review-of-review)                    │
                                                                     │
  When Claude and Codex agree → confidence ↑                         │
  When they disagree         → loop halts for human                  │
                                                                     ┘

judge-codex

Popularity

What's Inside

Confidence

README

judge-codex — Codex-as-judge for the `plan` cycle ecosystem

Why this exists

What you get

Verdict vocabulary (locked, mirrors `plan`)

Quick start

How it composes with `plan`

Cycle-aware vs generic Codex review

Similar Plugins

everything-claude-code

r-skills

unity-dev-toolkit

creative-writing

judge-codex — Codex-as-judge for the `plan` cycle ecosystem

Why this exists

What you get

Verdict vocabulary (locked, mirrors `plan`)

Quick start

How it composes with `plan`

Cycle-aware vs generic Codex review

Popularity

Health & Quality

Similar Plugins

everything-claude-code

r-skills

unity-dev-toolkit

creative-writing

dotnet-skills

aaron-seo-geo

judge-codex

Popularity

What's Inside

Confidence

README

judge-codex — Codex-as-judge for the plan cycle ecosystem

Why this exists

What you get

Verdict vocabulary (locked, mirrors plan)

Quick start

How it composes with plan

Cycle-aware vs generic Codex review

Similar Plugins

everything-claude-code

r-skills

unity-dev-toolkit

creative-writing

judge-codex — Codex-as-judge for the plan cycle ecosystem

Why this exists

What you get

Verdict vocabulary (locked, mirrors plan)

Quick start

How it composes with plan

Cycle-aware vs generic Codex review

Popularity

Health & Quality

Similar Plugins

everything-claude-code

r-skills

unity-dev-toolkit

creative-writing

dotnet-skills

aaron-seo-geo

judge-codex — Codex-as-judge for the `plan` cycle ecosystem

Verdict vocabulary (locked, mirrors `plan`)

How it composes with `plan`

judge-codex — Codex-as-judge for the `plan` cycle ecosystem

Verdict vocabulary (locked, mirrors `plan`)

How it composes with `plan`