Search everything...

Stats

Actions

Available In

evaluator

Name: evaluator
Author: artmin96

By ArtMin96

Evaluator-optimizer pattern: adversarial review, verification, static analysis gates, postmortems

npx claudepluginhub artmin96/forge-studio --plugin evaluator

Popularity

Stars

Above avg

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Agents1

adversarial-reviewer

/adversarial-reviewer

Reviews code with a skeptical eye. Asks hard questions about edge cases, failure modes, and hidden assumptions. Use for security-sensitive or complex code.

Skills7

challenge

/challenge

Draft Verification critique: self-review + git history comparison. Run before marking any non-trivial task complete.

devils-advocate

/devils-advocate

Argue against a design decision or implementation approach. Forces consideration of alternatives before committing. Use when evaluating architecture or design choices.

gate-report

/gate-report

Aggregate all quality warnings from the current session. Use before committing to see a summary of all hook-generated warnings.

grill-me

/grill-me

Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when the user is in plan mode, planning an implementation, wants to stress-test a plan, get grilled on their design, or mentions "grill me".

healthcheck

/healthcheck

Run a full project health check. Auto-detects PHP and/or JS/TS projects and runs the appropriate quality pipeline. Use before committing or when you want a quality snapshot.

Hooks1

Event Hooks

Bash

File writes

3 hooks across 2 events

Stats

Version1.0.0

LanguageShell

Stars2

MaintenanceGood

Last CommitApr 3, 2026

AddedApr 4, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

forge-studio2

Safety Signals

Caution

Executes bash commands

Hook triggers when Bash tool is used

Modifies files

Hook triggers on file write and edit operations

Uses power tools

Uses Bash, Write, or Edit tools

README

Forge Studio

Agent = Model + Harness. The harness is everything except the model: behavioral steering, context management, memory, evaluation, orchestration, and multi-agent decomposition. Research shows changing only the harness can produce a 6x performance gap.

Forge Studio implements harness principles as composable Claude Code plugins.

9 plugins. 35 skills. 22 hooks. 4 agents.

Install

# Add the marketplace
/plugin marketplace add ArtMin96/forge-studio

# Install by layer — pick what you need

# Behavioral Steering (recommended: start here)
/plugin install behavioral-core@forge-studio

# Context Management
/plugin install context-engine@forge-studio

# Memory Architecture
/plugin install memory@forge-studio

# Evaluation & Quality Gates
/plugin install evaluator@forge-studio

# Orchestration
/plugin install workflow@forge-studio

# Multi-Agent Decomposition
/plugin install agents@forge-studio

# Reference & Tips
/plugin install reference@forge-studio

# Execution Trace Collection
/plugin install traces@forge-studio

# Token-Optimized Output (always-on compressed communication)
/plugin install caveman@forge-studio

After installing, start a new session for plugins to load.

Recommended CLAUDE.md

A lean CLAUDE.md template is included at templates/CLAUDE.md. Designed to work with forge-studio plugins — covers personality, judgment, context management, self-evaluation, and project config without repeating what hooks enforce.

cp templates/CLAUDE.md ./CLAUDE.md
# Edit the Project Config and Conventions sections for your project

Recommended settings.json

A power-user settings.json template is included at templates/settings.json. Enables extended thinking, maximum effort, LSP tools, and bypass permissions with a deny list for destructive commands.

# Copy to your global Claude Code config
cp templates/settings.json ~/.claude/settings.json

Key choices:

Bypass permissions + deny list — allows everything except destructive commands. Two safety layers: the deny list here and behavioral-core hooks.
No co-authored-by — removes the "Co-Authored-By: Claude" trailer from commits
Always thinking + high effort — maximizes reasoning quality at the cost of more tokens
LSP + tool search — enables IDE-level code navigation and on-demand tool loading
Auto-compact at 75% — compacts context earlier than the default 95%, preventing quality decay
90-day transcript retention — extends the default 30-day cleanup period

See Settings Best Practices for detailed documentation.

Architecture

┌─────────────────────────────────────────────┐
│                User / IDE                   │
├─────────────────────────────────────────────┤
│           Harness (Forge Studio)            │
│                                             │
│  behavioral-core ──── Steering & discipline │
│  context-engine ───── Context window mgmt   │
│  memory ───────────── Cross-session recall  │
│  evaluator ────────── Quality gates & review│
│  workflow ─────────── Orchestration patterns│
│  agents ───────────── Multi-agent triad     │
│  reference ────────── Power-user tips       │
│  caveman ──────────── Token-optimized output│
│                                             │
├─────────────────────────────────────────────┤
│              Claude Model                   │
└─────────────────────────────────────────────┘

See docs/architecture.md for the full design rationale.

Plugin Reference

behavioral-core — Behavioral Steering

Re-injects behavioral rules on every message to prevent drift in long sessions. Rules live in rules.d/ as individual files, priority-ordered by numeric prefix (10-no-sycophancy, 20-no-filler, etc.). Add, remove, or reorder rules by managing files. Hooks enforce ~100% compliance where system prompt instructions degrade to ~80%.

Skill	Purpose
`/rules-audit`	Audit session for sycophancy, apologies, scope creep, filler
`/scope <task>`	Define task boundaries and acceptance criteria
`/timebox [N]`	Set a message budget (default 15) for the current task

context-engine — Context Management

Progressive 5-stage context pressure tracking replaces fixed message-count thresholds. Warns at ~50% (re-read files), ~65% (consider compact), ~75% (recommend compact), ~85% (recommend handoff), ~92% (critical — handoff now). Automatically uses actual context percentage when Claude Code exposes it.

View full README on GitHub

evaluator

Popularity

What's Inside

Confidence

README

Forge Studio

Install

Recommended CLAUDE.md

Recommended settings.json

Architecture

Plugin Reference

behavioral-core — Behavioral Steering

context-engine — Context Management

Similar Plugins

harness-claude

credo

everything-claude-code

caveman

llm-council-plugin

self-improving-agent

More by ArtMin96

workflow

traces

context-engine

agents

memory

Forge Studio

Install

Recommended CLAUDE.md

Recommended settings.json

Architecture

Plugin Reference

behavioral-core — Behavioral Steering

context-engine — Context Management

Popularity

Health & Quality

More by ArtMin96

workflow

traces

context-engine

agents

memory

Similar Plugins

harness-claude

credo

everything-claude-code

caveman

llm-council-plugin

self-improving-agent