Skill

bare-eval

From ork

Runs isolated claude --bare mode calls for skill evaluation, output grading, trigger testing, and prompt benchmarking without plugin interference.

Popularity

Parent stars

172

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/ork:bare-eval

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Run `claude -p --bare` for fast, clean eval/grading without plugin overhead.

Supporting Files

references/grading-schemas.mdreferences/invocation-patterns.mdreferences/troubleshooting.mdrules/_sections.mdrules/bare-grading-only.mdrules/bare-plugin-conflict.mdrules/bare-requires-api-key.mdtest-cases.json

SKILL.md

105 lines · ~770 tokens

Stats

LanguageTypeScript

Parent stars172

Parent forks15

MaintenanceExcellent

Last CommitMar 29, 2026

Actions

View Source View Plugin View on GitHub View README

Bare Eval — Isolated Evaluation Calls

Run claude -p --bare for fast, clean eval/grading without plugin overhead.

CC 2.1.81 required. The --bare flag skips hooks, LSP, plugin sync, and skill directory walks.

When to Use

Grading skill outputs against assertions
Trigger classification (which skill matches a prompt)
Description optimization iterations
Any scripted -p call that doesn't need plugins

When NOT to Use

Testing skill routing (needs --plugin-dir)
Testing agent orchestration (needs full plugin context)
Interactive sessions

Prerequisites

# --bare requires ANTHROPIC_API_KEY (OAuth/keychain disabled)
export ANTHROPIC_API_KEY="sk-ant-..."

# Verify CC version
claude --version  # Must be >= 2.1.81

Quick Reference

Call Type	Command Pattern
Grading	`claude -p "$prompt" --bare --max-turns 1 --output-format text`
Trigger	`claude -p "$prompt" --bare --json-schema "$schema" --output-format json`
Optimize	`echo "$prompt" \| claude -p --bare --max-turns 1 --output-format text`
Force-skill	`claude -p "$prompt" --bare --print --append-system-prompt "$content"`

Invocation Patterns

Load detailed patterns and examples:

Read("${CLAUDE_SKILL_DIR}/references/invocation-patterns.md")

Grading Schemas

JSON schemas for structured eval output:

Read("${CLAUDE_SKILL_DIR}/references/grading-schemas.md")

Pipeline Integration

OrchestKit's eval scripts (npm run eval:skill) auto-detect bare mode:

# eval-common.sh detects ANTHROPIC_API_KEY → sets BARE_MODE=true
# Scripts add --bare to all non-plugin calls automatically

Bare calls: Trigger classification, force-skill, baseline, all grading. Never bare: run_with_skill (needs plugin context for routing tests).

Performance

Scenario	Without --bare	With --bare	Savings
Single grading call	~3-5s startup	~0.5-1s	2-4x
Trigger (per prompt)	~3-5s	~0.5-1s	2-4x
Full eval (50 calls)	~150-250s overhead	~25-50s	3-5x

Rules

Read("${CLAUDE_SKILL_DIR}/rules/_sections.md")

Troubleshooting

Read("${CLAUDE_SKILL_DIR}/references/troubleshooting.md")

eval:skill npm script — unified skill evaluation runner
eval:trigger — trigger accuracy testing
eval:quality — A/B quality comparison
optimize-description.sh — iterative description improvement
Version compatibility: doctor/references/version-compatibility.md

bare-eval

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

bare-eval

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Bare Eval — Isolated Evaluation Calls

When to Use

When NOT to Use

Prerequisites

Quick Reference

Invocation Patterns

Grading Schemas

Pipeline Integration

Performance

Rules

Troubleshooting

Related

Similar Skills

Bare Eval — Isolated Evaluation Calls

When to Use

When NOT to Use

Prerequisites

Quick Reference

Invocation Patterns

Grading Schemas

Pipeline Integration

Performance

Rules

Troubleshooting

Related

Similar Skills