From hyperagents
Create and configure domain-specific evaluation harnesses for the HyperAgents evolution loop. Defines how tasks are loaded, agents are invoked, predictions are collected, and scores are computed. Triggers when setting up evaluation domains or creating custom fitness functions.
How this skill is triggered — by the user, by Claude, or both
Slash command
/hyperagents:domain-harnessThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The harness is the bridge between the HyperAgents evolution loop and domain-specific evaluation. It defines how to load tasks, run the agent, collect predictions, and compute fitness scores.
The harness is the bridge between the HyperAgents evolution loop and domain-specific evaluation. It defines how to load tasks, run the agent, collect predictions, and compute fitness scores.
┌──────────────┐ ┌─────────────┐ ┌──────────────┐
│ Task List │────>│ Harness │────>│ Predictions │
│ (input) │ │ (executor) │ │ (output) │
└──────────────┘ └──────┬──────┘ └──────┬───────┘
│ │
┌──────▼──────┐ ┌──────▼───────┐
│ Task Agent │ │ Reporter │
│ (modified) │ │ (scorer) │
└─────────────┘ └──────┬───────┘
│
┌──────▼───────┐
│ report.json │
│ (fitness) │
└──────────────┘
Create a task list JSON file with evaluation items:
[
{
"question_id": "task_001",
"input": "Write a function that reverses a string",
"expected": "def reverse_string(s): return s[::-1]"
}
]
Place at .hyperagents/domains/<domain>/harness.sh:
#!/bin/bash
# Domain harness script
# Args: --task-list <path> --agent-path <path> --output-dir <path> --num-samples <n>
TASK_LIST=$2
AGENT_PATH=$4
OUTPUT_DIR=$6
NUM_SAMPLES=$8
# Load tasks
# Run agent on each task
# Collect predictions
# Write predictions.csv to OUTPUT_DIR
Place at .hyperagents/domains/<domain>/report.sh:
#!/bin/bash
# Domain reporter script
# Args: --output-dir <path>
OUTPUT_DIR=$2
# Read predictions.csv
# Compare to expected outputs
# Compute score
# Write report.json
Add to .hyperagents/config.json:
{
"domains": {
"my_domain": {
"harness": ".hyperagents/domains/my_domain/harness.sh",
"reporter": ".hyperagents/domains/my_domain/report.sh",
"score_key": "accuracy",
"splits": ["train", "val"],
"can_ensemble": true,
"staged_eval_fraction": 0.1,
"staged_eval_samples": 10
}
}
}
claude-skill — Skill QualityEvaluates a Claude Code skill by:
claude-agent — Agent EffectivenessEvaluates a Claude Code agent by:
claude-hook — Hook ReliabilityEvaluates a Claude Code hook by:
code-quality — General Code QualityCombines multiple signals:
| Field | Type | Description |
|---|---|---|
harness | string | Path to harness script |
reporter | string | Path to reporter script |
score_key | string | JSON key in report.json for the fitness score |
splits | string[] | Evaluation splits: train, val, test |
can_ensemble | boolean | Whether ensemble evaluation makes sense |
staged_eval_fraction | number | Fraction of samples for staged eval |
staged_eval_samples | number | Absolute number of samples for staged eval |
eval_timeout | number | Timeout in seconds per task |
max_workers | number | Parallel evaluation workers |
HyperAgents supports evolving against multiple domains simultaneously:
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub zpankz/hyperagents-plugin