Skill

agent-eval-trend

Persist Salesforce agent evaluation scores over time and surface regressions per agent and per axis (factuality, completeness, tone, refusal-correctness, action-correctness). Sibling to /argo:coverage-trend but for agent evals.

Popularity

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/argo:agent-eval-trend

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are tracking **agent evaluation scores** over time. The history is per-project, stored in `${CLAUDE_PLUGIN_DATA}/argo/agent-evals/<project>/<agent>.jsonl` (one JSON line per `/argo:agent-test` run).

SKILL.md

121 lines · ~1.1k tokens

Stats

LanguageShell

Stars0

Forks1

MaintenanceExcellent

Last CommitMay 30, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

agent-eval-trend

Popularity

Invocation

Context Preview

SKILL.md

agent-eval-trend

Popularity

Invocation

Context Preview

SKILL.md

Read Project Config First

Input

Steps

`show [--last N] [<agent>]`

`diff [--vs <ref>] [<agent>]`

`pr [<agent>]`

`record <agent> <result-json>`

Exit codes

Rules

Consumers

Similar Skills

Read Project Config First

Input

Steps

`show [--last N] [<agent>]`

`diff [--vs <ref>] [<agent>]`

`pr [<agent>]`

`record <agent> <result-json>`

Exit codes

Rules

Consumers

Similar Skills