Skill

Run Evaluation (PPAPI)

Runs evaluations on Copilot Studio draft agents via Power Platform Evaluation API. Lists test sets, starts/polls runs, fetches results, proposes YAML fixes. Use to test changes without publishing.

Bash

Node

testing

Popularity

Stars

245

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/copilot-studio:run-eval

Not user invocable

Model invocable

Forked subagent

Default effort

Configuration

Agentcopilot-studio-test

Tool Access

This skill is limited to the following tools:

Bash(node *eval-api.bundle.js *)Bash(node *manage-agent.bundle.js push *)Bash(node *manage-agent.bundle.js pull *)ReadGlobGrepEdit

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Run evaluations against a Copilot Studio agent's **draft** — no publish needed.

SKILL.md

90 lines · ~922 tokens

Stats

LanguageJavaScript

Stars245

Forks55

MaintenanceExcellent

Last CommitMay 13, 2026

Actions

View Source View Plugin View on GitHub View README

Run Evaluation (PPAPI)

Run evaluations against a Copilot Studio agent's draft — no publish needed.

The caller (test agent) must provide --client-id and --workspace. If you don't have the client ID, return immediately and tell the caller to run test-auth first.

All eval-api commands run in the foreground. NEVER use run_in_background.

Step 1: List test sets and let the user choose

node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js list-testsets --workspace <path> --client-id <id>

No test sets found: Tell the user to create one in Copilot Studio (Evaluate tab > New evaluation). Stop.
One test set: Tell the user which one you're using and proceed.
Multiple test sets: Show them all and ask the user to pick. Do not proceed until they answer.

Step 2: Ask about authenticated execution — MANDATORY, do not skip

You MUST ask this question and wait for the user's answer before starting the run.

Ask the user:

Does your agent use authenticated knowledge sources or connector actions (tools) that require user identity? If so, you'll need to provide a connection ID — without it, the eval runs anonymously and tools and knowledge sources will not be used.

How to obtain the connection ID:

Go to https://make.powerautomate.com

Open Connections from the side menu

Select the relevant Microsoft Copilot Studio connection

Copy the connection ID from the URL (the GUID segment after /connections/)

If your agent doesn't use authenticated knowledge or tools, you can skip this.

Do not proceed to Step 3 until the user responds.

Step 3: Start the run

node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js start-run --workspace <path> --client-id <id> --testset-id <id> --run-name "Draft eval <date>"

Add --connection-id <id> if the user provided a connection ID in Step 2.

Add --published only if the user explicitly asked for published-bot testing.

Step 4: Poll until complete

node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js get-run --workspace <path> --client-id <id> --run-id <runId>

Poll every 15-30 seconds. Report progress: "Processing: 3/10 test cases..."

Stop when state is Completed, Failed, Abandoned, or Cancelled.

Step 5: Fetch and analyze results

node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js get-results --workspace <path> --client-id <id> --run-id <runId>

Present a summary table (total, passed, failed, errors). For failures:

Metric	What to check
`GeneralQuality` Fail	Which of relevance/completeness/groundedness/abstention failed
`ExactMatch` Fail	Score 0.0–1.0
`CapabilityUse` Fail	`missingInvocationSteps`
`Error` status	`errorReason` — often a test set config issue, not a YAML issue

Step 6: Propose fixes (if failures found)

For YAML authoring failures: find the relevant topic, read it, propose specific edits. Wait for user approval before applying.

After applying: offer to push and re-run (go back to Step 3).

Run Evaluation (PPAPI)

Popularity

Invocation

Configuration

Tool Access

Context Preview

SKILL.md

Run Evaluation (PPAPI)

Popularity

Invocation

Configuration

Tool Access

Context Preview

SKILL.md

Run Evaluation (PPAPI)

Step 1: List test sets and let the user choose

Step 2: Ask about authenticated execution — MANDATORY, do not skip

Step 3: Start the run

Step 4: Poll until complete

Step 5: Fetch and analyze results

Step 6: Propose fixes (if failures found)

Similar Skills

Run Evaluation (PPAPI)

Step 1: List test sets and let the user choose

Step 2: Ask about authenticated execution — MANDATORY, do not skip

Step 3: Start the run

Step 4: Poll until complete

Step 5: Fetch and analyze results

Step 6: Propose fixes (if failures found)

Similar Skills