From copilot-studio
Runs evaluations on Copilot Studio draft agents via Power Platform Evaluation API. Lists test sets, starts/polls runs, fetches results, proposes YAML fixes. Use to test changes without publishing.
How this skill is triggered — by the user, by Claude, or both
Slash command
/copilot-studio:run-evalcopilot-studio-testThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run evaluations against a Copilot Studio agent's **draft** — no publish needed.
Run evaluations against a Copilot Studio agent's draft — no publish needed.
The caller (test agent) must provide --client-id and --workspace. If you don't have the client ID, return immediately and tell the caller to run test-auth first.
All eval-api commands run in the foreground. NEVER use run_in_background.
node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js list-testsets --workspace <path> --client-id <id>
You MUST ask this question and wait for the user's answer before starting the run.
Ask the user:
Does your agent use authenticated knowledge sources or connector actions (tools) that require user identity? If so, you'll need to provide a connection ID — without it, the eval runs anonymously and tools and knowledge sources will not be used.
How to obtain the connection ID:
- Go to https://make.powerautomate.com
- Open Connections from the side menu
- Select the relevant Microsoft Copilot Studio connection
- Copy the connection ID from the URL (the GUID segment after
/connections/)If your agent doesn't use authenticated knowledge or tools, you can skip this.
Do not proceed to Step 3 until the user responds.
node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js start-run --workspace <path> --client-id <id> --testset-id <id> --run-name "Draft eval <date>"
Add --connection-id <id> if the user provided a connection ID in Step 2.
Add --published only if the user explicitly asked for published-bot testing.
node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js get-run --workspace <path> --client-id <id> --run-id <runId>
Poll every 15-30 seconds. Report progress: "Processing: 3/10 test cases..."
Stop when state is Completed, Failed, Abandoned, or Cancelled.
node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js get-results --workspace <path> --client-id <id> --run-id <runId>
Present a summary table (total, passed, failed, errors). For failures:
| Metric | What to check |
|---|---|
GeneralQuality Fail | Which of relevance/completeness/groundedness/abstention failed |
ExactMatch Fail | Score 0.0–1.0 |
CapabilityUse Fail | missingInvocationSteps |
Error status | errorReason — often a test set config issue, not a YAML issue |
For YAML authoring failures: find the relevant topic, read it, propose specific edits. Wait for user approval before applying.
After applying: offer to push and re-run (go back to Step 3).
npx claudepluginhub microsoft/skills-for-copilot-studio --plugin copilot-studioRuns batch test suites against published Copilot Studio agents using Power CAT Copilot Studio Kit and Dataverse API. Configures settings.json with environment credentials and reports pass/fail results with latencies.
Writes, runs, and analyzes structured test suites for Agentforce agents. Supports smoke tests, batch execution, and iterative fix loops using sf CLI commands.
Writes, runs, and analyzes structured test suites for Agentforce agents using sf agent test and sf agent preview CLI commands. Supports smoke tests, batch execution, trace analysis, and iterative fix loops.