Name: agentops-toolkit
Author: azure

Stats

Actions

Available In

Tags

AgentOps Toolkit

A CLI, local Cockpit, and agent skills that help teams operationalize AI agents on Microsoft Foundry with standardized evaluation, observability, tracing, and operational practices.

Overview

AgentOps Toolkit is a CLI, local Cockpit, and agent skills that help teams move Microsoft Foundry agents from demo/POC to production with standardized evaluation, CI/CD gates, readiness diagnostics, release evidence, and trace-driven regression loops.

The project enables:

Consistent local and CI execution of agent evaluations

Automatic evaluator selection based on dataset shape (RAG, agent-with-tools, model quality)

Stable machine-readable outputs for automation

Human-readable reports for PR reviews and quality gates

Baseline comparison to detect regressions across runs

Doctor readiness analysis for repo, CI/CD, telemetry, AI Landing Zone, production release evidence, and Foundry configuration

Release evidence packs (evidence.json + evidence.md) that summarize whether a candidate is ready, warning-only, or blocked

Trace-to-dataset promotion so reviewed production conversations become future regression rows

A local Cockpit that brings AgentOps artifacts together with Foundry and Azure Monitor navigation

How AgentOps complements Microsoft Foundry

AgentOps is not a replacement for Microsoft Foundry. Foundry remains the system of record for hosted agents, cloud evaluations, traces, runtime monitoring, red teaming, datasets, operations, and Azure resource posture. AgentOps adds the repo-side developer workflow around those capabilities: configuration, repeatable eval gates, CI/CD wiring, local artifacts, Doctor diagnostics, and a Cockpit that points back to the right Foundry or Azure Monitor surface.

Surface

Microsoft Foundry provides

AgentOps Toolkit provides

Agent runtime

Hosted agents, model deployments, traces, monitor views

Target resolution, local/CI eval invocation, normalized run artifacts

Evaluations

Cloud eval execution, Foundry evaluation reports, dataset assets

Source-controlled eval config, threshold gates, PR reports, baseline comparison

Observability

Foundry Monitor, App Insights, traces, operations views

Telemetry wiring, CI eval spans, Doctor finding spans, Cockpit deep links

Operations

Active alerts, red teaming, runtime health, Azure resources

Readiness checklist, workflow generation, repo/CI hygiene checks, release evidence

Continuous improvement

Traces, datasets, online evaluation signals

Reviewed trace-to-dataset candidates and regression gates

Developer workflow

Portal experience and Azure platform services

CLI-first automation that teams can run locally and in CI

The design goal is simple: AgentOps accelerates adoption of Foundry by making the developer workflow repeatable, observable, and CI-friendly; it does not duplicate Foundry's portal experience.

Core outputs:

results.json (machine-readable)

report.md (human-readable)

evidence.json / evidence.md (release-readiness evidence, when generated by agentops doctor --evidence-pack)

Exit code contract:

0 execution succeeded and all thresholds passed

2 execution succeeded but one or more thresholds failed

1 runtime or configuration error

Quickstart

1) Install