By raskip
10 example skills for diagnosing and remediating Azure Virtual Machine issues (Linux + Windows).
Verification procedure for Azure VM backup health and compliance. Use when checking if VMs are properly backed up, verifying recovery points, auditing backup compliance, investigating failed backup alerts, or assessing backup coverage across a resource group or subscription.
Expand VM disks (OS or data) when disk space is running low. Covers both Linux and Windows VMs. Use when a disk space alert fires, a user reports a full disk, or monitoring shows disk usage above 90%. Handles Azure managed disk resize, partition expansion, and filesystem growth. Always snapshots before changes.
Investigation procedure for disk IOPS and throughput throttling on Azure Virtual Machines. Covers both Linux and Windows VMs, Premium SSD, Standard SSD, and Ultra Disk configurations. Use when a VM shows slow disk performance, high IO wait, disk latency spikes, IOPS or throughput throttling, or when an application is slow but CPU and memory appear normal.
Troubleshooting procedure for high CPU usage on Azure Virtual Machines. Covers both Linux and Windows VMs. Use when a VM shows sustained high CPU, a CPU alert fires, or a user reports slow VM performance related to CPU.
Troubleshooting procedure for high memory usage and OOM (Out of Memory) conditions on Azure Virtual Machines. Covers both Linux and Windows VMs. Use when a VM shows high memory utilization, OOM kills occur, swap pressure is elevated, or a user reports slow VM performance related to memory exhaustion.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Last verified against Azure SRE Agent documentation: May 2026
A community starter kit for Azure SRE Agent: 10 VM troubleshooting skills, 18 governance hooks, Bicep deployment templates, marketplace install metadata, and step-by-step guides to get started quickly.
⚠️ Example content — all skills and hooks in this repo were created with GitHub Copilot CLI as starting points. Test and customize for your environment before production use. Use at your own risk.
Azure SRE Agent is an AI-powered operations assistant that diagnoses, triages, and remediates Azure infrastructure issues. It's genuinely powerful on its own — it can reason through problems iteratively, learn from past incidents, run tools in parallel, and adapt its investigation depth to the complexity of the issue.
So why this repo? Because the agent's built-in intelligence works best when combined with your team's specific knowledge. Skills encode your proven troubleshooting procedures, safety requirements, and organizational standards. Hooks enforce governance guardrails. Together, they turn a capable general-purpose agent into a team member who knows how you operate. See Why and When to Use Skills and Why and When to Use Hooks for the full picture.
This repo gives you example skills and hooks you can use as starting points — test, customize, and extend them for your environment. The new .github/plugin/marketplace.json manifest (mirrored at .claude-plugin/marketplace.json) makes the repo installable in one click from the Azure SRE Agent portal.
Looking for more plugins? Microsoft maintains an official set at Azure/sre-agent-plugins (Datadog, Dynatrace, PagerDuty, Elasticsearch, AWS, Azure Managed Grafana, Atlassian Rovo) — installable the same way through the SRE Agent portal's plugin marketplace.
Scope: This repo focuses on skills and hooks — it's not a comprehensive Azure SRE Agent guide. For full documentation on agent setup, connectors, memory, run modes, and more, see the official docs. The repo may expand to cover additional topics over time.

💡 Tip: This entire repo — every skill, hook, Bicep template, and doc — was built with GitHub Copilot. I strongly recommend using it to create your own skills and hooks, customize deployments, and work with the Azure SRE Agent in general. It makes the process dramatically faster. See our guide: Creating Skills with Copilot →
The agent can reason through problems without any skills. But skills add critical value for production operations:
| Benefit | What it means |
|---|---|
| Consistency | Same diagnostic procedure every time — not variable reasoning across engineers or shifts |
| Your knowledge encoded | Your thresholds, escalation paths, naming conventions, and architecture details — baked in |
| Execution + safety | Skills attach tools that actually run commands, with prescribed safety checks before any changes |
| Structured output | Standardized reports with evidence, severity, and recommendations — ready for handoff or audit |
When to create a skill: You have a proven procedure that recurs, involves safety-critical steps, or needs consistent output. When to skip: The problem is novel, one-off, or simple enough for the agent to handle ad-hoc.
📖 Full guide: Why and When to Use Skills →
Get your first skill running in 5 steps. This uses high-cpu-vm-troubleshooting as an example — swap in any skill from the skills table below.
git clone https://github.com/raskip/azure-sre-agent-stuff.git
cd azure-sre-agent-stuff
# Or pull latest if you already have it
git pull origin main
npx claudepluginhub raskip/azure-sre-agent-stuff --plugin vm-sre-skillsUltra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Frontend design skill for UI/UX implementation
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Memory compression system for Claude Code - persist context across sessions
Marketing skills for AI agents — conversion optimization, copywriting, SEO, paid ads, ad creative, and growth
Standalone image generation plugin using Nano Banana MCP server. Generates and edits images, icons, diagrams, patterns, and visual assets via Gemini image models. No Gemini CLI dependency required.