Investigate and mitigate Google Cloud incidents using SRE playbooks, anomaly detection on time-series metrics, log analysis, infrastructure discovery, and structured post-mortem generation.
🐉 Detects anomalies in time-series data from various sources.
🐉 Skill for interacting with and analyzing Google Cloud Logging and Error Reporting. Use this when you need to process large JSON logs from GCP or convert them to Apache format for easier analysis.
🐉 Skill for interacting with Google Cloud Monitoring (CM) via APIs to avoid large context bloat. Produces nice short synoptic "gists" of graphs
🐉 Fetches and parses time-series data from various sources.
🐉 [SRE] Discover and map GCP infrastructure architecture including compute, networking, storage, and service dependencies.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Note: Given the recent deprecation of Gemini CLI, this Extension is also fully functional as a Plugin for agy CLI, Claude Code, and Codex.
The SRE Gemini CLI Extension is a dedicated toolkit comprising specialized Skills designed to augment Site Reliability Engineers (SREs). By integrating deeply with the Gemini CLI, this extension empowers SREs to investigate outages, configure MCP servers, formulate mitigations, and detect anomalies more rapidly.
See also:
Watch the SRE Extension in action as it performs a live production outage investigation and generates a detailed PostMortem:
For detailed installation and configuration instructions across all CLI environments, please refer to the Installation Guide (INSTALL.md).
If you have just (a modern make clone) installed, you can quickly set up the extension. If you don't have just yet, you can quickly install it via brew install just /
sudo apt-get install just (or see casey/just for more options).
Once installed:
# Google Antigravity CLI (agy)
just install-agy
# Google Gemini CLI (deprecated)
just install-gemini
# Claude Code
just install-claude
You also need python and uv installed.
investigation-entrypoint: Primary entrypoint for investigating production outages, orchestrating SRE response, and mitigating incidents. Start here when an incident occurs!gcp-architecture-discovery: Discover and map GCP infrastructure architecture including compute, networking, storage, and service dependencies.gcp-playbooks: Follows established SRE playbooks for GCP/GKE investigations, including infrastructure discovery and common mitigation steps.gcp-mcp-setup: Automates enabling services, Google Managed MCP (OneMCP) servers, generating API keys, and configuring ~/.gemini/settings.json.gcp-slo-management: Discover Monitoring Services, list existing SLOs, or create new SLOs (Availability/Latency) via the REST API.postmortem-generator: Creates a generated PostMortem given enough context about a resolved incident/outage.cloud-build-investigation: Expert-level SRE skill for Google Cloud Build (GCB) and Cloud Run investigations. Correlates git commits with build failures and analyzes logs.cloud-logging: Skill for interacting with and analyzing Google Cloud Logging and Error Reporting. Processes large JSON logs or converts them to Apache format.cloud-monitoring: Interacts with Google Cloud Monitoring via APIs to avoid large context bloat. Exports time-series data and helps setup SLOs.generic-mitigations: Generic Mitigations high-level classification logic and actuation plan.monitoring-graphs: Generates high-quality, annotated incident graphs for post-mortems using Python to visualize outages and error rates (nice graphs visible here).anomaly-detection: Detects anomalies in time-series data from various sources (Isolation Forest, KNN, Z-score).data-ingestion: Fetches and parses time-series data from various sources for downstream analysis.This plugin provides a specialized suite of skills for data engineers and database practitioners working on Google Cloud. It acts as an expert assistant, allowing you to use natural language prompts in your preferred coding agent to architect complex data pipelines, transform data with dbt, write Spark and BigQuery SQL notebooks, and orchestrate end-to-end workflows across GCP's data ecosystem.
Connect to Looker and interact with your data using LookML.
Connect, query, and generate data insights for BigQuery datasets and data.
The CI/CD extension provides Gemini powered AI assisted CI/CD. It supports deployment to Cloud Run and Cloud Storage as well as creation of a robust CI/CD pipeline.
Create, connect, and interact with a Cloud SQL for PostgreSQL database and data.
npx claudepluginhub gemini-cli-extensions/sre --plugin sre-extensionTrack SLAs, SLIs, and SLOs for service reliability
DevsForge site reliability engineering specialist for building resilient and scalable systems
Production reliability and observability across all environments. Master Datadog, CloudWatch, monitoring, incident response, SRE practices, and audit logging for enterprise compliance.
Editorial "Observability & Monitoring" bundle for Claude Code from Antigravity Awesome Skills.
Observability & reliability engineer — monitoring, alerting, SRE, incident response, SLOs
Debug, explore, and instrument with Grafana using gcx CLI