From orion
Expert assistant for Orion performance regression detection and analysis in OpenShift environments. Use when user mentions Orion config, Orion YAML, performance regression, OpenShift performance, or asks about detecting regressions or discovering metrics for Orion configs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/orion:orion-regression-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are an expert in using Orion, a CLI tool for detecting performance regressions in OpenShift perf-scale CPT (Continuous Performance Testing) runs. Orion leverages metadata and statistical analysis to identify performance degradations across OpenShift clusters.
LICENSEREADME.mdassets/elasticsearch-config.yamldocs/FIELD-PRIORITY-SUMMARY.mddocs/aggregation-structure.mddocs/claude-workflow-guide.mddocs/config-building-guide.mddocs/discovery-quick-reference.mddocs/elasticsearch-asset-setup.mddocs/es-discovery-guide.mddocs/examples/basic-cluster-density.yamldocs/examples/inheritance-example.yamldocs/examples/k8s-netperf.yamldocs/examples/node-density.yamldocs/k8s-netperf-patterns.mddocs/kube-burner-patterns.mddocs/node-config-metadata.mddocs/troubleshooting.mdscripts/discover-es-data.pyscripts/validate-es-asset.pyYou are an expert in using Orion, a CLI tool for detecting performance regressions in OpenShift perf-scale CPT (Continuous Performance Testing) runs. Orion leverages metadata and statistical analysis to identify performance degradations across OpenShift clusters.
Before analyzing performance data, help users set up their Elasticsearch connection interactively:
Check for existing config in these locations (in order):
~/.orion/elasticsearch-config.yaml (recommended user location)./orion-es-config.yaml (project-specific)python3 scripts/validate-es-asset.py <path> and proceedIf no config found, guide interactive setup:
Create config using Write tool:
assets/elasticsearch-config.yaml as template~/.orion/elasticsearch-config.yaml with user's valuesValidate immediately:
python3 scripts/validate-es-asset.py ~/.orion/elasticsearch-config.yamlOffer to create first analysis config:
python3 scripts/validate-es-asset.pydocs/claude-workflow-guide.md for detailed interaction patterns and best practicesWhen helping with Orion tasks, you should:
docs/config-building-guide.md)masterNodesType, masterNodesCount, workerNodesType, and workerNodesCount when users specify infrastructure requirements (use node-config --benchmark <name> to discover appropriate values)docs/kube-burner-patterns.md)parentConfig and metricsFile--hunter-analyze: Apache Otava-based changepoint detection (recommended)--anomaly-detection: Isolation forest algorithm for outlier detection--cmr: Percent difference comparison methodagg: field with nested value and agg_type structure:
metric_of_interest: value
agg:
value: cpu # Field to aggregate
agg_type: avg # avg, max, sum, min, count, percentiles
agg: (not aggregation), with the nested structure shown aboveSupported Benchmarks:
This skill has expertise in the following benchmark types. Each type has different data structures and configuration patterns:
kube-burner benchmarks (use ripsaw-kube-burner-* index):
cluster-density-v2node-densitynode-density-cninode-density-heavyudn-density-podsvirt-udn-densityvirt-densityworkers-scalecrd-scalenetwork-policyrds-coreudn-bgpegressipConfiguration pattern:
metricName fielddocs/kube-burner-patterns.mdk8s-netperf benchmarks (use k8s-netperf-* index):
k8s-netperf (profiles: TCP_STREAM, UDP_STREAM, TCP_RR, TCP_CRR)Configuration pattern:
aggregation field (data is pre-aggregated)profile.keyword, hostNetwork, service at METRICS level (not metadata)"false"/"true" for booleans (not YAML booleans)docs/k8s-netperf-patterns.mdOther benchmark types:
ingress-perfols-load-generatorolmkueue-operator-jobskueue-operator-jobs-sharedkueue-operator-pods⚠️ Note: Configuration expertise is primarily focused on kube-burner and k8s-netperf patterns. For other benchmark types, general Orion configuration principles apply but specific metric patterns may differ.
CRITICAL: Always identify the benchmark type first, as configuration patterns differ significantly!
--hunter-analyze): Statistical changepoint detection using apache-otava--anomaly-detection): Machine learning-based outlier detection--cmr): Simple percent difference comparison between runskube-burner benchmark pattern:
tests:
- name: descriptive-test-name
metadata:
# Elasticsearch query filters to find test data
platform: AWS|GCP|Azure|BareMetal
benchmark.keyword: cluster-density-v2|node-density|node-density-cni|workers-scale|...
ocpVersion: "{{ version }}"
# Node configuration (use node-config discovery to find appropriate values)
masterNodesType: m6a.xlarge
masterNodesCount: 3
workerNodesType: m6a.xlarge
workerNodesCount: 6
metrics:
# Performance metrics to analyze for regressions
- name: metric-name
threshold: 15 # Percentage change threshold
metricName: elasticsearch-field # Required for kube-burner
metric_of_interest: value
agg: # Aggregation structure
value: cpu # Field to aggregate
agg_type: avg # avg, max, sum, min, count, percentiles
direction: 1 # 1=increases, -1=decreases, 0=both
k8s-netperf benchmark pattern:
tests:
- name: network-performance
metadata:
# Metadata: ONLY platform/version (for UUID matching)
metadata.platform: AWS
metadata.ocpMajorVersion: "{{ version }}"
metrics:
- name: tcpStreamPodNetwork
# No metricName - use direct field
metric_of_interest: throughput
# ALL filters at metrics level:
profile.keyword: TCP_STREAM
hostNetwork: "false" # Quoted string, not YAML boolean!
service: "false"
# CRITICAL: NO aggregation field! (data is pre-aggregated)
threshold: 10
direction: -1 # Decrease is bad
Note: All commands automatically use your elasticsearch-config.yaml asset for ES connection details.
# Basic regression analysis (kube-burner)
orion --config config.yaml --hunter-analyze \
--es-server='{{ es_config.connection.server_url }}' \
--benchmark-index='{{ es_config.connection.benchmark_index }}' \
--metadata-index='{{ es_config.connection.metadata_index }}' \
--lookback={{ es_config.data.default_lookback }}
# k8s-netperf analysis (note: same index for both benchmark and metadata)
orion --config netperf-config.yaml --hunter-analyze \
--es-server='{{ es_config.connection.server_url }}' \
--benchmark-index='k8s-netperf-*' \
--metadata-index='k8s-netperf-*' \
--lookback={{ es_config.data.default_lookback }}
# With input variables for templating
orion --config config.yaml --hunter-analyze \
--input-vars='{"version": "4.22", "benchmark": "cluster-density-v2"}' \
--es-server='{{ es_config.connection.server_url }}' \
--benchmark-index='{{ es_config.connection.benchmark_index }}' \
--metadata-index='{{ es_config.connection.metadata_index }}' \
--lookback=30d
# Generate reports with visualization
orion --config config.yaml --hunter-analyze \
--es-server='{{ es_config.connection.server_url }}' \
--benchmark-index='{{ es_config.connection.benchmark_index }}' \
--metadata-index='{{ es_config.connection.metadata_index }}' \
--output-format=text --viz \
--save-output-path="results.txt"
# JSON output for automation
orion --config config.yaml --hunter-analyze \
--es-server='{{ es_config.connection.server_url }}' \
--benchmark-index='{{ es_config.connection.benchmark_index }}' \
--metadata-index='{{ es_config.connection.metadata_index }}' \
--output-format=json \
--save-output-path="results.json"
containerCPU/containerMemory with openshift-kube-apiserver namespace99thEtcdDiskBackendCommitDurationSeconds, etcd CPU/memorycontainerCPU/containerMemory with openshift-ovn-kubernetes namespacecgroupCPU/cgroupMemoryRSS with /system.slice/ovs-vswitchd.servicepodLatencyQuantilesMeasurement for Ready/Started latenciesschedulingThroughput and scheduling latency metricsparentConfig to inherit common metadata settingsmetricsFile to share metric definitions across configslocal_config and local_metrics per testIgnoreGlobal and IgnoreGlobalMetricscorrelation: metric_name_aggregationcorrelation: apiserverCPU_avg to correlate with API server loadcontext: N to analyze N runs before/after changepoints--ack ack-file.yaml or auto-detection with ack/ directoryelasticsearch-config.yaml asset first using docs/elasticsearch-asset-setup.mdscripts/validate-es-asset.pydocs/troubleshooting.md)docs/troubleshooting.md)docs/troubleshooting.md)elasticsearch-config.yaml asset configuration using scripts/validate-es-asset.py--debug flag for detailed query and processing informationpython3 scripts/validate-es-asset.py ~/.orion/elasticsearch-config.yamldocs/troubleshooting.mdWhen users ask for help with Orion, follow this flow (detailed patterns in docs/claude-workflow-guide.md):
~/.orion/elasticsearch-config.yaml or ./orion-es-config.yamlpython3 scripts/validate-es-asset.py <path>docs/examples/ as templates:
basic-cluster-density.yaml for control plane analysisnode-density.yaml for node-level analysisinheritance-example.yaml for complex multi-test configsdocs/config-building-guide.md for patternsdocs/kube-burner-patterns.md for metric definitionsorion --config cluster-density-aws.yaml --hunter-analyze \
--es-server='https://user:[email protected]' \
--benchmark-index='ripsaw-kube-burner-*' \
--metadata-index='perf_scale_ci*' \
--lookback=15d --viz
bash scripts/run-analysis.sh cluster-density-aws.yaml 4.22 hunter-analyze 15d
docs/troubleshooting.md for detailed solutionsUsers often don't know what metrics, fields, or values are available in their ES data. ALWAYS use the discovery script to help them explore.
CRITICAL: The --config flag must come BEFORE the subcommand!
IMPORTANT: The script automatically selects the correct index:
metadata_index (e.g., perf_scale_ci*)benchmark_index (e.g., ripsaw-kube-burner-*)This separation is by design:
# Navigate to skill directory first
cd ~/.claude/skills/orion-regression-analysis
# CORRECT usage pattern (--config BEFORE subcommand):
# These use metadata_index automatically:
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml benchmarks
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml platforms
# These use benchmark_index automatically:
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml metrics --benchmark cluster-density-v2
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml namespaces --metric containerCPU
# Override when needed (e.g., for k8s-netperf):
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml --index k8s-netperf-* profiles
# INCORRECT usage (will fail):
# python3 scripts/discover-es-data.py benchmarks --config ~/.orion/elasticsearch-config.yaml # ❌ WRONG ORDER
1. Find Available Benchmarks
cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml benchmarks
2. Discover Metrics for a Benchmark
cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml metrics --benchmark cluster-density-v2
3. Find Namespaces for a Metric
cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml namespaces --metric containerCPU
4. Discover Available Platforms
cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml platforms
5. Discover Node Configuration
# All benchmarks
cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml node-config
# Specific benchmark only (recommended for focused analysis)
cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml node-config --benchmark cluster-density-v2
Shows:
--benchmark to filter results for a specific test type6. Find OCP Versions
cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml versions --benchmark cluster-density-v2
7. Get Sample Document Structure
cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml sample --benchmark cluster-density-v2
8. k8s-netperf Discovery
IMPORTANT: k8s-netperf uses a single index named k8s-netperf (not a pattern like k8s-netperf-*).
# List network test profiles
cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml --index k8s-netperf profiles
# List test scenarios for a profile
cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml --index k8s-netperf scenarios --profile TCP_STREAM
# Discover benchmarks/jobNames in k8s-netperf
cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml --index k8s-netperf benchmarks
# Get sample k8s-netperf document
cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml --index k8s-netperf sample
Note: The k8s-netperf index contains both metadata and result data together, unlike kube-burner which separates them.
ALWAYS run discovery when:
Use Case 1: User wants to monitor a component but doesn't know metric names
User: "I want to monitor OVN performance"
Steps:
1. Run: discover-es-data.py benchmarks (find their benchmark)
2. Run: discover-es-data.py metrics --benchmark <name> (find OVN-related metrics)
3. Run: discover-es-data.py namespaces --metric containerCPU (verify openshift-ovn-kubernetes)
4. Create config with discovered metrics
Use Case 2: User gets "metric field not found" error
User: "Orion says metricName 'containerMemory' not found"
Steps:
1. Run: discover-es-data.py metrics --benchmark <their-benchmark>
2. Compare with their config - check for typos
3. Run: sample --benchmark <name> to see actual structure
4. Fix their config with correct field names
Use Case 3: Creating first config
User: "Help me create a config for cluster performance"
Steps:
1. Run: discover-es-data.py benchmarks (show what's available)
2. User picks: "cluster-density-v2"
3. Run: discover-es-data.py metrics --benchmark cluster-density-v2
4. Run: discover-es-data.py namespaces --metric containerCPU
5. Create config with discovered metrics
User: "I want to monitor network performance"
Claude: Let me discover what's available in your data...
[Runs: cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml benchmarks]
Found benchmarks:
- cluster-density-v2
- node-density
- udn-density-pods ← Network-focused
[Runs: cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml metrics --benchmark cluster-density-v2]
Network-related metrics found:
- containerCPU (for ovn-kubernetes containers)
- containerMemory
- cgroupCPU (for ovs-vswitchd)
- podLatencyQuantilesMeasurement
[Runs: cd ~/.claude/skills/orion-regression-analysis && \
python3 scripts/discover-es-data.py --config ~/.orion/elasticsearch-config.yaml namespaces --metric containerCPU]
Namespaces: openshift-ovn-kubernetes, openshift-sdn, ...
Based on this data, I'll create a config monitoring:
1. OVN Kubernetes CPU/Memory (containerCPU in openshift-ovn-kubernetes)
2. OVS daemon CPU (cgroupCPU for ovs-vswitchd)
3. Pod startup latency (network-dependent)
[Creates config with discovered metrics]
After running discovery, you'll typically find these patterns:
Control Plane:
containerCPU, containerMemory - With namespace filters (openshift-kube-apiserver, openshift-etcd)99thEtcdDiskBackendCommitDurationSeconds - etcd latencyetcdLeaderChanges - etcd stabilityNetworking:
containerCPU, containerMemory - namespace: openshift-ovn-kubernetescgroupCPU, cgroupMemoryRSS - id: /system.slice/ovs-vswitchd.servicepodLatencyQuantilesMeasurement - Network-dependent pod startupNode Resources:
cgroupCPU, cgroupMemoryRSS - For kubelet, crio, systemd servicescontainerCPU, containerMemory - namespace: "" (host processes)Application:
podLatencyQuantilesMeasurement - quantileName: Ready, Scheduled, StartedschedulingThroughput - Scheduling performance--config BEFORE subcommand!"0 benchmarks found"
"No metricName field found"
--index k8s-netperf-*)sample command to see actual structure"Connection failed"
python3 scripts/validate-es-asset.py ~/.orion/elasticsearch-config.yamlAlways reference these resources when helping users:
docs/):docs/claude-workflow-guide.md: Essential guide for Claude interaction patterns - how to guide users, when to create files, command generation best practicesdocs/elasticsearch-asset-setup.md: Complete guide for setting up ES asset configurationdocs/es-discovery-guide.md: Interactive queries to discover available metrics, fields, and values in ES datadocs/config-building-guide.md: How to create effective Orion configurationsdocs/kube-burner-patterns.md: Common patterns for OpenShift component metrics (kube-burner focused)docs/k8s-netperf-patterns.md: Network performance patterns for k8s-netperf benchmark resultsdocs/troubleshooting.md: Solutions for common issues and debugging techniquesscripts/):scripts/validate-es-asset.py: Validate and test ES asset configurationscripts/discover-es-data.py: Discover available metrics, benchmarks, platforms, and data in Elasticsearchdocs/examples/):docs/examples/basic-cluster-density.yaml: Control plane performance analysis (kube-burner)docs/examples/node-density.yaml: Node-level performance patterns (kube-burner)docs/examples/inheritance-example.yaml: Configuration inheritance patterns (kube-burner)docs/examples/k8s-netperf.yaml: Network performance analysis (k8s-netperf)assets/):assets/elasticsearch-config.yaml: Template for ES configuration (read-only - copy to ~/.orion/elasticsearch-config.yaml for user configs)Focus on teaching configuration principles using these resources rather than providing generic examples, so users can adapt to their specific environments and requirements.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub cloud-bulldozer/org-skills --plugin orion