From microshift-dev
Investigate MicroShift runtime problems from SOS report
How this skill is triggered — by the user, by Claude, or both
Slash command
/microshift-dev:analyze-sos-reportThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
```bash
/microshift-dev:analyze-sos-report <sos-report-path> [log.html-url]
The analyze-sos-report command investigates MicroShift runtime problems by analyzing journal logs, Pod logs, YAML manifests, and configuration from a SOS report. Optionally, it can cross-reference findings with a Robot Framework test log.
This command focuses on:
<ARGUMENTS> (sos-report-path): Path or URL to the SOS report - Required
/tmp/sosreport-hostname-2025-01-15).tar.xz file that will be downloaded and extracted to /tmpsos_commands/microshift subdirectory.Second argument (log.html-url): URL to Robot Framework log.html file - Optional
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/.../log.htmlsos_commands/microshift/ - MicroShift specific datasos_commands/microshift/journalctl_--no-pager_--unit_microshift - MicroShift service logsjournalctl_--no-pager_--unit_microshift-etcd.scope - etcd logsmicroshift_version - MicroShift versionmicroshift_show-config_-m_effective - Effective configurationsystemctl_status_microshift - Service statusevent-filter.html - Kubernetes events viewersos_commands/microshift/namespaces/<NAMESPACE>/<namespace>.yaml - Namespace definitioncore/pods.yaml - Pod definitionscore/events.yaml - Namespace eventscore/configmaps.yaml - ConfigMapscore/services.yaml - Servicesapps/deployments.yaml - Deploymentsapps/daemonsets.yaml - DaemonSetspods/<POD>/<POD>.yaml - Individual Pod YAMLpods/<POD>/<CONTAINER>/<CONTAINER>/<CONTAINER>/logs/current.log - Container logspods/<POD>/<CONTAINER>/<CONTAINER>/<CONTAINER>/logs/previous.log - Previous container logssos_commands/microshift/cluster-scoped-resources/core/nodes.yaml - Node informationstorage.k8s.io/storageclasses.yaml - Storage classessos_commands/crio/ - CRI-O container runtimesos_commands/logs/ - System journal logssos_commands/microshift_ovn/ - OVN networkingsos_commands/openvswitch/ - Open vSwitchsos_commands/networking/ - Network configurationetc/microshift/ - MicroShift configuration filesWhen a log.html URL is provided, the file contains Robot Framework test execution results:
test/ directory of MicroShift repositoryGoal: Determine if input is a URL or local path, and prepare the sosreport directory.
Actions:
http:// or https://:
For Remote URLs:
Create an isolated working directory and download the archive:
WORKDIR="$(mktemp -d /tmp/sosreport-analyze.XXXXXX)"
ARCHIVE="$WORKDIR/sosreport-download.tar.xz"
curl -fL -o "$ARCHIVE" "<url>"
Extract the archive into the working directory:
tar -xf "$ARCHIVE" -C "$WORKDIR" --one-top-level=extracted
Find the extracted directory:
ls -dt "$WORKDIR"/extracted/sosreport-*/ 2>/dev/null | head -1
Set the extracted directory as the working path
Clean up the downloaded archive:
rm -f "$ARCHIVE"
For Local Paths:
sos_commands/microshift/ subdirectoryGoal: Find errors and problems in MicroShift service logs.
Actions:
Read MicroShift journal logs:
cat <sos-report-path>/sos_commands/microshift/journalctl_--no-pager_--unit_microshift
Search for errors, warnings, and failures - look for patterns like:
error, fail, fatal, panictimeout, refused, deniedGoal: Check embedded etcd health.
Actions:
Read etcd journal logs:
cat <sos-report-path>/sos_commands/microshift/journalctl_--no-pager_--unit_microshift-etcd.scope
Look for etcd-specific issues:
Goal: Check container runtime issues.
Actions:
Read CRI-O journal logs (if present):
cat <sos-report-path>/sos_commands/crio/journalctl_--no-pager_--unit_crio
Check container status:
cat <sos-report-path>/sos_commands/crio/crictl_ps_-a
Look for crashed or errored containers
Goal: Check Pod health and Kubernetes events.
Actions:
Check Pod status in each namespace:
cat <sos-report-path>/sos_commands/microshift/namespaces/*/core/pods.yaml
Look for Pods not in Running state (Pending, CrashLoopBackOff, Error, etc.)
Check events for problems:
cat <sos-report-path>/sos_commands/microshift/namespaces/*/core/events.yaml
Look for warning events indicating issues
Goal: Check individual container logs for errors.
Actions:
Find container logs:
find <sos-report-path>/sos_commands/microshift/namespaces -name "current.log" -o -name "previous.log"
Search for errors in container logs:
grep -rE "error|fail|panic|fatal" <sos-report-path>/sos_commands/microshift/namespaces/*/pods/*/
Check previous.log files for containers that restarted
Goal: Check resource configurations for issues.
Actions:
Check Deployments and DaemonSets:
cat <sos-report-path>/sos_commands/microshift/namespaces/*/apps/deployments.yaml
cat <sos-report-path>/sos_commands/microshift/namespaces/*/apps/daemonsets.yaml
Look for:
Goal: Check for configuration issues.
Actions:
Read effective MicroShift config:
cat <sos-report-path>/sos_commands/microshift/microshift_show-config_-m_effective
Check config files if present:
cat <sos-report-path>/etc/microshift/config.yaml
Check for common misconfigurations
Goal: Check networking issues.
Actions:
Check OVN status if present:
cat <sos-report-path>/sos_commands/microshift_ovn/*
Check OVN-related pods in openshift-ovn-kubernetes namespace
Look for network connectivity issues in logs
Goal: Extract test results from log.html and correlate with SOS report findings.
Actions:
Fetch and parse the log.html file using WebFetch
Extract key information:
Cross-reference with SOS report:
Test source reference:
test/ directory of the MicroShift repositoryGoal: Compile findings into a focused problem analysis.
Report Structure:
# MicroShift Runtime Problem Analysis
## Summary
<Brief 1-2 sentence summary of the main issue found>
## Test Results (if log.html provided)
| Test Name | Status | Duration | Error |
|-----------|--------|----------|-------|
| ... | PASS/FAIL | ... | ... |
### Failed Tests Analysis
<For each failed test, provide details and correlation with system logs>
### Test-System Event Correlation
<Timeline showing test execution alongside system events, noting timezone differences>
## Identified Problems
### Problem 1: <Problem Title>
**Severity**: Critical/Warning/Info
**Component**: MicroShift/CRI-O/etcd/Pod/OVN/etc.
**Evidence**:
<Relevant log excerpts>
**Root Cause Analysis**:
<Explanation of what caused this issue>
**Recommendation**:
<How to fix or investigate further>
## Affected Pods/Containers
| Namespace | Pod | Status | Restarts | Issue |
|-----------|-----|--------|----------|-------|
| ... | ... | ... | ... | ... |
## Relevant Log Excerpts
<Key error messages from journals>
## Configuration Issues
<Any misconfigurations found>
## Next Steps
1. <Recommended action 1>
2. <Recommended action 2>
/microshift-dev:analyze-sos-report /tmp/sosreport-microshift-host-2025-01-15-abcdef
/microshift-dev:analyze-sos-report https://example.com/sosreport-edge-device-01-2025-01-15.tar.xz
/microshift-dev:analyze-sos-report /tmp/sosreport-el96-host-2025-01-15 https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_microshift/5870/pull-ci-openshift-microshift-main-e2e-aws-tests-bootc-arm/1997885403058671616/artifacts/e2e-aws-tests-bootc-arm/openshift-microshift-e2e-metal-tests/artifacts/scenario-info/el96-src@optional/log.html
.tar.xz URL/tmp automaticallySOS reports from CI environments may contain logs from multiple MicroShift restarts. This is expected because:
When analyzing SOS reports:
When MicroShift restarts, the API server becomes temporarily unavailable. Ideally, pods should:
Any pod restarts due to API server unavailability should be reported as a concern. Even if the pod eventually recovers, frequent restarts during MicroShift transitions may indicate:
npx claudepluginhub openshift-eng/edge-tooling --plugin microshift-devGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.