From observability
Parse and interpret trace logs from a running system to diagnose latency, errors, and bottlenecks. Use this skill when asked to "analyze these logs", "find the bottleneck", "debug this trace", or "interpret these spans".
How this skill is triggered — by the user, by Claude, or both
Slash command
/observability:analyze-tracesThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Parse structured trace logs or span data and produce a diagnostic report.
Parse structured trace logs or span data and produce a diagnostic report.
Read the trace input and identify its format:
trace_id, span_id, parent_span_idtimestamp, level, eventExtract a list of spans. For each span, record:
span_id, parent_span_id, component (service/function name)start_time, end_time, duration_msstatus (OK / ERROR), error_message if presentBuild a tree of spans using parent_span_id links. Print the tree to show call nesting:
[root] process_request 200ms OK
[1] validate_input 5ms OK
[2] fetch_context 80ms OK
[2a] cache_lookup 2ms MISS
[2b] api_fetch 78ms OK
[3] call_llm 110ms OK
[4] format_response 5ms OK
Identify the critical path — the chain of spans with no parallelism that determines total latency.
Sort spans by duration_ms descending. Flag any span that:
Report hotspots with:
Scan for spans with status=ERROR or log records with level=ERROR or level=CRITICAL:
If multiple trace samples are provided:
Produce a Markdown report:
## Trace Analysis Report
**Traces analyzed**: <count>
**Time range**: <start> to <end>
**Total duration (median)**: <Xms>
### Critical Path
<tree diagram>
### Top Latency Hotspots
| Rank | Component | Avg Duration | % of Total | On Critical Path |
|------|-----------|-------------|------------|-----------------|
| 1 | call_llm | 110ms | 55% | YES |
### Errors Detected
| Component | Count | Message (sample) |
|-----------|-------|-----------------|
| api_fetch | 3 | timeout after 5s |
### Recommendations
1. <highest impact action>
2. <second action>
3. <third action>
Based on gaps found during analysis, list any spans or log points that are missing
and would improve future diagnosis. Use /instrument-code to add them.
npx claudepluginhub ats-kinoshita-iso/agent-workshop --plugin observabilityAnalyzes OpenTelemetry distributed traces from Axiom to find traces by ID, errors, latency, or service. Helps debug distributed system issues.
Investigates distributed application performance via PostHog APM / OpenTelemetry spans — trace ID lookup, slow span analysis, error-rate trends, latency distributions, service/attribute exploration.
Analyzes Lensflare traces span-by-span, explaining timing, errors, and parent/child structure. Use for debugging slow requests, error post-mortems, or production traces via trace ID.