From grafana-app-sdk
Grafana Tempo distributed tracing backend reference covering TraceQL query language, architecture, YAML configuration, ingestion (OTLP/Jaeger/Zipkin), deployment modes, metrics-from-traces, and Grafana integrations.
How this skill is triggered — by the user, by Claude, or both
Slash command
/grafana-app-sdk:tempoThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Grafana Tempo is an open-source, high-scale distributed tracing backend. It is:
Grafana Tempo is an open-source, high-scale distributed tracing backend. It is:
A trace represents the lifecycle of a request as it passes through multiple services. It consists of:
Traces enable:
Applications
|
| (OTLP 4317/4318, Jaeger 14250/14268, Zipkin 9411)
v
[Distributor] ---- hashes traceID, routes to N partitions
|
[Kafka]
|---> [Live Stores] (storage of recent data)
|
|---> [Block Builders] (Parquet block assembly, flush to object storage)
|
|---> [Metrics Generator] (optional: derives RED metrics -> Prometheus)
Query path:
Grafana --> [Query Frontend] (shards queries)
|
[Querier pool]
/ \
[Live Stores] [Object Storage]
(recent) (historical blocks)
| Component | Role | Default Ports |
|---|---|---|
| Distributor | Receives spans, routes by traceID hash | 4317 (gRPC), 4318 (HTTP) |
| Live Store | Buffers recent data on local disk and serves queries | - |
| Query Frontend | Query orchestrator, shards across queriers | 3200 (HTTP) |
| Querier | Executes search jobs against storage | - |
| Compactor | Merges blocks, enforces retention | - |
| Block Builder | Creates the final parquet blocks and flushes to object storage | - |
| Metrics Generator | Derives RED metrics from spans | - |
TraceQL queries filter traces by span properties. Structure: { filters } | pipeline
span.http.status_code # span-level attribute
resource.service.name # resource-level attribute (from SDK)
event.name # event-level attribute
name # intrinsic: span operation name
status # intrinsic: ok | error | unset
duration # intrinsic: span duration
kind # intrinsic: server | client | producer | consumer | internal
traceDuration # intrinsic: entire trace duration
rootServiceName # intrinsic: service of the root span
rootName # intrinsic: operation name of the root span
= != > < >= <= # comparison
=~ !~ # regex match (Go RE2)
&& || ! # logical
# All errors
{ status = error }
# Slow requests from a service
{ resource.service.name = "frontend" && duration > 1s }
# HTTP 5xx errors
{ span.http.status_code >= 500 }
# Count errors per trace (more than 2)
{ status = error } | count() >= 2
# Select specific fields
{ status = error } | select(span.http.url, duration, resource.service.name)
# Structural: server span with downstream error
{ kind = server } >> { status = error }
# Both conditions present (any relationship)
{ span.db.system = "redis" } && { span.db.system = "postgresql" }
# Find most recent (deterministic)
{ resource.service.name = "api" } with (most_recent=true)
# Error rate per service
{ status = error } | rate() by (resource.service.name)
# P99 latency
{ kind = server } | quantile_over_time(duration, .99) by (resource.service.name)
git clone https://github.com/grafana/tempo.git
cd tempo/example/docker-compose/local
mkdir tempo-data
docker compose up -d
# Grafana at http://localhost:3000, Tempo API at http://localhost:3200
helm repo add grafana https://grafana.github.io/helm-charts
helm install tempo grafana/tempo-distributed \
--version 1.61.3 \
--set storage.trace.backend=s3 \
--set storage.trace.s3.bucket=my-tempo-bucket \
--set storage.trace.s3.region=us-east-1
// alloy.river
otelcol.receiver.otlp "default" {
grpc { endpoint = "0.0.0.0:4317" }
http { endpoint = "0.0.0.0:4318" }
output {
traces = [otelcol.exporter.otlp.tempo.input]
}
}
otelcol.exporter.otlp "tempo" {
client {
endpoint = "tempo:4317"
tls { insecure = true }
}
}
exporters:
otlp:
endpoint: tempo:4317
tls:
insecure: true
# For multi-tenancy:
headers:
x-scope-orgid: my-tenant
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp]
curl -X POST -H 'Content-Type: application/json' \
http://localhost:4318/v1/traces \
-d '{"resourceSpans": [{"resource": {"attributes": [{"key": "service.name", "value": {"stringValue": "my-service"}}]}, "scopeSpans": [{"spans": [{"traceId": "5B8EFFF798038103D269B633813FC700", "spanId": "EEE19B7EC3C1B100", "name": "my-op", "startTimeUnixNano": 1689969302000000000, "endTimeUnixNano": 1689969302500000000, "kind": 2}]}]}]}'
metrics_generator:
storage:
path: /var/tempo/generator/wal
remote_write:
- url: http://prometheus:9090/api/v1/write
send_exemplars: true
overrides:
defaults:
metrics_generator:
processors: [service-graphs, span-metrics]
Service Graphs: Visualizes service topology and latency
traces_service_graph_request_total, traces_service_graph_request_failed_total, duration histogramsSpan Metrics: RED metrics per span
traces_spanmetrics_calls_total, traces_spanmetrics_duration_seconds_*Local Blocks: Enables TraceQL metrics queries on recent data
# Enable in Tempo config
multitenancy_enabled: true
All requests require X-Scope-OrgID header.
# OpenTelemetry Collector
exporters:
otlp:
headers:
x-scope-orgid: tenant-id
# Grafana datasource
jsonData:
httpHeaderName1: "X-Scope-OrgID"
secureJsonData:
httpHeaderValue1: "tenant-id"
datasources:
- name: Tempo
type: tempo
url: http://tempo:3200
jsonData:
# Link traces to logs
tracesToLogsV2:
datasourceUid: loki-uid
filterByTraceID: true
tags: [{key: "service.name", value: "app"}]
# Link traces to metrics
tracesToMetrics:
datasourceUid: prometheus-uid
tags: [{key: "service.name", value: "service"}]
queries:
- name: Error Rate
query: 'sum(rate(traces_spanmetrics_calls_total{$$__tags, status_code="STATUS_CODE_ERROR"}[5m]))'
# Link traces to profiles (Pyroscope)
tracesToProfiles:
datasourceUid: pyroscope-uid
tags: [{key: "service.name", value: "service_name"}]
# Service map from span metrics
serviceMap:
datasourceUid: prometheus-uid
/a/grafana-exploretraces-app - no TraceQL required# Search traces
GET /api/search?q={status=error}&limit=20&start=<unix>&end=<unix>
# Get trace by ID
GET /api/traces/<traceID>
GET /api/v2/traces/<traceID>
# List all tag names
GET /api/search/tags
# Get values for a tag
GET /api/search/tag/service.name/values
# TraceQL metrics (time series)
GET /api/metrics/query_range?q={status=error}|rate()&start=...&end=...&step=60
# Health check
GET /ready
| Problem | Solution |
|---|---|
| Slow searches | Scale queriers horizontally; scale compactors to reduce block count |
| High memory on queriers | Reduce max_concurrent_queries; lower target_bytes_per_job |
| High memory on ingesters | Reduce max_block_bytes; lower per-tenant trace limits |
| Slow attribute queries | Add dedicated Parquet columns for frequent attributes |
| Cache miss rate high | Increase cache size; tune cache_min_compaction_level |
| Rate limited (429) | Raise max_outstanding_per_tenant or increase per-tenant ingestion limits |
| Memcached connection errors | Increase memcached connection limit (-c 4096) |
span. prefix for span attributes, resource. for process contexttempo_ingester_live_traces to detect memory pressure earlystart/end) to limit search scopeattribute != nil for existence checkswith (most_recent=true) when you need deterministic recent results| Port | Protocol | Purpose |
|---|---|---|
| 3200 | HTTP | Tempo API (queries, search, health) |
| 9095 | gRPC | Internal component communication |
| 4317 | gRPC | OTLP trace ingestion |
| 4318 | HTTP | OTLP trace ingestion |
| 14268 | HTTP | Jaeger Thrift HTTP ingestion |
| 14250 | gRPC | Jaeger gRPC ingestion |
| 6831 | UDP | Jaeger Thrift Compact |
| 6832 | UDP | Jaeger Thrift Binary |
| 9411 | HTTP | Zipkin ingestion |
| 7946 | TCP/UDP | Memberlist gossip |
npx claudepluginhub grafana/skills --plugin grafana-app-sdkInstruments applications with OpenTelemetry for distributed tracing: auto/manual instrumentation, context propagation, sampling, integration with Jaeger or Tempo. Debug latency in distributed systems.
Implements distributed tracing with Jaeger and Tempo for request flow visibility across microservices. Useful for debugging latency, dependencies, bottlenecks, and errors.