From qa-grpc
Wraps ghz, the gRPC load testing tool, for throughput and latency benchmarking. Covers test invocation (--proto + --call + host:port; or --protoset for compiled descriptors), load parameters (-n total requests, -c concurrency, -r RPS rate limit, -z duration), output formats (json/csv/html/influx-summary for CI consumption), the metrics reported (RPS achieved, latency p50/p95/p99, status-code distribution, errors), and CI integration patterns for regression gating. Use when benchmarking a gRPC service's throughput or detecting latency regressions in CI. Composes grpc-status-code-mapping-reference for interpreting non-OK responses.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qa-grpc:ghz-loadThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Per
Per
ghz.sh/docs/usage, ghz accepts a
.proto (or compiled protoset), a method, a host:port, and
load parameters, and emits per-request metrics + a summary.
This skill wraps ghz for two use cases: ad-hoc throughput
measurement and CI regression gating.
Per ghz.sh/docs/install:
# Homebrew
brew install ghz
# Go install
go install github.com/bojand/ghz/cmd/ghz@latest
Verify:
ghz --version
config.jsonPer ghz docs, --config=path reads a JSON / TOML config. For
reproducibility, commit it to the repo:
{
"proto": "./proto/user.proto",
"import-paths": ["./proto", "./vendor"],
"call": "user.v1.UserService/GetUser",
"host": "localhost:8080",
"insecure": true,
"total": 10000,
"concurrency": 50,
"rps": 0,
"data": {
"id": "user-1"
},
"format": "json",
"output": "ghz-report.json",
"skipFirst": 100
}
skipFirst: 100 discards the first 100 requests (cold-cache /
warmup). Set per-service.
ghz --proto=./proto/user.proto \
--call=user.v1.UserService/GetUser \
--insecure \
-n 10000 -c 50 \
-d '{"id": "user-1"}' \
localhost:8080
Per ghz.sh/docs/usage:
| Flag | Meaning |
|---|---|
--proto | Path to the .proto file |
--protoset | Path to a compiled descriptor set (alternative) |
--import-paths | Comma-separated proto import paths |
--call | package.Service/Method |
--insecure | "Use plaintext and insecure connection" |
-n, --total=N | "Number of requests to run. Default is 200" |
-c, --concurrency=N | "Number of request workers to run concurrently" |
-r, --rps=N | "Requests per second (RPS) rate limit"; 0 = unlimited |
-z, --duration=N | Total duration (30s, 5m) - alternative to -n |
-t, --timeout=N | Per-request timeout (default 20s) |
-d | JSON message payload |
-D | Path to a JSON file containing the payload |
ghz --proto=./proto/user.proto \
--call=user.v1.UserService/GetUser \
--insecure \
-c 50 -z 60s -r 200 \
-d '{"id":"user-1"}' \
localhost:8080
Run for 60s, 50 concurrent workers, capped at 200 RPS. Useful for confirming the service can sustain the target rate.
ghz -z 5m -c 100 ...
When testing for stability / soak, -z beats -n - the test
ends after N minutes regardless of throughput.
Streaming RPCs aren't natively load-tested by ghz; it sends one
unary call per worker per request. For streaming load see
grpc-streaming-test-author.
summary format)Summary:
Count: 10000
Total: 20.45 s
Slowest: 120.34 ms
Fastest: 2.15 ms
Average: 10.23 ms
Requests/sec: 488.94
Status code distribution:
Status code distribution:
[OK] 9983 responses
[DeadlineExceeded] 17 responses
Per
grpc-status-code-mapping-reference,
any non-OK is a flag for investigation.
ghz --config=ghz.config.json
# Writes ghz-report.json
Schema highlights:
{
"count": 10000,
"total": 20450000000,
"average": 10230000,
"fastest": 2150000,
"slowest": 120340000,
"rps": 488.94,
"latencyDistribution": [
{"percentage": 50, "latency": 8000000},
{"percentage": 95, "latency": 25000000},
{"percentage": 99, "latency": 80000000}
],
"statusCodeDistribution": {"OK": 9983, "DeadlineExceeded": 17},
"errorDistribution": {}
}
Latencies are nanoseconds.
Per ghz.sh/docs/usage, --format
options: summary (default), csv, json, pretty, html,
influx-summary, influx-details. Use html for shareable
single-file reports; influx-* to ship metrics to InfluxDB.
# .github/workflows/grpc-perf.yml
name: grpc-perf
on:
pull_request:
paths:
- "service/**"
- "proto/**"
jobs:
ghz-baseline:
runs-on: ubuntu-latest
services:
service-under-test:
image: my-grpc-service:pr-${{ github.event.pull_request.number }}
ports: [8080]
steps:
- uses: actions/checkout@v5
- name: Install ghz
run: |
curl -L https://github.com/bojand/ghz/releases/download/v0.120.0/ghz-linux-x86_64.tar.gz | tar xz
sudo mv ghz /usr/local/bin/
- name: Warm + load
run: |
ghz --config=tests/perf/ghz.config.json
- name: Restore baseline
uses: actions/cache@v4
with:
path: baseline-ghz-report.json
key: ghz-baseline-${{ github.base_ref }}
- name: Compare
run: python tests/perf/compare-ghz.py baseline-ghz-report.json ghz-report.json
compare-ghz.py checks p99 latency is within +10% of baseline;
fails otherwise:
import json, sys
baseline = json.load(open(sys.argv[1]))
current = json.load(open(sys.argv[2]))
def p99(report):
for entry in report["latencyDistribution"]:
if entry["percentage"] == 99:
return entry["latency"]
return None
p99_baseline = p99(baseline) / 1_000_000 # ms
p99_current = p99(current) / 1_000_000
delta = (p99_current - p99_baseline) / p99_baseline
if delta > 0.10:
print(f"❌ p99 regressed: {p99_baseline:.1f}ms → {p99_current:.1f}ms ({delta*100:.1f}%)")
sys.exit(1)
print(f"✅ p99: {p99_baseline:.1f}ms → {p99_current:.1f}ms ({delta*100:+.1f}%)")
ghz --config=ghz.config.json --format=html --output=ghz-report.html
Generate a single HTML report attached to the PR for human review of distribution shape (long tail, bimodal, etc.).
| Anti-pattern | Why it fails | Fix |
|---|---|---|
-n 100 for a "load test" | Sample size too small; metrics noisy | At least -n 5000 or -z 30s |
No skipFirst | Cold cache / JIT warmup inflates latencies | skipFirst: ~5-10% of total |
Unbounded --concurrency | Tests the load generator, not the service | Match -c to expected production concurrency |
| Single-payload load test | Misses cache / branch-prediction noise | Vary -d payloads via -D <file> |
| Compare summary across runs without statistical context | Single-run noise → false regressions | Run N=3 times; compare distributions, not single numbers |
--insecure against TLS-required services | Connection failure dominates results | Match prod TLS config |
| Treating non-OK as transport failure | Status codes have meaning per grpc-status-code-mapping-reference | Inspect distribution; classify per AIP-194 |
| Load-testing on shared CI runner | Other jobs perturb CPU; noisy | Dedicated runner or isolate via Docker resource limits |
grpc-streaming-test-author.--call per process.grpc-status-code-mapping-reference.grpc-streaming-test-author.qa-load-testing/k6-load-testing,
qa-load-testing/jmeter-load-testing.npx claudepluginhub testland/qa --plugin qa-grpcProvides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.