From grimoire
Executes structured load tests for validating system performance before launches, capacity planning, code changes, or SLO budget breaches. Covers tools like k6, JMeter, Locust, and Gatling.
How this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:run-load-testThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Execute structured load tests to measure system throughput, latency, and failure characteristics under simulated production traffic before issues affect real users.
Execute structured load tests to measure system throughput, latency, and failure characteristics under simulated production traffic before issues affect real users.
Adopted by: Google (load testing as part of production readiness review), Amazon, Netflix (load testing before every major launch), all engineering orgs with SLO commitments Impact: Load testing before launch catches 80% of performance regressions that would otherwise become production incidents (Google SRE data); cost of fixing performance issues post-launch is 100x higher than pre-launch; capacity headroom discovered in load testing prevents 3 AM pages Why best: Synthetic traffic in testing never reproduces real concurrency behavior; load testing is the only way to discover how a system behaves under real-world concurrent load before real users experience it
Sources: Gregg "Systems Performance" 2nd ed. Pearson (2020); Google "Production Readiness Review" SRE practice; k6 documentation; Fowler "Load Testing" martinfowler.com
Define load test objectives — What do you want to learn? Maximum throughput (requests per second), latency at target throughput (SLO validation), breaking point (at what load does the system degrade), or recovery behavior (does the system recover after overload)? Different objectives require different test shapes.
Model production traffic — Analyze production access logs (last 30 days) to understand: request mix (% GET vs POST, endpoint distribution), concurrency (peak concurrent users), think time (time between user requests), and data distribution (most popular items, p50/p95/p99 request sizes). Model your load test to match this distribution.
Choose a load testing tool — k6 (JavaScript, open source, excellent for API testing, integrates with CI/CD), Apache JMeter (Java, GUI, high enterprise adoption), Locust (Python, programmatic, good for complex scenarios), Gatling (Scala, high performance, good reports). k6 is the current standard for API load testing; JMeter for complex enterprise scenarios.
Write representative test scripts — Implement the top 5-10 user journeys covering 80% of production traffic. Parameterize test data (users, IDs, search terms) to avoid cache-hit-only scenarios. Include authentication flows. Add realistic think time (0.5-2 s between requests to simulate human behavior). Hardcoded test data produces artificially optimistic cache performance.
Start with a baseline test — Run with 10% of expected peak load for 5 minutes. Verify the test itself works correctly (no script errors, responses as expected). Establish baseline metrics: throughput, p50/p95/p99 latency, error rate. This baseline is the reference for all subsequent tests.
Run ramp-up load test — Gradually increase load from 0 to 150% of expected peak over 30-60 minutes. Observe: at what load does latency begin to increase (the knee of the curve)? At what load does error rate exceed SLO threshold? At what load does the system enter saturation? Record the throughput at each threshold.
Run a sustained load test — Apply 80% of peak load for 60-90 minutes. Test for: memory leaks (growing heap over time), connection pool exhaustion, database connection limits, and log disk space exhaustion. Short tests miss time-based degradation; sustained tests reveal them.
Observe all system layers during the test — CPU, memory, disk I/O, and network utilization on application servers; database query times and connection count; cache hit rates; downstream service response times; thread pool and connection pool metrics. Load testing without observing infrastructure metrics is incomplete; the bottleneck may be anywhere in the stack.
Analyze bottlenecks — Identify the first resource to saturate under load. Common bottlenecks: CPU (insufficient compute), memory (GC pressure, leaks), database connections (pool exhaustion), I/O (disk or network bandwidth), and downstream dependencies (slow third-party APIs). Fix the identified bottleneck and re-run — there will always be another.
Document results and thresholds — Write a load test report: test scenario, traffic model, peak throughput achieved, latency at peak, breaking point, identified bottlenecks, and recommendations. Define pass/fail criteria for future tests (SLO thresholds). Automate load test execution in CI/CD with pass/fail gates for regression detection.
npx claudepluginhub jeffreytse/grimoire --plugin grimoireDesigns and runs load, stress, and performance tests to validate system behavior under traffic, identifying bottlenecks and breaking points before production.
Stress tests, capacity plans, and performance benchmarks with k6, Artillery, and Gatling. Detects existing load test infrastructure, designs scenarios, executes tests, and analyzes results against thresholds.
Guides load testing with k6 (stages, thresholds, spike/soak tests, CI integration via GitHub Actions) and Locust for Python teams. Use for verifying performance, finding capacity limits before launch.