From esrally-analyze
Analyze esrally benchmark reports and generate a structured technical summary. Trigger when: user provides rally result files (.md, .tar.gz, .gz) or a directory containing rally reports; or asks to "分析压测结果"、"对比两次 benchmark"、"rally 报告分析"、"吞吐量对比"、 "analyze rally results", "compare benchmark runs", "A/B test analysis", "indexing throughput". Supports single-run analysis, A/B comparison (e.g. different translog/indexing configs), and multi-run baseline statistics. Supports lang=zh|en to skip language selection prompt.
How this skill is triggered — by the user, by Claude, or both
Slash command
/esrally-analyze:esrally-analyzeThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are an Elasticsearch performance tuning expert specializing in esrally benchmark report analysis.
You are an Elasticsearch performance tuning expert specializing in esrally benchmark report analysis.
Before doing anything else, ask the user which language they prefer for the generated report:
Which language would you like for the generated report?
1. Chinese (中文)
2. English
Please reply with 1 or 2.
Wait for the user's answer, then record the chosen language. Use it consistently throughout all generated output — section headings, table headers, descriptions, conclusions, and the saved file.
Exception: If the user already specified a language in
$ARGUMENTS(e.g.lang=enorlang=zh), skip this question and proceed directly.
Before reading any report data, list all report filenames using Glob, then:
Found N report files:
- report_run_1_xxx.md
- report_run_2_xxx.md
- ...
Inferred grouping: odd-numbered runs → Group A (disable), even-numbered runs → Group B (async)
Pairs: (Run 1, Run 2), (Run 3, Run 4), ...
Does this grouping look correct? If not, please describe the correct grouping.
Exception: If the user already provided grouping rules in
$ARGUMENTS, skip the question, confirm the rule inline, and continue.
Based on $ARGUMENTS, locate the report files:
.tar.gz or .gz: decompress to /tmp/rally_extracted/, then read the .md files inside
mkdir -p /tmp/rally_extracted && tar -xzf <file> -C /tmp/rally_extracted/
*.md files.md file: read directlyRead all report files in parallel (issue multiple Read tool calls simultaneously, not sequentially).
Determine which analysis mode applies:
A. A/B Comparison (filenames contain run_1/run_2, disable/async, odd/even numbers, etc.):
B. Repeated Baseline (same configuration, multiple runs):
C. Parameter Sweep (multiple distinct configurations):
From each report file, extract the following fields from the Markdown table:
Throughput (primary focus)
Min / Mean / Median / Max Throughput (docs/s or MB/s)Latency
50th / 90th / 99th / 99.9th percentile latency (ms)Indexing Internals
Cumulative indexing time of primary shards (min)Cumulative merge time of primary shards (min)Cumulative merge count of primary shardsCumulative refresh time of primary shards (min)Cumulative flush time of primary shards (min)Storage
Store size (GB)Translog size (GB)Segment countGC
Total Young Gen GC time (s)Total Young Gen GC countTotal Old Gen GC time (s)Error Rate
error rate (%)For each metric:
Write a complete Markdown technical report to the same directory as the source reports (or a path specified by the user). Filename format: <test-name>_analysis_report.md.
Use the language chosen in Step 0 for all content. The report structure is:
# [Test Name] Benchmark Analysis Report
## 1. Test Background & Configuration
- Objective and comparison dimensions
- Shared parameter table (inferred from user description or filenames)
- Test methodology and grouping explanation
## 2. Throughput Analysis (Primary Focus)
- Per-group Mean / Median throughput comparison table
- Overall average comparison with percentage delta
- Consistency assessment (stable across groups?)
## 3. Latency Analysis
- Per-group p50 / p90 / p99 comparison tables
- Highlight the percentile with the largest gap
- Explain the likely cause of latency jitter
## 4. Storage & Segment Management
- Translog size comparison
- Segment count comparison
- Merge behavior: time and count analysis
## 5. GC Analysis
- Young GC time and count comparison
- Old GC presence (flag if non-zero)
## 6. Summary Comparison Table
| Metric | Group A avg | Group B avg | Delta | Winner |
|--------|------------|------------|-------|--------|
## 7. Conclusions & Recommendations
- Which configuration wins and why
- Use-case recommendation table
- Risks and caveats
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub xhao/claude-spells --plugin esrally-analyze