Skill

esrally-analyze

Analyze esrally benchmark reports and generate a structured technical summary. Trigger when: user provides rally result files (.md, .tar.gz, .gz) or a directory containing rally reports; or asks to "分析压测结果"、"对比两次 benchmark"、"rally 报告分析"、"吞吐量对比"、 "analyze rally results", "compare benchmark runs", "A/B test analysis", "indexing throughput". Supports single-run analysis, A/B comparison (e.g. different translog/indexing configs), and multi-run baseline statistics. Supports lang=zh|en to skip language selection prompt.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/esrally-analyze:esrally-analyze

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

BashReadGlobGrepWrite

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are an Elasticsearch performance tuning expert specializing in esrally benchmark report analysis.

SKILL.md

189 lines · ~1.6k tokens

Stats

Parent stars0

MaintenanceGood

Last CommitMar 18, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Step 0: Ask for Output Language

Before doing anything else, ask the user which language they prefer for the generated report:

Which language would you like for the generated report?
  1. Chinese (中文)
  2. English

Please reply with 1 or 2.

Wait for the user's answer, then record the chosen language. Use it consistently throughout all generated output — section headings, table headers, descriptions, conclusions, and the saved file.

Exception: If the user already specified a language in $ARGUMENTS (e.g. lang=en or lang=zh), skip this question and proceed directly.

Step 1: Confirm Grouping Rules

Before reading any report data, list all report filenames using Glob, then:

Infer the grouping logic from filename patterns (odd/even numbering, directory names, prefixes, etc.)
Present the inferred result to the user in this format:

Found N report files:
- report_run_1_xxx.md
- report_run_2_xxx.md
- ...

Inferred grouping: odd-numbered runs → Group A (disable), even-numbered runs → Group B (async)
Pairs: (Run 1, Run 2), (Run 3, Run 4), ...

Does this grouping look correct? If not, please describe the correct grouping.

Wait for user confirmation before proceeding. Once confirmed or corrected, record the grouping rule and move to Step 2.

Exception: If the user already provided grouping rules in $ARGUMENTS, skip the question, confirm the rule inline, and continue.

Step 2: Locate and Load Report Files

Based on $ARGUMENTS, locate the report files:

.tar.gz or .gz: decompress to /tmp/rally_extracted/, then read the .md files inside
```
mkdir -p /tmp/rally_extracted && tar -xzf <file> -C /tmp/rally_extracted/
```
Directory: use Glob to match *.md files
Single .md file: read directly

Read all report files in parallel (issue multiple Read tool calls simultaneously, not sequentially).

Step 3: Identify Test Scenario

Determine which analysis mode applies:

A. A/B Comparison (filenames contain run_1/run_2, disable/async, odd/even numbers, etc.):

Apply the confirmed grouping rule
For each metric: compute per-pair delta and percentage change, then average across all pairs

B. Repeated Baseline (same configuration, multiple runs):

Compute mean, max, min, and variance for each metric

C. Parameter Sweep (multiple distinct configurations):

Group by configuration dimension, then compare across groups

Step 4: Extract Key Metrics

From each report file, extract the following fields from the Markdown table:

Throughput (primary focus)

Min / Mean / Median / Max Throughput (docs/s or MB/s)

Latency

50th / 90th / 99th / 99.9th percentile latency (ms)

Indexing Internals

Cumulative indexing time of primary shards (min)
Cumulative merge time of primary shards (min)
Cumulative merge count of primary shards
Cumulative refresh time of primary shards (min)
Cumulative flush time of primary shards (min)

Storage

Store size (GB)
Translog size (GB)
Segment count

Total Young Gen GC time (s)
Total Young Gen GC count
Total Old Gen GC time (s)

Error Rate

error rate (%)

Step 5: Compute Statistics

For each metric:

A/B comparison: per-pair delta, percentage change, and average delta across all pairs
Repeated baseline: mean, max, min
Round all values to 1 decimal place; round percentages to 1 decimal place
If a metric differs by less than 2%, label it as "No significant difference"

Step 6: Generate Technical Report

Write a complete Markdown technical report to the same directory as the source reports (or a path specified by the user). Filename format: <test-name>_analysis_report.md.

Use the language chosen in Step 0 for all content. The report structure is:

# [Test Name] Benchmark Analysis Report

## 1. Test Background & Configuration
- Objective and comparison dimensions
- Shared parameter table (inferred from user description or filenames)
- Test methodology and grouping explanation

## 2. Throughput Analysis (Primary Focus)
- Per-group Mean / Median throughput comparison table
- Overall average comparison with percentage delta
- Consistency assessment (stable across groups?)

## 3. Latency Analysis
- Per-group p50 / p90 / p99 comparison tables
- Highlight the percentile with the largest gap
- Explain the likely cause of latency jitter

## 4. Storage & Segment Management
- Translog size comparison
- Segment count comparison
- Merge behavior: time and count analysis

## 5. GC Analysis
- Young GC time and count comparison
- Old GC presence (flag if non-zero)

## 6. Summary Comparison Table

| Metric | Group A avg | Group B avg | Delta | Winner |
|--------|------------|------------|-------|--------|

## 7. Conclusions & Recommendations
- Which configuration wins and why
- Use-case recommendation table
- Risks and caveats

General Notes

Throughput is the primary metric — always lead with it and use bold or tables to highlight the result
Flag any anomalous values (e.g. one run's Translog size significantly larger than others) with a dedicated note
If all error rates are 0%, state "Error rate: 0% across all runs" without a table
If grouping was auto-inferred, state the inference basis at the top of the report

esrally-analyze

Invocation

Tool Access

Context Preview

SKILL.md

esrally-analyze

Invocation

Tool Access

Context Preview

SKILL.md

Step 0: Ask for Output Language

Step 1: Confirm Grouping Rules

Step 2: Locate and Load Report Files

Step 3: Identify Test Scenario

Step 4: Extract Key Metrics

Step 5: Compute Statistics

Step 6: Generate Technical Report

General Notes

Similar Skills

Step 0: Ask for Output Language

Step 1: Confirm Grouping Rules

Step 2: Locate and Load Report Files

Step 3: Identify Test Scenario

Step 4: Extract Key Metrics

Step 5: Compute Statistics

Step 6: Generate Technical Report

General Notes

Similar Skills