Skill

build-data-report

Build a comprehensive multi-section data analysis report for battery cells whose measurements live on the Ionworks platform. Use when the user asks to "build a report", "summarize the data", "characterize cells", "create a data overview", "make an analysis PDF" for one or more cell designs — or when they hand over a dataset and ask "what's in it / what do we have / what does the BOL performance look like / how does it age". Produces a markdown + PDF with rate capability, DCIR, OCV, GITT/entropic, aging, and gap-analysis sections. Strongly prefer this skill whenever there is platform-resident measurement data and the user wants either a full report or any subset of these characterization sections.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/ionworks:build-data-report

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You're building a reproducible analysis report for cells whose measurements are already on the Ionworks platform. The report fetches **everything live from the platform** at build time — no local data files. Output is a markdown file with embedded plots, rendered to PDF.

Supporting Files

references/plot_patterns.mdreferences/prod_data.mdreferences/section_templates.mdreferences/step_filtering.md

SKILL.md

262 lines · ~4.1k tokens

Stats

LanguageShell

Stars3

MaintenanceGood

Last CommitJun 11, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Battery Data Analysis Report

What this skill does

You're building a reproducible analysis report for cells whose measurements are already on the Ionworks platform. The report fetches everything live from the platform at build time — no local data files. Output is a markdown file with embedded plots, rendered to PDF.

The report is structured by protocol family (rate capability, HPPC, GITT, etc.). Each section follows the same shape: pick a representative measurement, show its raw signal in detail, then summarize statistics across all measurements in that family.

Workflow

1. Discover what's on the platform

Before writing any plot code, find out what protocol families exist for the cells in question. This determines which sections the report will have.

from collections import defaultdict
from ionworks import Ionworks, Navigator, set_dataframe_backend
set_dataframe_backend("pandas")
nav = Navigator(Ionworks())

families = defaultdict(lambda: defaultdict(int))
for cell in cell_specs:  # the cell-spec names the user named, e.g. ("CellA", "CellB")
    for inst in nav.instances(cell):
        for m in nav.measurements(inst.id):
            families[cell][proto_family(m, inst)] += 1

Print this — share what's available with the user before deciding the section list. Don't assume; some cells have rich BOL data and no aging, others have heavy aging and minimal BOL. Some families (Pre_cycle, Precon_DCR) are operational overhead and not worth their own section.

2. Use metadata, never names

Critical rule: every filter, group, and selection uses measurement.protocol.* fields:

protocol.family — Rated_Discharge, HPPC, GITT, Fast_charge, Drive_cycle, Entropic, CC_Cycling, etc.
protocol.ambient_temperature_degc — float, may be None
protocol.c_rate — float, may be None
protocol.mode — "charge" / "discharge" / "profile" / "calendar" / "mixed"
protocol.soc_pct — int, may be None

measurement.protocol is the single source of truth for what kind of test this is and under what conditions. Every section of the report — what to include in it, how to group within it, which measurements to compare — comes from this dict. Reach for m.name, file paths, or instance names only when you've confirmed protocol cannot answer the question.

If protocol fields are missing or wrong, fix the protocol — don't work around it. Stop building the report, go back to the processing-pipeline step that classifies protocols, and make family / ambient_temperature_degc / c_rate / mode / soc_pct explicit on every affected measurement. Then re-upload (or patch in place) and resume. Once you've worked around a missing field in analysis code, every future report inherits the workaround and the underlying data stays wrong.

Concrete examples of the fix-the-protocol mindset:

Two measurements with family=HPPC but different pulse widths that shouldn't be compared together. Don't filter by m.name; add a pulse_duration_s (or whatever it actually is) field to protocol and split the report by it.
A measurement at 25 °C ambient but ambient_temperature_degc is None. Don't infer from the folder name; populate the field at processing time.
A "rate capability" cohort where some measurements have c_rate=0.5 and others have it unset. Don't guess; populate.

Some families legitimately don't have all fields — Entropic measurements typically have soc_pct set but no ambient_temperature_degc (the protocol varies SOC at a fixed ambient). Adapt grouping accordingly (group Entropic by SOC, not temperature). The rule is "every field that makes a measurement distinguishable from another measurement of the same family must be populated" — not "every field must always be set."

3. Build the section list

The report is a selection from a library of section types, not a fixed 12-section template. Pick whatever the inventory in step 1 says is available — a small dataset might be 3 sections; a comprehensive one might be 12+. There's no "complete" report; there's a report that matches the data on hand.

Common section types, in the typical order they appear:

Type	Source family/families	Skip when
Data Inventory	all	never (always section 1)
Executive Summary	computed	never (always section 2)
BOL Discharge Rate Cap	`Rated_Discharge`, `Cap_rated`, `Dis_cap_rated`	no discharge ladder at varying rate
BOL Charge Rate Cap	`Rated_Charge`	no charge ladder
Fast Charge	`Fast_charge`	no `Fast_charge` measurements
Drive Cycle	`Drive_cycle`	no real-load-profile data
Pulse Resistance / DCIR	`HPPC`	no HPPC
Open Circuit Voltage	`Rated_Discharge` (C/20) ± `HPPC` rest ± `GITT` rest	no slow discharge and no rest-based reconstruction
GITT	`GITT`	no GITT
Entropic Coefficient	`Entropic`	no entropic measurements
Cycle Life	`CC_Cycling`, `Profile_Cycling`, `Calendar`	no aging campaign
Data Quality & Gaps	computed	never (always last)

You may also need section types not in this list — e.g., formation cycles, abuse tests, EIS spectra, calorimetry. Treat the list as a starting palette, not a contract; the section-types listed are the ones most commonly seen on the platform.

Ordering principles: inventory and exec summary first. Then group by purpose — BOL performance (discharge → charge → application tests like fast charge & drive cycle), then characterisation (DCIR → OCV → GITT → Entropic in order from practical to model-oriented), then aging, then gaps. Application tests follow rate capability because they live in the same conceptual lane (what does the cell do at the terminals); characterisation follows because it goes deeper into structure/physics.

Numbering: clean integers, no 3b, 3c, 5b, 5c. If you find yourself reaching for a letter suffix, renumber instead — a few edits to renumber later sections cost less than confusing the reader.

Cell count agnostic: every section pattern works for 1, 2, or N cells. Single-cell reports skip the comparison subsections; multi-cell reports add one subsection per cell. The report shape doesn't change based on how many cells are in scope.

4. Per-section pattern

Every section follows the same shape:

(a) Brief prose explaining what the protocol measures and why it matters for the cell engineer (b) One representative measurement shown in detail (raw V vs t, V vs Q, or a 2×2 grid across temperatures/SOCs) (c) Summary across all measurements in that family — either a multi-line overlay plot or a markdown table

The representative + summary pattern is what makes the report useful to skim and still drillable: a reader sees one example trace, understands the protocol, then sees the across-cells statistics for the same family.

5. Implement plot functions with a consistent signature

def plot_X(nav: Navigator, spec_name: str, ..., out_path: Path) -> bool:
    """..."""
    inst, m = find_one(nav, spec_name, "FamilyName", temperature_C=...)
    if m is None:
        return False
    try:
        ts = nav.time_series(m.id)
    except Exception:
        return False
    # ... build plot ...
    fig.savefig(out_path)
    plt.close(fig)
    return True

Returning bool lets write_report() skip the section when data is missing. This is preferable to raising — every cell is different and we expect some sections to be empty for some cells.

Same shape for summary functions:

def summarise_X(nav: Navigator, spec_name: str) -> pd.DataFrame: ...

6. Wire into main() and write_report()

main() calls every plot/summary function once and stuffs results into a ctx dict. write_report(ctx, path) walks the sections and renders markdown — referencing ctx["plot_name"] (a bool) and ctx["summary_name"] (a DataFrame). Keep the two stages separate; it lets you regenerate the markdown without re-fetching every plot, and lets you test plot functions in isolation.

ctx = {
    "rate_plot_tcell": rate_plot_tcell,
    "rate_tcell": rate_tcell_df,          # raw DataFrame for the executive summary
    "fast_charge_summary": fc_df,
    # ...
}
write_report(ctx, OUT / "data_report.md")

7. Render to PDF

Use Chrome headless or weasyprint. A separate render_pdf.py script (not part of build_report.py) keeps the data-fetching path independent from the rendering path. Both should be runnable independently with uv run python ....

Key code patterns

These patterns recur across most plot functions — extract them as helpers in your script.

Navigator (cached SDK accessor)

Use :class:ionworks.Navigator — it memoises specs / instances / measurements / steps / time_series, paginates automatically, and returns listings sorted by name. Set the dataframe backend to pandas once at the top of the script.

from ionworks import Ionworks, Navigator, set_dataframe_backend

set_dataframe_backend("pandas")
nav = Navigator(Ionworks())

See references/prod_data.md for the report-specific helpers built on top.

find_one / find_measurements

def find_one(nav, spec_name, family, temperature_C=None, instance_filter=None):
    """Return first (inst, m) matching the family + optional temp."""

def find_measurements(nav, spec_name, families: tuple[str, ...]):
    """Return all [(inst, m), ...] in the given families."""

find_one is for representative-plot selection; find_measurements is for summary functions.

Step filtering for "full" CC charge/discharge

To isolate a complete CC discharge or charge (excluding partial steps, rest steps, CV tails), filter the steps DataFrame:

Discharge: dV < −1V AND End V < 2.7V AND cap in (2.5, 6.5)Ah AND dur > 500s AND |I| > 0.3A

Charge: dV > 1V AND Start V < 3.5V AND End V > 4.0V AND cap in (2.5, 6.5)Ah AND dur > 200s AND |I| > 0.3A

The voltage limits, capacity range, and duration thresholds depend on the cell (these are for a 5 Ah, 2.5–4.2 V cell). Adjust to ~50% of rated capacity as the lower cap bound and the cell's voltage cutoffs.

See references/step_filtering.md for the canonical implementation and the rationale for each threshold.

Representative per C-rate bucket

When you have many measurements at varying rates and want one curve per C-rate bin, bucket by nearest standard rate ([0.05, 0.1, 0.2, 0.33, 0.5, 1.0, 1.5, 2.0, 3.0]) within ±15%, and keep the first record per bucket.

Family priority within a bucket matters. If a standard family (Rated_Charge, Cap_rated) and an auxiliary family (Fast_charge) both have a step in the same C-rate bucket, prefer the standard family — the auxiliary may have a different voltage cutoff or protocol structure that makes the trace look weird. Sort records by (family_priority, c_rate) before bucketing.

_FAMILY_PRIORITY = {"Rated_Charge": 0, "Rated_Discharge": 0, "Cap_rated": 0,
                    "Dis_cap_rated": 0, "Fast_charge": 1}

Temperature column picking

Single-thermocouple cyclers (Arbin) write Temperature [degC]. Multi-channel cyclers (Maccor) write Temperature 1 [degC] ... Temperature N [degC] — preserve each as a separate column at processing time; don't average. For analysis plots that need a single trace, pick the canonical name or fall back to Temperature 1 [degC]:

def _pick_temperature_col(columns):
    if "Temperature [degC]" in columns:
        return "Temperature [degC]"
    for n in (1, 2, 3):
        col = f"Temperature {n} [degC]"
        if col in columns:
            return col
    return None

Surface multi-channel layouts in the report — show all channels overlaid on one example measurement so the reader knows what's there, then note which channel the analysis plots use.

Multi-temperature subplot grid

For techniques like GITT and Entropic where you want one panel per temperature (or per SOC), use a 2×2 layout with preferred values plus fallback:

preferred = [0, 10, 30, 40]
available = {}
for inst, m in find_measurements(nav, spec_name, (family,)):
    T = proto_temp(m, inst)
    if T is not None and T not in available:
        available[T] = (inst, m)
chosen = []
for T in preferred:
    if T in available:
        chosen.append((T, *available[T]))
for T, pair in sorted(available.items()):
    if len(chosen) >= 4: break
    if T not in [c[0] for c in chosen]:
        chosen.append((T, *pair))

This preferred-then-fallback selection avoids hard-coding temperatures that may not exist for every cell.

Iteration

Build the report incrementally. After every new section:

Run uv run python scripts/analysis/build_report.py end-to-end
Open the PDF and look at every figure
Inspect anything that looks anomalous — is the representative measurement actually representative, or is it a weird outlier that the filter happened to pick first?
Fix root causes, not symptoms. If a C-rate trace looks wrong, find the underlying step and check why it was selected.

The fastest debug loop is a small diagnostic Python snippet that calls one helper (_select_full_steps_at_temp, _representative_discharge_per_rate) and prints what gets chosen for the suspicious case. Inline uv run python -c "..." works well — no need to write throwaway files.

Common pitfalls

Silently dropped columns. Cycler readers in ionworksdata return canonical columns only. If the raw file has aux thermocouples (Maccor Temp 2..4, Arbin aux probes) and your report's temperature plots are empty, the reader probably dropped them at processing time. Fix in the processing pipeline by re-reading the source for those channels and join_asof-merging them in, not in the analysis layer.
Missing protocol metadata. A measurement without ambient_temperature_degc will silently be excluded from temperature-filtered selections. If a section is sparse, check whether the classifier ran on the affected cohort — proto_family(m, inst) returning None is the giveaway.
Anomalous representative traces. When _representative_discharge_per_rate picks "the wrong" measurement, the cause is usually iteration order (instances sort alphabetically; the first one wins). The fix is a stable sort with explicit priority, not a hack to filter out the offending measurement.
Section ordering drift. It's tempting to add new sections with lettered suffixes (3b, 3c) to avoid renumbering. Resist — a few renumber edits now are cheaper than confusion later. If the new section logically belongs between 3 and 4, renumber 4-onward.
Mixing protocol families in one plot. Rated_Charge (cutoff 4.20 V) and Fast_charge (cutoff 4.12 V) look similar at first glance but produce different traces. Keep them in separate sections. If two families could plausibly be combined, the rule is: combine when the protocol intent is the same (e.g., Rated_Discharge + Cap_rated both measure rate capability), separate when the intent differs (rate capability vs. multi-cycle stress test).

References

references/section_templates.md — Detailed markdown templates for each of the 12 sections, with the prose conventions and figure-caption style
references/plot_patterns.md — Recipes for the recurring plot types (V vs Q multi-rate, T vs Q multi-rate, rate-capability scatter, twin-axes V+T vs time, multi-temperature grid)
references/step_filtering.md — Detailed step-filter heuristics with rationale for each threshold
references/prod_data.md — Report-specific platform helpers (find_one, find_measurements, proto_family, etc.); the cached accessor itself lives in the SDK as ionworks.Navigator

build-data-report

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

build-data-report

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Battery Data Analysis Report

What this skill does

Workflow

1. Discover what's on the platform

2. Use metadata, never names

3. Build the section list

4. Per-section pattern

5. Implement plot functions with a consistent signature

6. Wire into main() and write_report()

7. Render to PDF

Key code patterns

Navigator (cached SDK accessor)

find_one / find_measurements

Step filtering for "full" CC charge/discharge

Representative per C-rate bucket

Temperature column picking

Multi-temperature subplot grid

Iteration

Common pitfalls

References

Similar Skills

Battery Data Analysis Report

What this skill does

Workflow

1. Discover what's on the platform

2. Use metadata, never names

3. Build the section list

4. Per-section pattern

5. Implement plot functions with a consistent signature

6. Wire into main() and write_report()

7. Render to PDF

Key code patterns

Navigator (cached SDK accessor)

find_one / find_measurements

Step filtering for "full" CC charge/discharge

Representative per C-rate bucket

Temperature column picking

Multi-temperature subplot grid

Iteration

Common pitfalls

References

Similar Skills