nsys-ai

AI-powered analysis for NVIDIA Nsight Systems profiles

Navigate GPU kernel timelines, diff two runs, and diagnose performance bottlenecks with an evidence-first agent — from your browser or terminal.

Mission: Build an agent that understands GPU performance from first principles — one that can identify pipeline bubbles, calculate MFU, assess arithmetic intensity, and diagnose the root causes that cost millions of GPU hours, turning months of expert debugging into minutes.

nsys-ai reads .nsys-rep or .sqlite exports from NVIDIA Nsight Systems and turns them into something you can navigate and reason about: a web timeline, terminal viewers, a before/after diff that reports whether a change actually helped, and a set of deterministic analysis skills an LLM agent can drive. .nsys-rep files are opened directly — nsys-ai exports them to SQLite for you on first use.

Installation

pip install nsys-ai

No CUDA and no Nsight install are required to analyze a profile. Python 3.10+ only. (Capturing a new .nsys-rep, or converting one, needs the nsys CLI on your machine; analyzing an existing .sqlite does not.)

Quick start

1. Capture a profile

For ML training, capture a few representative iterations rather than the whole run — it keeps the profile small and the profiler overhead low. Mark the region with the CUDA profiler API and trace CUDA plus NVTX:

import torch

for step in range(warmup):
    train_step()
torch.cuda.synchronize()
torch.cuda.cudart().cudaProfilerStart()
for step in range(3):            # profile these iterations
    train_step()
torch.cuda.synchronize()
torch.cuda.cudart().cudaProfilerStop()

nsys profile --capture-range=cudaProfilerApi --trace=cuda,nvtx \
  -o my_training python train.py
# -> my_training.nsys-rep

--trace=cuda is what every skill relies on (GPU kernels, memory copies, CUDA API). nvtx adds the annotation hierarchy that drives the iteration, region, and layer views. To use the iteration tools (iters, diff --iteration), annotate each step with a consistent NVTX marker — see Focused Profiling and NVTX Annotations.

No workload handy? Download an example profile:

cd examples/example-20-megatron-distca && python download_data.py
# -> output/megatron_distca.nsys-rep

2. Open it

# Default: open the web timeline in your browser
nsys-ai my_training.nsys-rep

# Metadata and GPU info
nsys-ai info my_training.nsys-rep

# GPU kernel summary
nsys-ai summary my_training.nsys-rep --gpu 0

Prefer the terminal? The TUIs work the same way:

nsys-ai timeline my_training.nsys-rep --gpu 0   # Perfetto-style horizontal timeline
nsys-ai tui my_training.nsys-rep --gpu 0        # NVTX tree browser

3. Compare two runs

nsys-ai diff before.sqlite after.sqlite

Web timeline

A browser-based multi-GPU viewer with progressive rendering — no --trim required. This is the default view when you run nsys-ai <profile>.

nsys-ai my_training.nsys-rep                       # opens in your browser
nsys-ai timeline-web my_training.nsys-rep --gpu 0 1 2 3

Multi-GPU stacked view with color-coded separators
Progressive rendering — pre-builds the NVTX tree at startup, then serves tiles in about a millisecond each
NVTX hierarchy bars (L0-L5) per GPU
AI chat sidebar (press a) and kernel search (press /)

Input	Action
Swipe / `h` `l` / arrows	Pan through time
Swipe up-down / `j` `k`	Select stream
Pinch / `Shift+scroll` / `+` `-`	Zoom
`f` or `0`	Fit full time range
`Tab`	Next kernel
`/`	Search kernels
`n`	Toggle NVTX
`a`	AI chat
`?`	Help overlay

Timeline TUI

A Perfetto-style horizontal viewer with per-stream kernels, NVTX hierarchy bars, and a time-cursor navigation model.

Key	Action
arrows	Pan time / select stream
`Shift+arrows`	Page pan (quarter viewport)
`Tab`	Snap to next kernel
`+` `-`	Zoom
`/`	Filter kernels by name
`m`	Minimum-duration threshold
`d`	Toggle demangled names
`B`	Save bookmark (with kernel + NVTX context)
`C`	Config panel (stream rows, tick density, NVTX depth)
`h`	Full help overlay

nsys-ai

Popularity

What's Inside

README

nsys-ai

Installation

Quick start

1. Capture a profile

2. Open it

3. Compare two runs

Web timeline

Timeline TUI

Profile diff

Confidence

Similar Plugins

kernel-opt-agent

NVIDIA

application-profiler

ai-infra-auto-driven-skills

cuda

humanize

Popularity

Health & Quality

Similar Plugins

kernel-opt-agent

NVIDIA

application-profiler

ai-infra-auto-driven-skills

cuda

humanize