Skill

compile-trace-dynamo

Debug PyTorch Dynamo stage - bytecode capture, FX graph construction, graph breaks, and pre-grad passes. Covers TORCH_LOGS for dynamo/graph_breaks/pre_grad_graphs, interpreting FX graph files, understanding graph break reasons, and pre-grad fusion patterns (Conv-BN, split-cat). Load after compile-bisect indicates backend='eager'.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/torch-compile:compile-trace-dynamo

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

How to trace and debug Dynamo bytecode capture, graph breaks, and pre-grad FX passes.

SKILL.md

467 lines · ~2.8k tokens

Stats

LanguagePython

Parent stars0

MaintenanceGood

Last CommitJun 4, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Tracing Dynamo Stage - FX Graph Capture

How to trace and debug Dynamo bytecode capture, graph breaks, and pre-grad FX passes.

Stage Overview
What to Trace
Logging Setup
Output Files
Pre-Grad FX Passes
Debugging Workflows
Common Issues

Stage Overview

Dynamo Stage = Python bytecode capture → FX graph (aten ops) → Pre-grad passes

What it does:

Intercepts Python execution via PEP 523 frame evaluation
Symbolically executes bytecode to build FX graphs
Identifies graph breaks (untraceable code)
Runs pre-grad optimization passes

For detailed Dynamo mechanics: See pytorch-dynamo skill

Pipeline Position:

Python → Dynamo (FX + aten ops) → [Pre-Grad Passes] → AOT → Inductor

What to Trace

Trace When...

Graph breaks:

Multiple small compiled graphs instead of one large graph
Performance degradation from breaks
Understanding why code isn't fully traced

Unsupported operations:

Compilation failures
Silent fallbacks to eager mode
Unexpected behavior after compilation

Pre-grad optimizations:

Conv-BN fusion not happening
Split-cat patterns not optimizing
Understanding FX-level transformations

Logging Setup

Basic Logging

Minimal (graph breaks only):

TORCH_LOGS="graph_breaks" python script.py

Standard (Dynamo + breaks):

TORCH_LOGS="dynamo,graph_breaks" python script.py

Comprehensive (including FX graphs):

TORCH_LOGS="dynamo,graph_breaks,graph_code,pre_grad_graphs" python script.py

Available Loggers

Logger	What It Shows	When to Use
`dynamo`	All Dynamo debug output	General debugging
`graph_breaks`	Why and where breaks occur	Minimizing graph breaks
`graph_code`	Readable FX graph structure	Understanding captured graph
`guards`	Generated guards	Dynamic shape issues
`recompiles`	Recompilation reasons	Cache misses, performance
`pre_grad_graphs`	FX graphs before/after pre-grad	Pre-grad pass effects
`bytecode`	Bytecode transformations	Deep Dynamo debugging

Programmatic Logging

import os
os.environ['TORCH_LOGS'] = 'dynamo,graph_breaks'

import torch._inductor.config as config
config.debug = True  # Enable debug file output

Output Files

Location

Default: /tmp/torchinductor_$USER/ Custom: Set TORCH_COMPILE_DEBUG_DIR=/path/to/dir

Generated Files

1. FX Graph Files

fx_graph_runnable.py         # Standalone reproduction script
fx_graph_readable.py         # Human-readable graph (before passes)
fx_graph_transformed.py      # Graph after pre-grad passes

How to use:

# Run reproduction script
python /tmp/torchinductor_$USER/fx_graph_runnable.py

# Compare before/after
diff fx_graph_readable.py fx_graph_transformed.py

2. Graph Break Logs

Console output shows:

Graph break: print(y)
  Reason: call_function print in skip list
  User code: /path/to/file.py:5 in fn
  Graph Count: 2 (compilation split into multiple graphs)

Interpreting FX Graphs

Structure:

graph():
    %x : torch.Tensor = placeholder[target=x]  # Input
    %relu : torch.Tensor = call_function[      # Operation
        target=torch.ops.aten.relu.default
    ](args = (%x,))
    %add : torch.Tensor = call_function[
        target=torch.ops.aten.add.Tensor
    ](args = (%relu, 1))
    return add                                  # Output

Node types:

placeholder - Function inputs
call_function - ATen ops (most operations)
call_module - nn.Module calls
get_attr - Parameter/buffer access
output - Return values

What to look for:

Number of nodes (complexity)
ATen ops used (lowering targets)
Graph structure (dependencies, parallelism)

Pre-Grad FX Passes

When They Run

After: Dynamo capture Before: AOT Autograd (training) or Inductor (inference)

Purpose: Optimize FX graph at aten-op level

Logging Pre-Grad Passes

TORCH_LOGS="pre_grad_graphs" python script.py

Shows:

FX graph before passes
Each pass applied
FX graph after passes
What changed

Common Passes

Pass	What It Does	How to Verify
Conv-BN Fusion	Folds BatchNorm into Conv weights	Check if `batch_norm` node removed
Split-Cat Elimination	Removes redundant split/cat	Check if `split`/`cat` pair eliminated
Normalization	NumPy compatibility rewrites	Compare before/after graph
Group Batch Fusion	Batches operations together	Look for combined ops

Verifying Pass Effects

Before Pre-Grad (fx_graph_readable.py):

%conv : Tensor = call_function[target=aten.conv2d](args = (%x, %weight))
%bn : Tensor = call_function[target=aten.batch_norm](args = (%conv, ...))

After Pre-Grad (fx_graph_transformed.py):

%fused_conv : Tensor = call_function[target=aten.conv2d](
    args = (%x, %fused_weight, %fused_bias)  # BN folded in
)
# batch_norm node eliminated

Debugging Workflows

Workflow 1: Minimize Graph Breaks

Goal: Reduce number of compiled graphs

Steps:

Enable logging:

TORCH_LOGS="graph_breaks" python script.py

Identify break locations:

Graph break: <operation>
  Reason: <why it broke>
  User code: <file:line>

Fix each break:
- Dynamic control: Replace with torch.cond()
- I/O ops: Move outside compiled region
- Custom ops: Use torch._dynamo.allow_in_graph()

Verify with fullgraph mode:

@torch.compile(fullgraph=True)  # Errors if any breaks
def fn(x):
    ...

Workflow 2: Debug Unsupported Operation

Symptom: UnsupportedOperationError or silent graph break

Steps:

Enable verbose logging:

TORCH_LOGS="dynamo,graph_breaks" python script.py

Locate unsupported op in graph break message
Check if op should be supported:
- Look in torch/_dynamo/skipfiles.py for skip lists
- Check torch/_dynamo/variables/ for handler
Solutions:
- Rewrite using supported ops
- Allow via torch._dynamo.allow_in_graph(fn)
- Skip via explicit torch._dynamo.graph_break()

Workflow 3: Verify Pre-Grad Optimization

Goal: Confirm expected optimization happened

Steps:

Enable pre-grad logging:

TORCH_LOGS="pre_grad_graphs" python script.py

Check before graph (fx_graph_readable.py):

grep "batch_norm\|conv2d" fx_graph_readable.py

Check after graph (fx_graph_transformed.py):

grep "batch_norm\|conv2d" fx_graph_transformed.py

Verify pattern eliminated:
- Conv-BN: batch_norm should be gone
- Split-Cat: split/cat pair should be gone

Workflow 4: Debug Dynamic Shapes

Symptom: Guards failing, recompilation, or wrong shapes

Steps:

Enable guard logging:

TORCH_LOGS="guards,recompiles" python script.py

Check generated guards:

Guard: tensor 'x' shape[0] == 10
Guard: tensor 'x' shape[1] == 20

Identify problematic guards:
- Too specific → causes recompilation
- Too loose → wrong specialization

Fix with shape hints:

x = torch.randn(10, 20)
torch._dynamo.mark_dynamic(x, 0)  # Dimension 0 is dynamic

Common Issues

Issue: Too Many Graph Breaks

Symptom: Many small graphs, slow performance

Debug:

TORCH_LOGS="graph_breaks" python script.py | grep "Graph break" | wc -l

Solutions:

Refactor to minimize dynamic control flow
Use torch.cond() for conditional execution
Move non-traceable code outside @torch.compile
Mark functions as traceable: torch._dynamo.allow_in_graph(fn)

Issue: Slow Compilation

Symptom: Long wait on first execution

Debug:

TORCH_COMPILE_DYNAMO_PROFILER=1 python script.py

Solutions:

Use compilation cache (enabled by default)
Reduce graph complexity
Use mode="reduce-overhead" for faster compile:
```
torch.compile(fn, mode="reduce-overhead")
```

Issue: Conv-BN Not Fusing

Symptom: Expected fusion didn't happen

Debug:

TORCH_LOGS="pre_grad_graphs" python script.py
# Check if both conv and batch_norm still in transformed graph

Common causes:

Model in training mode (use model.eval())
Conv and BN not consecutive in graph
Unsupported conv or BN variant

Fix:

model.eval()  # Pre-grad conv-bn fusion only in eval mode

Issue: Wrong Output After Compilation

Symptom: Compiled function produces incorrect result

Debug Steps:

Compare with eager backend:

@torch.compile(backend="eager")
def fn(x):
    ...

Check FX graph structure:

TORCH_LOGS="graph_code" python script.py
# Verify captured graph matches expectations

Verify guards:
```
TORCH_LOGS="guards" python script.py
```
Check for mutations:
- In-place ops may not be properly tracked
- Verify with @torch.compile(fullgraph=True)

Quick Reference

Essential Commands

# Basic tracing
TORCH_LOGS="dynamo,graph_breaks" python script.py

# With FX graphs
TORCH_LOGS="dynamo,graph_breaks,graph_code" python script.py

# Include pre-grad passes
TORCH_LOGS="dynamo,graph_breaks,pre_grad_graphs" python script.py

# Full debug
TORCH_LOGS="dynamo,graph_breaks,graph_code,guards,recompiles,pre_grad_graphs" python script.py

Output Files

# View FX graph
cat /tmp/torchinductor_$USER/fx_graph_readable.py

# Run reproduction script
python /tmp/torchinductor_$USER/fx_graph_runnable.py

# Compare before/after passes
diff /tmp/torchinductor_$USER/fx_graph_{readable,transformed}.py

Programmatic Debugging

# Count compilations
from torch._dynamo.testing import CompileCounter
cnt = CompileCounter()
compiled_fn = torch.compile(fn, backend=cnt)

# Explicit graph break
torch._dynamo.graph_break()

# Force fullgraph (error on break)
@torch.compile(fullgraph=True)
def fn(x): ...

# Compile-time breakpoint
import torch._dynamo.comptime as comptime
comptime.breakpoint()  # Drops into pdb during compile

Next Stage

After Dynamo Stage: Load compile-trace-aot skill - Tracing AOT Autograd transformations

Or: Load compile-trace-inductor skill - Skip to Inductor stage

Reference: See compile-overview skill for complete pipeline context.

compile-trace-dynamo

Invocation

Context Preview

SKILL.md

compile-trace-dynamo

Invocation

Context Preview

SKILL.md

Tracing Dynamo Stage - FX Graph Capture

Table of Contents

Stage Overview

What to Trace

Trace When...

Logging Setup

Basic Logging

Available Loggers

Programmatic Logging

Output Files

Location

Generated Files

Interpreting FX Graphs

Pre-Grad FX Passes

When They Run

Logging Pre-Grad Passes

Common Passes

Verifying Pass Effects

Debugging Workflows

Workflow 1: Minimize Graph Breaks

Workflow 2: Debug Unsupported Operation

Workflow 3: Verify Pre-Grad Optimization

Workflow 4: Debug Dynamic Shapes

Common Issues

Issue: Too Many Graph Breaks

Issue: Slow Compilation

Issue: Conv-BN Not Fusing

Issue: Wrong Output After Compilation

Quick Reference

Essential Commands

Output Files

Programmatic Debugging

Next Stage

Similar Skills

Tracing Dynamo Stage - FX Graph Capture

Table of Contents

Stage Overview

What to Trace

Trace When...

Logging Setup

Basic Logging

Available Loggers

Programmatic Logging

Output Files

Location

Generated Files

Interpreting FX Graphs

Pre-Grad FX Passes

When They Run

Logging Pre-Grad Passes

Common Passes

Verifying Pass Effects

Debugging Workflows

Workflow 1: Minimize Graph Breaks

Workflow 2: Debug Unsupported Operation

Workflow 3: Verify Pre-Grad Optimization

Workflow 4: Debug Dynamic Shapes

Common Issues

Issue: Too Many Graph Breaks

Issue: Slow Compilation

Issue: Conv-BN Not Fusing

Issue: Wrong Output After Compilation

Quick Reference

Essential Commands

Output Files

Programmatic Debugging

Next Stage

Similar Skills