Debug PyTorch Dynamo stage - bytecode capture, FX graph construction, graph breaks, and pre-grad passes. Covers TORCH_LOGS for dynamo/graph_breaks/pre_grad_graphs, interpreting FX graph files, understanding graph break reasons, and pre-grad fusion patterns (Conv-BN, split-cat). Load after compile-bisect indicates backend='eager'.
How this skill is triggered — by the user, by Claude, or both
Slash command
/torch-compile:compile-trace-dynamoThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
How to trace and debug Dynamo bytecode capture, graph breaks, and pre-grad FX passes.
How to trace and debug Dynamo bytecode capture, graph breaks, and pre-grad FX passes.
Dynamo Stage = Python bytecode capture → FX graph (aten ops) → Pre-grad passes
What it does:
For detailed Dynamo mechanics: See pytorch-dynamo skill
Pipeline Position:
Python → Dynamo (FX + aten ops) → [Pre-Grad Passes] → AOT → Inductor
Graph breaks:
Unsupported operations:
Pre-grad optimizations:
Minimal (graph breaks only):
TORCH_LOGS="graph_breaks" python script.py
Standard (Dynamo + breaks):
TORCH_LOGS="dynamo,graph_breaks" python script.py
Comprehensive (including FX graphs):
TORCH_LOGS="dynamo,graph_breaks,graph_code,pre_grad_graphs" python script.py
| Logger | What It Shows | When to Use |
|---|---|---|
dynamo | All Dynamo debug output | General debugging |
graph_breaks | Why and where breaks occur | Minimizing graph breaks |
graph_code | Readable FX graph structure | Understanding captured graph |
guards | Generated guards | Dynamic shape issues |
recompiles | Recompilation reasons | Cache misses, performance |
pre_grad_graphs | FX graphs before/after pre-grad | Pre-grad pass effects |
bytecode | Bytecode transformations | Deep Dynamo debugging |
import os
os.environ['TORCH_LOGS'] = 'dynamo,graph_breaks'
import torch._inductor.config as config
config.debug = True # Enable debug file output
Default: /tmp/torchinductor_$USER/
Custom: Set TORCH_COMPILE_DEBUG_DIR=/path/to/dir
1. FX Graph Files
fx_graph_runnable.py # Standalone reproduction script
fx_graph_readable.py # Human-readable graph (before passes)
fx_graph_transformed.py # Graph after pre-grad passes
How to use:
# Run reproduction script
python /tmp/torchinductor_$USER/fx_graph_runnable.py
# Compare before/after
diff fx_graph_readable.py fx_graph_transformed.py
2. Graph Break Logs
Console output shows:
Graph break: print(y)
Reason: call_function print in skip list
User code: /path/to/file.py:5 in fn
Graph Count: 2 (compilation split into multiple graphs)
Structure:
graph():
%x : torch.Tensor = placeholder[target=x] # Input
%relu : torch.Tensor = call_function[ # Operation
target=torch.ops.aten.relu.default
](args = (%x,))
%add : torch.Tensor = call_function[
target=torch.ops.aten.add.Tensor
](args = (%relu, 1))
return add # Output
Node types:
placeholder - Function inputscall_function - ATen ops (most operations)call_module - nn.Module callsget_attr - Parameter/buffer accessoutput - Return valuesWhat to look for:
After: Dynamo capture Before: AOT Autograd (training) or Inductor (inference)
Purpose: Optimize FX graph at aten-op level
TORCH_LOGS="pre_grad_graphs" python script.py
Shows:
| Pass | What It Does | How to Verify |
|---|---|---|
| Conv-BN Fusion | Folds BatchNorm into Conv weights | Check if batch_norm node removed |
| Split-Cat Elimination | Removes redundant split/cat | Check if split/cat pair eliminated |
| Normalization | NumPy compatibility rewrites | Compare before/after graph |
| Group Batch Fusion | Batches operations together | Look for combined ops |
Before Pre-Grad (fx_graph_readable.py):
%conv : Tensor = call_function[target=aten.conv2d](args = (%x, %weight))
%bn : Tensor = call_function[target=aten.batch_norm](args = (%conv, ...))
After Pre-Grad (fx_graph_transformed.py):
%fused_conv : Tensor = call_function[target=aten.conv2d](
args = (%x, %fused_weight, %fused_bias) # BN folded in
)
# batch_norm node eliminated
Goal: Reduce number of compiled graphs
Steps:
Enable logging:
TORCH_LOGS="graph_breaks" python script.py
Identify break locations:
Graph break: <operation>
Reason: <why it broke>
User code: <file:line>
Fix each break:
torch.cond()torch._dynamo.allow_in_graph()Verify with fullgraph mode:
@torch.compile(fullgraph=True) # Errors if any breaks
def fn(x):
...
Symptom: UnsupportedOperationError or silent graph break
Steps:
Enable verbose logging:
TORCH_LOGS="dynamo,graph_breaks" python script.py
Locate unsupported op in graph break message
Check if op should be supported:
torch/_dynamo/skipfiles.py for skip liststorch/_dynamo/variables/ for handlerSolutions:
torch._dynamo.allow_in_graph(fn)torch._dynamo.graph_break()Goal: Confirm expected optimization happened
Steps:
Enable pre-grad logging:
TORCH_LOGS="pre_grad_graphs" python script.py
Check before graph (fx_graph_readable.py):
grep "batch_norm\|conv2d" fx_graph_readable.py
Check after graph (fx_graph_transformed.py):
grep "batch_norm\|conv2d" fx_graph_transformed.py
Verify pattern eliminated:
batch_norm should be gonesplit/cat pair should be goneSymptom: Guards failing, recompilation, or wrong shapes
Steps:
Enable guard logging:
TORCH_LOGS="guards,recompiles" python script.py
Check generated guards:
Guard: tensor 'x' shape[0] == 10
Guard: tensor 'x' shape[1] == 20
Identify problematic guards:
Fix with shape hints:
x = torch.randn(10, 20)
torch._dynamo.mark_dynamic(x, 0) # Dimension 0 is dynamic
Symptom: Many small graphs, slow performance
Debug:
TORCH_LOGS="graph_breaks" python script.py | grep "Graph break" | wc -l
Solutions:
torch.cond() for conditional execution@torch.compiletorch._dynamo.allow_in_graph(fn)Symptom: Long wait on first execution
Debug:
TORCH_COMPILE_DYNAMO_PROFILER=1 python script.py
Solutions:
mode="reduce-overhead" for faster compile:
torch.compile(fn, mode="reduce-overhead")
Symptom: Expected fusion didn't happen
Debug:
TORCH_LOGS="pre_grad_graphs" python script.py
# Check if both conv and batch_norm still in transformed graph
Common causes:
model.eval())Fix:
model.eval() # Pre-grad conv-bn fusion only in eval mode
Symptom: Compiled function produces incorrect result
Debug Steps:
Compare with eager backend:
@torch.compile(backend="eager")
def fn(x):
...
Check FX graph structure:
TORCH_LOGS="graph_code" python script.py
# Verify captured graph matches expectations
Verify guards:
TORCH_LOGS="guards" python script.py
Check for mutations:
@torch.compile(fullgraph=True)# Basic tracing
TORCH_LOGS="dynamo,graph_breaks" python script.py
# With FX graphs
TORCH_LOGS="dynamo,graph_breaks,graph_code" python script.py
# Include pre-grad passes
TORCH_LOGS="dynamo,graph_breaks,pre_grad_graphs" python script.py
# Full debug
TORCH_LOGS="dynamo,graph_breaks,graph_code,guards,recompiles,pre_grad_graphs" python script.py
# View FX graph
cat /tmp/torchinductor_$USER/fx_graph_readable.py
# Run reproduction script
python /tmp/torchinductor_$USER/fx_graph_runnable.py
# Compare before/after passes
diff /tmp/torchinductor_$USER/fx_graph_{readable,transformed}.py
# Count compilations
from torch._dynamo.testing import CompileCounter
cnt = CompileCounter()
compiled_fn = torch.compile(fn, backend=cnt)
# Explicit graph break
torch._dynamo.graph_break()
# Force fullgraph (error on break)
@torch.compile(fullgraph=True)
def fn(x): ...
# Compile-time breakpoint
import torch._dynamo.comptime as comptime
comptime.breakpoint() # Drops into pdb during compile
After Dynamo Stage: Load compile-trace-aot skill - Tracing AOT Autograd transformations
Or: Load compile-trace-inductor skill - Skip to Inductor stage
Reference: See compile-overview skill for complete pipeline context.
npx claudepluginhub torchedhat/ai-marketplace --plugin torch-compileGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.