Skill

research-debugging

Debugging guidance for research code — triggers on: error, exception, failed, crash, NaN, OOM, CUDA, traceback, bug, broken

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/claude-eureka:research-debugging

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

When encountering errors in research code, follow this order strictly.

SKILL.md

51 lines · ~405 tokens

Stats

LanguageShell

Stars1

MaintenanceGood

Last CommitMar 13, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Debugging Protocol

When encountering errors in research code, follow this order strictly.

1. Understand before fixing

Read the full error message and traceback, not just the last line.
Identify the exact line and operation that failed.
Do NOT guess-and-patch. Reproduce the error first if possible.

2. ML debugging order

Check in this sequence — most bugs are data bugs:

Data: shapes, dtypes, NaN/Inf values, loader output, augmentation correctness
Model: parameter shapes, forward pass with dummy input, gradient flow
Training loop: loss computation, optimizer step, scheduler, logging
Infrastructure: GPU memory, CUDA version, library compatibility

3. Check recent changes

git diff HEAD~3 --stat
git log --oneline -5

If it worked before, the bug is in the diff.

4. Quick-check lists

NaN values:

Check learning rate (too high?)
Check loss function inputs (log of zero? division by zero?)
Check data normalization
Insert torch.autograd.set_detect_anomaly(True) temporarily

OOM (Out of Memory):

Reduce batch size first
Check for tensor accumulation in loops (missing .detach() or with torch.no_grad())
Profile with torch.cuda.memory_summary()

CUDA errors:

Device mismatch: print .device for all tensors involved
Shape mismatch: print .shape at each step
Driver issue: check nvidia-smi and torch.cuda.is_available()

research-debugging

Popularity

Invocation

Context Preview

SKILL.md

research-debugging

Popularity

Invocation

Context Preview

SKILL.md

Debugging Protocol

1. Understand before fixing

2. ML debugging order

3. Check recent changes

4. Quick-check lists

Similar Skills

Debugging Protocol

1. Understand before fixing

2. ML debugging order

3. Check recent changes

4. Quick-check lists

Similar Skills