From ascend-migration
Analyze memory access patterns and optimization opportunities for Ascend NPU. Use when examining data loading, host-device transfers, mixed precision training, and memory efficiency.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ascend-migration:skills/memory-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are analyzing memory patterns for Ascend NPU optimization. This skill helps identify:
You are analyzing memory patterns for Ascend NPU optimization. This skill helps identify:
Invoke this skill when:
Examine:
.pin_memory=True)Search Patterns:
grep -rn "DataLoader" <repo_path>
grep -rn "num_workers" <repo_path>
grep -rn "pin_memory" <repo_path>
Identify data movement:
# Explicit transfers
tensor = tensor.to('cuda') # → .to('npu') or automatic
model = model.cuda() # → .npu() or automatic
# Check for inefficient patterns
# - Redundant transfers
# - Transferring large unused data
# - Frequent CPU↔GPU bouncing
torch_npu provides automatic data migration:
.to('npu') callsAnalyze:
FP16/BF16 Benefits on Ascend:
Check for:
# Existing AMP usage
torch.cuda.amp.autocast # → torch.npu.amp.autocast
torch.cuda.amp.GradScaler # → torch.npu.amp.GradScaler
# Opportunities for AMP
# - Float32 models that can use FP16
# - Loss scaling requirements
Identify opportunities:
# Enable torch_npu AMP
from torch_npu import amp
scaler = amp.GradScaler()
with amp.autocast():
output = model(input)
Recommend specific APIs:
# Memory management
torch.npu.empty_cache() # Clear unused memory
torch.npu.set_memory_strategy() # Memory allocation strategy
torch.npu.memory_allocated() # Current memory usage
torch.npu.max_memory_allocated() # Peak memory
# AMP
from torch_npu import amp
amp.autocast() # Automatic mixed precision
amp.GradScaler() # Loss scaling
Documentation First:
Memory Analysis:
Grep to search for memory-related patternsRead to examine data loading codenpx claudepluginhub ferhodium/ascend-migrationScans PyTorch codebases for CUDA/NVIDIA dependencies and assesses migration feasibility to Ascend NPUs, organized by 7 domains (device layer, attention, custom ops, distributed, precision, third-party, compilation).
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.