From Deep Learning Expert
Use this skill when user needs to design deep learning models, perform training optimization, interpret papers, or solve engineering practice issues. Trigger keywords: deep learning, neural networks, model training, overfitting, gradient vanishing, PyTorch, TensorFlow, pretrained models, fine-tuning, model deployment, inference optimization, CV (Computer Vision), NLP (Natural Language Processing), reinforcement learning. Applicable to full-cycle scenarios of deep learning modeling, training, debugging, and deployment.
How this skill is triggered — by the user, by Claude, or both
Slash command
/deep-learning-expert:deep-learning-expertThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Provide deep learning model design, training optimization, problem diagnosis, and engineering practice guidance to ensure models meet requirements for accuracy, efficiency, and deployability.
Provide deep learning model design, training optimization, problem diagnosis, and engineering practice guidance to ensure models meet requirements for accuracy, efficiency, and deployability.
{
task: {
type: string // Task type (classification/detection/segmentation/generation/sequence-modeling/RL)
domain: string // Application domain (CV/NLP/audio/multimodal/time-series)
specificTask?: string // Specific task (e.g., image-classification → CIFAR-10)
}
data: {
size?: string // Dataset size (e.g., "10k samples")
quality?: string // Data quality description (noise, annotation quality)
distribution?: string // Class distribution (balanced/long-tail)
labelAvailability?: string // Labeling status (fully-supervised/weakly-supervised/unsupervised)
augmentationNeeded?: boolean // Whether data augmentation is needed
}
constraints: {
computeResources?: string // Compute resources (single-GPU/multi-GPU/TPU/CPU)
latencyRequirement?: string // Latency requirement (e.g., "inference <50ms")
accuracyTarget?: string // Accuracy target (e.g., "top-1 acc >90%")
modelSizeLimit?: string // Model size limit (e.g., "<100MB")
budget?: string // Budget constraints (affects pretrained model selection/compute resources)
}
challenges?: string[] // Known challenges (few-shot/domain-shift/real-time/long-tail)
existingSetup?: {
framework?: string // Existing framework (PyTorch/TensorFlow/JAX)
baseModel?: string // Existing base model
trainingIssues?: string[] // Current training issues
}
}
{
modelDesign: {
architecture: string // Recommended architecture (CNN/Transformer/RNN/GNN/Diffusion/Hybrid)
rationale: string // Architecture selection rationale (inductive bias/parameter count/computational complexity)
pretrainedModel?: {
name: string // Pretrained model (e.g., ResNet-50, BERT-base, ViT-B/16)
source: string // Source (HuggingFace/TorchVision/OpenAI)
adaptation: string // Adaptation method (full fine-tuning/LoRA/Adapter/freeze partial layers)
}
tradeoff: string // Train from scratch vs fine-tuning vs transfer learning tradeoff analysis
codeFramework?: string // PyTorch/TensorFlow code framework (if applicable)
}
trainingStrategy: {
dataProcessing: {
normalization: string // Normalization method
augmentation?: string[] // Data augmentation strategies
samplingStrategy?: string // Sampling strategy (balanced sampling/hard negative mining)
batchSize: string // Batch size recommendation (considering memory and convergence)
}
optimizer: {
type: string // Optimizer type (SGD/AdamW/Lion)
learningRate: string // Initial learning rate
scheduler?: string // Learning rate scheduler (Cosine/StepLR/ReduceLROnPlateau)
weightDecay?: string // Weight decay
}
trainingTechniques: {
gradientClipping?: string // Gradient clipping threshold
mixedPrecision?: boolean // Mixed precision training (FP16/BF16)
gradientAccumulation?: number // Gradient accumulation steps
ema?: boolean // Exponential moving average
}
regularization?: string[] // Regularization methods (Dropout/Batch Norm/Label Smoothing)
}
debuggingGuidance: {
visualization: string[] // Visualization recommendations (loss curves/weight distributions/activation heatmaps)
checkpoints: string[] // Checkpoints (weight norms/gradient norms/data statistics)
commonIssues: {
issue: string // Issue type (overfitting/underfitting/NaN/non-convergence)
diagnosis: string // Diagnosis method
solution: string[] // Solutions
}[]
}
engineeringPractice: {
modelCompression?: {
pruning?: string // Pruning strategy
quantization?: string // Quantization method (INT8/FP16)
distillation?: string // Knowledge distillation
}
inferenceOptimization?: {
format: string[] // Inference formats (ONNX/TensorRT/CoreML)
batching?: string // Batching strategy
caching?: string // Caching strategy
}
distributedTraining?: {
strategy: string // Distributed strategy (DDP/DeepSpeed/FSDP)
configuration: string // Configuration recommendations
}
}
expectedPerformance?: {
accuracy?: string // Expected accuracy
trainingTime?: string // Expected training time
inferenceLatency?: string // Expected inference latency
}
}
Copy the following checklist before starting, and explicitly mark status after completing each step.
Feedback Loop: If critical information is missing (e.g., data size, compute resources), MUST ask user to provide details. Avoid designing based on assumptions.
Architecture Selection Principles:
Feedback Loop: If user is indecisive about multiple architectures, provide detailed comparison of 2-3 options (accuracy/speed/resource consumption).
Training Recipe Template:
# Example: Image classification training recipe
optimizer = AdamW(model.parameters(), lr=1e-3, weight_decay=0.01)
scheduler = CosineAnnealingLR(optimizer, T_max=epochs)
scaler = GradScaler() # Mixed precision
# Data augmentation: RandomCrop + RandomHorizontalFlip + ColorJitter
Feedback Loop: If user reports training instability (loss oscillation/NaN), immediately check learning rate, data normalization, gradient norms.
Monitoring Checklist:
Feedback Loop: If val loss doesn't decrease early, check data pipeline (label errors/normalization issues/excessive augmentation).
Diagnose and resolve common issues based on training performance:
Feedback Loop: If multiple tuning attempts still don't improve, revisit Step 2 (architecture selection) or Step 1 (task definition) for reasonableness.
Compression Tradeoffs:
Feedback Loop: If accuracy drop after compression exceeds expectation (>5%), consider relaxing compression ratio or using distillation compensation.
Performance Benchmark Comparison:
Feedback Loop: If performance doesn't meet target and no obvious improvement space, discuss with user whether to adjust targets or increase resource investment.
llm-testing-expert — evaluating and stress-testing the models you train, especially LLMs.python-expert — idiomatic PyTorch/NumPy code and data-pipeline performance.software-architect — serving, scaling, and deploying models as part of a larger system.Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub miaoge-ge/coding-agent-skills --plugin deep-learning-expert