From rust-skills
Guides performance optimization decisions: measuring bottlenecks, choosing algorithms/data structures, reducing allocations, parallelizing, and improving cache efficiency. References tools like perf, flamegraph, criterion.
How this skill is triggered — by the user, by Claude, or both
Slash command
/rust-skills:m10-performanceThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Layer 2: Design Choices**
Layer 2: Design Choices
What's the bottleneck, and is optimization worth it?
Before optimizing:
| Goal | Design Choice | Implementation |
|---|---|---|
| Reduce allocations | Pre-allocate, reuse | with_capacity, object pools |
| Improve cache | Contiguous data | Vec, SmallVec |
| Parallelize | Data parallelism | rayon, threads |
| Avoid copies | Zero-copy | References, Cow<T> |
| Reduce indirection | Inline data | smallvec, arrays |
Before optimizing:
Have you measured?
What's the priority?
What's the trade-off?
To domain constraints (Layer 3):
"How fast does this need to be?"
↑ Ask: What's the performance SLA?
↑ Check: domain-* (latency requirements)
↑ Check: Business requirements (acceptable response time)
| Question | Trace To | Ask |
|---|---|---|
| Latency requirements | domain-* | What's acceptable response time? |
| Throughput needs | domain-* | How many requests per second? |
| Memory constraints | domain-* | What's the memory budget? |
To implementation (Layer 1):
"Need to reduce allocations"
↓ m01-ownership: Use references, avoid clone
↓ m02-resource: Pre-allocate with_capacity
"Need to parallelize"
↓ m07-concurrency: Choose rayon or threads
↓ m07-concurrency: Consider async for I/O-bound
"Need cache efficiency"
↓ Data layout: Prefer Vec over HashMap when possible
↓ Access patterns: Sequential over random access
| Tool | Purpose |
|---|---|
cargo bench | Micro-benchmarks |
criterion | Statistical benchmarks |
perf / flamegraph | CPU profiling |
heaptrack | Allocation tracking |
valgrind / cachegrind | Cache analysis |
1. Algorithm choice (10x - 1000x)
2. Data structure (2x - 10x)
3. Allocation reduction (2x - 5x)
4. Cache optimization (1.5x - 3x)
5. SIMD/Parallelism (2x - 8x)
| Technique | When | How |
|---|---|---|
| Pre-allocation | Known size | Vec::with_capacity(n) |
| Avoid cloning | Hot paths | Use references or Cow<T> |
| Batch operations | Many small ops | Collect then process |
| SmallVec | Usually small | smallvec::SmallVec<[T; N]> |
| Inline buffers | Fixed-size data | Arrays over Vec |
| Mistake | Why Wrong | Better |
|---|---|---|
| Optimize without profiling | Wrong target | Profile first |
| Benchmark in debug mode | Meaningless | Always --release |
| Use LinkedList | Cache unfriendly | Vec or VecDeque |
Hidden .clone() | Unnecessary allocs | Use references |
| Premature optimization | Wasted effort | Make it work first |
| Anti-Pattern | Why Bad | Better |
|---|---|---|
| Clone to avoid lifetimes | Performance cost | Proper ownership |
| Box everything | Indirection cost | Stack when possible |
| HashMap for small sets | Overhead | Vec with linear search |
| String concat in loop | O(n^2) | String::with_capacity or format! |
| When | See |
|---|---|
| Reducing clones | m01-ownership |
| Concurrency options | m07-concurrency |
| Smart pointer choice | m02-resource |
| Domain requirements | domain-* |
npx claudepluginhub actionbook/rust-skills --plugin rust-skillsMeasures and optimizes performance with data-driven profiling, identifying bottlenecks like N+1 queries, missing indexes, and synchronous I/O. Triggers on performance, speed, latency, profiling, or benchmark keywords.
Identifies performance bottlenecks and optimizes via profiling, caching strategies, database query tuning, and language-specific tools for Python, Rust, JS/Node.js, Go.
Guides measure-first optimization: profile to identify hot spots, apply algorithm/data-structure improvements before micro-optimizations, and validate each change to prevent regression.