m10-performance

zhanghandong/rust-skills · updated Apr 8, 2026

$npx skills add https://github.com/zhanghandong/rust-skills --skill m10-performance
0 commentsdiscussion
summary

Systematic approach to identifying and eliminating performance bottlenecks through measurement and targeted optimization.

  • Emphasizes profiling first (flamegraph, perf, criterion) before optimizing; includes decision table mapping goals (reduce allocations, improve cache, parallelize) to specific implementation patterns
  • Prioritizes optimization by impact: algorithm choice (10x–1000x), data structure (2x–10x), allocation reduction (2x–5x), cache optimization (1.5x–3x)
  • Covers common tec
skill.md

Performance Optimization

Layer 2: Design Choices

Core Question

What's the bottleneck, and is optimization worth it?

Before optimizing:

  • Have you measured? (Don't guess)
  • What's the acceptable performance?
  • Will optimization add complexity?

Performance Decision → Implementation

Goal Design Choice Implementation
Reduce allocations Pre-allocate, reuse with_capacity, object pools
Improve cache Contiguous data Vec, SmallVec
Parallelize Data parallelism rayon, threads
Avoid copies Zero-copy References, Cow<T>
Reduce indirection Inline data smallvec, arrays

Thinking Prompt

Before optimizing:

  1. Have you measured?

    • Profile first → flamegraph, perf
    • Benchmark → criterion, cargo bench
    • Identify actual hotspots
  2. What's the priority?

    • Algorithm (10x-1000x improvement)
    • Data structure (2x-10x)
    • Allocation (2x-5x)
    • Cache (1.5x-3x)
  3. What's the trade-off?

    • Complexity vs speed
    • Memory vs CPU
    • Latency vs throughput

Trace Up ↑

To domain constraints (Layer 3):

"How fast does this need to be?"
    ↑ Ask: What's the performance SLA?
    ↑ Check: domain-* (latency requirements)
    ↑ Check: Business requirements (acceptable response time)
Question Trace To Ask
Latency requirements domain-* What's acceptable response time?
Throughput needs domain-* How many requests per second?
Memory constraints domain-* What's the memory budget?

Trace Down ↓

To implementation (Layer 1):

"Need to reduce allocations"
    ↓ m01-ownership: Use references, avoid clone
    ↓ m02-resource: Pre-allocate with_capacity

"Need to parallelize"
    ↓ m07-concurrency: Choose rayon or threads
    ↓ m07-concurrency: Consider async for I/O-bound

"Need cache efficiency"
    ↓ Data layout: Prefer Vec over HashMap when possible
    ↓ Access patterns: Sequential over random access

Quick Reference

Tool Purpose
cargo bench Micro-benchmarks
criterion Statistical benchmarks
perf / flamegraph CPU profiling
heaptrack Allocation tracking
valgrind / cachegrind Cache analysis

Optimization Priority

1. Algorithm choice     (10x - 1000x)
2. Data structure       (2x - 10x)
3. Allocation reduction (2x - 5x)
4. Cache optimization   (1.5x - 3x)
5. SIMD/Parallelism     (2x - 8x)

Common Techniques

Technique When How
Pre-allocation Known size Vec::with_capacity(n)
Avoid cloning Hot paths Use references or Cow<T>
Batch operations Many small ops Collect then process
SmallVec Usually small smallvec::SmallVec<[T; N]>
Inline buffers Fixed-size data Arrays over Vec

Common Mistakes

Mistake Why Wrong Better
Optimize without profiling Wrong target Profile first
Benchmark in debug mode Meaningless Always --release
Use LinkedList Cache unfriendly Vec or VecDeque
Hidden .clone() Unnecessary allocs Use references
Premature optimization Wasted effort Make it work first

Anti-Patterns

Anti-Pattern Why Bad Better
Clone to avoid lifetimes Performance cost Proper ownership
Box everything Indirection cost Stack when possible
HashMap for small sets Overhead Vec with linear search
String concat in loop O(n^2) String::with_capacity or format!

Related Skills

When See
Reducing clones m01-ownership
Concurrency options m07-concurrency
Smart pointer choice m02-resource
Domain requirements domain-*

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.643 reviews
  • Liam Bhatia· Dec 28, 2024

    Solid pick for teams standardizing on skills: m10-performance is focused, and the summary matches what you get after install.

  • Nia Diallo· Dec 24, 2024

    I recommend m10-performance for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Charlotte Desai· Dec 24, 2024

    We added m10-performance from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Shikha Mishra· Dec 16, 2024

    Registry listing for m10-performance matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Ganesh Mohane· Dec 12, 2024

    m10-performance has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Emma Dixit· Nov 19, 2024

    m10-performance has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Omar Martinez· Nov 19, 2024

    m10-performance is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Charlotte Taylor· Nov 15, 2024

    Keeps context tight: m10-performance is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Min Perez· Nov 15, 2024

    m10-performance fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Yash Thakker· Nov 7, 2024

    m10-performance reduced setup friction for our internal harness; good balance of opinion and flexibility.

showing 1-10 of 43

1 / 5