From llm-application-dev
Optimize vector index performance for latency, recall, and memory. Use when tuning HNSW parameters, selecting quantization strategies, or scaling vector search infrastructure.
How this skill is triggered — by the user, by Claude, or both
Slash command
/llm-application-dev:vector-index-tuningThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Guide to optimizing vector indexes for production performance.
Guide to optimizing vector indexes for production performance.
Data Size Recommended Index
────────────────────────────────────────
< 10K vectors → Flat (exact search)
10K - 1M → HNSW
1M - 100M → HNSW + Quantization
> 100M → IVF + PQ or DiskANN
| Parameter | Default | Effect |
|---|---|---|
| M | 16 | Connections per node, ↑ = better recall, more memory |
| efConstruction | 100 | Build quality, ↑ = better index, slower build |
| efSearch | 50 | Search quality, ↑ = better recall, slower search |
Full Precision (FP32): 4 bytes × dimensions
Half Precision (FP16): 2 bytes × dimensions
INT8 Scalar: 1 byte × dimensions
Product Quantization: ~32-64 bytes total
Binary: dimensions/8 bytes
Full template library and detailed worked examples live in references/details.md. Read that file when you need the concrete templates.
npx claudepluginhub wshobson/agents --plugin llm-application-devOptimizes vector index performance by tuning HNSW parameters, selecting quantization strategies, and balancing latency, recall, and memory for production-scale vector search.
Tunes vector indexes for latency, recall, and memory using HNSW parameters, quantization strategies, and scaling guidelines up to billions of vectors.
Optimizes Qdrant vector search performance covering indexing strategies, query optimization, search speed, indexing performance, and memory usage. Use to improve speed and efficiency of Qdrant deployment.