This skill should be used when the user is new to a codebase and wants to understand its structure, architecture, or main components. Trigger phrases include "help me understand this codebase", "I'm new to this project", "give me an overview", "what are the main components", "how is this structured", or "onboard me". Use for initial exploration, not for specific questions.
This skill should be used when the user wants to create, update, delete, or troubleshoot semantic search indexes. Trigger phrases include "index this codebase", "update the index", "refresh the index", "delete the collection", "check index status", "semantic search isn't working", or "why isn't it finding". Use for index administration, not for searching.
This skill should be used when the user asks to understand code behavior, explore how something works, find where functionality is implemented, or asks questions like "how does X work", "where is Y handled", "what does Z do", "show me the code that", "find the implementation of", or "trace the flow of". Also use for exploring unfamiliar codebases or when grep/glob returns too many irrelevant results. Use BEFORE or ALONGSIDE keyword search.
Admin access level
Server config contains admin-level keywords
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A Claude Code plugin for GPU-accelerated semantic code search. Self-hosted embedding, reranking, and vector search — accessible locally or remotely over Tailscale.
Three containerized services work together:
Embedding Service — Jina Code Embeddings 0.5B running on HuggingFace's Text Embeddings Inference (TEI), with custom builds for SM 12.0 (Blackwell) and CUDA 13.1. Includes performance-optimized Flash Attention and link-time optimization (LTO).
Reranking Proxy — Jina Reranker v3 with a smart proxy layer that intercepts Qdrant search requests. Fetches top 100 vector results, then reranks with the cross-encoder before returning the top N. Uses TorchAO int4 quantization, Flash Attention 2, and listwise reranking architecture for throughput.
Vector Database — Qdrant for persistent vector storage with incremental indexing support.
cpuset — embedding and reranking isolated to separate core groups to prevent contentionPYTORCH_CUDA_ALLOC_CONF=expandable_segments:True for efficient CUDA memory allocationulimits.memlock: -1) and 64MB stack for GPU workloadspid: host and ipc: host for shared memory access between GPU processesOMP_NUM_THREADS and MKL_NUM_THREADS pinned to match cpuset allocationrestart: unless-stopped for automatic recovery from CUDA context corruption under WSL2docker compose -f docker-compose.semantic-search.yaml --profile semantic-search up -d
The plugin registers as a Claude Code MCP server, providing tools for semantic search, indexing, and collection management.
Install Tailscale on the GPU host and any remote machines. Then set environment variables on remote machines:
export QDRANT_URL="http://<tailscale-hostname>:6333"
export EMBEDDING_URL="http://<tailscale-hostname>:1335"
The plugin reads these at startup, falling back to localhost when unset.
| Component | VRAM |
|---|---|
| Jina Code Embeddings 0.5B (float16) | ~1 GB |
| Jina Reranker v3 (int4 quantized) | ~6 GB |
| Total | ~7-8 GB |
Configurable via MAX_CLIENT_BATCH_SIZE and MAX_BATCH_TOKENS environment variables for constrained hardware.
npx claudepluginhub adilasif/local-semantic-search-claude-code-plugin --plugin local-semantic-searchEngineering lifecycle orchestrator for Claude Code: OODA-loop-structured dev cycle with rigor profiles, phase handover contracts, and per-project adapters. Built on superpowers.
Precise local semantic code search via MCP. Indexes your codebase with Go AST parsing, embeds with Ollama or LM Studio, and exposes vector search to Claude through an MCP server — no cloud, no npm.
Beacon — semantic code search for Claude Code
AST-based semantic code search via the ccc CLI. Bundles the ccc skill so coding agents handle init, indexing, and search automatically.
Semantic code search powered by ColBERT. Replaces grep/ripgrep with natural language understanding for smarter code navigation.
Optimized file search, semantic indexing, and persistent memory for Claude Code — with optional sync to a self-hosted web dashboard
A vector-powered CLI for semantic search over files (Vexor skill bundle).