From api-vector-db-chroma
Chroma vector database -- collection management, automatic embedding, metadata filtering, document storage, query patterns
How this skill is triggered — by the user, by Claude, or both
Slash command
/api-vector-db-chroma:api-vector-db-chromaThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Quick Guide:** Use `chromadb` (v3.x) with `@chroma-core/default-embed` for automatic embedding. Chroma auto-embeds documents if no embeddings are provided -- just pass `documents` and `ids` to `collection.add()`. Use `where` for metadata filtering and `whereDocument` for document content filtering (`$contains`, `$regex`). Default distance metric is `l2` (Euclidean); use `cosine` for most em...
Quick Guide: Use
chromadb(v3.x) with@chroma-core/default-embedfor automatic embedding. Chroma auto-embeds documents if no embeddings are provided -- just passdocumentsandidstocollection.add(). Usewherefor metadata filtering andwhereDocumentfor document content filtering ($contains,$regex). Default distance metric isl2(Euclidean); usecosinefor most embedding models viaconfiguration: { hnsw: { space: "cosine" } }. Query results return nested arrays (ids: string[][]) because queries are batched -- always accessresults.ids[0]for a single query. Include only the fields you need via theincludeparameter to reduce payload size.
<critical_requirements>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST install @chroma-core/default-embed alongside chromadb -- the default embedding function ships as a separate package since v3)
(You MUST access query results as nested arrays -- results.ids[0], results.documents[0] -- because Chroma batches queries and returns string[][] not string[])
(You MUST use the configuration parameter for HNSW settings -- the legacy metadata: { "hnsw:space": "cosine" } approach is deprecated)
(You MUST use flat metadata values only (string, number, boolean, typed arrays) -- nested objects are not supported and will be rejected)
</critical_requirements>
Additional resources:
Auto-detection: Chroma, chromadb, ChromaClient, CloudClient, createCollection, getOrCreateCollection, collection.add, collection.query, collection.get, collection.upsert, queryTexts, queryEmbeddings, nResults, whereDocument, $contains, @chroma-core/default-embed, @chroma-core/openai, EmbeddingFunction, vector database, semantic search, embedding, RAG retrieval, hnsw:space
When to use:
$contains and $regexKey patterns covered:
where) with comparison, set, array, and logical operatorswhereDocument) with $contains and $regexWhen NOT to use:
Chroma is a lightweight, developer-friendly embedding database designed for rapid prototyping and production RAG applications. The core principle: pass documents in, get relevant results out -- Chroma handles embedding automatically.
Core principles:
where for structured metadata filters and whereDocument for full-text content filters. Both can be combined in a single query.all-MiniLM-L6-v2 via @chroma-core/default-embed) works out of the box for English text. Swap to OpenAI, Cohere, or any provider with a single package change.string[][]), even for single queries. Always access [0] for the first query's results.Always pass the server URL explicitly from an environment variable -- never rely on the implicit http://localhost:8000 default. See examples/core.md for HTTP, Cloud, and token-authenticated client examples.
const chromaUrl = process.env.CHROMA_URL;
if (!chromaUrl) throw new Error("CHROMA_URL environment variable is required");
return new ChromaClient({ path: chromaUrl });
Use the configuration parameter for HNSW settings -- never the deprecated metadata: { "hnsw:space": "cosine" } approach. See examples/core.md.
const collection = await client.createCollection({
name: COLLECTION_NAME,
configuration: { hnsw: { space: "cosine" } },
});
Pass documents and ids -- Chroma embeds automatically. Either documents or embeddings must be provided; metadata alone is insufficient. See examples/core.md.
await collection.add({
ids: articles.map((a) => a.id),
documents: articles.map((a) => a.text),
metadatas: articles.map((a) => ({ category: a.category })),
});
Combine where (metadata) and whereDocument (content) filters. Results are nested arrays -- always access [0] for single-query results. See examples/metadata-filtering.md for all operators.
const results = await collection.query({
queryTexts: ["machine learning fundamentals"],
nResults: N_RESULTS,
where: {
$and: [{ category: { $eq: "tutorial" } }, { year: { $gte: 2023 } }],
},
whereDocument: { $contains: "neural network" },
include: ["documents", "metadatas", "distances"],
});
// results.ids[0] -- nested array, access [0] for first query
Use get() with limit/offset for non-similarity retrieval. Unlike query(), get() returns flat arrays. See examples/core.md.
const results = await collection.get({
where: { category: { $eq: category } },
limit: PAGE_SIZE,
offset,
include: ["documents", "metadatas"],
});
Use upsert() instead of add() for create-or-update semantics -- safer for pipelines that run multiple times. See examples/core.md.
await collection.upsert({
ids: docs.map((d) => d.id),
documents: docs.map((d) => d.text),
metadatas: docs.map((d) => d.metadata),
});
<decision_framework>
Which distance metric should I use?
|-- Using embeddings from a language model? -> cosine (normalized, most common)
|-- Need dot product similarity? -> ip (inner product)
|-- Comparing raw feature vectors? -> l2 (Euclidean, Chroma default)
'-- Unsure? -> cosine (safe default for most embedding models)
Which embedding function should I use?
|-- Quick prototyping, English text? -> @chroma-core/default-embed (all-MiniLM-L6-v2, runs locally)
|-- Need high-quality embeddings? -> @chroma-core/openai (text-embedding-3-small)
|-- Have your own embedding pipeline? -> Pass embeddings directly (skip embedding function)
|-- Need custom model? -> Implement EmbeddingFunction interface
'-- Want all providers? -> npm install @chroma-core/all
How should I filter results?
|-- Structured attributes (category, year, status)? -> where (metadata filter)
|-- Full-text content search? -> whereDocument ($contains, $regex)
|-- Both? -> Combine where + whereDocument in same query
'-- Need exact match on specific IDs? -> get({ ids: [...] })
Which client should I use?
|-- Local development or self-hosted? -> ChromaClient({ path: "http://localhost:8000" })
|-- Chroma Cloud (managed)? -> CloudClient({ apiKey, tenant, database })
|-- Docker deployment? -> ChromaClient with Docker host URL
'-- Testing? -> ChromaClient against local Docker container
</decision_framework>
<red_flags>
High Priority Issues:
results.ids is string[][], not string[]; always use results.ids[0] for single-query results@chroma-core/default-embed package -- since v3, the default embedding function ships separately; npm install chromadb @chroma-core/default-embedmetadata: { "hnsw:space": "cosine" } for HNSW config -- use configuration: { hnsw: { space: "cosine" } } insteadMedium Priority Issues:
include in queries -- default includes vary (query returns documents, metadatas, distances; get returns documents, metadatas); explicitly set include for clarity and to control payload sizel2 (default) when cosine is appropriate -- most embedding models are normalized for cosine similarity; l2 may produce worse resultsadd() without documents or embeddings -- at least one must be provided; metadata alone is insufficientresults.ids[0] may be an empty array; check length before processingCommon Mistakes:
queryEmbeddings AND queryTexts together -- use one or the other, not bothupdate() to create missing records -- update() silently ignores non-existent IDs; use upsert() for create-or-update semanticsdelete() with no arguments -- deletes nothing (not everything); pass ids or where to target specific records$gt/$lt on string metadata -- comparison operators only work on numeric values (int or float)Gotchas & Edge Cases:
space, ef_construction, max_neighbors) cannot be changed after collection creation -- you must delete and recreate the collectioncollection.count() returns total records in the collection, not filtered counts -- there is no filtered count APIpeek() returns the first limit items (default 10) in insertion order, not by relevance -- useful for debugging, not querying$contains in whereDocument is case-sensitive -- searching for "Neural" will not match "neural"$regex in whereDocument uses full regex syntax but can be slow on large collectionsstring[], number[]) must be homogeneous -- mixing types within an array is rejectedCategory and category are different fieldsnResults default is 10 if not specified in query()</red_flags>
<critical_reminders>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST install @chroma-core/default-embed alongside chromadb -- the default embedding function ships as a separate package since v3)
(You MUST access query results as nested arrays -- results.ids[0], results.documents[0] -- because Chroma batches queries and returns string[][] not string[])
(You MUST use the configuration parameter for HNSW settings -- the legacy metadata: { "hnsw:space": "cosine" } approach is deprecated)
(You MUST use flat metadata values only (string, number, boolean, typed arrays) -- nested objects are not supported and will be rejected)
Failure to follow these rules will cause embedding failures, incorrect result access, deprecated configuration warnings, and rejected metadata.
</critical_reminders>
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.
npx claudepluginhub agents-inc/skills --plugin api-vector-db-chroma