Pinecone serverless vector database -- index management, vector operations, metadata filtering, namespaces, hybrid search, inference API
How this skill is triggered — by the user, by Claude, or both
Slash command
/api-vector-db-pinecone:api-vector-db-pineconeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Quick Guide:** Use `@pinecone-database/pinecone` (v7.x) for serverless vector database operations. Target indexes by host (`pc.index({ host })`), not by name. Use namespaces for multi-tenant isolation (physically separate, cheaper queries). Batch upserts at 200 records (max 1,000 or 2 MB). Metadata is limited to 40 KB per record with flat key-value pairs only (no nested objects). Pinecone i...
Quick Guide: Use
@pinecone-database/pinecone(v7.x) for serverless vector database operations. Target indexes by host (pc.index({ host })), not by name. Use namespaces for multi-tenant isolation (physically separate, cheaper queries). Batch upserts at 200 records (max 1,000 or 2 MB). Metadata is limited to 40 KB per record with flat key-value pairs only (no nested objects). Pinecone is eventually consistent -- vectors may not appear in queries immediately after upsert. UsedescribeIndexStats()to verify indexing progress. For hybrid search, usedotproductmetric with sparse+dense vectors in a single index.
<critical_requirements>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST target indexes by host URL, not by name -- pc.index({ host }) is the v7 API; pc.index('name') is deprecated)
(You MUST batch upserts to max 1,000 records or 2 MB per request -- exceeding either limit causes a 400 error)
(You MUST use flat key-value metadata only -- nested objects, null values, and keys starting with $ are rejected by Pinecone)
(You MUST handle eventual consistency -- vectors are not queryable immediately after upsert; use describeIndexStats() or retry logic for freshness-critical flows)
</critical_requirements>
Additional resources:
Auto-detection: Pinecone, @pinecone-database/pinecone, createIndex, createIndexForModel, upsert, query, topK, includeMetadata, sparseValues, namespace, describeIndexStats, vector database, similarity search, embedding, cosine, dotproduct, euclidean, RAG retrieval, semantic search, pinecone-sparse-english, rerank, searchRecords, upsertRecords, fetchByMetadata
When to use:
Key patterns covered:
When NOT to use:
Pinecone is a managed serverless vector database purpose-built for similarity search at scale. The core principle: store embeddings and metadata, query by vector similarity, filter by metadata.
Core principles:
describeIndexStats() before querying.Create a Pinecone client from an API key. See examples/core.md for full examples.
// Good Example
import { Pinecone } from "@pinecone-database/pinecone";
function createPineconeClient(): Pinecone {
const apiKey = process.env.PINECONE_API_KEY;
if (!apiKey) {
throw new Error("PINECONE_API_KEY environment variable is required");
}
return new Pinecone({ apiKey });
}
export { createPineconeClient };
Why good: API key from environment variable, validation before construction, named export
// Bad Example
import { Pinecone } from "@pinecone-database/pinecone";
const pc = new Pinecone({ apiKey: "sk-abc123..." });
// Hardcoded key leaks in version control
Why bad: Hardcoded API key is a security risk, no validation
Always target an index by its host URL, not its name. See examples/core.md.
// Good Example -- target by host
const indexModel = await pc.createIndex({
name: "products",
dimension: EMBEDDING_DIMENSION,
metric: "cosine",
spec: { serverless: { cloud: "aws", region: "us-east-1" } },
});
const index = pc.index({ host: indexModel.host });
Why good: pc.index({ host }) is the v7 API, avoids an extra API call to resolve the name to a host
// Bad Example -- target by name (deprecated)
const index = pc.index("products");
// Triggers an extra describeIndex call to resolve the host URL
Why bad: Targeting by name requires an extra network call and is deprecated in v7
Upsert vectors with flat metadata for filtering. See examples/core.md for typed metadata.
// Good Example
interface DocumentMetadata {
title: string;
category: string;
createdAt: number; // Unix timestamp (numbers only, no Date objects)
}
const NAMESPACE = "articles";
await index.namespace(NAMESPACE).upsert({
records: [
{
id: "doc-1",
values: embedding, // number[] matching index dimension
metadata: { title: "Guide", category: "tutorial", createdAt: 1710000000 },
},
],
});
Why good: Typed metadata interface, flat key-value pairs, numeric timestamp (not Date), namespace isolation
Query for similar vectors with metadata filtering. See examples/metadata-filtering.md for all operators.
// Good Example
const TOP_K = 10;
const results = await index.namespace(NAMESPACE).query({
vector: queryEmbedding,
topK: TOP_K,
includeMetadata: true,
filter: {
$and: [
{ category: { $eq: "tutorial" } },
{ createdAt: { $gte: 1700000000 } },
],
},
});
for (const match of results.matches) {
console.log(match.id, match.score, match.metadata);
}
Why good: Named constant for topK, structured filter with $and, includes metadata in response
// Bad Example
const results = await index.query({
vector: queryEmbedding,
topK: 100,
includeMetadata: true,
filter: { tags: ["a", "b"] }, // INVALID: arrays are not valid filter values
});
Why bad: Missing namespace (queries default namespace), array filter syntax is invalid (use $in), no named constant for topK
Use namespaces for tenant isolation. See examples/namespaces.md.
// Good Example -- physically isolated tenant data
function getTenantIndex(pc: Pinecone, host: string, tenantId: string) {
return pc.index({ host }).namespace(`tenant-${tenantId}`);
}
// Each tenant's queries scan only their namespace
const tenantIndex = getTenantIndex(pc, INDEX_HOST, "acme-corp");
const results = await tenantIndex.query({ vector: embedding, topK: TOP_K });
Why good: Physical isolation per tenant, queries scan only the target namespace (lower cost and latency)
// Bad Example -- metadata filtering for multi-tenancy
await index.query({
vector: embedding,
topK: 10,
filter: { tenantId: { $eq: "acme-corp" } },
// Scans ENTIRE index, filters after -- expensive at scale
});
Why bad: Metadata filtering scans the full namespace regardless of filter selectivity, cost scales with total data not tenant data
Generate embeddings and rerank results. See examples/inference.md.
// Good Example -- embed text
const embedResult = await pc.inference.embed({
model: "multilingual-e5-large",
inputs: [{ text: "What is machine learning?" }],
parameters: { inputType: "query", truncate: "END" },
});
const queryVector = embedResult.data[0].values;
Why good: Specifies inputType (query vs passage), handles truncation for long inputs
// Good Example -- rerank results
const rerankResult = await pc.inference.rerank({
model: "pinecone-rerank-v0",
query: "machine learning basics",
documents: results.matches.map((m) => ({
id: m.id,
text: m.metadata?.content as string,
})),
topN: 5,
returnDocuments: true,
});
Why good: Reranks query results for better relevance, limits output with topN
<decision_framework>
Which Pinecone index type should I use?
|-- Serverless? (recommended for most use cases)
| |-- Variable or unpredictable traffic? -> Serverless (auto-scales, pay-per-use)
| |-- Starting a new project? -> Serverless (simpler, no capacity planning)
| '-- Need hybrid sparse-dense search? -> Serverless with dotproduct metric
|
'-- Pod-based? (legacy, specific needs)
|-- Need guaranteed low latency SLAs? -> Pod-based (dedicated compute)
'-- Using collections for snapshots? -> Pod-based (collections are pod-only)
Which distance metric should I use?
|-- Using embeddings from a language model? -> cosine (normalized, most common)
|-- Need hybrid search (sparse + dense)? -> dotproduct (REQUIRED for hybrid)
|-- Comparing raw feature vectors? -> euclidean (absolute distance matters)
'-- Unsure? -> cosine (safe default for most embedding models)
How should I isolate tenant data?
|-- Strict data isolation required? -> Namespaces (physical separation)
|-- Need to query across tenants? -> Metadata filtering (logical separation)
|-- Cost-sensitive at scale? -> Namespaces (query cost = tenant size, not total)
|-- Few tenants (< 10)? -> Either approach works
'-- Many tenants (100+)? -> Namespaces (metadata filtering scans everything)
How should I generate embeddings?
|-- Want simplest architecture? -> Integrated inference (createIndexForModel)
| (Pinecone handles embedding automatically on upsert/query)
|
|-- Need a specific embedding model not hosted by Pinecone? -> External
| (Generate embeddings yourself, upsert raw vectors)
|
|-- Need hybrid search with sparse vectors? -> External sparse model
| (Use pinecone-sparse-english-v0 via inference API + your dense model)
|
'-- Need full control over embedding pipeline? -> External
(Custom preprocessing, chunking, model selection)
</decision_framework>
<red_flags>
High Priority Issues:
pc.index("name") is deprecated in v7; use pc.index({ host }) to avoid an extra API callMedium Priority Issues:
includeMetadata: true in queries -- metadata is NOT included by default; omitting this returns only IDs and scoresDate objects in metadata -- Pinecone metadata supports strings, numbers, booleans, and string arrays only; convert dates to Unix timestampscreateIndex() readiness -- index creation is async; the index is not ready for operations immediately after createIndex() returnsCommon Mistakes:
topK > 1,000 with includeMetadata: true -- the max topK is 1,000 when including metadata or values; without them, max is 10,000{ tags: ["a", "b"] }) -- use $in operator instead: { tags: { $in: ["a", "b"] } }deleteAll() without a namespace deletes from the default namespace only, not the entire indexGotchas & Edge Cases:
describeIndexStats() returns approximate counts -- record counts are not exact in real-time, especially after recent upserts or deletes$eq: "42" does not match numeric 42listPaginated() returns vector IDs only (no values or metadata) -- use fetch() to get full vector data$in and $nin operators accept a maximum of 10,000 values each$ (reserved for operators)upsert is an upsert, not an insert -- upserting with an existing ID overwrites the previous vector and metadata entirely (no partial merge)update() merges metadata by default -- updating metadata replaces only the fields you specify, not the entire metadata objectupsertRecords (integrated inference) accepts a direct array, not { records: [...] } -- this differs from the regular upsert method which uses { records: [...] }</red_flags>
<critical_reminders>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST target indexes by host URL, not by name -- pc.index({ host }) is the v7 API; pc.index('name') is deprecated)
(You MUST batch upserts to max 1,000 records or 2 MB per request -- exceeding either limit causes a 400 error)
(You MUST use flat key-value metadata only -- nested objects, null values, and keys starting with $ are rejected by Pinecone)
(You MUST handle eventual consistency -- vectors are not queryable immediately after upsert; use describeIndexStats() or retry logic for freshness-critical flows)
Failure to follow these rules will cause index creation failures, rejected upserts, empty query results, and degraded multi-tenant performance.
</critical_reminders>
npx claudepluginhub agents-inc/skills --plugin api-vector-db-pineconeGuides vector database selection for embeddings and semantic search, compares managed options like Pinecone and self-hosted like pgvector/Milvus, explains ANN algorithms like HNSW.
Provides patterns and Python templates for similarity search with vector databases, including metrics, indexes, and Pinecone implementation. Use for semantic search, RAG, recommendations, and scaling.
References curated Pinecone documentation links on indexes, upsert, search, metadata filtering, APIs, and SDKs. Use when coding Pinecone integrations or looking up parameters.