From kdbai-knowledge
Use when building vector search, RAG pipelines, hybrid search, time-series pattern matching, or managing tables in KDB.AI. Also use when asked about kdbai_client, similarity search, reranking, KDB.AI filters, or CAGRA GPU indexes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/kdbai-knowledge:kdbaiThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
KDB.AI is a vector database for AI applications. Supports similarity search, hybrid search (dense+BM25), time-series similarity (TSS), dynamic time warping (DTW), and reranking.
KDB.AI is a vector database for AI applications. Supports similarity search, hybrid search (dense+BM25), time-series similarity (TSS), dynamic time warping (DTW), and reranking.
For full Python client API, CAGRA GPU details, REST endpoints: see reference.md
# CORRECT: (operator, column, value)
filter=[("=", "fiscal_year", 2024)]
filter=[("within", "price", [50, 100])]
# WRONG — agents always get this backwards
filter=[("fiscal_year", "=", 2024)] # WRONG ORDER!
# CORRECT
results = table.search(vectors={"myIndex": [[1.0, 0.0, 1.0]]}, n=10)
# WRONG
results = table.search(vectors=[[1.0, 0.0, 1.0]], n=10) # Must be dict!
# CORRECT: two separate arguments
schema = [
{"name": "id", "type": "str"},
{"name": "text", "type": "str"},
{"name": "vector", "type": "float32s"},
]
indexes = [
{"name": "vec_idx", "type": "hnsw", "column": "vector",
"params": {"dims": 1024, "metric": "CS", "M": 16, "efConstruction": 64}},
]
table = db.create_table("docs", schema=schema, indexes=indexes)
# WRONG — do NOT nest index config inside schema columns
schema = [{"name": "vector", "type": "float32s", "vectorIndex": {...}}] # WRONG!
type= in Search# CORRECT: no index needed, use SCALAR numeric column (not list type)
schema = [{"name": "price", "type": "float64"}] # scalar, not float32s
indexes = [] # NO index for non-transformed TSS/DTW
table = db.create_table("ts", schema=schema, indexes=indexes)
# vectors key = column name (not index name)
results = table.search(vectors={"price": [[0,1,2,3,4]]}, n=5, type="tss")
# WRONG — there is no TSS or DTW index type
indexes = [{"name": "idx", "type": "tss", ...}] # WRONG! TSS is not an index
# CORRECT: sparse vector is {term_id: frequency} dict
sparse_data = [{0: 2, 5: 1, 12: 3}] # term IDs to frequencies
# WRONG
sparse_data = ["raw text goes here"] # NOT raw text!
import kdbai_client as kdbai
session = kdbai.Session(endpoint="http://localhost:8082") # Local (qIPC, default)
session = kdbai.Session(endpoint="http://localhost:8081", mode="rest") # Local (REST)
session = kdbai.Session(api_key="key", endpoint="https://...") # Cloud
db = session.database("default")
schema = [
{"name": "id", "type": "str"},
{"name": "text", "type": "str"},
{"name": "vector", "type": "float32s"},
{"name": "sparse", "type": "general"}, # BM25 sparse vectors
{"name": "document_date", "type": "datetime64[ns]"},
]
indexes = [
{"name": "dense_idx", "type": "hnsw", "column": "vector",
"params": {"dims": 1024, "metric": "CS", "M": 16, "efConstruction": 64}},
{"name": "sparse_idx", "type": "bm25", "column": "sparse"}
]
table = db.create_table("docs", schema=schema, indexes=indexes)
table = db.create_table("docs", schema=schema, indexes=indexes,
partition_column="document_date") # Partitioned
db.tables # List table names
table = db.table("docs") # Get existing
table.drop() # Delete (irreversible)
| Type | Required Params | Optional (defaults) | Notes |
|---|---|---|---|
| flat | dims, metric | -- | Exact, 100% recall |
| qFlat | dims, metric | -- | On-disk, supports range search |
| hnsw | dims | M(8), efConstruction(8), metric(L2) | Balanced speed/recall |
| qHnsw | dims | M(8), efConstruction(8), metric(L2), mmapLevel(1) | On-disk |
| ivf | -- | nclusters(8), metric(L2) | Requires table.train() before insert |
| ivfpq | -- | nclusters(8), nbits(8), nsplits(8), metric(L2) | Compressed, requires training |
| bm25 | -- | k(1.25), b(0.75) | Sparse keyword search, column type general |
| cagra | metric | See reference.md | GPU only, do NOT pass dims |
Metrics: L2 (Euclidean, default), CS (Cosine), IP (Inner Product).
table.insert(df) # Insert DataFrame
table.update_data(columns={"year": 2025}, filter=[...]) # Update rows
table.train(df) # Train IVF/IVFPQ (before insert)
table.update_indexes(indexes=["idx"], parts=[1, 2]) # Rebuild indexes on partitions
table.delete_data(filter=[("=", "year", 2023)]) # Delete (flat/qFlat only)
# WARNING: No filter on delete = deletes ALL data
results = table.search(vectors={"idx": [[emb]]}, n=10) # Basic
results = table.search(vectors={"idx": [[e1], [e2]]}, n=5) # Batch
results = table.search(vectors={"idx": [[emb]]}, range=0.5) # Range (qFlat only)
results = table.search(
vectors={"dense_idx": [[dense_emb]], "sparse_idx": [{1:2, 3:1}]},
n=10,
index_params={
"dense_idx": {"weight": 0.6},
"sparse_idx": {"weight": 0.4, "k": 1.5, "b": 0.8}
}
)
# Fusion: score = (w_sparse / (1+sparse_rank)) + (w_dense / (1+dense_rank))
# WRONG — there is no weights= parameter
# results = table.search(..., weights={"dense": 0.6, "sparse": 0.4}) # WRONG!
No index required. Works on scalar numeric columns (float64, float32, int64, etc.).
query = [1.2, 1.5, 1.8, 2.1, 1.9, 1.6]
# vectors key = column name (not index name since there's no index)
results = table.search(vectors={"price": [query]}, n=5, type="tss",
options={"returnMatches": True, "normalize": True})
# Options: normalize (default True), returnMatches, force, overlap (0-1)
# Grouped search (parallelized per group)
results = table.search(vectors={"price": [query]}, n=3, type="tss",
search_by="sym", options={"force": True}) # force: search even if partition has fewer rows
# Outlier detection: negative n = MOST DISSIMILAR
results = table.search(vectors={"price": [query]}, n=-3, type="tss")
Transformed TSS (dimensionality reduction, use HNSW/IVF/Flat index, avoid IVFPQ):
table = db.create_table("ts", schema=schema, indexes=indexes,
embedding_configurations={"price": {"dims": 8, "type": "tsc",
"on_insert_error": "skip_row"}}) # or "reject_all"
# dims: 8 (slow data), 12 (medium), 20+ (fast). Column must contain vectors, not scalars.
No index required. Handles variable-speed patterns.
results = table.search(vectors={"price": [query]}, n=5, type="dtw",
options={"RR": 0.1, "cutOff": 5.0, "returnMatches": True})
# RR: warping radius (0-1), cutOff: max distance threshold
Uses built-in search_and_rerank() — do NOT manually rerank with external libraries.
from kdbai_client.rerankers import CohereReranker
reranker = CohereReranker(api_key="...", model="rerank-english-v3.0",
overfetch_factor=2) # default: 2 (retrieves 2*n, returns n)
results = table.search_and_rerank(
vectors={"idx": [[emb]]}, n=10, reranker=reranker,
queries=["revenue trend?"], text_column="text")
# Providers: CohereReranker, JinaAIReranker, VoyageAIReranker
results = table.query(
filter=[(">=", "fiscal_year", 2024)],
aggs={"price": "avg", "volume": "sum"},
group_by=["sector"], sort_columns=["sector"], limit=100)
| Operator | Example | Types |
|---|---|---|
= | ("=", "year", 2024) | Numeric, string |
<> | ("<>", "status", "draft") | Any |
>, <, >=, <= | (">=", "score", 0.8) | Numeric |
in | ("in", "quarter", [1, 2, 3]) | String, numeric |
like | ("like", "source", "*report*") | String |
within | ("within", "price", [50, 100]) | Numeric, datetime |
fuzzy | ("fuzzy", "name", [["Microsft", 2]]) | String, symbol |
| Error | Fix |
|---|---|
| "Index not found" | vectors key must match exact index name |
| Filter not working | Operator FIRST: ("=", "col", val) not ("col", "=", val) |
| Low HNSW recall | Increase index_params={"idx": {"efSearch": 100}} |
| "missing arguments: dims" | HNSW/Flat need dims. CAGRA rejects it. |
| IVF returns empty | Must table.train(df) before insert |
| Delete fails | Only works on no-index, flat, qFlat tables |
mcp__qmcp__connect_to_q -- Connect to KDB.AI/q servermcp__qmcp__query_q -- Execute q queriesRelated: /q for q language, /pykx for PyKX, /kdbx for KDB-X AI libs.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.
npx claudepluginhub kxsystems/kx-skills --plugin kdbai-knowledge