From skills
Add AI and LLM capabilities to Spice — tools, NSQL (text-to-SQL), memory, model routing/workers, and evals. Use this skill whenever the user wants to enable LLM tools (SQL, search, memory, MCP, web search), set up text-to-SQL via /v1/nsql, add persistent conversational memory, configure model routing with workers (load balancing, fallback, weighted distribution), set up evals, or use the OpenAI-compatible chat API. This skill covers AI features and orchestration. For configuring individual model providers (OpenAI, Anthropic, etc.), see spice-models.
How this skill is triggered — by the user, by Claude, or both
Slash command
/skills:spice-aiThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Spice integrates AI as a first-class runtime capability. Connect to hosted LLM providers or serve models locally, with an OpenAI-compatible API, tool use, text-to-SQL, and model routing — all configured in YAML.
Spice integrates AI as a first-class runtime capability. Connect to hosted LLM providers or serve models locally, with an OpenAI-compatible API, tool use, text-to-SQL, and model routing — all configured in YAML.
models:
- from: <provider>:<model_id>
name: <model_name>
params:
<provider>_api_key: ${ secrets:API_KEY }
tools: auto # optional: enable runtime tools
system_prompt: | # optional: default system prompt
You are a helpful assistant.
| Provider | From Format | Status |
|---|---|---|
| OpenAI (or compatible) | openai:gpt-4o | Stable |
| Anthropic | anthropic:claude-sonnet-4-5 | Alpha |
| Azure OpenAI | azure:my-deployment | Alpha |
| Google AI | google:gemini-pro | Alpha |
| xAI | xai:grok-beta | Alpha |
| Perplexity | perplexity:sonar-pro | Alpha |
| Amazon Bedrock | bedrock:anthropic.claude-3 | Alpha |
| Databricks | databricks:llama-3-70b | Alpha |
| Spice.ai | spiceai:llama3 | Release Candidate |
| HuggingFace | hf:meta-llama/Llama-3-8B-Instruct | Release Candidate |
| Local file | file:./models/llama.gguf | Release Candidate |
Existing applications using OpenAI SDKs can swap endpoints without code changes:
curl http://localhost:8090/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt4",
"messages": [{"role": "user", "content": "Hello"}]
}'
spice chat
chat> How many orders were placed last month?
The /v1/nsql endpoint converts natural language to SQL and executes it. Spice uses tools like table_schema, random_sample, and sample_distinct_columns to help models write accurate SQL:
curl -XPOST "http://localhost:8090/v1/nsql" \
-H "Content-Type: application/json" \
-d '{"query": "What was the highest tip any passenger gave?"}'
Tools extend LLM capabilities with runtime functions:
| Tool | Description | Group |
|---|---|---|
list_datasets | List available datasets | auto |
sql | Execute SQL queries | auto |
table_schema | Get table schema | auto |
search | Vector similarity search | auto |
sample_distinct_columns | Sample distinct column values | auto |
random_sample | Random row sampling | auto |
top_n_sample | Top N rows by ordering | auto |
memory:load | Load stored memories | memory |
memory:store | Store new memories | memory |
websearch | Search the web | — |
models:
- from: openai:gpt-4o
name: analyst
params:
openai_api_key: ${ secrets:OPENAI_API_KEY }
tools: auto # all default tools
# tools: sql, search # or specific tools only
datasets:
- from: memory:store
name: llm_memory
access: read_write
models:
- from: openai:gpt-4o
name: assistant
params:
tools: auto, memory
tools:
- name: web
from: websearch
description: 'Search the web for information.'
params:
engine: perplexity
perplexity_auth_token: ${ secrets:PERPLEXITY_TOKEN }
models:
- from: openai:gpt-4o
name: researcher
params:
tools: auto, web
tools:
- name: external_tools
from: mcp
params:
mcp_endpoint: http://localhost:3000/mcp
models:
- from: openai:gpt-4o
name: my_model
params:
tool_recursion_limit: 3 # default: 10
Workers coordinate traffic across multiple models for load balancing, fallback, and weighted routing. Workers are called with the same API as models.
workers:
- name: balanced
type: load_balance
description: Distribute requests evenly.
load_balance:
routing:
- from: model_a
- from: model_b
workers:
- name: fallback
type: load_balance
description: Try GPT-4o first, fall back to Claude.
load_balance:
routing:
- from: gpt4
order: 1
- from: claude
order: 2
workers:
- name: weighted
type: load_balance
description: Route 80% to fast model.
load_balance:
routing:
- from: fast_model
weight: 4 # 80%
- from: slow_model
weight: 1 # 20%
models:
- from: openai:gpt-4o
name: gpt4
params:
openai_api_key: ${ secrets:OPENAI_API_KEY }
tools: auto
models:
- from: openai:llama3-groq-70b-8192-tool-use-preview
name: groq-llama
params:
endpoint: https://api.groq.com/openai/v1
openai_api_key: ${ secrets:GROQ_API_KEY }
models:
- from: openai:gpt-4o
name: pirate_haikus
params:
system_prompt: |
Write everything in Haiku like a pirate.
openai_temperature: 0.1
openai_response_format: "{ 'type': 'json_object' }"
models:
- from: file:./models/llama-3.gguf
name: local_llama
Evaluate model performance:
evals:
- name: accuracy_test
description: Verify model understands the data.
dataset: test_data
scorers:
- Match
npx claudepluginhub spiceai/skills --plugin skillsGuides claude-flow orchestration decisions including topology, agents, memory, and SPARC workflow setup.
Provides production-ready patterns for LLM apps including RAG pipelines, chunking strategies, vector DB selection, embedding models, and AI agent architectures. Use for designing RAG systems, agents, and LLMOps.
Dispatches to ML/AI sub-skills for LLM integrations, prompt engineering with evals, model pipelines, performance evaluations, RAG, and system inventory. Use for AI engineering tasks.