From together-pack
Runs Together AI inference for chat completions, streaming, images, and embeddings using Python or Node.js OpenAI-compatible clients. For testing open-source LLMs like Llama.
How this skill is triggered — by the user, by Claude, or both
Slash command
/together-pack:together-hello-worldThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run chat completions with open-source models via Together AI's OpenAI-compatible API. Supports Llama, Mixtral, Qwen, and 100+ models. Key endpoints: `/v1/chat/completions`, `/v1/completions`, `/v1/embeddings`, `/v1/images/generations`.
Run chat completions with open-source models via Together AI's OpenAI-compatible API. Supports Llama, Mixtral, Qwen, and 100+ models. Key endpoints: /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/images/generations.
from together import Together
client = Together()
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to calculate fibonacci numbers"},
],
max_tokens=500,
temperature=0.7,
top_p=0.9,
)
print(response.choices[0].message.content)
print(f"Tokens: {response.usage.prompt_tokens} in, {response.usage.completion_tokens} out")
stream = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "Explain quantum computing"}],
stream=True,
max_tokens=200,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
response = client.images.generate(
model="black-forest-labs/FLUX.1-schnell-Free",
prompt="A sunset over mountains, digital art style",
width=1024, height=768,
n=1,
)
print(f"Image URL: {response.data[0].url}")
response = client.embeddings.create(
model="togethercomputer/m2-bert-80M-8k-retrieval",
input=["Hello world", "Together AI is great"],
)
print(f"Embedding dim: {len(response.data[0].embedding)}")
import OpenAI from 'openai';
const together = new OpenAI({
apiKey: process.env.TOGETHER_API_KEY,
baseURL: 'https://api.together.xyz/v1',
});
const chat = await together.chat.completions.create({
model: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(chat.choices[0].message.content);
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
Tokens: 28 in, 45 out
| Error | Cause | Solution |
|---|---|---|
Model not found | Wrong model ID | Check docs.together.ai/docs/inference-models |
| Empty response | max_tokens too low | Increase max_tokens |
429 rate limit | Too many requests | Implement backoff |
| Slow response | Large model | Try Turbo variant or smaller model |
Proceed to together-local-dev-loop for development workflow.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin together-packGuides local dev loop for Together AI inference, fine-tuning, and model deployment via OpenAI-compatible API and Python SDK. Covers errors and resources.
Real-time and streaming text generation via Together AI's OpenAI-compatible chat/completions API, including multi-turn conversations, tool calling, structured JSON outputs, and reasoning models.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.