From togetherai-skills
Deploys and manages single-tenant GPU endpoints on Together AI for fine-tuned or uploaded models with autoscaling, hardware sizing, and lifecycle control. Use for predictable always-on production inference hosting.
How this skill is triggered — by the user, by Claude, or both
Slash command
/togetherai-skills:together-dedicated-endpointsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use dedicated endpoints for managed single-tenant model hosting with predictable performance and
Use dedicated endpoints for managed single-tenant model hosting with predictable performance and no shared serverless pool.
Typical fits:
together-chat-completions for serverless chat inferencetogether-dedicated-containers for custom runtimes or nonstandard inference pipelinestogether-gpu-clusters for raw infrastructure or cluster orchestrationtogether>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".model.npx claudepluginhub zainhas/skillsManages single-tenant GPU endpoints on Together AI with autoscaling and no rate limits. Deploys fine-tuned or uploaded models, sizes hardware, and handles endpoint lifecycle.
Generates a Jupyter notebook to deploy LoRA fine-tuned models from SageMaker Serverless Model Customization to SageMaker endpoints or Bedrock, handling Nova vs OSS pathways.
Guides Together AI API integration for inference, fine-tuning, and model deployment using OpenAI-compatible clients and Python SDK.