From togetherai-skills
Manages single-tenant GPU endpoints on Together AI with autoscaling and no rate limits. Deploys fine-tuned or uploaded models, sizes hardware, and handles endpoint lifecycle.
How this skill is triggered — by the user, by Claude, or both
Slash command
/togetherai-skills:together-dedicated-endpointsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use dedicated endpoints for managed single-tenant model hosting with predictable performance and
Use dedicated endpoints for managed single-tenant model hosting with predictable performance and no shared serverless pool.
Typical fits:
together-chat-completions for serverless chat inferencetogether-dedicated-containers for custom runtimes or nonstandard inference pipelinestogether-gpu-clusters for raw infrastructure or cluster orchestrationtogether>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".model.npx claudepluginhub togethercomputer/skills --plugin togetherai-skillsGenerates a Jupyter notebook to deploy LoRA fine-tuned models from SageMaker Serverless Model Customization to SageMaker endpoints or Bedrock, handling Nova vs OSS pathways.
Deploys custom ML models on fal.ai serverless infrastructure using fal.App class, GPU selection (T4/A10G/A100/H100), setup for model loading, @fal.endpoint decorators, scaling config, secrets, persistent volumes, and fal deploy/run commands.
Runs Python workloads on Hugging Face managed infrastructure (CPUs, GPUs, TPUs) with Hub persistence. For batch inference, data processing, experiments, or any job without local GPU setup.