Skip to main content

/

/

Stats

Actions

Tags

Stats

Actions

Tags

ClaudePluginHub

Community directory for discovering and installing Claude Code plugins.

Find plugins for your project

AI-powered recommendations based on your stack.

Product

Browse Plugins
Marketplaces
Pricing
About
Contact

Resources

Learning Center
Blog
Weekly Digest
Claude Code Docs
Plugin Guide
Plugin Reference
Plugin Marketplaces

Community

Browse on GitHub
Get Support

Legal

Terms of Service
Privacy Policy

Browse · Plugins · Top Plugins · Marketplaces · Components · Technologies · Skills · Agents · Commands · Hooks · MCP Servers · LSP Servers · Output Styles · Themes · Monitors

Categories · Productivity · Development · Testing · Deployment · Security · Documentation · Data · Utilities

© 2025 ClaudePluginHub

Community Maintained · Not affiliated with Anthropic

ClaudePluginHub

ClaudePluginHub

Tools Learn Pricing

Search everything...

together-dedicated-endpoints | togetherai-skills

Home
Skills
togetherai-skills
together-dedicated-endpoints

Skill

together-dedicated-endpoints

From togetherai-skills

Manages single-tenant GPU endpoints on Together AI with autoscaling and no rate limits. Deploys fine-tuned or uploaded models, sizes hardware, and handles endpoint lifecycle.

Popularity

Stars

32

Forks

4

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/togetherai-skills:together-dedicated-endpoints

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use dedicated endpoints for managed single-tenant model hosting with predictable performance and

Supporting Files

agents/openai.yamlreferences/api-reference.mdreferences/dedicated-models.mdreferences/hardware-options.mdscripts/deploy_finetuned.pyscripts/manage_endpoint.pyscripts/manage_endpoint.tsscripts/upload_custom_model.py

SKILL.md

81 lines · ~1k tokens

Stats

LanguagePython

Stars32

Forks4

MaintenanceExcellent

Last CommitJun 10, 2026

Actions

View Source View Plugin View on GitHub View README

Tags

model-deployment

dedicated-hosting

Stats

LanguagePython

Stars32

Forks4

MaintenanceExcellent

Last CommitJun 10, 2026

Actions

View Source View Plugin View on GitHub View README

Tags

model-deployment

dedicated-hosting

Together Dedicated Endpoints

Overview

Use dedicated endpoints for managed single-tenant model hosting with predictable performance and no shared serverless pool.

Typical fits:

production inference with stable latency
fine-tuned model hosting
uploaded custom model hosting
autoscaled model APIs

When This Skill Wins

The user needs always-on or single-tenant hosting
The model is supported for dedicated deployment
Fine-tuned or uploaded models must be served as endpoints
Hardware, scaling, or idle-time settings need explicit control

Hand Off To Another Skill

Use together-chat-completions for serverless chat inference
Use together-dedicated-containers for custom runtimes or nonstandard inference pipelines
Use together-gpu-clusters for raw infrastructure or cluster orchestration

Quick Routing

Create and manage a standard endpoint
- Start with scripts/manage_endpoint.py or scripts/manage_endpoint.ts
- Read references/api-reference.md
Lifecycle tuning or troubleshooting
- Read references/api-reference.md
Deploy a fine-tuned model
- Start with scripts/deploy_finetuned.py
- Read references/dedicated-models.md
Upload and deploy a custom model
- Start with scripts/upload_custom_model.py
- Read references/dedicated-models.md
Hardware and sizing choices
- Read references/hardware-options.md

Workflow

Confirm that the task needs dedicated hosting instead of serverless or containers.
Verify model eligibility and inspect available hardware.
Create the endpoint with explicit scaling and timeout settings.
Wait for readiness before sending inference traffic.
Stop or delete the endpoint when the workload no longer needs to run.

High-Signal Rules

Python scripts require the Together v2 SDK (together>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".
Model eligibility and hardware availability are gating constraints; check them early.
Endpoint management uses endpoint IDs, while inference usually uses the endpoint name as model.
Autoscaling, auto-shutdown, prompt caching, and speculative decoding materially affect operations and cost.
For custom or fine-tuned models, do not skip the intermediate verification steps before deployment.

Resource Map

API reference: references/api-reference.md
Operational controls and troubleshooting: references/api-reference.md
Dedicated model guide: references/dedicated-models.md
Hardware guide: references/hardware-options.md
Python endpoint lifecycle: scripts/manage_endpoint.py
TypeScript endpoint lifecycle: scripts/manage_endpoint.ts
Fine-tuned deployment: scripts/deploy_finetuned.py
Custom model upload and deployment: scripts/upload_custom_model.py

Official Docs

Dedicated Endpoints
Endpoints API
Upload and Deploy Custom Models

$

npx claudepluginhub togethercomputer/skills --plugin togetherai-skills

Similar Skills

model-deployment

748

Generates a Jupyter notebook to deploy LoRA fine-tuned models from SageMaker Serverless Model Customization to SageMaker endpoints or Bedrock, handling Nova vs OSS pathways.

9 files

View model-deployment

fal-serverless-guide

37

Deploys custom ML models on fal.ai serverless infrastructure using fal.App class, GPU selection (T4/A10G/A100/H100), setup for model loading, @fal.endpoint decorators, scaling config, secrets, persistent volumes, and fal deploy/run commands.

View fal-serverless-guide

hugging-face-jobs

41.0k

Runs Python workloads on Hugging Face managed infrastructure (CPUs, GPUs, TPUs) with Hub persistence. For batch inference, data processing, experiments, or any job without local GPU setup.

8 files

antigravity-awesome-skills

View hugging-face-jobs