From timelord
This skill should be used when the user asks about "Temporal sizing", "history shards", "cluster capacity", "Temporal resources", "scale Temporal", "Temporal performance", "how many shards", or needs guidance on capacity planning for Temporal clusters.
How this skill is triggered — by the user, by Claude, or both
Slash command
/timelord:cluster-sizingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Guidance for sizing Temporal clusters based on workload requirements.
Guidance for sizing Temporal clusters based on workload requirements.
| Factor | Impact | Cannot Change |
|---|---|---|
| History Shards | Workflow parallelism | Yes (set at creation) |
| History Replicas | Throughput, availability | No |
| Matching Replicas | Task dispatch rate | No |
| Frontend Replicas | API request rate | No |
| Database Size | History storage | No |
Critical: History shards cannot be changed after cluster creation.
Shards determine maximum workflow parallelism. Each workflow belongs to one shard.
| Concurrent Workflows | Recommended Shards |
|---|---|
| < 10,000 | 128 |
| 10,000 - 100,000 | 256 |
| 100,000 - 500,000 | 512 |
| 500,000 - 2,000,000 | 1024 |
| > 2,000,000 | 2048 or 4096 |
shards = ceil(max_concurrent_workflows / 1000) * safety_factor
# Round up to nearest power of 2
# safety_factor = 2-4x for growth
Example: Expecting 50,000 concurrent workflows with 3x growth:
base = 50,000 / 1000 = 50
with_growth = 50 * 3 = 150
nearest_power_of_2 = 256 shards
Shards distribute across history service replicas:
shards_per_replica = total_shards / history_replicas
# Example: 512 shards, 4 replicas = 128 shards/replica
More replicas = better distribution = higher throughput.
Handles API requests, authentication, rate limiting.
| Load Level | Replicas | CPU | Memory |
|---|---|---|---|
| Low (<100 rps) | 1-2 | 500m | 1Gi |
| Medium (100-1000 rps) | 3 | 1 | 2Gi |
| High (1000-5000 rps) | 5 | 2 | 4Gi |
| Very High (>5000 rps) | 10+ | 4 | 8Gi |
Manages workflow state and event history.
| Shards | Replicas | CPU/replica | Memory/replica |
|---|---|---|---|
| 128 | 2 | 1 | 2Gi |
| 256 | 3 | 2 | 4Gi |
| 512 | 4-6 | 2 | 4Gi |
| 1024 | 8-12 | 4 | 8Gi |
| 2048 | 16-24 | 4 | 8Gi |
Dispatches tasks to workers.
| Task Rate | Replicas | CPU | Memory |
|---|---|---|---|
| Low (<1000/s) | 2 | 500m | 1Gi |
| Medium (1000-10000/s) | 3 | 1 | 2Gi |
| High (>10000/s) | 5+ | 2 | 4Gi |
Handles internal system workflows. Scale with cluster size:
| Cluster Size | Replicas | CPU | Memory |
|---|---|---|---|
| Small | 1 | 200m | 256Mi |
| Medium | 1 | 500m | 512Mi |
| Large | 2 | 1 | 1Gi |
| Workflow Volume | CPU | Memory | Storage | IOPS |
|---|---|---|---|---|
| < 100K workflows | 2 | 8GB | 100GB | 3000 |
| 100K-1M workflows | 4 | 16GB | 500GB | 6000 |
| 1M-10M workflows | 8 | 32GB | 1TB | 12000 |
| > 10M workflows | 16+ | 64GB+ | 2TB+ | 20000+ |
storage_per_workflow = avg_history_events * event_size
= 100 events * 1KB = 100KB
total_storage = workflows * storage_per_workflow * retention_multiplier
= 1,000,000 * 100KB * 1.5 = 150GB
Retention: Configure appropriate workflow retention to manage storage.
For visibility queries (optional but recommended):
| Indexed Workflows | Nodes | CPU/node | Memory/node | Storage/node |
|---|---|---|---|---|
| < 1M | 3 | 1 | 2Gi | 50Gi |
| 1M-10M | 3 | 2 | 4Gi | 200Gi |
| > 10M | 5+ | 4 | 8Gi | 500Gi |
server:
config:
numHistoryShards: 128
replicaCount:
frontend: 1
history: 1
matching: 1
worker: 1
resources:
frontend:
requests: {cpu: "250m", memory: "512Mi"}
history:
requests: {cpu: "500m", memory: "1Gi"}
matching:
requests: {cpu: "250m", memory: "512Mi"}
server:
config:
numHistoryShards: 256
replicaCount:
frontend: 3
history: 3
matching: 3
worker: 1
resources:
frontend:
requests: {cpu: "500m", memory: "1Gi"}
limits: {cpu: "2", memory: "4Gi"}
history:
requests: {cpu: "1", memory: "2Gi"}
limits: {cpu: "4", memory: "8Gi"}
matching:
requests: {cpu: "500m", memory: "1Gi"}
limits: {cpu: "2", memory: "4Gi"}
server:
config:
numHistoryShards: 1024
replicaCount:
frontend: 5
history: 10
matching: 5
worker: 2
resources:
frontend:
requests: {cpu: "2", memory: "4Gi"}
limits: {cpu: "4", memory: "8Gi"}
history:
requests: {cpu: "4", memory: "8Gi"}
limits: {cpu: "8", memory: "16Gi"}
matching:
requests: {cpu: "2", memory: "4Gi"}
limits: {cpu: "4", memory: "8Gi"}
Scale replicas when:
Increase resources when:
Key metrics to watch:
# History service load
sum(rate(temporal_persistence_requests_total[5m])) by (operation)
# Task latency (indicates matching capacity)
histogram_quantile(0.99, rate(temporal_schedule_to_start_latency_bucket[5m]))
# Workflow throughput
sum(rate(temporal_workflow_completed_total[5m]))
# Shard distribution
temporal_history_shard_count
| Mistake | Impact | Solution |
|---|---|---|
| Too few shards | Cannot scale later | Start with more shards |
| Undersized history | Latency spikes | Increase memory, replicas |
| Single frontend | Single point of failure | Minimum 2 for HA |
| No Elasticsearch | Slow visibility queries | Enable for production |
For detailed sizing calculations, consult:
references/sizing-calculator.md - Detailed sizing formulasreferences/benchmark-results.md - Performance benchmark datanpx claudepluginhub therealbill/mynet --plugin timelordDevelop and manage Temporal workflows and activities using Python, TypeScript, Go, Java, .NET, or Ruby SDKs. Debug non-determinism, stuck workflows, activity retries. Run Temporal CLI commands and dev server.
Guides Qdrant scaling decisions for data volume, query throughput, latency, and query volume. Use when data doesn't fit on one node, cluster is slow, or capacity needs assessment.
Implements durable Temporal workflows using Python SDK: sagas, distributed transactions, async/await, testing strategies, and production deployment.