Skill

cold-start-budget-reference

Pure-reference catalog of cold-start budgets across serverless runtimes. Covers AWS Lambda's three-phase cold start (Init: download+unzip+runtime-bootstrap; Init code: imports + module load; Invoke: handler execution), Cloudflare Workers' isolate model (sub-millisecond cold starts via V8 isolates per developers.cloudflare.com), Vercel Edge Runtime, Lambda SnapStart for JVM (snapshot-restore for Java), and provisioned-concurrency trade-offs. Includes per-runtime typical cold-start ranges and the testable behaviours each model creates. Use when designing latency budgets, choosing a runtime, or auditing cold-start variance in production.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/qa-serverless:cold-start-budget-reference

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Per [aws.amazon.com/blogs/compute on cold starts](https://aws.amazon.com/blogs/compute/),

SKILL.md

170 lines · ~1.9k tokens

Stats

LanguagePython

Parent stars0

MaintenanceExcellent

Last CommitJun 4, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

cold-start-budget-reference

Overview

Per aws.amazon.com/blogs/compute on cold starts, Lambda's cold start has three phases:

Init - download deployment package, unzip, bootstrap the runtime (Node/Python/Java/etc.).
Init code - execute module-level imports + global setup.
Invoke - the actual handler call.

Phases 1 + 2 are the "cold" part. Phase 3 is what runs every invocation (cold or warm).

When to use

Designing a latency budget for a Lambda / Workers / Edge function.
Investigating "p95 is fine but p99 is 5s."
Choosing a runtime - Cloudflare's isolate model is qualitatively different from Lambda containers.
Auditing provisioned-concurrency / SnapStart configurations.

Per-runtime cold-start budgets

Per AWS, Cloudflare, Vercel docs (typical ranges; bigger packages and bigger memory-class skew higher):

Runtime	Typical cold start	Architecture
AWS Lambda Node.js (256MB)	200-700ms	Container (Firecracker microVM)
AWS Lambda Python (256MB)	250-800ms	Container
AWS Lambda Java 11 (512MB, no SnapStart)	1.5-6s	Container + JVM warmup
AWS Lambda Java 11 (512MB, SnapStart)	100-300ms	Snapshot-restore per docs.aws.amazon.com/lambda
AWS Lambda .NET (1GB)	1-3s	Container + .NET runtime
AWS Lambda Go (256MB)	100-300ms	Container; pre-compiled binary
AWS Lambda Rust (256MB)	50-200ms	Container; pre-compiled binary
Cloudflare Workers	0-5ms (V8 isolate spawn)	V8 isolate per developers.cloudflare.com/workers
Vercel Edge Runtime	5-30ms	V8 isolate (similar to Workers)
Vercel Node.js Functions	200-500ms (small) to 2-3s (large)	Lambda under the hood
Netlify Functions	300ms-2s	Lambda under the hood

The "Workers / Edge" qualitative leap is the isolate model: each function is a V8 isolate, spun up in microseconds per developers.cloudflare.com - no container, no OS startup.

Mitigations

Provisioned concurrency (AWS Lambda)

Per docs.aws.amazon.com/lambda: keeps N execution environments pre-initialised. Eliminates cold starts up to N concurrent requests; you pay for the keep-warm time.

Trade-off: cost. A constant N=10 provisioned concurrency for 30 days ≈ $30-300 depending on memory class.

Lambda SnapStart (Java / .NET)

Per docs.aws.amazon.com/lambda/latest/dg/snapstart.html: takes a snapshot of the initialised JVM and restores from it on each cold start. Reduces Java cold starts from 1.5-6s → 100-300ms.

Caveats:

State snapshot includes connections, random seeds; can't have per-instance unique values frozen.
Hooks beforeCheckpoint / afterRestore let you re-prime non-serializable state.

Package-size discipline

Lambda cold-start scales with deployment-package size. Per AWS: keep under 50MB (zipped) → cold start in the 200-800ms range. Larger packages → seconds.

Avoid heavy module-level imports

Init code runs once but on every cold start. Heavy imports (database connection pool init, large dependency trees) inflate init time.

# Bad: top-level
import heavy_lib            # 2s import time
def handler(event, ctx):
    return heavy_lib.do_thing(event)

# Better: lazy-import
def handler(event, ctx):
    import heavy_lib
    return heavy_lib.do_thing(event)

Lazy imports add per-warm-call latency but reduce cold-start spike.

Runtime choice

Pre-compiled runtimes (Go, Rust) have far lower cold starts than managed runtimes (Java, Python). For latency-critical paths, runtime choice is a primary lever.

Testable behaviours

Behaviour	Test
Cold start within budget	Force cold (deploy or wait > idle-evict time); measure p95 first-invocation latency
Warm performance	Subsequent invocations (50+) → p95 well within prod budget
SnapStart effective	Pre/post SnapStart cold start delta
Provisioned concurrency keeps warm	Run for an hour; no cold-start spikes observed
Package size in budget	Build-step assertion: zipped artifact < 50MB
No heavy init-time imports	Profile init phase; assert < 500ms

Anti-patterns

Anti-pattern	Why it fails	Fix
p99 latency surprise	Cold starts at the tail; not visible in p50/p95	Watch p99; explicit cold-start monitoring (CloudWatch Init Duration metric)
Large dependency tree on init path	Cold start inflated 2-5x	Audit imports; lazy-import non-critical
Java on Lambda without SnapStart	5s cold starts	Enable SnapStart
Provisioned concurrency without size analysis	Pay for unused warm instances	Tune to actual concurrency p99
Cold-start test only on the local dev environment	Local doesn't simulate Lambda init	Deploy + test against AWS / Workers / Edge
Treat cold starts as "rare"	Bursty traffic → cold starts cluster	Account for both steady-state and burst patterns
Ignore module bundling	Webpack-bundled is smaller AND has fewer import resolution hops	Bundle for production Lambdas

Limitations

Cold-start measurement is platform-side. CloudWatch Init Duration metric is canonical for Lambda; Workers / Edge expose their own.
SnapStart caveats are subtle. State that survives the snapshot may be wrong (random seeds, connection state).
Provisioned-concurrency is regional. Multi-region Lambdas need PC per region.
Workers / Edge are not free of all variance. First request per (script, region) still has 5-30ms init.
Doesn't address steady-state throughput. Cold start is one metric; concurrency-limit + duration are separate.

References

AWS Lambda cold-start blog: aws.amazon.com/blogs/compute.
Lambda SnapStart: docs.aws.amazon.com/lambda/latest/dg/snapstart.html.
Cloudflare Workers isolate model: developers.cloudflare.com/workers/learning/how-workers-works.
Vercel Edge Runtime: vercel.com/docs/functions/edge-runtime.
Companion catalog: lambda-timeout-budget-reference.
Consumed by: aws-sam-local-testing, lambda-test-tools-net, cloudflare-workers-miniflare, vercel-edge-runtime-testing, netlify-functions-test, serverless-framework-test-plugin, serverless-integration-test-builder.

cold-start-budget-reference

Invocation

Context Preview

SKILL.md

cold-start-budget-reference

Invocation

Context Preview

SKILL.md

cold-start-budget-reference

Overview

When to use

Per-runtime cold-start budgets

Mitigations

Provisioned concurrency (AWS Lambda)

Lambda SnapStart (Java / .NET)

Package-size discipline

Avoid heavy module-level imports

Runtime choice

Testable behaviours

Anti-patterns

Limitations

References

Similar Skills

cold-start-budget-reference

Overview

When to use

Per-runtime cold-start budgets

Mitigations

Provisioned concurrency (AWS Lambda)

Lambda SnapStart (Java / .NET)

Package-size discipline

Avoid heavy module-level imports

Runtime choice

Testable behaviours

Anti-patterns

Limitations

References

Similar Skills