From EKS Auto Mode Skills
Guides users from zero to a running EKS Auto Mode cluster: covers concepts, example selection, deployment steps, and first-day troubleshooting.
How this skill is triggered — by the user, by Claude, or both
Slash command
/eks-automode:eks-automode-onboardThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Help users go from zero to a running EKS Auto Mode cluster using this repository.
Help users go from zero to a running EKS Auto Mode cluster using this repository. Cover concepts, deployment, example selection, and first-encounter troubleshooting.
EKS Auto Mode runs these components inside the AWS-managed control plane so you never install, configure, or upgrade them:
| Component | What it does |
|---|---|
| Karpenter | Provisions, scales, and consolidates EC2 compute |
| VPC CNI | Pod networking, IP allocation, security groups |
| EBS CSI driver | Persistent volume provisioning from PVCs |
| ALB controller | Creates ALBs/NLBs from Ingress and Service resources |
| CoreDNS | Cluster DNS (runs as a node-level service, not pods) |
| Pod Identity Agent | Fine-grained IAM for pods without manual IRSA |
| Node health monitor | Detects and replaces unhealthy nodes |
| AMI lifecycle | Picks correct AMI, patches weekly, remediates drift |
You interact via standard K8s APIs (NodePool, NodeClass, Ingress, StorageClass). AWS handles the operational lifecycle behind those APIs.
| Use case | Example directory | What it deploys |
|---|---|---|
| First deployment / validation | examples/graviton/ | 2048 game on ARM64 Graviton |
| Fault-tolerant batch / dev | examples/spot/ | 2048 game on Spot instances |
| ML inference (NVIDIA GPU) | examples/gpu/ | Qwen 3 on GPU |
| ML inference (Inferentia2) | examples/neuron/ | DeepSeek-R1-Qwen3-8B via vLLM |
| OD + Spot mix with headroom | examples/cost-optimization/ | Weighted priority pools + pause-pod |
| Pin to reserved capacity | examples/capacity-reservation/ | ODCR-targeted NodePool |
| Fixed always-on fleet | examples/static-capacity/ | spec.replicas nodes, no consolidation |
| Protect long-running jobs | examples/batch-jobs/ | do-not-disrupt annotation pattern |
| Limit drain concurrency | examples/disruption-budgets/ | NodePool disruption budget config |
| CPU autoscaling + SQS-driven | examples/pod-autoscaling/ | HPA + KEDA ScaledObjects |
| Metrics, logs, tracing | examples/observability/ | CloudWatch Container Insights |
Start with examples/graviton/ if you just want to validate the cluster works.
git clone https://github.com/aws-samples/sample-aws-eks-auto-mode.git && cd sample-aws-eks-auto-mode
Edit terraform/terraform.tfvars (or pass -var flags). The important variables:
| Variable | Purpose | Default |
|---|---|---|
name | Cluster and VPC name | automode-cluster |
region | AWS region | us-west-2 |
eks_cluster_version | K8s version (minimum 1.29 for Auto Mode) | 1.34 |
tags | Tags applied everywhere via 5-layer pattern | {"auto-delete"="never"} |
base_domain | Route53 zone for public HTTPS (leave empty for internal-only) | "" |
enable_observability | CloudWatch Container Insights | false |
cd terraform && terraform init && terraform apply -auto-approve
Terraform creates the VPC, EKS cluster, IAM roles, NodePools, NodeClasses, StorageClasses, and IngressClasses. Takes 12-18 minutes.
$(terraform output -raw configure_kubectl)
kubectl get nodes # Should show zero nodes (no workloads yet)
kubectl get nodepools # Shows general-purpose + any custom pools
kubectl get nodeclasses # Shows default + any custom classes
kubectl get storageclass # Should list ebs-csi class
kubectl get ingressclass # Should list alb class
kubectl apply -f examples/graviton/
kubectl get pods -n game-2048 -w # Watch Auto Mode provision a Graviton node
A node appears within 60-90 seconds. The pod transitions to Running once the node joins and passes readiness checks.
| You manage | AWS manages |
|---|---|
| NodePool specs (instance families, AZs, taints) | Actual instance launches + termination |
| NodeClass specs (tags, storage config) | AMI selection, patching, drift remediation |
| Ingress / Service manifests | ALB/NLB creation, TLS termination, target registration |
| PVC manifests + StorageClass | EBS volume provisioning, attach, detach |
| Pod specs and scheduling constraints | Node health monitoring + auto-repair |
| Cluster version upgrades (EKS console/API) | Component version upgrades (Karpenter, CNI, CSI, CoreDNS) |
NodePool (what to launch) NodeClass (how to launch)
- instance families - subnet discovery rules
- architectures (amd64/arm64) - security group discovery
- capacity types (on-demand/spot) - ephemeral storage config
- taints and labels - tags pushed to EC2/EBS/ENI
- disruption settings - IMDS settings
- weight (priority) - KMS key for storage
| |
+---- nodeClassRef.name -----------+
A NodePool references exactly one NodeClass. Multiple NodePools can share a NodeClass.
The default NodePool and default NodeClass are AWS-managed. Do not edit the
default NodeClass (changes revert silently within minutes). Create custom
NodeClasses for durable customization.
This repo uses Terraform's templatefile() to render K8s manifests:
nodepool-templates/*.yaml.tpl (source templates)
|
v terraform apply (templatefile + local_file)
|
nodepools/*.yaml (rendered manifests applied by kubectl_manifest)
If you edit a .tpl file, the rendered YAML does not update until you run
terraform apply. Never edit the rendered YAML directly; it will be overwritten.
Pods stuck Pending -- check nodeSelector and node.kubernetes.io/instance-type
labels. Auto Mode only provisions nodes that match a NodePool. If no pool matches
your pod's constraints, it stays Pending forever.
EBS StorageClass provisioner name -- use ebs.csi.eks.amazonaws.com, NOT
ebs.csi.aws.com. The latter is the self-managed driver. Auto Mode uses a
different provisioner name.
Editing the default NodeClass -- your changes revert silently. Always create a named custom NodeClass.
Tags not landing on resources -- you need the IAM custom-tags policy
(enable_auto_mode_custom_tags=true in the module). Without it, any custom tag
key outside eks:*, kubernetes.io/*, karpenter.sh/* is silently denied.
No SSH/SSM access to nodes -- nodes are Bottlerocket and read-only. Use
kubectl debug node/<name> or the NodeDiagnostic resource for troubleshooting.
Rendered YAML stale after template edit -- run terraform apply to
re-render. Applying stale YAML silently reverts your fix.
LB not provisioning -- subnets need kubernetes.io/role/elb: "1" (public)
or kubernetes.io/role/internal-elb: "1" (private) tags. This repo adds them
automatically, but custom VPCs may not.
references/concepts.mdreferences/deployment-guide.mdreferences/troubleshooting.mdclaude-md/TAGGING.mdclaude-md/CLEANUP.mdnpx claudepluginhub aws-samples/sample-aws-eks-auto-mode --plugin eks-automodeGuides on Azure Kubernetes Service (AKS) Automatic mode GA 2025: Karpenter autoscaling, HPA/VPA/KEDA, workload identity, networking, billing model, and cluster creation via az CLI.
Maintains sample-aws-eks-auto-mode repo by keeping docs, templates, rendered YAML, and tagging layers in sync. Use when updating nodepool templates, terraform config, examples, tagging, cleanup scripts, or docs.
Reviews Amazon EKS Kubernetes platform operations: cluster access, IRSA, pod identity, node groups, Karpenter, autoscaling, network policy, upgrades, reliability, observability, and cost. Activates automatically on EKS/Kubernetes topics.