How to Control Kubernetes Costs: Resource Limits, Autoscaling, and Spot Nodes
Reduce Kubernetes infrastructure costs by 40-65% with proper resource requests, cluster autoscaling, spot node pools, and namespace quotas.
Kubernetes makes it trivially easy to waste money. Every pod without resource limits is an open checkbook. Every idle node is a bill you’re paying for nothing. This guide shows you exactly how to bring costs under control.
Step 1: Set Resource Requests and Limits on Every Pod
The single most important cost control mechanism in Kubernetes. Without resource requests, the scheduler can’t pack pods efficiently. Without limits, a single pod can consume an entire node.
1.1 Determine Actual Usage
# Get CPU/memory usage for all pods in a namespace
kubectl top pods -n production --sort-by=cpu
# Get node-level utilization
kubectl top nodes
# For historical data, use Prometheus queries
# container_cpu_usage_seconds_total
# container_memory_working_set_bytes
1.2 Apply Resource Specs
# Deployment with proper resource management
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
spec:
replicas: 3
template:
spec:
containers:
- name: api
image: api:v2.1
resources:
requests: # Used for scheduling
cpu: "250m" # 0.25 CPU cores
memory: "512Mi" # 512 MB
limits: # Hard ceiling
cpu: "1000m" # 1 CPU core
memory: "1Gi" # 1 GB
:::tip[Sizing Strategy] Set requests to the P50 (median) usage and limits to the P99 (peak) usage. This ensures efficient packing while preventing OOMKilled situations. :::
Step 2: Implement Namespace Resource Quotas
Prevent any single team from consuming the entire cluster.
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-alpha-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "16"
requests.memory: "32Gi"
limits.cpu: "32"
limits.memory: "64Gi"
pods: "50"
persistentvolumeclaims: "20"
# LimitRange sets defaults for pods that don't specify resources
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-alpha
spec:
limits:
- default: # Default limits
cpu: "500m"
memory: "512Mi"
defaultRequest: # Default requests
cpu: "100m"
memory: "128Mi"
type: Container
Step 3: Enable Cluster Autoscaler
The Cluster Autoscaler automatically adjusts the number of nodes based on pending pods.
3.1 Azure AKS
az aks update \
--resource-group myRG \
--name myCluster \
--enable-cluster-autoscaler \
--min-count 2 \
--max-count 20
3.2 AWS EKS
# Cluster Autoscaler deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
template:
spec:
containers:
- name: cluster-autoscaler
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
command:
- ./cluster-autoscaler
- --cloud-provider=aws
- --nodes=2:20:eks-nodegroup
- --scale-down-delay-after-add=5m
- --scale-down-unneeded-time=5m
- --skip-nodes-with-local-storage=false
3.3 Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 65
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75
Step 4: Use Spot/Preemptible Node Pools
Spot nodes provide 60-90% savings for fault-tolerant workloads.
4.1 Create a Spot Node Pool
# AKS Spot Pool
az aks nodepool add \
--resource-group myRG \
--cluster-name myCluster \
--name spotpool \
--priority Spot \
--eviction-policy Delete \
--spot-max-price -1 \
--min-count 0 \
--max-count 15 \
--node-vm-size Standard_D4s_v5
# EKS Spot Instances via managed node group
eksctl create nodegroup \
--cluster myCluster \
--name spot-workers \
--instance-types m5.xlarge,m5a.xlarge,m5d.xlarge \
--spot \
--min-size 0 \
--max-size 15
4.2 Schedule Tolerant Workloads on Spot
spec:
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "kubernetes.azure.com/scalesetpriority"
operator: In
values: ["spot"]
Step 5: Implement Pod Disruption Budgets
Protect critical services during node scale-down and spot evictions.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2 # Always keep 2 pods running
selector:
matchLabels:
app: api-service
Step 6: Schedule Non-Production Shutdown
Dev/staging clusters don’t need to run 24/7.
#!/bin/bash
# Cron: scale down dev at 8 PM, scale up at 7 AM
# Scale down
kubectl scale deployment --all --replicas=0 -n dev
kubectl scale deployment --all --replicas=0 -n staging
# Scale up (separate cron job)
kubectl scale deployment --all --replicas=1 -n dev
kubectl scale deployment --all --replicas=1 -n staging
Cost Optimization Checklist
- Resource requests/limits on every container
- Namespace ResourceQuotas for team budgets
- LimitRanges for default container limits
- Cluster Autoscaler enabled with appropriate min/max
- HPA for variable-traffic deployments
- Spot node pools for fault-tolerant workloads
- PodDisruptionBudgets on critical services
- Dev/staging scheduled shutdown
- Regular review of
kubectl topdata
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For enterprise Kubernetes cost audits, visit garnetgrid.com. :::