How to Control Kubernetes Costs: Resource Limits, Autoscaling, and Spot Nodes

Kubernetes makes it trivially easy to waste money. Every pod without resource limits is an open checkbook. Every idle node is a bill you’re paying for nothing. This guide shows you exactly how to bring costs under control.

Step 1: Set Resource Requests and Limits on Every Pod

The single most important cost control mechanism in Kubernetes. Without resource requests, the scheduler can’t pack pods efficiently. Without limits, a single pod can consume an entire node.

1.1 Determine Actual Usage

# Get CPU/memory usage for all pods in a namespace
kubectl top pods -n production --sort-by=cpu

# Get node-level utilization
kubectl top nodes

# For historical data, use Prometheus queries
# container_cpu_usage_seconds_total
# container_memory_working_set_bytes

1.2 Apply Resource Specs

# Deployment with proper resource management
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: api
          image: api:v2.1
          resources:
            requests:           # Used for scheduling
              cpu: "250m"       # 0.25 CPU cores
              memory: "512Mi"   # 512 MB
            limits:             # Hard ceiling
              cpu: "1000m"      # 1 CPU core
              memory: "1Gi"     # 1 GB

:::tip[Sizing Strategy] Set requests to the P50 (median) usage and limits to the P99 (peak) usage. This ensures efficient packing while preventing OOMKilled situations. :::

Step 2: Implement Namespace Resource Quotas

Prevent any single team from consuming the entire cluster.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "16"
    requests.memory: "32Gi"
    limits.cpu: "32"
    limits.memory: "64Gi"
    pods: "50"
    persistentvolumeclaims: "20"

# LimitRange sets defaults for pods that don't specify resources
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-alpha
spec:
  limits:
    - default:          # Default limits
        cpu: "500m"
        memory: "512Mi"
      defaultRequest:   # Default requests
        cpu: "100m"
        memory: "128Mi"
      type: Container

Step 3: Enable Cluster Autoscaler

The Cluster Autoscaler automatically adjusts the number of nodes based on pending pods.

3.1 Azure AKS

az aks update \
  --resource-group myRG \
  --name myCluster \
  --enable-cluster-autoscaler \
  --min-count 2 \
  --max-count 20

3.2 AWS EKS

# Cluster Autoscaler deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: cluster-autoscaler
          image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
          command:
            - ./cluster-autoscaler
            - --cloud-provider=aws
            - --nodes=2:20:eks-nodegroup
            - --scale-down-delay-after-add=5m
            - --scale-down-unneeded-time=5m
            - --skip-nodes-with-local-storage=false

3.3 Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 65
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 75

Step 4: Use Spot/Preemptible Node Pools

Spot nodes provide 60-90% savings for fault-tolerant workloads.

4.1 Create a Spot Node Pool

# AKS Spot Pool
az aks nodepool add \
  --resource-group myRG \
  --cluster-name myCluster \
  --name spotpool \
  --priority Spot \
  --eviction-policy Delete \
  --spot-max-price -1 \
  --min-count 0 \
  --max-count 15 \
  --node-vm-size Standard_D4s_v5

# EKS Spot Instances via managed node group
eksctl create nodegroup \
  --cluster myCluster \
  --name spot-workers \
  --instance-types m5.xlarge,m5a.xlarge,m5d.xlarge \
  --spot \
  --min-size 0 \
  --max-size 15

4.2 Schedule Tolerant Workloads on Spot

spec:
  tolerations:
    - key: "kubernetes.azure.com/scalesetpriority"
      operator: "Equal"
      value: "spot"
      effect: "NoSchedule"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: "kubernetes.azure.com/scalesetpriority"
                operator: In
                values: ["spot"]

Step 5: Implement Pod Disruption Budgets

Protect critical services during node scale-down and spot evictions.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2          # Always keep 2 pods running
  selector:
    matchLabels:
      app: api-service

Step 6: Schedule Non-Production Shutdown

Dev/staging clusters don’t need to run 24/7.

#!/bin/bash
# Cron: scale down dev at 8 PM, scale up at 7 AM
# Scale down
kubectl scale deployment --all --replicas=0 -n dev
kubectl scale deployment --all --replicas=0 -n staging

# Scale up (separate cron job)
kubectl scale deployment --all --replicas=1 -n dev
kubectl scale deployment --all --replicas=1 -n staging

Cost Optimization Checklist

Resource requests/limits on every container
Namespace ResourceQuotas for team budgets
LimitRanges for default container limits
Cluster Autoscaler enabled with appropriate min/max
HPA for variable-traffic deployments
Spot node pools for fault-tolerant workloads
PodDisruptionBudgets on critical services
Dev/staging scheduled shutdown
Regular review of kubectl top data

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For enterprise Kubernetes cost audits, visit garnetgrid.com. :::