Overview

Traces the complete flow from a metric spike to a scaled Deployment — including the metrics pipeline (metrics-server or Prometheus Adapter), the HPA controller's calculation, and the resulting Deployment update.

HPA Architecture

                    ┌───────────────────────────────────────────────┐
                    │          HPA Controller (in kube-controller-manager)  │
                    │                                               │
                    │  every 15 seconds (default):                 │
                    │  1. Fetch current metric from metrics API     │
                    │  2. Calculate desired replicas               │
                    │  3. Update Deployment.spec.replicas          │
                    └───────────────────────────────────────────────┘
                         │                    ▲
                         │ GET /apis/metrics.k8s.io    │ metrics
                         ▼                    │
                    API Server            metrics-server
                         │               (scrapes kubelet
                         │                cAdvisor every 60s)
                         │
                    custom metrics:
                    GET /apis/custom.metrics.k8s.io
                         │
                    Prometheus Adapter
                    (queries Prometheus)

Full HPA Sequence

metrics-server      API Server       HPA Controller    Deployment Ctrl    kubelet
      │                 │                  │                  │               │
      │  every 60s:     │                  │                  │               │
      │  scrape kubelet │                  │                  │               │
      │  cAdvisor       │                  │                  │               │
      │──PATCH ────────►│                  │                  │               │
      │  PodMetrics     │                  │                  │               │
      │  (CPU, memory)  │                  │                  │               │
      │                 │                  │                  │               │
      │                 │                  │ every 15s:       │               │
      │                 │◄── GET metrics ──│                  │               │
      │                 │  /apis/metrics.k8s.io/v1beta1/      │               │
      │                 │  namespaces/production/pods         │               │
      │                 │                  │                  │               │
      │                 │── metrics ──────►│                  │               │
      │                 │   pod1: 450m CPU │                  │               │
      │                 │   pod2: 480m CPU │                  │               │
      │                 │   pod3: 490m CPU │                  │               │
      │                 │                  │                  │               │
      │                 │                  │ Calculate:       │               │
      │                 │                  │ avg = 473m       │               │
      │                 │                  │ target = 200m    │               │
      │                 │                  │ ratio = 473/200 = 2.37          │
      │                 │                  │ desired = ceil(3 × 2.37) = 8    │
      │                 │                  │ clamped to maxReplicas=10 → 8   │
      │                 │                  │                  │               │
      │                 │◄── PATCH ────────│                  │               │
      │                 │  Deployment      │                  │               │
      │                 │  replicas: 3→8   │                  │               │
      │                 │──WRITE ─────────────────────────────►(etcd)         │
      │                 │                  │                  │               │
      │                 │──WATCH event ──────────────────────►│               │
      │                 │  (Deployment Modified)              │               │
      │                 │                                     │               │
      │                 │                  [Deployment controller reconcile]  │
      │                 │                  RS-current replicas: 3→8           │
      │                 │                  5 new pods created                 │
      │                 │                  scheduled, started ────────────────►
      │                 │                                                      │
      │  [metrics fall after 5 minutes of sustained low load]                 │
      │                 │                  │                  │               │
      │                 │                  │ Calculate:       │               │
      │                 │                  │ avg = 50m        │               │
      │                 │                  │ ratio = 50/200 = 0.25           │
      │                 │                  │ desired = ceil(8 × 0.25) = 2    │
      │                 │                  │ scaleDown delay: 300s (default) │
      │                 │                  │ → wait 300s before scaling down │
      │                 │                  │                  │               │
      │                 │◄── PATCH ────────│ (after 300s)     │               │
      │                 │  replicas: 8→2   │                  │               │

HPA Calculation Formula

desiredReplicas = ceil(currentReplicas × (currentMetric / desiredMetric))

Example — CPU:
  currentReplicas = 3
  currentCPU = 473m (average across all pods)
  targetCPU = 200m (from HPA spec)
  ratio = 473 / 200 = 2.365
  desired = ceil(3 × 2.365) = ceil(7.095) = 8

Tolerance band (default 10%):
  If ratio is within [0.9, 1.1], no scaling action is taken
  → prevents thrashing on minor fluctuations

HPA Spec — CPU and Custom Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: payments-api
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payments-api
  minReplicas: 3
  maxReplicas: 20

  metrics:
  # CPU utilization (requires resource requests to be set)
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60    # 60% of CPU request

  # Custom metric — RPS per pod (via Prometheus Adapter)
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"      # 1000 RPS per pod

  # External metric — SQS queue depth
  - type: External
    external:
      metric:
        name: sqs_queue_depth
        selector:
          matchLabels:
            queue: payments-jobs
      target:
        type: AverageValue
        averageValue: "100"       # 100 messages per pod

  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0     # scale up immediately
      policies:
      - type: Pods
        value: 4
        periodSeconds: 60               # max +4 pods per minute
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300   # wait 5 min before scaling down
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60               # max -10% per minute
      selectPolicy: Min

Scale-Down Stabilisation Window

Without stabilisation window:
  t=0:  metric high → scale to 10
  t=1m: metric drops → scale to 2 immediately
  t=2m: metric spikes → scale to 10 again
  → thrashing

With stabilizationWindowSeconds=300:
  HPA tracks the maximum desiredReplicas over the last 300 seconds
  Only scales down if the maximum desired over the window is lower
  → prevents thrashing during transient metric drops

Prometheus Adapter — Custom Metrics

# prometheus-adapter ConfigMap — expose RPS as custom metric
apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: monitoring
data:
  config.yaml: |
    rules:
    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: "^(.*)_total$"
        as: "${1}_per_second"
      metricsQuery: |
        sum by (<<.GroupBy>>) (
          rate(<<.Series>>{<<.LabelMatchers>>}[2m])
        )
# Verify custom metrics are available
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .
kubectl get --raw \
  "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" \
  | jq '.items[].value'

HPA Status and Debugging

# Check HPA status (shows current vs target metrics)
kubectl get hpa payments-api -n production
# NAME           REFERENCE                   TARGETS         MINPODS  MAXPODS  REPLICAS
# payments-api   Deployment/payments-api     473m/200m(cpu)  3        20       8

# Detailed conditions
kubectl describe hpa payments-api -n production
# Conditions:
#   AbleToScale: True (ReadyForNewScale)
#   ScalingActive: True (ValidMetricFound: the HPA was able to successfully calculate...)
#   ScalingLimited: False

# Common conditions and causes:
# ScalingActive=False, FailedGetScale → metrics-server not running
# ScalingActive=False, InvalidSelector → targetRef or metric selector wrong
# ScalingLimited=True, TooManyReplicas → hit maxReplicas
# ScalingLimited=True, TooFewReplicas → hit minReplicas

# Check if metrics-server is healthy
kubectl get pods -n kube-system -l k8s-app=metrics-server
kubectl top pods -n production    # should show CPU values

VPA vs HPA Interaction Rule

DO NOT use both VPA (CPU scaling) and HPA (CPU scaling) on the same Deployment.

Why: Both controllers will fight — HPA scales replicas based on CPU utilisation,
     while VPA adjusts CPU requests, changing the denominator of HPA's calculation.
     This causes oscillation.

Safe combinations:
  ✓ HPA (CPU/RPS) + VPA (memory only, controlledValues: RequestsOnly)
  ✓ HPA (custom RPS metric) + VPA (CPU and memory)
  ✓ VPA (all resources) + no HPA