HPA Flow
Overview
Traces the complete flow from a metric spike to a scaled Deployment — including the metrics pipeline (metrics-server or Prometheus Adapter), the HPA controller's calculation, and the resulting Deployment update.
HPA Architecture
┌───────────────────────────────────────────────┐
│ HPA Controller (in kube-controller-manager) │
│ │
│ every 15 seconds (default): │
│ 1. Fetch current metric from metrics API │
│ 2. Calculate desired replicas │
│ 3. Update Deployment.spec.replicas │
└───────────────────────────────────────────────┘
│ ▲
│ GET /apis/metrics.k8s.io │ metrics
▼ │
API Server metrics-server
│ (scrapes kubelet
│ cAdvisor every 60s)
│
custom metrics:
GET /apis/custom.metrics.k8s.io
│
Prometheus Adapter
(queries Prometheus)
Full HPA Sequence
metrics-server API Server HPA Controller Deployment Ctrl kubelet
│ │ │ │ │
│ every 60s: │ │ │ │
│ scrape kubelet │ │ │ │
│ cAdvisor │ │ │ │
│──PATCH ────────►│ │ │ │
│ PodMetrics │ │ │ │
│ (CPU, memory) │ │ │ │
│ │ │ │ │
│ │ │ every 15s: │ │
│ │◄── GET metrics ──│ │ │
│ │ /apis/metrics.k8s.io/v1beta1/ │ │
│ │ namespaces/production/pods │ │
│ │ │ │ │
│ │── metrics ──────►│ │ │
│ │ pod1: 450m CPU │ │ │
│ │ pod2: 480m CPU │ │ │
│ │ pod3: 490m CPU │ │ │
│ │ │ │ │
│ │ │ Calculate: │ │
│ │ │ avg = 473m │ │
│ │ │ target = 200m │ │
│ │ │ ratio = 473/200 = 2.37 │
│ │ │ desired = ceil(3 × 2.37) = 8 │
│ │ │ clamped to maxReplicas=10 → 8 │
│ │ │ │ │
│ │◄── PATCH ────────│ │ │
│ │ Deployment │ │ │
│ │ replicas: 3→8 │ │ │
│ │──WRITE ─────────────────────────────►(etcd) │
│ │ │ │ │
│ │──WATCH event ──────────────────────►│ │
│ │ (Deployment Modified) │ │
│ │ │ │
│ │ [Deployment controller reconcile] │
│ │ RS-current replicas: 3→8 │
│ │ 5 new pods created │
│ │ scheduled, started ────────────────►
│ │ │
│ [metrics fall after 5 minutes of sustained low load] │
│ │ │ │ │
│ │ │ Calculate: │ │
│ │ │ avg = 50m │ │
│ │ │ ratio = 50/200 = 0.25 │
│ │ │ desired = ceil(8 × 0.25) = 2 │
│ │ │ scaleDown delay: 300s (default) │
│ │ │ → wait 300s before scaling down │
│ │ │ │ │
│ │◄── PATCH ────────│ (after 300s) │ │
│ │ replicas: 8→2 │ │ │
HPA Calculation Formula
desiredReplicas = ceil(currentReplicas × (currentMetric / desiredMetric))
Example — CPU:
currentReplicas = 3
currentCPU = 473m (average across all pods)
targetCPU = 200m (from HPA spec)
ratio = 473 / 200 = 2.365
desired = ceil(3 × 2.365) = ceil(7.095) = 8
Tolerance band (default 10%):
If ratio is within [0.9, 1.1], no scaling action is taken
→ prevents thrashing on minor fluctuations
HPA Spec — CPU and Custom Metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: payments-api
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: payments-api
minReplicas: 3
maxReplicas: 20
metrics:
# CPU utilization (requires resource requests to be set)
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60 # 60% of CPU request
# Custom metric — RPS per pod (via Prometheus Adapter)
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000" # 1000 RPS per pod
# External metric — SQS queue depth
- type: External
external:
metric:
name: sqs_queue_depth
selector:
matchLabels:
queue: payments-jobs
target:
type: AverageValue
averageValue: "100" # 100 messages per pod
behavior:
scaleUp:
stabilizationWindowSeconds: 0 # scale up immediately
policies:
- type: Pods
value: 4
periodSeconds: 60 # max +4 pods per minute
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300 # wait 5 min before scaling down
policies:
- type: Percent
value: 10
periodSeconds: 60 # max -10% per minute
selectPolicy: Min
Scale-Down Stabilisation Window
Without stabilisation window:
t=0: metric high → scale to 10
t=1m: metric drops → scale to 2 immediately
t=2m: metric spikes → scale to 10 again
→ thrashing
With stabilizationWindowSeconds=300:
HPA tracks the maximum desiredReplicas over the last 300 seconds
Only scales down if the maximum desired over the window is lower
→ prevents thrashing during transient metric drops
Prometheus Adapter — Custom Metrics
# prometheus-adapter ConfigMap — expose RPS as custom metric
apiVersion: v1
kind: ConfigMap
metadata:
name: adapter-config
namespace: monitoring
data:
config.yaml: |
rules:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: |
sum by (<<.GroupBy>>) (
rate(<<.Series>>{<<.LabelMatchers>>}[2m])
)
# Verify custom metrics are available
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .
kubectl get --raw \
"/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" \
| jq '.items[].value'
HPA Status and Debugging
# Check HPA status (shows current vs target metrics)
kubectl get hpa payments-api -n production
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
# payments-api Deployment/payments-api 473m/200m(cpu) 3 20 8
# Detailed conditions
kubectl describe hpa payments-api -n production
# Conditions:
# AbleToScale: True (ReadyForNewScale)
# ScalingActive: True (ValidMetricFound: the HPA was able to successfully calculate...)
# ScalingLimited: False
# Common conditions and causes:
# ScalingActive=False, FailedGetScale → metrics-server not running
# ScalingActive=False, InvalidSelector → targetRef or metric selector wrong
# ScalingLimited=True, TooManyReplicas → hit maxReplicas
# ScalingLimited=True, TooFewReplicas → hit minReplicas
# Check if metrics-server is healthy
kubectl get pods -n kube-system -l k8s-app=metrics-server
kubectl top pods -n production # should show CPU values
VPA vs HPA Interaction Rule
DO NOT use both VPA (CPU scaling) and HPA (CPU scaling) on the same Deployment.
Why: Both controllers will fight — HPA scales replicas based on CPU utilisation,
while VPA adjusts CPU requests, changing the denominator of HPA's calculation.
This causes oscillation.
Safe combinations:
✓ HPA (CPU/RPS) + VPA (memory only, controlledValues: RequestsOnly)
✓ HPA (custom RPS metric) + VPA (CPU and memory)
✓ VPA (all resources) + no HPA
Related
- 07 — VPA — VPA configuration
- 01 — Capacity Planning — HPA + Karpenter interaction
- 05 — Rolling Update Flow — how replicas change reaches pods