📋 Page Coverage Checklist

HPA v2 API (autoscaling/v2, GA 1.23): full metric type taxonomy

Control loop: metrics pipeline, sync period, desiredReplicas formula

Resource metric type: CPU and memory targeting with utilization vs average value

ContainerResource metric type: per-container CPU/memory targeting

Pods metric type: custom per-pod metrics (e.g., requests-per-second)

Object metric type: metrics from a single Kubernetes object

External metric type: metrics from outside the cluster (cloud queues, etc.)

Scaling behavior: scaleUp/scaleDown policies, stabilizationWindowSeconds, selectPolicy

Flap prevention: scale-down stabilization window (300s default)

Deployment spec.replicas ownership conflict with SSA

HPA status fields: currentReplicas, desiredReplicas, conditions, currentMetrics

Prometheus Adapter: custom.metrics.k8s.io API registration, SeriesQuery/MetricNameRules

KEDA: ScaledObject, external scalers (RabbitMQ, SQS, Kafka, Cron, Prometheus)

HPA + VPA conflict and resolution strategies

Scale-to-zero with KEDA; wakeup latency considerations

5 metrics + 4 alerting rules + 5 runbooks + 8 best practices

Horizontal Pod Autoscaler

Scale workload replicas automatically based on resource utilization and custom metrics

autoscaling/v2 GA 1.23 Platform Engineer

The Horizontal Pod Autoscaler (HPA) adjusts the replicas field of a scalable workload (Deployment, StatefulSet, ReplicaSet, or any resource implementing the scale subresource) in response to observed metrics. It operates as a control loop in kube-controller-manager, periodically comparing current metric values against targets and computing a new desired replica count. The autoscaling/v2 API (GA since 1.23) supports multiple metric sources, fine-grained scaling behavior, and per-container metrics.

Control Loop & Metrics Pipeline

HPA control loop (default sync period: 15s) ┌──────────────────────────────────────────────────────────────┐ │ HPA Controller (kube-controller-manager) │ │ │ │ 1. Fetch current metrics for each metric source: │ │ ├─ Resource metrics → metrics-server (metrics.k8s.io) │ │ ├─ Custom metrics → Prometheus Adapter │ │ │ (custom.metrics.k8s.io) │ │ └─ External metrics → Adapter (external.metrics.k8s.io)│ │ │ │ 2. For each metric source compute: │ │ desiredReplicas = ceil(currentReplicas │ │ × currentMetricValue / targetValue) │ │ │ │ 3. Take the MAXIMUM desired replicas across all metrics │ │ (any metric can drive scale-up) │ │ │ │ 4. Clamp to [minReplicas, maxReplicas] │ │ │ │ 5. Apply scaling behavior constraints │ │ (stabilization window, rate limits) │ │ │ │ 6. If desiredReplicas ≠ currentReplicas → update .replicas │ └──────────────────────────────────────────────────────────────┘ metrics-server aggregates kubelet /metrics/resource every 60s Prometheus Adapter bridges Prometheus → custom.metrics.k8s.io

Replica Calculation Formula

For a utilization target (percentage of requests):

desiredReplicas = ceil(currentReplicas × (currentUtilization / targetUtilization))

For an average value target:

desiredReplicas = ceil(currentReplicas × (totalMetricValue / (targetAverageValue × currentReplicas)))
                = ceil(totalMetricValue / targetAverageValue)

Pods in non-Ready state (Pending, Terminating, or recently started) are excluded from metric averaging to avoid false scale-ups during rollouts.

Missing metrics = optimistic default
If metric data is unavailable for a pod (e.g., pod just started, metrics-server lag), that pod's metric is assumed to be at 100% of target for scale-up calculations and 0% for scale-down. This prevents premature scale-down during rollouts. If all pods' metrics are unavailable, the HPA takes no action (neither scales up nor down).

HPA Spec — Full Reference

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: production
spec:
  # --- Target workload ---
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server

  # --- Replica bounds ---
  minReplicas: 3          # Never scale below this (default: 1)
  maxReplicas: 50         # Hard ceiling

  # --- Metrics (evaluated independently; max desired wins) ---
  metrics:
    # 1. CPU utilization (% of requests)
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70     # Target 70% of CPU requests across all pods

    # 2. Memory — use AverageValue, not Utilization (memory doesn't release fast)
    - type: Resource
      resource:
        name: memory
        target:
          type: AverageValue
          averageValue: 800Mi        # Target 800Mi average per pod

    # 3. Per-container CPU (ContainerResource — GA 1.20)
    - type: ContainerResource
      containerResource:
        name: cpu
        container: proxy-sidecar    # Scale based on sidecar, not main container
        target:
          type: Utilization
          averageUtilization: 80

    # 4. Custom metric from Pods (per-pod, summed across replicas)
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "500"        # 500 RPS per pod target

    # 5. Custom metric from Object (single source, e.g., Ingress)
    - type: Object
      object:
        metric:
          name: ingress_requests_per_second
        describedObject:
          apiVersion: networking.k8s.io/v1
          kind: Ingress
          name: api-ingress
        target:
          type: Value
          value: "10000"            # Total 10k RPS on this Ingress

    # 6. External metric (cloud queue, external system)
    - type: External
      external:
        metric:
          name: sqs_messages_visible
          selector:
            matchLabels:
              queue: api-job-queue
        target:
          type: AverageValue
          averageValue: "30"        # 30 messages per worker pod

  # --- Scaling behavior ---
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0      # Scale up immediately (no dampening)
      policies:
        - type: Pods
          value: 4                        # Add at most 4 pods per period
          periodSeconds: 60
        - type: Percent
          value: 100                      # Or double replicas per period
          periodSeconds: 60
      selectPolicy: Max                   # Use whichever policy allows more pods

    scaleDown:
      stabilizationWindowSeconds: 300    # Wait 5 min of consistently low metrics
      policies:
        - type: Pods
          value: 2                        # Remove at most 2 pods per period
          periodSeconds: 60
        - type: Percent
          value: 10                       # Or 10% of replicas per period
          periodSeconds: 60
      selectPolicy: Min                   # Use whichever policy removes fewer pods

Metric Types in Detail

Resource Metrics

Resource metrics use the metrics.k8s.io API provided by metrics-server. Metrics-server scrapes kubelet summary APIs every 60 seconds and serves rolling averages.

Target type	Formula	Best for
`Utilization`	`currentCPU / requests.cpu × 100`	CPU — pods must have CPU requests set
`AverageValue`	Total metric value across pods / replica count	Memory, custom per-pod metrics
`Value`	Raw single value (Object/External only)	Queue depth, global counters

Memory autoscaling anti-pattern
Using Utilization target for memory is misleading: a JVM heap that is 90% allocated but not under GC pressure will trigger scale-up even if the application is healthy. Prefer AverageValue with a generous headroom above the working set. Also note that scaling down does not reclaim memory already allocated by JVM/Go runtime — the pod must be restarted. Scale-down stabilization window (300s default) is critical for memory-based scaling.

ContainerResource (GA 1.20)

When pods run multiple containers, Resource metrics aggregate all containers. ContainerResource targets a specific container, enabling independent scaling decisions:

- type: ContainerResource
  containerResource:
    name: memory
    container: app               # Only consider the app container's memory
    target:
      type: AverageValue
      averageValue: 512Mi

This is essential when a heavyweight sidecar (Istio Envoy, Datadog agent) consumes disproportionate resources — you don't want sidecar resource usage to drive scaling of the main application.

Pods Metric

The Pods metric type reads from custom.metrics.k8s.io and averages the named metric across all pods selected by the HPA's target. The Prometheus Adapter or a custom adapter must serve this API.

- type: Pods
  pods:
    metric:
      name: http_requests_per_second
      selector:                       # Optional: filter by label on the metric series
        matchLabels:
          route: /api/v2
    target:
      type: AverageValue
      averageValue: "1000"           # 1000 req/s per pod

Object Metric

The Object type reads a single metric value from a specific Kubernetes object. Common use: total request rate on an Ingress, queue depth on a Kafka topic CRD.

- type: Object
  object:
    metric:
      name: nginx_ingress_requests_per_second
    describedObject:
      apiVersion: networking.k8s.io/v1
      kind: Ingress
      name: frontend-ingress
    target:
      type: Value
      value: "5000"          # Total RPS on the Ingress drives replica count

External Metric

External metrics come from systems outside the cluster (cloud queues, monitoring systems). An adapter must bridge the external system to external.metrics.k8s.io.

- type: External
  external:
    metric:
      name: aws_sqs_approximate_number_of_messages_visible
      selector:
        matchLabels:
          queue_name: payment-jobs
    target:
      type: AverageValue
      averageValue: "10"     # 10 messages per worker pod ideal

Scaling Behavior

The behavior block controls how fast scaling happens, independently for scale-up and scale-down. Without this, the HPA scales up and down at full speed, which can cause flapping.

Scaling behavior policies — selectPolicy determines which applies: scaleUp policies (selectPolicy: Max → most aggressive) ┌────────────────────────────────────────┐ │ Policy A: +4 pods per 60s │ │ Policy B: +100% pods per 60s │ │ selectPolicy: Max → uses B if B > A │ └────────────────────────────────────────┘ scaleDown policies (selectPolicy: Min → most conservative) ┌────────────────────────────────────────┐ │ Policy A: -2 pods per 60s │ │ Policy B: -10% pods per 60s │ │ selectPolicy: Min → uses whichever │ │ removes fewer pods │ └────────────────────────────────────────┘ stabilizationWindowSeconds: └─ HPA tracks desiredReplicas over this window and takes the MAX (for scale-down) of all computed values → prevents flapping on transient metric spikes/dips

Field	Default (scale-up)	Default (scale-down)	Effect
`stabilizationWindowSeconds`	0	300	Seconds to look back; use max desired replicas seen in window
`selectPolicy`	`Max`	`Min`	Which policy to apply when multiple policies conflict
`policies[].type`	—	—	`Pods` (absolute) or `Percent` (relative)
`policies[].value`	—	—	Max change allowed per period
`policies[].periodSeconds`	—	—	Time window for the policy (max 1800s)

Disabling Scale-Down

behavior:
  scaleDown:
    selectPolicy: Disabled    # Never scale down (scale-up only HPA)

Useful for workloads where scale-down is disruptive (e.g., stateful-ish services with warm caches) or when you want manual control over scale-down.

Deployment spec.replicas and Server-Side Apply

A common misconfiguration: your CI pipeline applies the Deployment manifest on every deploy, overwriting spec.replicas back to the value in your Git repo (e.g., 3), undoing what HPA set (e.g., 15). This causes a momentary replica crash on every deployment.

Omit spec.replicas when using HPA
Remove spec.replicas from your Deployment manifest entirely (or set it only on first apply). Once HPA is managing replicas, it owns that field. If using Server-Side Apply (SSA), the HPA manager claims the replicas field; a subsequent kubectl apply from a different field manager will conflict. Use kubectl apply --server-side --force-conflicts only if you intentionally want to reclaim ownership — not on every deploy.

# Check which manager owns spec.replicas
kubectl get deployment api-server -o json | \
  jq '.metadata.managedFields[] | select(.fieldsV1."f:spec"."f:replicas" != null) | .manager'

# Correct SSA-based workflow: strip replicas from manifests
# In your Deployment YAML:
spec:
  # replicas: 3   ← DELETE THIS LINE; HPA owns it
  selector:
    matchLabels:
      app: api-server

HPA Status

kubectl get hpa api-server-hpa -n production
# NAME             REFERENCE              TARGETS         MINPODS  MAXPODS  REPLICAS
# api-server-hpa   Deployment/api-server  72%/70%, 800Mi  3        50       8

kubectl describe hpa api-server-hpa -n production

Status field	Meaning
`status.currentReplicas`	Replicas currently managed by the target
`status.desiredReplicas`	Replicas computed by the last HPA evaluation
`status.currentMetrics`	Last observed value for each metric source
`status.lastScaleTime`	Timestamp of last replica change
`status.conditions[].type: AbleToScale`	Can the HPA currently scale? (False if backoff active)
`status.conditions[].type: ScalingActive`	Is HPA actively watching metrics?
`status.conditions[].type: ScalingLimited`	Desired exceeds maxReplicas or violates behavior policy

# Watch HPA in real time
kubectl get hpa -n production -w

# Get conditions (why isn't it scaling?)
kubectl get hpa api-server-hpa -o jsonpath='{.status.conditions}' | jq .

# Events show recent scale decisions
kubectl describe hpa api-server-hpa | grep -A30 Events

Prometheus Adapter

To use application-level metrics (request rate, queue depth, error rate) as HPA targets, you need a metrics adapter that translates Prometheus queries into the custom.metrics.k8s.io API. The Prometheus Adapter is the most common open-source solution.

Prometheus Adapter architecture: Application pods → expose /metrics │ ▼ Prometheus scrapes /metrics every 15s │ ▼ Prometheus Adapter (runs as Deployment) ├─ Queries Prometheus for registered metric rules ├─ Serves custom.metrics.k8s.io/v1beta1 API └─ Registered as APIService in kube-aggregator │ ▼ HPA controller calls custom.metrics.k8s.io └─ Fetches e.g. http_requests_per_second{namespace="prod",pod=~"api-.*"}

# Prometheus Adapter ConfigMap — metric rules
apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: monitoring
data:
  config.yaml: |
    rules:
      # HTTP requests per second per pod
      - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^http_requests_total$"
          as: "http_requests_per_second"
        metricsQuery: |
          sum(rate(http_requests_total{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)

      # Queue depth per pod (custom application metric)
      - seriesQuery: 'worker_queue_depth{namespace!="",pod!=""}'
        resources:
          overrides:
            namespace: {resource: "namespace"}
            pod: {resource: "pod"}
        name:
          matches: "^worker_queue_depth$"
          as: "worker_queue_depth"
        metricsQuery: 'avg(worker_queue_depth{<<.LabelMatchers>>}) by (<<.GroupBy>>)'

      # Ingress RPS (Object metric — scoped to Ingress)
      - seriesQuery: 'nginx_ingress_controller_requests{namespace!="",ingress!=""}'
        resources:
          overrides:
            namespace: {resource: "namespace"}
            ingress: {group: "networking.k8s.io", resource: "ingress"}
        name:
          as: "nginx_ingress_requests_per_second"
        metricsQuery: |
          sum(rate(nginx_ingress_controller_requests{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)

# Verify custom metrics are available
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

# Check a specific metric
kubectl get --raw \
  "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" \
  | jq .

# List all registered custom metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq '.resources[].name'

KEDA — Kubernetes Event-Driven Autoscaling

KEDA extends HPA with a rich library of built-in scalers and adds scale-to-zero capability. It deploys a metrics adapter and manages HPA objects on your behalf via the ScaledObject CRD.

KEDA architecture: KEDA Operator (watches ScaledObjects) │ ├─ Creates/manages HPA for the target workload ├─ Runs scaler goroutines that poll external sources └─ Feeds values into custom.metrics.k8s.io External systems ──► KEDA scalers ──► HPA ──► Deployment replicas Scale-to-zero: KEDA manages replicas=0 directly (HPA minimum is 1; KEDA bypasses HPA for the 0↔1 transition)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-worker-scaler
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-worker

  # Replica bounds
  minReplicaCount: 0          # Scale to zero when idle
  maxReplicaCount: 100

  # Cooldown periods
  pollingInterval: 15         # Check trigger every 15s
  cooldownPeriod: 300         # Wait 300s before scaling to zero

  # Advanced scaling behavior (passed to managed HPA)
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 60

  triggers:
    # SQS queue depth
    - type: aws-sqs-queue
      authenticationRef:
        name: keda-sqs-auth
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/api-jobs
        queueLength: "10"           # 10 messages per worker pod
        awsRegion: us-east-1

    # Kafka consumer lag
    - type: kafka
      metadata:
        bootstrapServers: kafka-svc.platform:9092
        consumerGroup: api-worker-group
        topic: api-events
        lagThreshold: "50"          # 50 messages lag per pod

    # Prometheus metric
    - type: prometheus
      metadata:
        serverAddress: http://prometheus-svc.monitoring:9090
        metricName: http_requests_per_second
        threshold: "500"
        query: |
          sum(rate(http_requests_total{app="api-worker"}[2m]))

    # Cron-based pre-scaling
    - type: cron
      metadata:
        timezone: America/New_York
        start: "0 8 * * 1-5"        # Scale up at 8am weekdays
        end: "0 20 * * 1-5"         # Scale down at 8pm weekdays
        desiredReplicas: "10"        # Pre-scale to 10 during business hours

KEDA TriggerAuthentication

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-sqs-auth
  namespace: production
spec:
  podIdentity:
    provider: aws              # Use IRSA (IAM Roles for Service Accounts)
  # Or use secretTargetRef:
  # secretTargetRef:
  #   - parameter: awsAccessKeyID
  #     name: aws-credentials
  #     key: access-key-id
  #   - parameter: awsSecretAccessKey
  #     name: aws-credentials
  #     key: secret-access-key

Scale-to-Zero Considerations

Cold start latency on wake-up
When scaling from 0→1, the first request that arrives is queued by KEDA while the pod starts. For workloads with slow startup (JVM, ML model loading), this can mean 30–120 seconds of latency for the first request after idle. Mitigations: (1) set minReplicaCount: 1 during business hours using the cron trigger; (2) use fast-starting runtimes; (3) set readinessProbe so traffic is only sent after the pod is truly ready.

# Check KEDA ScaledObject status
kubectl get scaledobject api-worker-scaler -n production
kubectl describe scaledobject api-worker-scaler -n production

# View the HPA KEDA created
kubectl get hpa -n production -l "scaledobject.keda.sh/name=api-worker-scaler"

# Check KEDA operator logs
kubectl logs -n keda -l app=keda-operator --tail=50

HPA + VPA Interaction

HPA and VPA both modify pod-level resources but in different dimensions: HPA changes replica count, VPA changes container resource requests. Running both simultaneously on the same workload with the same metric (e.g., CPU) causes conflict:

Conflict scenario: VPA sees high CPU → recommends higher requests → restarts pods with new requests HPA sees high CPU utilization → scales out replicas VPA sees new higher requests → utilization drops → HPA scales back in VPA sees lower utilization → downsizes requests → cycle repeats Recommended combinations: ┌──────────────────────────────────────────────────────┐ │ HPA on CPU/memory + VPA in Off mode (advisor only) │ │ HPA on custom/ext + VPA on CPU/memory (no conflict)│ │ VPA only + no HPA │ └──────────────────────────────────────────────────────┘

The safest production pattern when you need both: use HPA for custom/external metrics (request rate, queue depth) and VPA for right-sizing CPU/memory requests. Set VPA.spec.updatePolicy.updateMode: "Off" on workloads where HPA is managing replicas based on CPU/memory.

Common HPA Patterns

Pattern: Request-Rate Scaling

# Scale based on total RPS via Ingress Object metric
metrics:
  - type: Object
    object:
      metric:
        name: nginx_ingress_requests_per_second
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: api-ingress
      target:
        type: Value
        value: "2000"        # 2000 RPS total → N replicas
# With 3 pods at 2000 RPS each, desiredReplicas = totalRPS / 2000
# e.g., 10000 RPS → ceil(10000/2000) = 5 pods

Pattern: Queue-Depth Scaling (without KEDA)

# Worker reads queue depth via Custom Metric (Prometheus Adapter)
metrics:
  - type: External
    external:
      metric:
        name: redis_list_length
        selector:
          matchLabels:
            list_name: job-queue
      target:
        type: AverageValue
        averageValue: "20"     # 20 items per worker pod
behavior:
  scaleDown:
    stabilizationWindowSeconds: 600  # Don't scale down while queue drains
    policies:
      - type: Pods
        value: 1
        periodSeconds: 120           # Remove 1 pod every 2 min max

Pattern: Conservative Scale-Down for Stateful-ish Services

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0
    policies:
      - type: Percent
        value: 50              # Scale up by 50% at most per period
        periodSeconds: 60
  scaleDown:
    stabilizationWindowSeconds: 900   # 15 min window before scaling down
    policies:
      - type: Percent
        value: 5               # Remove at most 5% of pods per period
        periodSeconds: 300     # Once every 5 minutes

Metrics

Metric	Labels	Use
`kube_horizontalpodautoscaler_status_current_replicas`	`hpa`, `namespace`	Current replica count managed by HPA
`kube_horizontalpodautoscaler_status_desired_replicas`	`hpa`, `namespace`	Desired replica count from last evaluation
`kube_horizontalpodautoscaler_spec_max_replicas`	`hpa`, `namespace`	Configured maxReplicas ceiling
`kube_horizontalpodautoscaler_spec_min_replicas`	`hpa`, `namespace`	Configured minReplicas floor
`kube_horizontalpodautoscaler_status_condition`	`condition`, `status`	HPA condition health (AbleToScale, ScalingActive, ScalingLimited)

Alerting Rules

groups:
  - name: hpa
    rules:
      # HPA at max replicas — may need to raise ceiling
      - alert: HPAAtMaxReplicas
        expr: |
          kube_horizontalpodautoscaler_status_current_replicas
          == kube_horizontalpodautoscaler_spec_max_replicas
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "HPA {{ $labels.namespace }}/{{ $labels.hpa }} at maxReplicas for 15m"
          description: "Consider raising maxReplicas or optimizing the workload"

      # HPA unable to scale (metric unavailable)
      - alert: HPAScalingInactive
        expr: |
          kube_horizontalpodautoscaler_status_condition{
            condition="ScalingActive",status="false"} == 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "HPA {{ $labels.hpa }} in {{ $labels.namespace }} is not scaling"

      # Desired replicas oscillating (flapping)
      - alert: HPAFlapping
        expr: |
          changes(kube_horizontalpodautoscaler_status_desired_replicas[30m]) > 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "HPA {{ $labels.hpa }} is flapping (>5 changes in 30m)"

      # HPA at minReplicas for extended period (possible over-provisioning)
      - alert: HPAAtMinReplicasLong
        expr: |
          kube_horizontalpodautoscaler_status_current_replicas
          == kube_horizontalpodautoscaler_spec_min_replicas
        for: 72h
        labels:
          severity: info
        annotations:
          summary: "HPA {{ $labels.hpa }} has been at minReplicas for 3 days — review minReplicas"

Runbooks

HPA Not Scaling Despite High Load

# 1. Check HPA conditions
kubectl describe hpa <name> -n <namespace> | grep -A5 Conditions

# 2. Verify metrics are available
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/<ns>/pods" | jq .
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

# 3. Check metrics-server is running
kubectl get deployment metrics-server -n kube-system

# 4. Verify pods have resource requests (required for Utilization target)
kubectl get pods -n <namespace> -o json | \
  jq '.items[].spec.containers[].resources.requests'

# 5. Check if at maxReplicas ceiling
kubectl get hpa <name> -o jsonpath='{.spec.maxReplicas} {.status.currentReplicas}'

HPA Flapping / Oscillating

# Check scale events
kubectl describe hpa <name> -n <namespace> | grep -A30 Events

# Increase stabilization window to reduce flapping
kubectl patch hpa <name> -n <namespace> --type=merge -p '{
  "spec": {
    "behavior": {
      "scaleDown": {"stabilizationWindowSeconds": 600},
      "scaleUp":   {"stabilizationWindowSeconds": 30}
    }
  }
}'

Custom Metrics Returning Errors

# Test custom metrics API directly
kubectl get --raw \
  "/apis/custom.metrics.k8s.io/v1beta1/namespaces/<ns>/pods/*/<metric-name>"

# Check Prometheus Adapter logs
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus-adapter --tail=100

# Verify Prometheus query returns data
curl -G prometheus-svc.monitoring:9090/api/v1/query \
  --data-urlencode 'query=sum(rate(http_requests_total[2m])) by (pod)'

HPA Spec.Replicas Conflict (GitOps Override)

# Check which field manager owns replicas
kubectl get deployment <name> -o json | \
  jq '.metadata.managedFields[] | {manager, fields: .fieldsV1."f:spec"."f:replicas"} |
      select(.fields != null)'

# Remove replicas from manifest (GitOps fix)
# In Deployment YAML, delete the spec.replicas line
# Then re-apply without spec.replicas

# Force-release ownership if needed (SSA)
kubectl apply --server-side --force-conflicts -f deployment.yaml

KEDA ScaledObject Not Scaling

# Check ScaledObject status
kubectl describe scaledobject <name> -n <namespace>

# Check KEDA operator logs
kubectl logs -n keda -l app=keda-operator --tail=100 | grep ERROR

# Verify trigger authentication
kubectl describe triggerauthentication <auth-name> -n <namespace>

# Check the HPA KEDA manages
kubectl get hpa -n <namespace> -l "scaledobject.keda.sh/name=<name>"
kubectl describe hpa -n <namespace> -l "scaledobject.keda.sh/name=<name>"

Best Practices

Always set CPU requests — HPA's Utilization target divides current CPU by requests.cpu. Without requests, utilization is undefined and HPA falls back to raw value or skips the metric.
Prefer custom/external metrics over CPU for latency-sensitive services — CPU utilization lags behind request rate spikes. A metric like http_requests_per_second reacts faster to traffic surges.
Remove spec.replicas from Deployment manifests managed by HPA — prevent GitOps pipelines from overriding the replica count on every deploy.
Set a meaningful minReplicas — minReplicas: 1 means a single point of failure during deployments. For HA, use at least 2; for critical paths, 3 (spread across zones).
Tune scale-down conservatively — default 300s stabilization window is often too short for services with warm caches or sticky connections. 600–900s is safer for stateful-ish services.
Use KEDA for scale-to-zero and event-driven workloads — native HPA minimum is 1 replica. KEDA handles the 0↔1 transition and provides richer trigger sources out of the box.
Test autoscaling in staging under realistic load — run load tests that ramp up and sustain traffic; verify the HPA correctly computes desired replicas, the behavior policies don't block needed scale-up, and scale-down doesn't happen during momentary lulls in the ramp.
Monitor ScalingLimited condition continuously — when HPA is constrained by maxReplicas under real traffic, it means your ceiling is too low or the workload needs right-sizing. Alert on this and review periodically.