Horizontal Pod Autoscaler
📋 Page Coverage Checklist
  • HPA v2 API (autoscaling/v2, GA 1.23): full metric type taxonomy
  • Control loop: metrics pipeline, sync period, desiredReplicas formula
  • Resource metric type: CPU and memory targeting with utilization vs average value
  • ContainerResource metric type: per-container CPU/memory targeting
  • Pods metric type: custom per-pod metrics (e.g., requests-per-second)
  • Object metric type: metrics from a single Kubernetes object
  • External metric type: metrics from outside the cluster (cloud queues, etc.)
  • Scaling behavior: scaleUp/scaleDown policies, stabilizationWindowSeconds, selectPolicy
  • Flap prevention: scale-down stabilization window (300s default)
  • Deployment spec.replicas ownership conflict with SSA
  • HPA status fields: currentReplicas, desiredReplicas, conditions, currentMetrics
  • Prometheus Adapter: custom.metrics.k8s.io API registration, SeriesQuery/MetricNameRules
  • KEDA: ScaledObject, external scalers (RabbitMQ, SQS, Kafka, Cron, Prometheus)
  • HPA + VPA conflict and resolution strategies
  • Scale-to-zero with KEDA; wakeup latency considerations
  • 5 metrics + 4 alerting rules + 5 runbooks + 8 best practices
  • Horizontal Pod Autoscaler

    Scale workload replicas automatically based on resource utilization and custom metrics

    autoscaling/v2 GA 1.23 Platform Engineer

    The Horizontal Pod Autoscaler (HPA) adjusts the replicas field of a scalable workload (Deployment, StatefulSet, ReplicaSet, or any resource implementing the scale subresource) in response to observed metrics. It operates as a control loop in kube-controller-manager, periodically comparing current metric values against targets and computing a new desired replica count. The autoscaling/v2 API (GA since 1.23) supports multiple metric sources, fine-grained scaling behavior, and per-container metrics.

    Control Loop & Metrics Pipeline

    HPA control loop (default sync period: 15s) ┌──────────────────────────────────────────────────────────────┐ │ HPA Controller (kube-controller-manager) │ │ │ │ 1. Fetch current metrics for each metric source: │ │ ├─ Resource metrics → metrics-server (metrics.k8s.io) │ │ ├─ Custom metrics → Prometheus Adapter │ │ │ (custom.metrics.k8s.io) │ │ └─ External metrics → Adapter (external.metrics.k8s.io)│ │ │ │ 2. For each metric source compute: │ │ desiredReplicas = ceil(currentReplicas │ │ × currentMetricValue / targetValue) │ │ │ │ 3. Take the MAXIMUM desired replicas across all metrics │ │ (any metric can drive scale-up) │ │ │ │ 4. Clamp to [minReplicas, maxReplicas] │ │ │ │ 5. Apply scaling behavior constraints │ │ (stabilization window, rate limits) │ │ │ │ 6. If desiredReplicas ≠ currentReplicas → update .replicas │ └──────────────────────────────────────────────────────────────┘ metrics-server aggregates kubelet /metrics/resource every 60s Prometheus Adapter bridges Prometheus → custom.metrics.k8s.io

    Replica Calculation Formula

    For a utilization target (percentage of requests):

    desiredReplicas = ceil(currentReplicas × (currentUtilization / targetUtilization))

    For an average value target:

    desiredReplicas = ceil(currentReplicas × (totalMetricValue / (targetAverageValue × currentReplicas)))
                    = ceil(totalMetricValue / targetAverageValue)

    Pods in non-Ready state (Pending, Terminating, or recently started) are excluded from metric averaging to avoid false scale-ups during rollouts.

    Missing metrics = optimistic default
    If metric data is unavailable for a pod (e.g., pod just started, metrics-server lag), that pod's metric is assumed to be at 100% of target for scale-up calculations and 0% for scale-down. This prevents premature scale-down during rollouts. If all pods' metrics are unavailable, the HPA takes no action (neither scales up nor down).

    HPA Spec — Full Reference

    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: api-server-hpa
      namespace: production
    spec:
      # --- Target workload ---
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: api-server
    
      # --- Replica bounds ---
      minReplicas: 3          # Never scale below this (default: 1)
      maxReplicas: 50         # Hard ceiling
    
      # --- Metrics (evaluated independently; max desired wins) ---
      metrics:
        # 1. CPU utilization (% of requests)
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 70     # Target 70% of CPU requests across all pods
    
        # 2. Memory — use AverageValue, not Utilization (memory doesn't release fast)
        - type: Resource
          resource:
            name: memory
            target:
              type: AverageValue
              averageValue: 800Mi        # Target 800Mi average per pod
    
        # 3. Per-container CPU (ContainerResource — GA 1.20)
        - type: ContainerResource
          containerResource:
            name: cpu
            container: proxy-sidecar    # Scale based on sidecar, not main container
            target:
              type: Utilization
              averageUtilization: 80
    
        # 4. Custom metric from Pods (per-pod, summed across replicas)
        - type: Pods
          pods:
            metric:
              name: http_requests_per_second
            target:
              type: AverageValue
              averageValue: "500"        # 500 RPS per pod target
    
        # 5. Custom metric from Object (single source, e.g., Ingress)
        - type: Object
          object:
            metric:
              name: ingress_requests_per_second
            describedObject:
              apiVersion: networking.k8s.io/v1
              kind: Ingress
              name: api-ingress
            target:
              type: Value
              value: "10000"            # Total 10k RPS on this Ingress
    
        # 6. External metric (cloud queue, external system)
        - type: External
          external:
            metric:
              name: sqs_messages_visible
              selector:
                matchLabels:
                  queue: api-job-queue
            target:
              type: AverageValue
              averageValue: "30"        # 30 messages per worker pod
    
      # --- Scaling behavior ---
      behavior:
        scaleUp:
          stabilizationWindowSeconds: 0      # Scale up immediately (no dampening)
          policies:
            - type: Pods
              value: 4                        # Add at most 4 pods per period
              periodSeconds: 60
            - type: Percent
              value: 100                      # Or double replicas per period
              periodSeconds: 60
          selectPolicy: Max                   # Use whichever policy allows more pods
    
        scaleDown:
          stabilizationWindowSeconds: 300    # Wait 5 min of consistently low metrics
          policies:
            - type: Pods
              value: 2                        # Remove at most 2 pods per period
              periodSeconds: 60
            - type: Percent
              value: 10                       # Or 10% of replicas per period
              periodSeconds: 60
          selectPolicy: Min                   # Use whichever policy removes fewer pods

    Metric Types in Detail

    Resource Metrics

    Resource metrics use the metrics.k8s.io API provided by metrics-server. Metrics-server scrapes kubelet summary APIs every 60 seconds and serves rolling averages.

    Target typeFormulaBest for
    Utilization currentCPU / requests.cpu × 100 CPU — pods must have CPU requests set
    AverageValue Total metric value across pods / replica count Memory, custom per-pod metrics
    Value Raw single value (Object/External only) Queue depth, global counters
    Memory autoscaling anti-pattern
    Using Utilization target for memory is misleading: a JVM heap that is 90% allocated but not under GC pressure will trigger scale-up even if the application is healthy. Prefer AverageValue with a generous headroom above the working set. Also note that scaling down does not reclaim memory already allocated by JVM/Go runtime — the pod must be restarted. Scale-down stabilization window (300s default) is critical for memory-based scaling.

    ContainerResource (GA 1.20)

    When pods run multiple containers, Resource metrics aggregate all containers. ContainerResource targets a specific container, enabling independent scaling decisions:

    - type: ContainerResource
      containerResource:
        name: memory
        container: app               # Only consider the app container's memory
        target:
          type: AverageValue
          averageValue: 512Mi

    This is essential when a heavyweight sidecar (Istio Envoy, Datadog agent) consumes disproportionate resources — you don't want sidecar resource usage to drive scaling of the main application.

    Pods Metric

    The Pods metric type reads from custom.metrics.k8s.io and averages the named metric across all pods selected by the HPA's target. The Prometheus Adapter or a custom adapter must serve this API.

    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
          selector:                       # Optional: filter by label on the metric series
            matchLabels:
              route: /api/v2
        target:
          type: AverageValue
          averageValue: "1000"           # 1000 req/s per pod

    Object Metric

    The Object type reads a single metric value from a specific Kubernetes object. Common use: total request rate on an Ingress, queue depth on a Kafka topic CRD.

    - type: Object
      object:
        metric:
          name: nginx_ingress_requests_per_second
        describedObject:
          apiVersion: networking.k8s.io/v1
          kind: Ingress
          name: frontend-ingress
        target:
          type: Value
          value: "5000"          # Total RPS on the Ingress drives replica count

    External Metric

    External metrics come from systems outside the cluster (cloud queues, monitoring systems). An adapter must bridge the external system to external.metrics.k8s.io.

    - type: External
      external:
        metric:
          name: aws_sqs_approximate_number_of_messages_visible
          selector:
            matchLabels:
              queue_name: payment-jobs
        target:
          type: AverageValue
          averageValue: "10"     # 10 messages per worker pod ideal

    Scaling Behavior

    The behavior block controls how fast scaling happens, independently for scale-up and scale-down. Without this, the HPA scales up and down at full speed, which can cause flapping.

    Scaling behavior policies — selectPolicy determines which applies: scaleUp policies (selectPolicy: Max → most aggressive) ┌────────────────────────────────────────┐ │ Policy A: +4 pods per 60s │ │ Policy B: +100% pods per 60s │ │ selectPolicy: Max → uses B if B > A │ └────────────────────────────────────────┘ scaleDown policies (selectPolicy: Min → most conservative) ┌────────────────────────────────────────┐ │ Policy A: -2 pods per 60s │ │ Policy B: -10% pods per 60s │ │ selectPolicy: Min → uses whichever │ │ removes fewer pods │ └────────────────────────────────────────┘ stabilizationWindowSeconds: └─ HPA tracks desiredReplicas over this window and takes the MAX (for scale-down) of all computed values → prevents flapping on transient metric spikes/dips
    FieldDefault (scale-up)Default (scale-down)Effect
    stabilizationWindowSeconds0300Seconds to look back; use max desired replicas seen in window
    selectPolicyMaxMinWhich policy to apply when multiple policies conflict
    policies[].typePods (absolute) or Percent (relative)
    policies[].valueMax change allowed per period
    policies[].periodSecondsTime window for the policy (max 1800s)

    Disabling Scale-Down

    behavior:
      scaleDown:
        selectPolicy: Disabled    # Never scale down (scale-up only HPA)

    Useful for workloads where scale-down is disruptive (e.g., stateful-ish services with warm caches) or when you want manual control over scale-down.

    Deployment spec.replicas and Server-Side Apply

    A common misconfiguration: your CI pipeline applies the Deployment manifest on every deploy, overwriting spec.replicas back to the value in your Git repo (e.g., 3), undoing what HPA set (e.g., 15). This causes a momentary replica crash on every deployment.

    Omit spec.replicas when using HPA
    Remove spec.replicas from your Deployment manifest entirely (or set it only on first apply). Once HPA is managing replicas, it owns that field. If using Server-Side Apply (SSA), the HPA manager claims the replicas field; a subsequent kubectl apply from a different field manager will conflict. Use kubectl apply --server-side --force-conflicts only if you intentionally want to reclaim ownership — not on every deploy.
    # Check which manager owns spec.replicas
    kubectl get deployment api-server -o json | \
      jq '.metadata.managedFields[] | select(.fieldsV1."f:spec"."f:replicas" != null) | .manager'
    
    # Correct SSA-based workflow: strip replicas from manifests
    # In your Deployment YAML:
    spec:
      # replicas: 3   ← DELETE THIS LINE; HPA owns it
      selector:
        matchLabels:
          app: api-server

    HPA Status

    kubectl get hpa api-server-hpa -n production
    # NAME             REFERENCE              TARGETS         MINPODS  MAXPODS  REPLICAS
    # api-server-hpa   Deployment/api-server  72%/70%, 800Mi  3        50       8
    
    kubectl describe hpa api-server-hpa -n production
    Status fieldMeaning
    status.currentReplicasReplicas currently managed by the target
    status.desiredReplicasReplicas computed by the last HPA evaluation
    status.currentMetricsLast observed value for each metric source
    status.lastScaleTimeTimestamp of last replica change
    status.conditions[].type: AbleToScaleCan the HPA currently scale? (False if backoff active)
    status.conditions[].type: ScalingActiveIs HPA actively watching metrics?
    status.conditions[].type: ScalingLimitedDesired exceeds maxReplicas or violates behavior policy
    # Watch HPA in real time
    kubectl get hpa -n production -w
    
    # Get conditions (why isn't it scaling?)
    kubectl get hpa api-server-hpa -o jsonpath='{.status.conditions}' | jq .
    
    # Events show recent scale decisions
    kubectl describe hpa api-server-hpa | grep -A30 Events

    Prometheus Adapter

    To use application-level metrics (request rate, queue depth, error rate) as HPA targets, you need a metrics adapter that translates Prometheus queries into the custom.metrics.k8s.io API. The Prometheus Adapter is the most common open-source solution.

    Prometheus Adapter architecture: Application pods → expose /metrics │ ▼ Prometheus scrapes /metrics every 15s │ ▼ Prometheus Adapter (runs as Deployment) ├─ Queries Prometheus for registered metric rules ├─ Serves custom.metrics.k8s.io/v1beta1 API └─ Registered as APIService in kube-aggregator │ ▼ HPA controller calls custom.metrics.k8s.io └─ Fetches e.g. http_requests_per_second{namespace="prod",pod=~"api-.*"}
    # Prometheus Adapter ConfigMap — metric rules
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: adapter-config
      namespace: monitoring
    data:
      config.yaml: |
        rules:
          # HTTP requests per second per pod
          - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
            resources:
              overrides:
                namespace: {resource: "namespace"}
                pod: {resource: "pod"}
            name:
              matches: "^http_requests_total$"
              as: "http_requests_per_second"
            metricsQuery: |
              sum(rate(http_requests_total{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)
    
          # Queue depth per pod (custom application metric)
          - seriesQuery: 'worker_queue_depth{namespace!="",pod!=""}'
            resources:
              overrides:
                namespace: {resource: "namespace"}
                pod: {resource: "pod"}
            name:
              matches: "^worker_queue_depth$"
              as: "worker_queue_depth"
            metricsQuery: 'avg(worker_queue_depth{<<.LabelMatchers>>}) by (<<.GroupBy>>)'
    
          # Ingress RPS (Object metric — scoped to Ingress)
          - seriesQuery: 'nginx_ingress_controller_requests{namespace!="",ingress!=""}'
            resources:
              overrides:
                namespace: {resource: "namespace"}
                ingress: {group: "networking.k8s.io", resource: "ingress"}
            name:
              as: "nginx_ingress_requests_per_second"
            metricsQuery: |
              sum(rate(nginx_ingress_controller_requests{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)
    # Verify custom metrics are available
    kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
    
    # Check a specific metric
    kubectl get --raw \
      "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" \
      | jq .
    
    # List all registered custom metrics
    kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq '.resources[].name'

    KEDA — Kubernetes Event-Driven Autoscaling

    KEDA extends HPA with a rich library of built-in scalers and adds scale-to-zero capability. It deploys a metrics adapter and manages HPA objects on your behalf via the ScaledObject CRD.

    KEDA architecture: KEDA Operator (watches ScaledObjects) │ ├─ Creates/manages HPA for the target workload ├─ Runs scaler goroutines that poll external sources └─ Feeds values into custom.metrics.k8s.io External systems ──► KEDA scalers ──► HPA ──► Deployment replicas Scale-to-zero: KEDA manages replicas=0 directly (HPA minimum is 1; KEDA bypasses HPA for the 0↔1 transition)
    apiVersion: keda.sh/v1alpha1
    kind: ScaledObject
    metadata:
      name: api-worker-scaler
      namespace: production
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: api-worker
    
      # Replica bounds
      minReplicaCount: 0          # Scale to zero when idle
      maxReplicaCount: 100
    
      # Cooldown periods
      pollingInterval: 15         # Check trigger every 15s
      cooldownPeriod: 300         # Wait 300s before scaling to zero
    
      # Advanced scaling behavior (passed to managed HPA)
      advanced:
        horizontalPodAutoscalerConfig:
          behavior:
            scaleDown:
              stabilizationWindowSeconds: 60
    
      triggers:
        # SQS queue depth
        - type: aws-sqs-queue
          authenticationRef:
            name: keda-sqs-auth
          metadata:
            queueURL: https://sqs.us-east-1.amazonaws.com/123456789/api-jobs
            queueLength: "10"           # 10 messages per worker pod
            awsRegion: us-east-1
    
        # Kafka consumer lag
        - type: kafka
          metadata:
            bootstrapServers: kafka-svc.platform:9092
            consumerGroup: api-worker-group
            topic: api-events
            lagThreshold: "50"          # 50 messages lag per pod
    
        # Prometheus metric
        - type: prometheus
          metadata:
            serverAddress: http://prometheus-svc.monitoring:9090
            metricName: http_requests_per_second
            threshold: "500"
            query: |
              sum(rate(http_requests_total{app="api-worker"}[2m]))
    
        # Cron-based pre-scaling
        - type: cron
          metadata:
            timezone: America/New_York
            start: "0 8 * * 1-5"        # Scale up at 8am weekdays
            end: "0 20 * * 1-5"         # Scale down at 8pm weekdays
            desiredReplicas: "10"        # Pre-scale to 10 during business hours

    KEDA TriggerAuthentication

    apiVersion: keda.sh/v1alpha1
    kind: TriggerAuthentication
    metadata:
      name: keda-sqs-auth
      namespace: production
    spec:
      podIdentity:
        provider: aws              # Use IRSA (IAM Roles for Service Accounts)
      # Or use secretTargetRef:
      # secretTargetRef:
      #   - parameter: awsAccessKeyID
      #     name: aws-credentials
      #     key: access-key-id
      #   - parameter: awsSecretAccessKey
      #     name: aws-credentials
      #     key: secret-access-key

    Scale-to-Zero Considerations

    Cold start latency on wake-up
    When scaling from 0→1, the first request that arrives is queued by KEDA while the pod starts. For workloads with slow startup (JVM, ML model loading), this can mean 30–120 seconds of latency for the first request after idle. Mitigations: (1) set minReplicaCount: 1 during business hours using the cron trigger; (2) use fast-starting runtimes; (3) set readinessProbe so traffic is only sent after the pod is truly ready.
    # Check KEDA ScaledObject status
    kubectl get scaledobject api-worker-scaler -n production
    kubectl describe scaledobject api-worker-scaler -n production
    
    # View the HPA KEDA created
    kubectl get hpa -n production -l "scaledobject.keda.sh/name=api-worker-scaler"
    
    # Check KEDA operator logs
    kubectl logs -n keda -l app=keda-operator --tail=50

    HPA + VPA Interaction

    HPA and VPA both modify pod-level resources but in different dimensions: HPA changes replica count, VPA changes container resource requests. Running both simultaneously on the same workload with the same metric (e.g., CPU) causes conflict:

    Conflict scenario: VPA sees high CPU → recommends higher requests → restarts pods with new requests HPA sees high CPU utilization → scales out replicas VPA sees new higher requests → utilization drops → HPA scales back in VPA sees lower utilization → downsizes requests → cycle repeats Recommended combinations: ┌──────────────────────────────────────────────────────┐ │ HPA on CPU/memory + VPA in Off mode (advisor only) │ │ HPA on custom/ext + VPA on CPU/memory (no conflict)│ │ VPA only + no HPA │ └──────────────────────────────────────────────────────┘

    The safest production pattern when you need both: use HPA for custom/external metrics (request rate, queue depth) and VPA for right-sizing CPU/memory requests. Set VPA.spec.updatePolicy.updateMode: "Off" on workloads where HPA is managing replicas based on CPU/memory.

    Common HPA Patterns

    Pattern: Request-Rate Scaling

    # Scale based on total RPS via Ingress Object metric
    metrics:
      - type: Object
        object:
          metric:
            name: nginx_ingress_requests_per_second
          describedObject:
            apiVersion: networking.k8s.io/v1
            kind: Ingress
            name: api-ingress
          target:
            type: Value
            value: "2000"        # 2000 RPS total → N replicas
    # With 3 pods at 2000 RPS each, desiredReplicas = totalRPS / 2000
    # e.g., 10000 RPS → ceil(10000/2000) = 5 pods

    Pattern: Queue-Depth Scaling (without KEDA)

    # Worker reads queue depth via Custom Metric (Prometheus Adapter)
    metrics:
      - type: External
        external:
          metric:
            name: redis_list_length
            selector:
              matchLabels:
                list_name: job-queue
          target:
            type: AverageValue
            averageValue: "20"     # 20 items per worker pod
    behavior:
      scaleDown:
        stabilizationWindowSeconds: 600  # Don't scale down while queue drains
        policies:
          - type: Pods
            value: 1
            periodSeconds: 120           # Remove 1 pod every 2 min max

    Pattern: Conservative Scale-Down for Stateful-ish Services

    behavior:
      scaleUp:
        stabilizationWindowSeconds: 0
        policies:
          - type: Percent
            value: 50              # Scale up by 50% at most per period
            periodSeconds: 60
      scaleDown:
        stabilizationWindowSeconds: 900   # 15 min window before scaling down
        policies:
          - type: Percent
            value: 5               # Remove at most 5% of pods per period
            periodSeconds: 300     # Once every 5 minutes

    Metrics

    MetricLabelsUse
    kube_horizontalpodautoscaler_status_current_replicashpa, namespaceCurrent replica count managed by HPA
    kube_horizontalpodautoscaler_status_desired_replicashpa, namespaceDesired replica count from last evaluation
    kube_horizontalpodautoscaler_spec_max_replicashpa, namespaceConfigured maxReplicas ceiling
    kube_horizontalpodautoscaler_spec_min_replicashpa, namespaceConfigured minReplicas floor
    kube_horizontalpodautoscaler_status_conditioncondition, statusHPA condition health (AbleToScale, ScalingActive, ScalingLimited)

    Alerting Rules

    groups:
      - name: hpa
        rules:
          # HPA at max replicas — may need to raise ceiling
          - alert: HPAAtMaxReplicas
            expr: |
              kube_horizontalpodautoscaler_status_current_replicas
              == kube_horizontalpodautoscaler_spec_max_replicas
            for: 15m
            labels:
              severity: warning
            annotations:
              summary: "HPA {{ $labels.namespace }}/{{ $labels.hpa }} at maxReplicas for 15m"
              description: "Consider raising maxReplicas or optimizing the workload"
    
          # HPA unable to scale (metric unavailable)
          - alert: HPAScalingInactive
            expr: |
              kube_horizontalpodautoscaler_status_condition{
                condition="ScalingActive",status="false"} == 1
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "HPA {{ $labels.hpa }} in {{ $labels.namespace }} is not scaling"
    
          # Desired replicas oscillating (flapping)
          - alert: HPAFlapping
            expr: |
              changes(kube_horizontalpodautoscaler_status_desired_replicas[30m]) > 5
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "HPA {{ $labels.hpa }} is flapping (>5 changes in 30m)"
    
          # HPA at minReplicas for extended period (possible over-provisioning)
          - alert: HPAAtMinReplicasLong
            expr: |
              kube_horizontalpodautoscaler_status_current_replicas
              == kube_horizontalpodautoscaler_spec_min_replicas
            for: 72h
            labels:
              severity: info
            annotations:
              summary: "HPA {{ $labels.hpa }} has been at minReplicas for 3 days — review minReplicas"

    Runbooks

    HPA Not Scaling Despite High Load

    # 1. Check HPA conditions
    kubectl describe hpa <name> -n <namespace> | grep -A5 Conditions
    
    # 2. Verify metrics are available
    kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/<ns>/pods" | jq .
    kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
    
    # 3. Check metrics-server is running
    kubectl get deployment metrics-server -n kube-system
    
    # 4. Verify pods have resource requests (required for Utilization target)
    kubectl get pods -n <namespace> -o json | \
      jq '.items[].spec.containers[].resources.requests'
    
    # 5. Check if at maxReplicas ceiling
    kubectl get hpa <name> -o jsonpath='{.spec.maxReplicas} {.status.currentReplicas}'

    HPA Flapping / Oscillating

    # Check scale events
    kubectl describe hpa <name> -n <namespace> | grep -A30 Events
    
    # Increase stabilization window to reduce flapping
    kubectl patch hpa <name> -n <namespace> --type=merge -p '{
      "spec": {
        "behavior": {
          "scaleDown": {"stabilizationWindowSeconds": 600},
          "scaleUp":   {"stabilizationWindowSeconds": 30}
        }
      }
    }'

    Custom Metrics Returning Errors

    # Test custom metrics API directly
    kubectl get --raw \
      "/apis/custom.metrics.k8s.io/v1beta1/namespaces/<ns>/pods/*/<metric-name>"
    
    # Check Prometheus Adapter logs
    kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus-adapter --tail=100
    
    # Verify Prometheus query returns data
    curl -G prometheus-svc.monitoring:9090/api/v1/query \
      --data-urlencode 'query=sum(rate(http_requests_total[2m])) by (pod)'

    HPA Spec.Replicas Conflict (GitOps Override)

    # Check which field manager owns replicas
    kubectl get deployment <name> -o json | \
      jq '.metadata.managedFields[] | {manager, fields: .fieldsV1."f:spec"."f:replicas"} |
          select(.fields != null)'
    
    # Remove replicas from manifest (GitOps fix)
    # In Deployment YAML, delete the spec.replicas line
    # Then re-apply without spec.replicas
    
    # Force-release ownership if needed (SSA)
    kubectl apply --server-side --force-conflicts -f deployment.yaml

    KEDA ScaledObject Not Scaling

    # Check ScaledObject status
    kubectl describe scaledobject <name> -n <namespace>
    
    # Check KEDA operator logs
    kubectl logs -n keda -l app=keda-operator --tail=100 | grep ERROR
    
    # Verify trigger authentication
    kubectl describe triggerauthentication <auth-name> -n <namespace>
    
    # Check the HPA KEDA manages
    kubectl get hpa -n <namespace> -l "scaledobject.keda.sh/name=<name>"
    kubectl describe hpa -n <namespace> -l "scaledobject.keda.sh/name=<name>"

    Best Practices

    1. Always set CPU requests — HPA's Utilization target divides current CPU by requests.cpu. Without requests, utilization is undefined and HPA falls back to raw value or skips the metric.
    2. Prefer custom/external metrics over CPU for latency-sensitive services — CPU utilization lags behind request rate spikes. A metric like http_requests_per_second reacts faster to traffic surges.
    3. Remove spec.replicas from Deployment manifests managed by HPA — prevent GitOps pipelines from overriding the replica count on every deploy.
    4. Set a meaningful minReplicasminReplicas: 1 means a single point of failure during deployments. For HA, use at least 2; for critical paths, 3 (spread across zones).
    5. Tune scale-down conservatively — default 300s stabilization window is often too short for services with warm caches or sticky connections. 600–900s is safer for stateful-ish services.
    6. Use KEDA for scale-to-zero and event-driven workloads — native HPA minimum is 1 replica. KEDA handles the 0↔1 transition and provides richer trigger sources out of the box.
    7. Test autoscaling in staging under realistic load — run load tests that ramp up and sustain traffic; verify the HPA correctly computes desired replicas, the behavior policies don't block needed scale-up, and scale-down doesn't happen during momentary lulls in the ramp.
    8. Monitor ScalingLimited condition continuously — when HPA is constrained by maxReplicas under real traffic, it means your ceiling is too low or the workload needs right-sizing. Alert on this and review periodically.