Vertical Pod Autoscaler
📋 Page Coverage Checklist
  • VPA architecture: Recommender, Updater, Admission Controller components
  • VPA object spec: updateMode (Off/Initial/Recreate/Auto)
  • containerPolicies: minAllowed, maxAllowed, controlledResources, controlledValues
  • Recommender algorithm: OOM history, CPU p90, memory p95, safety margins
  • Updater eviction: PDB respect, evictionRequirements, minimum replicas
  • In-Place Pod Vertical Scaling (1.27 alpha → 1.33 beta): resize subresource, resizePolicy
  • Limit/request ratio preservation and controlledValues: RequestsOnly
  • VPA + HPA: conflict matrix and 3 safe combination patterns
  • Goldilocks: namespace-mode VPA advisor, dashboard, Helm install
  • VPA for JVM workloads: heap sizing interaction, -XX:MaxRAMPercentage
  • VPA for batch/Jobs: Initial mode use case
  • Resource recommendation reading: kubectl get vpa, status.recommendation
  • 5 metrics + 4 alerting rules + 5 runbooks + 8 best practices
  • Vertical Pod Autoscaler

    Right-size container resource requests automatically using historical usage data

    autoscaling.k8s.io/v1 Add-on (not built-in) Platform Engineer

    While HPA adds or removes pod replicas, the Vertical Pod Autoscaler (VPA) adjusts the resource requests and limits of existing containers. It observes actual resource usage over time, builds statistical recommendations, and optionally applies them — either at pod creation time or by evicting and replacing running pods with updated resource specs. VPA is an add-on installed separately from core Kubernetes; it is not part of the default control plane.

    VPA is not built in
    Install VPA from the kubernetes/autoscaler repo. It deploys three components into the cluster. Managed Kubernetes offerings (GKE, EKS, AKS) offer VPA as a managed add-on.

    VPA Architecture

    VPA components and data flow: ┌─────────────────────────────────────────────────────────────┐ │ │ │ [1] VPA Recommender │ │ ├─ Watches all pods in cluster │ │ ├─ Reads metrics from metrics-server (CPU/mem usage) │ │ ├─ Reads OOMKill events from pod events │ │ ├─ Builds per-container histogram of CPU/mem usage │ │ └─ Writes recommendations → VPA object .status │ │ │ │ [2] VPA Updater │ │ ├─ Watches VPA objects with updateMode != Off │ │ ├─ Compares current pod requests to recommendations │ │ ├─ Evicts pods that are significantly out of range │ │ └─ Respects PodDisruptionBudgets during eviction │ │ │ │ [3] VPA Admission Controller (webhook) │ │ ├─ Intercepts pod CREATE requests │ │ ├─ Looks up VPA recommendation for matching VPA obj │ │ └─ Mutates resource requests/limits before pod lands │ │ │ └─────────────────────────────────────────────────────────────┘ Data flow: metrics-server ──► Recommender ──► VPA.status.recommendation │ ┌───────────────────┤ ▼ ▼ Updater Admission Controller (evict pods) (mutate new pods)

    The three components are decoupled — you can run Recommender alone (for advisory-only mode) without Updater or Admission Controller.

    VPA Object Spec

    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: api-server-vpa
      namespace: production
    spec:
      # --- Target workload ---
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: api-server
    
      # --- Update policy ---
      updatePolicy:
        updateMode: Auto          # Off | Initial | Recreate | Auto
        minReplicas: 2            # Don't evict if fewer than 2 replicas running
    
      # --- Container-level policies ---
      resourcePolicy:
        containerPolicies:
          - containerName: app
            # Recommendation boundaries
            minAllowed:
              cpu: 100m
              memory: 128Mi
            maxAllowed:
              cpu: 4
              memory: 8Gi
            # Which resources to manage
            controlledResources: ["cpu", "memory"]
            # Whether to also set limits
            controlledValues: RequestsAndLimits  # or RequestsOnly
    
          - containerName: proxy-sidecar
            # Exclude sidecar from VPA management entirely
            mode: "Off"
    
          - containerName: init-container
            # Initial containers: only set requests (limits rarely matter)
            controlledValues: RequestsOnly
            minAllowed:
              cpu: 50m
              memory: 64Mi
            maxAllowed:
              cpu: 500m
              memory: 512Mi

    Update Modes

    ModeBehaviorPod restart?Best for
    Off Recommendations computed and stored in VPA status; nothing applied Never Advisory only; Goldilocks; manual right-sizing workflow
    Initial Recommendations applied only to new pods at creation time (via Admission Controller); running pods untouched Only on natural restarts / rollouts Jobs, batch workloads; any workload where live eviction is unacceptable
    Recreate Recommendations applied to new pods AND Updater evicts out-of-range running pods Yes — pods are evicted and recreated Workloads that can tolerate occasional restarts; non-production
    Auto Same as Recreate today; in future will use in-place updates when available and safe Yes (currently same as Recreate) Recommended default when in-place updates stabilize
    Auto mode evicts pods today
    Despite the name, Auto mode currently behaves identically to Recreate — it evicts pods to apply new resource specs. In-place pod resize (1.33 beta) will eventually make Auto truly non-disruptive for CPU changes, but memory changes still require a restart. Always set minReplicas and respect PDBs to limit eviction blast radius.

    Recommender Algorithm

    The Recommender builds a histogram of CPU and memory usage samples for each container, then derives a target value with safety margins applied.

    Recommender histogram-based approach: CPU recommendation: ├─ Collects 1-minute CPU usage samples over lookback window (default: 8 days) ├─ Builds weighted histogram (recent samples weighted higher) ├─ Target = p90 of histogram × safety margin (default: 1.15 = 15% headroom) └─ Lower bound = p50; Upper bound = p95 × safety margin Memory recommendation: ├─ Collects memory usage samples (RSS + page cache working set) ├─ OOM events → spike to current usage × OOM bump factor ├─ Target = p95 of histogram × safety margin (default: 1.15) └─ Upper bound = p99 × safety margin × OOM bump Safety margins (configurable via flags): --recommendation-margin-fraction=0.15 (15% overhead) --pod-recommendation-min-cpu-millicores=25 --pod-recommendation-min-memory-mb=250

    The Recommender requires at least a few hours of data before producing reliable recommendations. Fresh workloads or those with spiky/seasonal patterns may receive under-fitted recommendations. The default lookback is 8 days — adjust with --history-length flag.

    OOM Kill Handling

    When a container is OOM-killed, the Recommender registers this as a "spike event" in the memory histogram. The spike is set to the memory limit at time of OOM multiplied by an OOM bump factor (default: 1.2). This prevents the recommender from re-recommending a limit that already caused an OOM.

    # Check if VPA has OOM events in its recommendation
    kubectl get vpa api-server-vpa -n production -o yaml | \
      yq '.status.recommendation.containerRecommendations[] |
          select(.containerName == "app") |
          {"lowerBound": .lowerBound, "target": .target, "upperBound": .upperBound}'

    Reading VPA Recommendations

    # Summary view
    kubectl get vpa -n production
    # NAME             MODE   CPU    MEM      PROVIDED   AGE
    # api-server-vpa   Auto   220m   512Mi    True       3d
    
    # Full recommendation details
    kubectl describe vpa api-server-vpa -n production
    # VPA status.recommendation structure
    status:
      recommendation:
        containerRecommendations:
          - containerName: app
            lowerBound:              # Safe minimum (rarely go below this)
              cpu: 100m
              memory: 256Mi
            target:                  # Recommended value — apply this
              cpu: 220m
              memory: 512Mi
            uncappedTarget:          # What VPA would recommend without minAllowed/maxAllowed
              cpu: 195m
              memory: 480Mi
            upperBound:              # Should not exceed this (headroom for spikes)
              cpu: 1200m
              memory: 2Gi
      conditions:
        - type: RecommendationProvided
          status: "True"             # False if insufficient data
        - type: LowConfidence
          status: "False"            # True if < 1 hour of data
    lowerBound vs target vs upperBound
    target is the primary recommendation — this is what gets applied. lowerBound is the safe minimum; running below it risks OOM or throttling. upperBound is the maximum the VPA considers safe; setting requests above it wastes capacity. The Updater only evicts if current requests are outside the [lowerBound, upperBound] range.

    Updater Eviction Logic

    The Updater runs periodically and checks whether running pods' resource requests fall within the [lowerBound, upperBound] window. Pods outside this range are candidates for eviction.

    Updater eviction decision tree: For each pod in target workload: │ ├─ Is updateMode Off or Initial? → skip ├─ Is pod in a VPA-excluded namespace? → skip ├─ Is current request within [lowerBound, upperBound]? → skip ├─ Would eviction violate PDB? → skip (try later) ├─ Is replica count ≤ minReplicas? → skip └─ Evict pod → Admission Controller applies recommendation on reschedule
    Updater can evict all pods of a single-replica Deployment
    Without minReplicas: 2 in the VPA spec, the Updater will evict the sole pod of a single-replica Deployment to apply new resource specs — causing a complete outage until the replacement pod starts. Always set spec.updatePolicy.minReplicas to at least 2 for production workloads, and ensure a matching PDB exists.
    # PDB to protect against VPA evictions
    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: api-server-pdb
      namespace: production
    spec:
      minAvailable: 2
      selector:
        matchLabels:
          app: api-server
    ---
    # VPA minReplicas guard
    spec:
      updatePolicy:
        updateMode: Auto
        minReplicas: 2      # VPA will not evict if fewer than 2 replicas are running

    evictionRequirements (1.25+)

    spec:
      updatePolicy:
        updateMode: Auto
        evictionRequirements:
          # Only evict pods during low-traffic hours
          - resources: ["cpu", "memory"]
            changeRequirement: TargetHigherThanRequests  # Only evict to increase resources
          # Alternative: evict only when recommendation changes by > 20%
          # (controlled via --eviction-tolerance flag on Updater, not per-VPA)

    controlledValues: RequestsOnly vs RequestsAndLimits

    controlledValuesEffect on requestsEffect on limitsUse case
    RequestsAndLimits (default) Set to recommendation target Scaled proportionally: limit = request × (original limit / original request) Most workloads — preserves the original limit/request ratio
    RequestsOnly Set to recommendation target Unchanged (kept at original value or removed if none) When limits are intentionally higher (burst allowance); or no limits set
    Limit/request ratio drift with RequestsAndLimits
    If a container originally had requests.cpu: 100m, limits.cpu: 1000m (10× ratio), VPA will maintain this ratio. If VPA recommends 500m, it sets limits.cpu: 5000m — far more than needed. Consider RequestsOnly and manage limits separately via LimitRange defaults.

    In-Place Pod Vertical Scaling (1.33 Beta)

    Traditionally, changing a pod's resource requests requires restarting the pod (evict + recreate). In-place pod resize, stabilized as beta in 1.33, allows CPU requests/limits to be adjusted without a pod restart for containers that support it.

    In-place resize flow (CPU — no restart needed): 1. Update pod.spec.containers[i].resources.requests.cpu 2. kubelet receives the update via the resize subresource 3. kubelet adjusts the container's cgroup cpu.shares / cpu.quota 4. Pod continues running — no eviction, no downtime Memory resize — restart required: Reducing memory limit below current usage → OOM risk Kernel cannot shrink RSS of a running process safely → kubelet marks resize as "InProgress" then "Infeasible" → VPA falls back to eviction for memory changes
    # resizePolicy on container spec (in-place resize control)
    spec:
      containers:
        - name: app
          resources:
            requests:
              cpu: 500m
              memory: 512Mi
            limits:
              cpu: 2
              memory: 1Gi
          resizePolicy:
            - resourceName: cpu
              restartPolicy: NotRequired    # CPU resize without restart
            - resourceName: memory
              restartPolicy: RestartContainer  # Memory resize requires restart
    # Trigger an in-place resize manually
    kubectl patch pod api-server-abc123 --subresource=resize \
      --type=merge -p '{"spec":{"containers":[{"name":"app","resources":{"requests":{"cpu":"800m"}}}]}}'
    
    # Check resize status
    kubectl get pod api-server-abc123 -o jsonpath='{.status.resize}'
    # Possible values: Proposed | InProgress | Deferred | Infeasible
    Resize statusMeaning
    ProposedResize requested, kubelet hasn't processed yet
    InProgresskubelet is applying the change
    DeferredNot enough node resources now; will retry when available
    InfeasibleCannot be applied (e.g., memory reduction below RSS); requires pod restart

    VPA + HPA Interaction

    See also: HPA page — VPA interaction section. The key conflict: both HPA (on CPU utilization) and VPA change the effective CPU utilization ratio — VPA by changing requests, HPA by changing replicas. This creates oscillation.

    ConfigurationConflict?Safe?
    HPA on CPU/memory + VPA AutoYes — feedback loopNo
    HPA on CPU/memory + VPA Off (advisor only)NoneYes — use VPA recommendations manually
    HPA on custom/external metrics + VPA Auto on CPU/memoryNone — different signalsYes — recommended pattern
    VPA only (no HPA) on CPU/memoryNoneYes — for workloads where replica count is fixed
    HPA on CPU + VPA RequestsOnly on memory onlyMarginal — VPA changes memory, HPA watches CPUUsually safe with monitoring
    # Safe pattern: HPA on RPS, VPA on CPU+memory
    # VPA (manages CPU and memory requests)
    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: api-server-vpa
    spec:
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: api-server
      updatePolicy:
        updateMode: Auto
      resourcePolicy:
        containerPolicies:
          - containerName: app
            controlledResources: ["cpu", "memory"]
    ---
    # HPA (manages replicas based on RPS, not CPU)
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: api-server-hpa
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: api-server
      minReplicas: 3
      maxReplicas: 50
      metrics:
        - type: Pods
          pods:
            metric:
              name: http_requests_per_second   # Custom metric — no conflict with VPA
            target:
              type: AverageValue
              averageValue: "500"

    Goldilocks — VPA Advisor Dashboard

    Goldilocks (by Fairwinds) automates the Off-mode VPA workflow: it creates a VPA object in Off mode for every Deployment in labeled namespaces, then provides a dashboard showing current vs recommended resource requests and the estimated cost difference.

    Goldilocks workflow: 1. Label namespace: kubectl label ns production goldilocks.fairwinds.com/enabled=true 2. Goldilocks controller creates VPA (mode: Off) for every Deployment in namespace 3. VPA Recommender populates recommendations in each VPA status 4. Goldilocks dashboard reads VPA statuses and renders: ├─ Current requests vs recommendation ├─ Estimated monthly cost at current vs recommended └─ Copy-paste YAML snippet for the new resource block
    # Install Goldilocks via Helm
    helm repo add fairwinds-stable https://charts.fairwinds.com/stable
    helm upgrade --install goldilocks fairwinds-stable/goldilocks \
      --namespace goldilocks --create-namespace
    
    # Enable for a namespace
    kubectl label ns production goldilocks.fairwinds.com/enabled=true
    
    # Port-forward the dashboard
    kubectl port-forward -n goldilocks svc/goldilocks-dashboard 8080:80
    
    # List VPAs Goldilocks created
    kubectl get vpa -n production -l "app.kubernetes.io/managed-by=goldilocks"

    VPA for JVM Workloads

    JVM applications manage their own heap independently of the container's memory limit. VPA observes the container's total RSS, not just heap — including metaspace, thread stacks, off-heap buffers (Netty, native libs). This can lead to confusing recommendations.

    JVM heap and VPA interaction
    If the JVM is configured with a fixed heap (-Xmx2g) and the container limit is 3Gi, VPA observes ~2.5Gi RSS and may recommend lowering the limit below the current RSS — triggering an OOM kill. Use -XX:MaxRAMPercentage=75 instead of -Xmx so the JVM heap automatically scales with the container limit VPA sets.
    # JVM container — use percentage-based heap, not fixed -Xmx
    containers:
      - name: java-app
        image: myapp:v2
        env:
          - name: JAVA_TOOL_OPTIONS
            value: >-
              -XX:MaxRAMPercentage=75
              -XX:InitialRAMPercentage=50
              -XX:+UseG1GC
              -XX:MaxGCPauseMillis=200
        resources:
          requests:
            cpu: 500m
            memory: 1Gi   # VPA will adjust this; JVM heap = 75% of limit
          limits:
            cpu: 2
            memory: 2Gi
    # VPA for JVM: wider bounds to account for metaspace variability
    spec:
      resourcePolicy:
        containerPolicies:
          - containerName: java-app
            minAllowed:
              cpu: 200m
              memory: 512Mi     # Absolute floor — JVM won't start below ~256Mi
            maxAllowed:
              cpu: 8
              memory: 16Gi
            controlledValues: RequestsAndLimits  # Scale limit with request (preserves ratio)

    VPA for Batch Jobs (Initial Mode)

    For batch Jobs where pods are short-lived, updateMode: Initial is ideal: VPA applies the recommendation when the pod is created (at the start of each job run) but never evicts running pods mid-job.

    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: batch-etl-vpa
      namespace: platform
    spec:
      targetRef:
        apiVersion: batch/v1
        kind: CronJob
        name: nightly-etl
      updatePolicy:
        updateMode: Initial   # Apply on pod start, never evict running pod
      resourcePolicy:
        containerPolicies:
          - containerName: etl-worker
            minAllowed:
              cpu: 100m
              memory: 256Mi
            maxAllowed:
              cpu: 16
              memory: 32Gi    # Generous upper bound for large datasets
            controlledResources: ["cpu", "memory"]

    Operational Commands

    # List all VPAs and their modes
    kubectl get vpa -A -o custom-columns=\
    'NAMESPACE:.metadata.namespace,NAME:.metadata.name,MODE:.spec.updatePolicy.updateMode,READY:.status.conditions[0].status'
    
    # Get full recommendation for a VPA
    kubectl get vpa <name> -n <namespace> -o jsonpath=\
    '{range .status.recommendation.containerRecommendations[*]}{.containerName}{"\n"}\
      target: {.target}{"\n"}\
      lowerBound: {.lowerBound}{"\n"}\
      upperBound: {.upperBound}{"\n\n"}{end}'
    
    # Watch VPA conditions
    kubectl get vpa <name> -n <namespace> -o jsonpath='{.status.conditions}' | jq .
    
    # Check VPA Recommender logs (useful for diagnosing no-recommendation)
    kubectl logs -n kube-system -l app=vpa-recommender --tail=100
    
    # Check VPA Updater logs (see which pods were evicted and why)
    kubectl logs -n kube-system -l app=vpa-updater --tail=100
    
    # Check VPA Admission Controller logs
    kubectl logs -n kube-system -l app=vpa-admission-controller --tail=100
    
    # Temporarily disable VPA updates (flip to Off without deleting)
    kubectl patch vpa <name> -n <namespace> --type=merge \
      -p '{"spec":{"updatePolicy":{"updateMode":"Off"}}}'

    Metrics

    MetricLabelsUse
    vpa_recommender_recommendation_latency_secondsnamespace, vpaTime to generate a recommendation
    vpa_updater_evictions_totalnamespaceTotal evictions triggered by VPA Updater
    vpa_admission_controller_admission_duration_secondsLatency of VPA webhook on pod creation
    vpa_recommender_memory_estimation_qualitynamespace, vpaConfidence score of memory recommendations
    kube_verticalpodautoscaler_status_recommendation_containerrecommendations_targetcontainer, resourceCurrent target recommendation per container/resource

    Alerting Rules

    groups:
      - name: vpa
        rules:
          # VPA not providing recommendations (insufficient data)
          - alert: VPANoRecommendation
            expr: |
              kube_verticalpodautoscaler_status_condition{
                condition="RecommendationProvided",status="False"} == 1
            for: 2h
            labels:
              severity: warning
            annotations:
              summary: "VPA {{ $labels.namespace }}/{{ $labels.vpa }} has no recommendation after 2h"
              description: "Ensure metrics-server is running and the workload has been running for >30min"
    
          # VPA Admission Controller webhook latency too high
          - alert: VPAAdmissionHighLatency
            expr: |
              histogram_quantile(0.99,
                rate(vpa_admission_controller_admission_duration_seconds_bucket[5m])) > 1
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "VPA Admission Controller p99 latency > 1s — pod starts may be slow"
    
          # VPA Updater evicting pods at high rate
          - alert: VPAHighEvictionRate
            expr: rate(vpa_updater_evictions_total[10m]) > 0.1
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "VPA Updater is evicting >6 pods/minute — check recommendation bounds"
    
          # Pod OOM killed (feeds into VPA Recommender — also indicates under-sizing)
          - alert: PodOOMKilled
            expr: |
              kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1
            for: 0m
            labels:
              severity: warning
            annotations:
              summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} container {{ $labels.container }} was OOM killed"

    Runbooks

    VPA Not Generating Recommendations

    # Check VPA conditions
    kubectl describe vpa <name> -n <namespace> | grep -A10 Conditions
    
    # Verify metrics-server is running
    kubectl get deployment metrics-server -n kube-system
    
    # Verify VPA can read metrics
    kubectl logs -n kube-system -l app=vpa-recommender | grep "Failed\|Error\|Warn"
    
    # Check if workload has been running long enough
    kubectl get pods -n <namespace> -l app=<app> \
      -o jsonpath='{.items[*].status.startTime}'
    
    # VPA needs ~1 hour of metrics before producing recommendations

    VPA Evicting Too Aggressively

    # Switch to Off mode immediately to stop evictions
    kubectl patch vpa <name> -n <namespace> --type=merge \
      -p '{"spec":{"updatePolicy":{"updateMode":"Off"}}}'
    
    # Check Updater logs for eviction reasoning
    kubectl logs -n kube-system -l app=vpa-updater | grep <namespace>
    
    # Add/lower minReplicas to prevent single-pod evictions
    kubectl patch vpa <name> -n <namespace> --type=merge \
      -p '{"spec":{"updatePolicy":{"minReplicas":2}}}'
    
    # Widen maxAllowed to reduce recommendation oscillation
    kubectl patch vpa <name> -n <namespace> --type=merge \
      -p '{"spec":{"resourcePolicy":{"containerPolicies":[{"containerName":"app","maxAllowed":{"cpu":"8","memory":"16Gi"}}]}}}'

    VPA Recommendation Seems Wrong (Too High or Too Low)

    # Compare actual usage to recommendation
    kubectl top pods -n <namespace> -l app=<app> --containers
    
    # Get VPA targets
    kubectl get vpa <name> -n <namespace> -o yaml | grep -A20 containerRecommendations
    
    # Check if OOM events are inflating memory recommendation
    kubectl get events -n <namespace> --field-selector reason=OOMKilling
    
    # If recommendation is stale (workload changed), delete and recreate VPA
    # to reset the history window
    kubectl delete vpa <name> -n <namespace>
    kubectl apply -f vpa.yaml

    VPA Admission Controller Not Applying Recommendations

    # Check if webhook is registered
    kubectl get mutatingwebhookconfigurations | grep vpa
    
    # Verify Admission Controller is running
    kubectl get deployment vpa-admission-controller -n kube-system
    
    # Test webhook manually (create a test pod and inspect its resources)
    kubectl run vpa-test --image=nginx:alpine -n <namespace> --dry-run=server \
      -o yaml | grep -A10 resources
    
    # Check logs
    kubectl logs -n kube-system -l app=vpa-admission-controller | tail -50

    In-Place Resize Stuck in Infeasible

    # Check resize status
    kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.resize}'
    
    # Infeasible = requested change cannot be done in-place (e.g., memory reduction)
    # Solution: delete the pod and let controller recreate with new spec
    kubectl delete pod <pod-name> -n <namespace>
    
    # Deferred = node lacks capacity; reschedule pod to different node
    kubectl drain <node> --ignore-daemonsets --delete-emptydir-data

    Best Practices

    1. Start with updateMode: Off (Goldilocks) — run VPA in advisory mode for 1–2 weeks before enabling updates. Validate that recommendations match intuition before allowing automatic evictions.
    2. Always set minReplicas: 2 in VPA updatePolicy — prevents the Updater from evicting the only pod of a single-replica Deployment. For critical services, set 3 or higher.
    3. Pair VPA with PDBs — VPA Updater respects PodDisruptionBudgets. Define a PDB with minAvailable: 2 alongside every VPA-managed Deployment to cap eviction impact.
    4. Use RequestsOnly for workloads with intentional burst limits — preserving the original limit/request ratio via RequestsAndLimits can produce unexpectedly large limits if the original ratio was wide.
    5. Set explicit minAllowed and maxAllowed — unbounded VPA can set CPU to 50m (causing throttling) or memory to 64Gi (blocking scheduling). Always bound recommendations to the realistic operational range.
    6. Use Initial mode for batch/Job workloads — running pods should not be evicted mid-job. Initial mode applies recommendations only at pod creation, which happens naturally on each job run.
    7. For JVM workloads, use -XX:MaxRAMPercentage instead of -Xmx — fixed heap sizes break VPA's ability to right-size memory without causing OOMs. Percentage-based heap automatically adjusts with the container limit.
    8. Don't use VPA Auto mode with HPA CPU/memory scaling — both controllers adjust the effective CPU utilization ratio through different levers, creating a feedback oscillation. Use HPA on custom/external metrics when VPA manages CPU/memory.