📋 Page Coverage Checklist

VPA architecture: Recommender, Updater, Admission Controller components

VPA object spec: updateMode (Off/Initial/Recreate/Auto)

containerPolicies: minAllowed, maxAllowed, controlledResources, controlledValues

Recommender algorithm: OOM history, CPU p90, memory p95, safety margins

Updater eviction: PDB respect, evictionRequirements, minimum replicas

In-Place Pod Vertical Scaling (1.27 alpha → 1.33 beta): resize subresource, resizePolicy

Limit/request ratio preservation and controlledValues: RequestsOnly

VPA + HPA: conflict matrix and 3 safe combination patterns

Goldilocks: namespace-mode VPA advisor, dashboard, Helm install

VPA for JVM workloads: heap sizing interaction, -XX:MaxRAMPercentage

VPA for batch/Jobs: Initial mode use case

Resource recommendation reading: kubectl get vpa, status.recommendation

5 metrics + 4 alerting rules + 5 runbooks + 8 best practices

Vertical Pod Autoscaler

Right-size container resource requests automatically using historical usage data

autoscaling.k8s.io/v1 Add-on (not built-in) Platform Engineer

While HPA adds or removes pod replicas, the Vertical Pod Autoscaler (VPA) adjusts the resource requests and limits of existing containers. It observes actual resource usage over time, builds statistical recommendations, and optionally applies them — either at pod creation time or by evicting and replacing running pods with updated resource specs. VPA is an add-on installed separately from core Kubernetes; it is not part of the default control plane.

VPA is not built in
Install VPA from the kubernetes/autoscaler repo. It deploys three components into the cluster. Managed Kubernetes offerings (GKE, EKS, AKS) offer VPA as a managed add-on.

VPA Architecture

VPA components and data flow: ┌─────────────────────────────────────────────────────────────┐ │ │ │ [1] VPA Recommender │ │ ├─ Watches all pods in cluster │ │ ├─ Reads metrics from metrics-server (CPU/mem usage) │ │ ├─ Reads OOMKill events from pod events │ │ ├─ Builds per-container histogram of CPU/mem usage │ │ └─ Writes recommendations → VPA object .status │ │ │ │ [2] VPA Updater │ │ ├─ Watches VPA objects with updateMode != Off │ │ ├─ Compares current pod requests to recommendations │ │ ├─ Evicts pods that are significantly out of range │ │ └─ Respects PodDisruptionBudgets during eviction │ │ │ │ [3] VPA Admission Controller (webhook) │ │ ├─ Intercepts pod CREATE requests │ │ ├─ Looks up VPA recommendation for matching VPA obj │ │ └─ Mutates resource requests/limits before pod lands │ │ │ └─────────────────────────────────────────────────────────────┘ Data flow: metrics-server ──► Recommender ──► VPA.status.recommendation │ ┌───────────────────┤ ▼ ▼ Updater Admission Controller (evict pods) (mutate new pods)

The three components are decoupled — you can run Recommender alone (for advisory-only mode) without Updater or Admission Controller.

VPA Object Spec

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: production
spec:
  # --- Target workload ---
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server

  # --- Update policy ---
  updatePolicy:
    updateMode: Auto          # Off | Initial | Recreate | Auto
    minReplicas: 2            # Don't evict if fewer than 2 replicas running

  # --- Container-level policies ---
  resourcePolicy:
    containerPolicies:
      - containerName: app
        # Recommendation boundaries
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi
        # Which resources to manage
        controlledResources: ["cpu", "memory"]
        # Whether to also set limits
        controlledValues: RequestsAndLimits  # or RequestsOnly

      - containerName: proxy-sidecar
        # Exclude sidecar from VPA management entirely
        mode: "Off"

      - containerName: init-container
        # Initial containers: only set requests (limits rarely matter)
        controlledValues: RequestsOnly
        minAllowed:
          cpu: 50m
          memory: 64Mi
        maxAllowed:
          cpu: 500m
          memory: 512Mi

Update Modes

Mode	Behavior	Pod restart?	Best for
`Off`	Recommendations computed and stored in VPA status; nothing applied	Never	Advisory only; Goldilocks; manual right-sizing workflow
`Initial`	Recommendations applied only to new pods at creation time (via Admission Controller); running pods untouched	Only on natural restarts / rollouts	Jobs, batch workloads; any workload where live eviction is unacceptable
`Recreate`	Recommendations applied to new pods AND Updater evicts out-of-range running pods	Yes — pods are evicted and recreated	Workloads that can tolerate occasional restarts; non-production
`Auto`	Same as Recreate today; in future will use in-place updates when available and safe	Yes (currently same as Recreate)	Recommended default when in-place updates stabilize

Auto mode evicts pods today
Despite the name, Auto mode currently behaves identically to Recreate — it evicts pods to apply new resource specs. In-place pod resize (1.33 beta) will eventually make Auto truly non-disruptive for CPU changes, but memory changes still require a restart. Always set minReplicas and respect PDBs to limit eviction blast radius.

Recommender Algorithm

The Recommender builds a histogram of CPU and memory usage samples for each container, then derives a target value with safety margins applied.

Recommender histogram-based approach: CPU recommendation: ├─ Collects 1-minute CPU usage samples over lookback window (default: 8 days) ├─ Builds weighted histogram (recent samples weighted higher) ├─ Target = p90 of histogram × safety margin (default: 1.15 = 15% headroom) └─ Lower bound = p50; Upper bound = p95 × safety margin Memory recommendation: ├─ Collects memory usage samples (RSS + page cache working set) ├─ OOM events → spike to current usage × OOM bump factor ├─ Target = p95 of histogram × safety margin (default: 1.15) └─ Upper bound = p99 × safety margin × OOM bump Safety margins (configurable via flags): --recommendation-margin-fraction=0.15 (15% overhead) --pod-recommendation-min-cpu-millicores=25 --pod-recommendation-min-memory-mb=250

The Recommender requires at least a few hours of data before producing reliable recommendations. Fresh workloads or those with spiky/seasonal patterns may receive under-fitted recommendations. The default lookback is 8 days — adjust with --history-length flag.

OOM Kill Handling

When a container is OOM-killed, the Recommender registers this as a "spike event" in the memory histogram. The spike is set to the memory limit at time of OOM multiplied by an OOM bump factor (default: 1.2). This prevents the recommender from re-recommending a limit that already caused an OOM.

# Check if VPA has OOM events in its recommendation
kubectl get vpa api-server-vpa -n production -o yaml | \
  yq '.status.recommendation.containerRecommendations[] |
      select(.containerName == "app") |
      {"lowerBound": .lowerBound, "target": .target, "upperBound": .upperBound}'

Reading VPA Recommendations

# Summary view
kubectl get vpa -n production
# NAME             MODE   CPU    MEM      PROVIDED   AGE
# api-server-vpa   Auto   220m   512Mi    True       3d

# Full recommendation details
kubectl describe vpa api-server-vpa -n production

# VPA status.recommendation structure
status:
  recommendation:
    containerRecommendations:
      - containerName: app
        lowerBound:              # Safe minimum (rarely go below this)
          cpu: 100m
          memory: 256Mi
        target:                  # Recommended value — apply this
          cpu: 220m
          memory: 512Mi
        uncappedTarget:          # What VPA would recommend without minAllowed/maxAllowed
          cpu: 195m
          memory: 480Mi
        upperBound:              # Should not exceed this (headroom for spikes)
          cpu: 1200m
          memory: 2Gi
  conditions:
    - type: RecommendationProvided
      status: "True"             # False if insufficient data
    - type: LowConfidence
      status: "False"            # True if < 1 hour of data

lowerBound vs target vs upperBound
target is the primary recommendation — this is what gets applied. lowerBound is the safe minimum; running below it risks OOM or throttling. upperBound is the maximum the VPA considers safe; setting requests above it wastes capacity. The Updater only evicts if current requests are outside the [lowerBound, upperBound] range.

Updater Eviction Logic

The Updater runs periodically and checks whether running pods' resource requests fall within the [lowerBound, upperBound] window. Pods outside this range are candidates for eviction.

Updater eviction decision tree: For each pod in target workload: │ ├─ Is updateMode Off or Initial? → skip ├─ Is pod in a VPA-excluded namespace? → skip ├─ Is current request within [lowerBound, upperBound]? → skip ├─ Would eviction violate PDB? → skip (try later) ├─ Is replica count ≤ minReplicas? → skip └─ Evict pod → Admission Controller applies recommendation on reschedule

Updater can evict all pods of a single-replica Deployment
Without minReplicas: 2 in the VPA spec, the Updater will evict the sole pod of a single-replica Deployment to apply new resource specs — causing a complete outage until the replacement pod starts. Always set spec.updatePolicy.minReplicas to at least 2 for production workloads, and ensure a matching PDB exists.

# PDB to protect against VPA evictions
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api-server
---
# VPA minReplicas guard
spec:
  updatePolicy:
    updateMode: Auto
    minReplicas: 2      # VPA will not evict if fewer than 2 replicas are running

evictionRequirements (1.25+)

spec:
  updatePolicy:
    updateMode: Auto
    evictionRequirements:
      # Only evict pods during low-traffic hours
      - resources: ["cpu", "memory"]
        changeRequirement: TargetHigherThanRequests  # Only evict to increase resources
      # Alternative: evict only when recommendation changes by > 20%
      # (controlled via --eviction-tolerance flag on Updater, not per-VPA)

controlledValues: RequestsOnly vs RequestsAndLimits

controlledValues	Effect on requests	Effect on limits	Use case
`RequestsAndLimits` (default)	Set to recommendation target	Scaled proportionally: `limit = request × (original limit / original request)`	Most workloads — preserves the original limit/request ratio
`RequestsOnly`	Set to recommendation target	Unchanged (kept at original value or removed if none)	When limits are intentionally higher (burst allowance); or no limits set

Limit/request ratio drift with RequestsAndLimits
If a container originally had requests.cpu: 100m, limits.cpu: 1000m (10× ratio), VPA will maintain this ratio. If VPA recommends 500m, it sets limits.cpu: 5000m — far more than needed. Consider RequestsOnly and manage limits separately via LimitRange defaults.

In-Place Pod Vertical Scaling (1.33 Beta)

Traditionally, changing a pod's resource requests requires restarting the pod (evict + recreate). In-place pod resize, stabilized as beta in 1.33, allows CPU requests/limits to be adjusted without a pod restart for containers that support it.

In-place resize flow (CPU — no restart needed): 1. Update pod.spec.containers[i].resources.requests.cpu 2. kubelet receives the update via the resize subresource 3. kubelet adjusts the container's cgroup cpu.shares / cpu.quota 4. Pod continues running — no eviction, no downtime Memory resize — restart required: Reducing memory limit below current usage → OOM risk Kernel cannot shrink RSS of a running process safely → kubelet marks resize as "InProgress" then "Infeasible" → VPA falls back to eviction for memory changes

# resizePolicy on container spec (in-place resize control)
spec:
  containers:
    - name: app
      resources:
        requests:
          cpu: 500m
          memory: 512Mi
        limits:
          cpu: 2
          memory: 1Gi
      resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired    # CPU resize without restart
        - resourceName: memory
          restartPolicy: RestartContainer  # Memory resize requires restart

# Trigger an in-place resize manually
kubectl patch pod api-server-abc123 --subresource=resize \
  --type=merge -p '{"spec":{"containers":[{"name":"app","resources":{"requests":{"cpu":"800m"}}}]}}'

# Check resize status
kubectl get pod api-server-abc123 -o jsonpath='{.status.resize}'
# Possible values: Proposed | InProgress | Deferred | Infeasible

Resize status	Meaning
`Proposed`	Resize requested, kubelet hasn't processed yet
`InProgress`	kubelet is applying the change
`Deferred`	Not enough node resources now; will retry when available
`Infeasible`	Cannot be applied (e.g., memory reduction below RSS); requires pod restart

VPA + HPA Interaction

See also: HPA page — VPA interaction section. The key conflict: both HPA (on CPU utilization) and VPA change the effective CPU utilization ratio — VPA by changing requests, HPA by changing replicas. This creates oscillation.

Configuration	Conflict?	Safe?
HPA on CPU/memory + VPA Auto	Yes — feedback loop	No
HPA on CPU/memory + VPA Off (advisor only)	None	Yes — use VPA recommendations manually
HPA on custom/external metrics + VPA Auto on CPU/memory	None — different signals	Yes — recommended pattern
VPA only (no HPA) on CPU/memory	None	Yes — for workloads where replica count is fixed
HPA on CPU + VPA `RequestsOnly` on memory only	Marginal — VPA changes memory, HPA watches CPU	Usually safe with monitoring

# Safe pattern: HPA on RPS, VPA on CPU+memory
# VPA (manages CPU and memory requests)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: Auto
  resourcePolicy:
    containerPolicies:
      - containerName: app
        controlledResources: ["cpu", "memory"]
---
# HPA (manages replicas based on RPS, not CPU)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second   # Custom metric — no conflict with VPA
        target:
          type: AverageValue
          averageValue: "500"

Goldilocks — VPA Advisor Dashboard

Goldilocks (by Fairwinds) automates the Off-mode VPA workflow: it creates a VPA object in Off mode for every Deployment in labeled namespaces, then provides a dashboard showing current vs recommended resource requests and the estimated cost difference.

Goldilocks workflow: 1. Label namespace: kubectl label ns production goldilocks.fairwinds.com/enabled=true 2. Goldilocks controller creates VPA (mode: Off) for every Deployment in namespace 3. VPA Recommender populates recommendations in each VPA status 4. Goldilocks dashboard reads VPA statuses and renders: ├─ Current requests vs recommendation ├─ Estimated monthly cost at current vs recommended └─ Copy-paste YAML snippet for the new resource block

# Install Goldilocks via Helm
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm upgrade --install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks --create-namespace

# Enable for a namespace
kubectl label ns production goldilocks.fairwinds.com/enabled=true

# Port-forward the dashboard
kubectl port-forward -n goldilocks svc/goldilocks-dashboard 8080:80

# List VPAs Goldilocks created
kubectl get vpa -n production -l "app.kubernetes.io/managed-by=goldilocks"

VPA for JVM Workloads

JVM applications manage their own heap independently of the container's memory limit. VPA observes the container's total RSS, not just heap — including metaspace, thread stacks, off-heap buffers (Netty, native libs). This can lead to confusing recommendations.

JVM heap and VPA interaction
If the JVM is configured with a fixed heap (-Xmx2g) and the container limit is 3Gi, VPA observes ~2.5Gi RSS and may recommend lowering the limit below the current RSS — triggering an OOM kill. Use -XX:MaxRAMPercentage=75 instead of -Xmx so the JVM heap automatically scales with the container limit VPA sets.

# JVM container — use percentage-based heap, not fixed -Xmx
containers:
  - name: java-app
    image: myapp:v2
    env:
      - name: JAVA_TOOL_OPTIONS
        value: >-
          -XX:MaxRAMPercentage=75
          -XX:InitialRAMPercentage=50
          -XX:+UseG1GC
          -XX:MaxGCPauseMillis=200
    resources:
      requests:
        cpu: 500m
        memory: 1Gi   # VPA will adjust this; JVM heap = 75% of limit
      limits:
        cpu: 2
        memory: 2Gi

# VPA for JVM: wider bounds to account for metaspace variability
spec:
  resourcePolicy:
    containerPolicies:
      - containerName: java-app
        minAllowed:
          cpu: 200m
          memory: 512Mi     # Absolute floor — JVM won't start below ~256Mi
        maxAllowed:
          cpu: 8
          memory: 16Gi
        controlledValues: RequestsAndLimits  # Scale limit with request (preserves ratio)

VPA for Batch Jobs (Initial Mode)

For batch Jobs where pods are short-lived, updateMode: Initial is ideal: VPA applies the recommendation when the pod is created (at the start of each job run) but never evicts running pods mid-job.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: batch-etl-vpa
  namespace: platform
spec:
  targetRef:
    apiVersion: batch/v1
    kind: CronJob
    name: nightly-etl
  updatePolicy:
    updateMode: Initial   # Apply on pod start, never evict running pod
  resourcePolicy:
    containerPolicies:
      - containerName: etl-worker
        minAllowed:
          cpu: 100m
          memory: 256Mi
        maxAllowed:
          cpu: 16
          memory: 32Gi    # Generous upper bound for large datasets
        controlledResources: ["cpu", "memory"]

Operational Commands

# List all VPAs and their modes
kubectl get vpa -A -o custom-columns=\
'NAMESPACE:.metadata.namespace,NAME:.metadata.name,MODE:.spec.updatePolicy.updateMode,READY:.status.conditions[0].status'

# Get full recommendation for a VPA
kubectl get vpa <name> -n <namespace> -o jsonpath=\
'{range .status.recommendation.containerRecommendations[*]}{.containerName}{"\n"}\
  target: {.target}{"\n"}\
  lowerBound: {.lowerBound}{"\n"}\
  upperBound: {.upperBound}{"\n\n"}{end}'

# Watch VPA conditions
kubectl get vpa <name> -n <namespace> -o jsonpath='{.status.conditions}' | jq .

# Check VPA Recommender logs (useful for diagnosing no-recommendation)
kubectl logs -n kube-system -l app=vpa-recommender --tail=100

# Check VPA Updater logs (see which pods were evicted and why)
kubectl logs -n kube-system -l app=vpa-updater --tail=100

# Check VPA Admission Controller logs
kubectl logs -n kube-system -l app=vpa-admission-controller --tail=100

# Temporarily disable VPA updates (flip to Off without deleting)
kubectl patch vpa <name> -n <namespace> --type=merge \
  -p '{"spec":{"updatePolicy":{"updateMode":"Off"}}}'

Metrics

Metric	Labels	Use
`vpa_recommender_recommendation_latency_seconds`	`namespace`, `vpa`	Time to generate a recommendation
`vpa_updater_evictions_total`	`namespace`	Total evictions triggered by VPA Updater
`vpa_admission_controller_admission_duration_seconds`	—	Latency of VPA webhook on pod creation
`vpa_recommender_memory_estimation_quality`	`namespace`, `vpa`	Confidence score of memory recommendations
`kube_verticalpodautoscaler_status_recommendation_containerrecommendations_target`	`container`, `resource`	Current target recommendation per container/resource

Alerting Rules

groups:
  - name: vpa
    rules:
      # VPA not providing recommendations (insufficient data)
      - alert: VPANoRecommendation
        expr: |
          kube_verticalpodautoscaler_status_condition{
            condition="RecommendationProvided",status="False"} == 1
        for: 2h
        labels:
          severity: warning
        annotations:
          summary: "VPA {{ $labels.namespace }}/{{ $labels.vpa }} has no recommendation after 2h"
          description: "Ensure metrics-server is running and the workload has been running for >30min"

      # VPA Admission Controller webhook latency too high
      - alert: VPAAdmissionHighLatency
        expr: |
          histogram_quantile(0.99,
            rate(vpa_admission_controller_admission_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "VPA Admission Controller p99 latency > 1s — pod starts may be slow"

      # VPA Updater evicting pods at high rate
      - alert: VPAHighEvictionRate
        expr: rate(vpa_updater_evictions_total[10m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "VPA Updater is evicting >6 pods/minute — check recommendation bounds"

      # Pod OOM killed (feeds into VPA Recommender — also indicates under-sizing)
      - alert: PodOOMKilled
        expr: |
          kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} container {{ $labels.container }} was OOM killed"

Runbooks

VPA Not Generating Recommendations

# Check VPA conditions
kubectl describe vpa <name> -n <namespace> | grep -A10 Conditions

# Verify metrics-server is running
kubectl get deployment metrics-server -n kube-system

# Verify VPA can read metrics
kubectl logs -n kube-system -l app=vpa-recommender | grep "Failed\|Error\|Warn"

# Check if workload has been running long enough
kubectl get pods -n <namespace> -l app=<app> \
  -o jsonpath='{.items[*].status.startTime}'

# VPA needs ~1 hour of metrics before producing recommendations

VPA Evicting Too Aggressively

# Switch to Off mode immediately to stop evictions
kubectl patch vpa <name> -n <namespace> --type=merge \
  -p '{"spec":{"updatePolicy":{"updateMode":"Off"}}}'

# Check Updater logs for eviction reasoning
kubectl logs -n kube-system -l app=vpa-updater | grep <namespace>

# Add/lower minReplicas to prevent single-pod evictions
kubectl patch vpa <name> -n <namespace> --type=merge \
  -p '{"spec":{"updatePolicy":{"minReplicas":2}}}'

# Widen maxAllowed to reduce recommendation oscillation
kubectl patch vpa <name> -n <namespace> --type=merge \
  -p '{"spec":{"resourcePolicy":{"containerPolicies":[{"containerName":"app","maxAllowed":{"cpu":"8","memory":"16Gi"}}]}}}'

VPA Recommendation Seems Wrong (Too High or Too Low)

# Compare actual usage to recommendation
kubectl top pods -n <namespace> -l app=<app> --containers

# Get VPA targets
kubectl get vpa <name> -n <namespace> -o yaml | grep -A20 containerRecommendations

# Check if OOM events are inflating memory recommendation
kubectl get events -n <namespace> --field-selector reason=OOMKilling

# If recommendation is stale (workload changed), delete and recreate VPA
# to reset the history window
kubectl delete vpa <name> -n <namespace>
kubectl apply -f vpa.yaml

VPA Admission Controller Not Applying Recommendations

# Check if webhook is registered
kubectl get mutatingwebhookconfigurations | grep vpa

# Verify Admission Controller is running
kubectl get deployment vpa-admission-controller -n kube-system

# Test webhook manually (create a test pod and inspect its resources)
kubectl run vpa-test --image=nginx:alpine -n <namespace> --dry-run=server \
  -o yaml | grep -A10 resources

# Check logs
kubectl logs -n kube-system -l app=vpa-admission-controller | tail -50

In-Place Resize Stuck in Infeasible

# Check resize status
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.resize}'

# Infeasible = requested change cannot be done in-place (e.g., memory reduction)
# Solution: delete the pod and let controller recreate with new spec
kubectl delete pod <pod-name> -n <namespace>

# Deferred = node lacks capacity; reschedule pod to different node
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data

Best Practices

Start with updateMode: Off (Goldilocks) — run VPA in advisory mode for 1–2 weeks before enabling updates. Validate that recommendations match intuition before allowing automatic evictions.
Always set minReplicas: 2 in VPA updatePolicy — prevents the Updater from evicting the only pod of a single-replica Deployment. For critical services, set 3 or higher.
Pair VPA with PDBs — VPA Updater respects PodDisruptionBudgets. Define a PDB with minAvailable: 2 alongside every VPA-managed Deployment to cap eviction impact.
Use RequestsOnly for workloads with intentional burst limits — preserving the original limit/request ratio via RequestsAndLimits can produce unexpectedly large limits if the original ratio was wide.
Set explicit minAllowed and maxAllowed — unbounded VPA can set CPU to 50m (causing throttling) or memory to 64Gi (blocking scheduling). Always bound recommendations to the realistic operational range.
Use Initial mode for batch/Job workloads — running pods should not be evicted mid-job. Initial mode applies recommendations only at pod creation, which happens naturally on each job run.
For JVM workloads, use -XX:MaxRAMPercentage instead of -Xmx — fixed heap sizes break VPA's ability to right-size memory without causing OOMs. Percentage-based heap automatically adjusts with the container limit.
Don't use VPA Auto mode with HPA CPU/memory scaling — both controllers adjust the effective CPU utilization ratio through different levers, creating a feedback oscillation. Use HPA on custom/external metrics when VPA manages CPU/memory.