📋 Page Coverage Checklist
Vertical Pod Autoscaler
Right-size container resource requests automatically using historical usage data
While HPA adds or removes pod replicas, the Vertical Pod Autoscaler (VPA) adjusts the resource requests and limits of existing containers. It observes actual resource usage over time, builds statistical recommendations, and optionally applies them — either at pod creation time or by evicting and replacing running pods with updated resource specs. VPA is an add-on installed separately from core Kubernetes; it is not part of the default control plane.
Install VPA from the kubernetes/autoscaler repo. It deploys three components into the cluster. Managed Kubernetes offerings (GKE, EKS, AKS) offer VPA as a managed add-on.
VPA Architecture
The three components are decoupled — you can run Recommender alone (for advisory-only mode) without Updater or Admission Controller.
VPA Object Spec
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
namespace: production
spec:
# --- Target workload ---
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
# --- Update policy ---
updatePolicy:
updateMode: Auto # Off | Initial | Recreate | Auto
minReplicas: 2 # Don't evict if fewer than 2 replicas running
# --- Container-level policies ---
resourcePolicy:
containerPolicies:
- containerName: app
# Recommendation boundaries
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 8Gi
# Which resources to manage
controlledResources: ["cpu", "memory"]
# Whether to also set limits
controlledValues: RequestsAndLimits # or RequestsOnly
- containerName: proxy-sidecar
# Exclude sidecar from VPA management entirely
mode: "Off"
- containerName: init-container
# Initial containers: only set requests (limits rarely matter)
controlledValues: RequestsOnly
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 500m
memory: 512Mi
Update Modes
| Mode | Behavior | Pod restart? | Best for |
|---|---|---|---|
Off |
Recommendations computed and stored in VPA status; nothing applied | Never | Advisory only; Goldilocks; manual right-sizing workflow |
Initial |
Recommendations applied only to new pods at creation time (via Admission Controller); running pods untouched | Only on natural restarts / rollouts | Jobs, batch workloads; any workload where live eviction is unacceptable |
Recreate |
Recommendations applied to new pods AND Updater evicts out-of-range running pods | Yes — pods are evicted and recreated | Workloads that can tolerate occasional restarts; non-production |
Auto |
Same as Recreate today; in future will use in-place updates when available and safe | Yes (currently same as Recreate) | Recommended default when in-place updates stabilize |
Despite the name,
Auto mode currently behaves identically to Recreate — it evicts pods to apply new resource specs. In-place pod resize (1.33 beta) will eventually make Auto truly non-disruptive for CPU changes, but memory changes still require a restart. Always set minReplicas and respect PDBs to limit eviction blast radius.
Recommender Algorithm
The Recommender builds a histogram of CPU and memory usage samples for each container, then derives a target value with safety margins applied.
The Recommender requires at least a few hours of data before producing reliable recommendations. Fresh workloads or those with spiky/seasonal patterns may receive under-fitted recommendations. The default lookback is 8 days — adjust with --history-length flag.
OOM Kill Handling
When a container is OOM-killed, the Recommender registers this as a "spike event" in the memory histogram. The spike is set to the memory limit at time of OOM multiplied by an OOM bump factor (default: 1.2). This prevents the recommender from re-recommending a limit that already caused an OOM.
# Check if VPA has OOM events in its recommendation
kubectl get vpa api-server-vpa -n production -o yaml | \
yq '.status.recommendation.containerRecommendations[] |
select(.containerName == "app") |
{"lowerBound": .lowerBound, "target": .target, "upperBound": .upperBound}'
Reading VPA Recommendations
# Summary view
kubectl get vpa -n production
# NAME MODE CPU MEM PROVIDED AGE
# api-server-vpa Auto 220m 512Mi True 3d
# Full recommendation details
kubectl describe vpa api-server-vpa -n production
# VPA status.recommendation structure
status:
recommendation:
containerRecommendations:
- containerName: app
lowerBound: # Safe minimum (rarely go below this)
cpu: 100m
memory: 256Mi
target: # Recommended value — apply this
cpu: 220m
memory: 512Mi
uncappedTarget: # What VPA would recommend without minAllowed/maxAllowed
cpu: 195m
memory: 480Mi
upperBound: # Should not exceed this (headroom for spikes)
cpu: 1200m
memory: 2Gi
conditions:
- type: RecommendationProvided
status: "True" # False if insufficient data
- type: LowConfidence
status: "False" # True if < 1 hour of data
target is the primary recommendation — this is what gets applied. lowerBound is the safe minimum; running below it risks OOM or throttling. upperBound is the maximum the VPA considers safe; setting requests above it wastes capacity. The Updater only evicts if current requests are outside the
[lowerBound, upperBound] range.
Updater Eviction Logic
The Updater runs periodically and checks whether running pods' resource requests fall within the [lowerBound, upperBound] window. Pods outside this range are candidates for eviction.
Without
minReplicas: 2 in the VPA spec, the Updater will evict the sole pod of a single-replica Deployment to apply new resource specs — causing a complete outage until the replacement pod starts. Always set spec.updatePolicy.minReplicas to at least 2 for production workloads, and ensure a matching PDB exists.
# PDB to protect against VPA evictions
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: api-server
---
# VPA minReplicas guard
spec:
updatePolicy:
updateMode: Auto
minReplicas: 2 # VPA will not evict if fewer than 2 replicas are running
evictionRequirements (1.25+)
spec:
updatePolicy:
updateMode: Auto
evictionRequirements:
# Only evict pods during low-traffic hours
- resources: ["cpu", "memory"]
changeRequirement: TargetHigherThanRequests # Only evict to increase resources
# Alternative: evict only when recommendation changes by > 20%
# (controlled via --eviction-tolerance flag on Updater, not per-VPA)
controlledValues: RequestsOnly vs RequestsAndLimits
| controlledValues | Effect on requests | Effect on limits | Use case |
|---|---|---|---|
RequestsAndLimits (default) |
Set to recommendation target | Scaled proportionally: limit = request × (original limit / original request) |
Most workloads — preserves the original limit/request ratio |
RequestsOnly |
Set to recommendation target | Unchanged (kept at original value or removed if none) | When limits are intentionally higher (burst allowance); or no limits set |
If a container originally had
requests.cpu: 100m, limits.cpu: 1000m (10× ratio), VPA will maintain this ratio. If VPA recommends 500m, it sets limits.cpu: 5000m — far more than needed. Consider RequestsOnly and manage limits separately via LimitRange defaults.
In-Place Pod Vertical Scaling (1.33 Beta)
Traditionally, changing a pod's resource requests requires restarting the pod (evict + recreate). In-place pod resize, stabilized as beta in 1.33, allows CPU requests/limits to be adjusted without a pod restart for containers that support it.
# resizePolicy on container spec (in-place resize control)
spec:
containers:
- name: app
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2
memory: 1Gi
resizePolicy:
- resourceName: cpu
restartPolicy: NotRequired # CPU resize without restart
- resourceName: memory
restartPolicy: RestartContainer # Memory resize requires restart
# Trigger an in-place resize manually
kubectl patch pod api-server-abc123 --subresource=resize \
--type=merge -p '{"spec":{"containers":[{"name":"app","resources":{"requests":{"cpu":"800m"}}}]}}'
# Check resize status
kubectl get pod api-server-abc123 -o jsonpath='{.status.resize}'
# Possible values: Proposed | InProgress | Deferred | Infeasible
| Resize status | Meaning |
|---|---|
Proposed | Resize requested, kubelet hasn't processed yet |
InProgress | kubelet is applying the change |
Deferred | Not enough node resources now; will retry when available |
Infeasible | Cannot be applied (e.g., memory reduction below RSS); requires pod restart |
VPA + HPA Interaction
See also: HPA page — VPA interaction section. The key conflict: both HPA (on CPU utilization) and VPA change the effective CPU utilization ratio — VPA by changing requests, HPA by changing replicas. This creates oscillation.
| Configuration | Conflict? | Safe? |
|---|---|---|
| HPA on CPU/memory + VPA Auto | Yes — feedback loop | No |
| HPA on CPU/memory + VPA Off (advisor only) | None | Yes — use VPA recommendations manually |
| HPA on custom/external metrics + VPA Auto on CPU/memory | None — different signals | Yes — recommended pattern |
| VPA only (no HPA) on CPU/memory | None | Yes — for workloads where replica count is fixed |
HPA on CPU + VPA RequestsOnly on memory only | Marginal — VPA changes memory, HPA watches CPU | Usually safe with monitoring |
# Safe pattern: HPA on RPS, VPA on CPU+memory
# VPA (manages CPU and memory requests)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: Auto
resourcePolicy:
containerPolicies:
- containerName: app
controlledResources: ["cpu", "memory"]
---
# HPA (manages replicas based on RPS, not CPU)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 50
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second # Custom metric — no conflict with VPA
target:
type: AverageValue
averageValue: "500"
Goldilocks — VPA Advisor Dashboard
Goldilocks (by Fairwinds) automates the Off-mode VPA workflow: it creates a VPA object in Off mode for every Deployment in labeled namespaces, then provides a dashboard showing current vs recommended resource requests and the estimated cost difference.
# Install Goldilocks via Helm
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm upgrade --install goldilocks fairwinds-stable/goldilocks \
--namespace goldilocks --create-namespace
# Enable for a namespace
kubectl label ns production goldilocks.fairwinds.com/enabled=true
# Port-forward the dashboard
kubectl port-forward -n goldilocks svc/goldilocks-dashboard 8080:80
# List VPAs Goldilocks created
kubectl get vpa -n production -l "app.kubernetes.io/managed-by=goldilocks"
VPA for JVM Workloads
JVM applications manage their own heap independently of the container's memory limit. VPA observes the container's total RSS, not just heap — including metaspace, thread stacks, off-heap buffers (Netty, native libs). This can lead to confusing recommendations.
If the JVM is configured with a fixed heap (
-Xmx2g) and the container limit is 3Gi, VPA observes ~2.5Gi RSS and may recommend lowering the limit below the current RSS — triggering an OOM kill. Use -XX:MaxRAMPercentage=75 instead of -Xmx so the JVM heap automatically scales with the container limit VPA sets.
# JVM container — use percentage-based heap, not fixed -Xmx
containers:
- name: java-app
image: myapp:v2
env:
- name: JAVA_TOOL_OPTIONS
value: >-
-XX:MaxRAMPercentage=75
-XX:InitialRAMPercentage=50
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
resources:
requests:
cpu: 500m
memory: 1Gi # VPA will adjust this; JVM heap = 75% of limit
limits:
cpu: 2
memory: 2Gi
# VPA for JVM: wider bounds to account for metaspace variability
spec:
resourcePolicy:
containerPolicies:
- containerName: java-app
minAllowed:
cpu: 200m
memory: 512Mi # Absolute floor — JVM won't start below ~256Mi
maxAllowed:
cpu: 8
memory: 16Gi
controlledValues: RequestsAndLimits # Scale limit with request (preserves ratio)
VPA for Batch Jobs (Initial Mode)
For batch Jobs where pods are short-lived, updateMode: Initial is ideal: VPA applies the recommendation when the pod is created (at the start of each job run) but never evicts running pods mid-job.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: batch-etl-vpa
namespace: platform
spec:
targetRef:
apiVersion: batch/v1
kind: CronJob
name: nightly-etl
updatePolicy:
updateMode: Initial # Apply on pod start, never evict running pod
resourcePolicy:
containerPolicies:
- containerName: etl-worker
minAllowed:
cpu: 100m
memory: 256Mi
maxAllowed:
cpu: 16
memory: 32Gi # Generous upper bound for large datasets
controlledResources: ["cpu", "memory"]
Operational Commands
# List all VPAs and their modes
kubectl get vpa -A -o custom-columns=\
'NAMESPACE:.metadata.namespace,NAME:.metadata.name,MODE:.spec.updatePolicy.updateMode,READY:.status.conditions[0].status'
# Get full recommendation for a VPA
kubectl get vpa <name> -n <namespace> -o jsonpath=\
'{range .status.recommendation.containerRecommendations[*]}{.containerName}{"\n"}\
target: {.target}{"\n"}\
lowerBound: {.lowerBound}{"\n"}\
upperBound: {.upperBound}{"\n\n"}{end}'
# Watch VPA conditions
kubectl get vpa <name> -n <namespace> -o jsonpath='{.status.conditions}' | jq .
# Check VPA Recommender logs (useful for diagnosing no-recommendation)
kubectl logs -n kube-system -l app=vpa-recommender --tail=100
# Check VPA Updater logs (see which pods were evicted and why)
kubectl logs -n kube-system -l app=vpa-updater --tail=100
# Check VPA Admission Controller logs
kubectl logs -n kube-system -l app=vpa-admission-controller --tail=100
# Temporarily disable VPA updates (flip to Off without deleting)
kubectl patch vpa <name> -n <namespace> --type=merge \
-p '{"spec":{"updatePolicy":{"updateMode":"Off"}}}'
Metrics
| Metric | Labels | Use |
|---|---|---|
vpa_recommender_recommendation_latency_seconds | namespace, vpa | Time to generate a recommendation |
vpa_updater_evictions_total | namespace | Total evictions triggered by VPA Updater |
vpa_admission_controller_admission_duration_seconds | — | Latency of VPA webhook on pod creation |
vpa_recommender_memory_estimation_quality | namespace, vpa | Confidence score of memory recommendations |
kube_verticalpodautoscaler_status_recommendation_containerrecommendations_target | container, resource | Current target recommendation per container/resource |
Alerting Rules
groups:
- name: vpa
rules:
# VPA not providing recommendations (insufficient data)
- alert: VPANoRecommendation
expr: |
kube_verticalpodautoscaler_status_condition{
condition="RecommendationProvided",status="False"} == 1
for: 2h
labels:
severity: warning
annotations:
summary: "VPA {{ $labels.namespace }}/{{ $labels.vpa }} has no recommendation after 2h"
description: "Ensure metrics-server is running and the workload has been running for >30min"
# VPA Admission Controller webhook latency too high
- alert: VPAAdmissionHighLatency
expr: |
histogram_quantile(0.99,
rate(vpa_admission_controller_admission_duration_seconds_bucket[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "VPA Admission Controller p99 latency > 1s — pod starts may be slow"
# VPA Updater evicting pods at high rate
- alert: VPAHighEvictionRate
expr: rate(vpa_updater_evictions_total[10m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "VPA Updater is evicting >6 pods/minute — check recommendation bounds"
# Pod OOM killed (feeds into VPA Recommender — also indicates under-sizing)
- alert: PodOOMKilled
expr: |
kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1
for: 0m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} container {{ $labels.container }} was OOM killed"
Runbooks
VPA Not Generating Recommendations
# Check VPA conditions
kubectl describe vpa <name> -n <namespace> | grep -A10 Conditions
# Verify metrics-server is running
kubectl get deployment metrics-server -n kube-system
# Verify VPA can read metrics
kubectl logs -n kube-system -l app=vpa-recommender | grep "Failed\|Error\|Warn"
# Check if workload has been running long enough
kubectl get pods -n <namespace> -l app=<app> \
-o jsonpath='{.items[*].status.startTime}'
# VPA needs ~1 hour of metrics before producing recommendations
VPA Evicting Too Aggressively
# Switch to Off mode immediately to stop evictions
kubectl patch vpa <name> -n <namespace> --type=merge \
-p '{"spec":{"updatePolicy":{"updateMode":"Off"}}}'
# Check Updater logs for eviction reasoning
kubectl logs -n kube-system -l app=vpa-updater | grep <namespace>
# Add/lower minReplicas to prevent single-pod evictions
kubectl patch vpa <name> -n <namespace> --type=merge \
-p '{"spec":{"updatePolicy":{"minReplicas":2}}}'
# Widen maxAllowed to reduce recommendation oscillation
kubectl patch vpa <name> -n <namespace> --type=merge \
-p '{"spec":{"resourcePolicy":{"containerPolicies":[{"containerName":"app","maxAllowed":{"cpu":"8","memory":"16Gi"}}]}}}'
VPA Recommendation Seems Wrong (Too High or Too Low)
# Compare actual usage to recommendation
kubectl top pods -n <namespace> -l app=<app> --containers
# Get VPA targets
kubectl get vpa <name> -n <namespace> -o yaml | grep -A20 containerRecommendations
# Check if OOM events are inflating memory recommendation
kubectl get events -n <namespace> --field-selector reason=OOMKilling
# If recommendation is stale (workload changed), delete and recreate VPA
# to reset the history window
kubectl delete vpa <name> -n <namespace>
kubectl apply -f vpa.yaml
VPA Admission Controller Not Applying Recommendations
# Check if webhook is registered
kubectl get mutatingwebhookconfigurations | grep vpa
# Verify Admission Controller is running
kubectl get deployment vpa-admission-controller -n kube-system
# Test webhook manually (create a test pod and inspect its resources)
kubectl run vpa-test --image=nginx:alpine -n <namespace> --dry-run=server \
-o yaml | grep -A10 resources
# Check logs
kubectl logs -n kube-system -l app=vpa-admission-controller | tail -50
In-Place Resize Stuck in Infeasible
# Check resize status
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.resize}'
# Infeasible = requested change cannot be done in-place (e.g., memory reduction)
# Solution: delete the pod and let controller recreate with new spec
kubectl delete pod <pod-name> -n <namespace>
# Deferred = node lacks capacity; reschedule pod to different node
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data
Best Practices
- Start with
updateMode: Off(Goldilocks) — run VPA in advisory mode for 1–2 weeks before enabling updates. Validate that recommendations match intuition before allowing automatic evictions. - Always set
minReplicas: 2in VPA updatePolicy — prevents the Updater from evicting the only pod of a single-replica Deployment. For critical services, set 3 or higher. - Pair VPA with PDBs — VPA Updater respects PodDisruptionBudgets. Define a PDB with
minAvailable: 2alongside every VPA-managed Deployment to cap eviction impact. - Use
RequestsOnlyfor workloads with intentional burst limits — preserving the original limit/request ratio viaRequestsAndLimitscan produce unexpectedly large limits if the original ratio was wide. - Set explicit
minAllowedandmaxAllowed— unbounded VPA can set CPU to 50m (causing throttling) or memory to 64Gi (blocking scheduling). Always bound recommendations to the realistic operational range. - Use
Initialmode for batch/Job workloads — running pods should not be evicted mid-job. Initial mode applies recommendations only at pod creation, which happens naturally on each job run. - For JVM workloads, use
-XX:MaxRAMPercentageinstead of-Xmx— fixed heap sizes break VPA's ability to right-size memory without causing OOMs. Percentage-based heap automatically adjusts with the container limit. - Don't use VPA Auto mode with HPA CPU/memory scaling — both controllers adjust the effective CPU utilization ratio through different levers, creating a feedback oscillation. Use HPA on custom/external metrics when VPA manages CPU/memory.