📋 Page Coverage Checklist
Pod Disruption Budgets
Limit voluntary disruptions to maintain availability during drains, upgrades, and autoscaling
A PodDisruptionBudget (PDB) constrains how many pods of a workload can be simultaneously unavailable due to voluntary disruptions — node drains, cluster upgrades, VPA evictions, Cluster Autoscaler scale-down, and any other caller of the Eviction API. PDBs do not protect against involuntary disruptions (node hardware failure, kernel panic, OOM kill) — those are handled by replica counts, topology spread, and readiness probes.
Voluntary vs Involuntary Disruptions
| Disruption type | Category | PDB protects? | Examples |
|---|---|---|---|
Node drain (kubectl drain) | Voluntary | Yes | Node maintenance, OS upgrade, cluster upgrade |
| Cluster Autoscaler scale-down | Voluntary | Yes | Removing underutilized nodes |
| VPA Updater eviction | Voluntary | Yes | Right-sizing resource requests |
| kubectl delete pod | Voluntary | Yes — uses Eviction API if specified | Manual intervention |
| Deployment RollingUpdate | Voluntary | Partial — HPA and kube-controller use spec, not Eviction API | kubectl rollout |
| Node hardware failure | Involuntary | No | Power outage, NIC failure |
| OOM kill | Involuntary | No | Container exceeds memory limit |
| Kernel panic / node NotReady | Involuntary | No | OS crash, kubelet failure |
| Pod eviction for resource pressure | Involuntary | No | kubelet evicts BestEffort/Burstable pods |
PDB Spec
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server-pdb
namespace: production
spec:
# --- Availability constraint (choose one: minAvailable OR maxUnavailable) ---
minAvailable: 2 # At least 2 pods must be healthy before eviction is allowed
# OR
# maxUnavailable: 1 # At most 1 pod can be unavailable at a time
# OR percentages:
# minAvailable: "80%" # At least 80% of pods must be healthy
# maxUnavailable: "20%"
# --- Pod selector ---
selector:
matchLabels:
app: api-server
# Optional: matchExpressions for complex selectors
# matchExpressions:
# - key: environment
# operator: In
# values: [production, staging]
# --- Unhealthy pod policy (1.26+) ---
unhealthyPodEvictionPolicy: IfHealthyBudget # IfHealthyBudget (default) | AlwaysAllow
You cannot set both in the same PDB. Choose based on how you think about the constraint:
minAvailable for quorum-based systems ("always keep N healthy"), maxUnavailable for rolling operations ("allow at most N down at once").
minAvailable vs maxUnavailable Semantics
| Constraint | Replicas=3 | Replicas=5 | Replicas=10 | Notes |
|---|---|---|---|---|
minAvailable: 2 | 1 allowed | 3 allowed | 8 allowed | Absolute — scales with replicas |
minAvailable: "60%" | 1 allowed (3-ceil(1.8)) | 3 allowed (5-3) | 4 allowed (10-6) | Percentage rounds minAvailable up |
maxUnavailable: 1 | 1 allowed | 1 allowed | 1 allowed | Absolute — does not scale |
maxUnavailable: "20%" | 0 allowed (floor(0.6)) | 1 allowed (floor(1)) | 2 allowed (floor(2)) | Percentage rounds maxUnavailable down (conservative) |
minAvailable percentages round up (more conservative — more pods must stay healthy). maxUnavailable percentages round down (more conservative — fewer pods can be removed). For small replica counts, maxUnavailable: "20%" with 3 replicas allows 0 disruptions (floor(0.6) = 0) — effectively freezing all evictions. Verify behavior at your minimum replica count.
Eviction API
PDBs are enforced through the Eviction API (policy/v1/evictions), not through the Delete API. Any caller that wants to respect PDBs must use eviction rather than deletion.
# Evict a pod via the Eviction API (PDB-aware)
kubectl delete pod <pod-name> --grace-period=30
# kubectl delete uses the Eviction API by default since 1.22
# Raw eviction API call
kubectl proxy &
curl -X POST \
"http://localhost:8001/api/v1/namespaces/production/pods/api-server-xyz/eviction" \
-H "Content-Type: application/json" \
-d '{"apiVersion":"policy/v1","kind":"Eviction","metadata":{"name":"api-server-xyz","namespace":"production"}}'
kubectl drain and PDBs
kubectl drain cordons the node (marks it unschedulable) then evicts all pods via the Eviction API, respecting PDBs automatically. It retries on 429 responses until the PDB allows the eviction or a timeout is reached.
# Standard drain — respects PDBs, graceful termination
kubectl drain node-1 \
--ignore-daemonsets \ # Don't evict DaemonSet pods (they reschedule on same node)
--delete-emptydir-data \ # Allow eviction of pods with emptyDir volumes
--grace-period=60 \ # Override pod's terminationGracePeriodSeconds
--timeout=600s # Give up after 10 minutes total
# Check what would be drained (dry run)
kubectl drain node-1 --ignore-daemonsets --dry-run
# Force drain — bypasses PDBs (DANGEROUS — use only for node replacement)
kubectl drain node-1 --ignore-daemonsets --force --disable-eviction
# --disable-eviction: uses DELETE instead of Eviction API (bypasses PDB checks)
# Use only when: node is already NotReady, or you accept the availability risk
--force allows deletion of pods not managed by a controller (orphan pods). --disable-eviction switches from the Eviction API to direct deletion, entirely bypassing PDB checks. Both flags can cause outages if used on healthy nodes with carefully configured PDBs. Reserve them for disaster recovery scenarios where the node is already failed.
# Drain stuck due to PDB — diagnose before forcing
# See which PDB is blocking
kubectl get pdb -n production
kubectl describe pdb api-server-pdb -n production
# See which pods are selected
kubectl get pods -n production -l app=api-server
# Check if some pods are not Ready (contributing to low currentHealthy)
kubectl get pods -n production -l app=api-server \
-o custom-columns=NAME:.metadata.name,READY:.status.conditions[?(@.type=="Ready")].status
# If pods are stuck not-Ready, fix them first before draining
# Or use unhealthyPodEvictionPolicy: AlwaysAllow to allow eviction of unhealthy pods
PDB Status Fields
kubectl get pdb -n production
# NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
# api-server-pdb 2 N/A 3 14d
kubectl describe pdb api-server-pdb -n production
| Status field | Meaning |
|---|---|
status.currentHealthy | Number of pods matching selector that are currently Ready |
status.desiredHealthy | Minimum number of pods that must be healthy (derived from minAvailable / maxUnavailable) |
status.disruptionsAllowed | currentHealthy - desiredHealthy — how many evictions are currently permitted |
status.expectedPods | Total pods matching the selector (whether healthy or not) |
status.disruptedPods | Pods that have been evicted but not yet removed from the endpoint list |
status.observedGeneration | Generation of the PDB spec this status reflects |
status.conditions[].type: DisruptionAllowed | True if at least 1 disruption is currently allowed |
# Watch PDB status in real time during drain
watch kubectl get pdb -n production
# Get disruptionsAllowed programmatically
kubectl get pdb api-server-pdb -n production \
-o jsonpath='{.status.disruptionsAllowed}'
# Get all PDB conditions
kubectl get pdb api-server-pdb -n production \
-o jsonpath='{.status.conditions}' | jq .
unhealthyPodEvictionPolicy (1.26+)
This field controls what happens when pods are already unhealthy (not Ready) and a PDB exists. Without it, unhealthy pods count toward the budget — even though they're already broken — blocking eviction of nodes that need maintenance.
| Policy | Behavior | Use case |
|---|---|---|
IfHealthyBudget (default) |
Unhealthy pods can be evicted only if currentHealthy > desiredHealthy (budget has room). If all pods are unhealthy, none can be evicted. |
Strict availability guarantee — never evict when below the healthy threshold |
AlwaysAllow |
Unhealthy pods can always be evicted regardless of budget state | Allow node drain to proceed even when pods are already broken (e.g., stuck CrashLoopBackOff blocking maintenance) |
spec:
minAvailable: 2
unhealthyPodEvictionPolicy: AlwaysAllow # Unblock drains when pods are already broken
selector:
matchLabels:
app: api-server
With
AlwaysAllow, if 3 of 5 pods are already in CrashLoopBackOff and a node drain evicts the remaining 2 healthy pods, you have 0 healthy pods serving traffic. Use AlwaysAllow only when the alternative (nodes stuck in maintenance limbo) is worse than the availability risk.
PDB for Quorum Systems
Distributed consensus systems (etcd, Kafka, ZooKeeper, PostgreSQL with Patroni) require a strict quorum of healthy members. PDBs enforce this during maintenance.
etcd (3-node cluster — needs 2 for quorum)
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: etcd-pdb
namespace: kube-system
spec:
minAvailable: 2 # 2 of 3 = quorum maintained
selector:
matchLabels:
component: etcd
Kafka (3 brokers — ISR-based availability)
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: kafka-pdb
namespace: platform
spec:
maxUnavailable: 1 # Only 1 broker down at a time
selector:
matchLabels:
app.kubernetes.io/component: kafka
app.kubernetes.io/instance: production
ZooKeeper (5-node — needs 3 for quorum)
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: zookeeper-pdb
namespace: platform
spec:
minAvailable: 3 # 3 of 5 = quorum (majority)
selector:
matchLabels:
app: zookeeper
PostgreSQL with Patroni (1 primary + 2 replicas)
# Primary must never be disrupted alone — use minAvailable: 2 to ensure
# at least one replica is present before eviction is allowed
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: postgres-pdb
namespace: databases
spec:
minAvailable: 2 # 2 of 3 — Patroni can failover if primary is drained
selector:
matchLabels:
app: patroni-cluster
RollingUpdate + PDB Deadlock
A common misconfiguration: maxUnavailable: 0 in the PDB combined with a Deployment's RollingUpdate strategy can create a deadlock where the rollout cannot proceed.
# WRONG — deadlocks RollingUpdate when all pods are healthy
spec:
maxUnavailable: 0
selector:
matchLabels:
app: api-server
# CORRECT — allows 1 pod down during rollout while maintaining 2 healthy
spec:
minAvailable: 2 # For replicas=3: allows 1 disruption
selector:
matchLabels:
app: api-server
The Deployment controller does call the Eviction API when deleting pods during a rolling update (since Kubernetes 1.22). This means PDBs are respected during Deployments. If your PDB prevents any eviction, your rolling update will stall. Ensure
minAvailable < currentReplicas (or equivalently, maxUnavailable >= 1) so the rollout can always make progress.
Cluster Autoscaler & PDBs
The Cluster Autoscaler (CA) respects PDBs when deciding whether to scale down underutilized nodes. A node is safe to remove only if all pods on it can be evicted without violating their PDBs.
# Check why CA isn't scaling down a node
kubectl get nodes -o wide
kubectl describe node <node> | grep -i "scale-down\|autoscaler"
# Check CA logs
kubectl logs -n kube-system -l app=cluster-autoscaler --tail=100 | grep -i "pdb\|blocked\|not safe"
# List all nodes CA considers safe to remove
kubectl get nodes -o json | \
jq '.items[] | select(.metadata.annotations["cluster-autoscaler.kubernetes.io/scale-down-disabled"] != "true") | .metadata.name'
If a pod is in CrashLoopBackOff and its PDB uses
IfHealthyBudget with all healthy copies at minimum, CA cannot remove the node hosting that broken pod. The broken pod prevents CA from evicting it (budget depleted), but the pod isn't self-healing. Resolution: fix the pod, use unhealthyPodEvictionPolicy: AlwaysAllow, or temporarily annotate the node with cluster-autoscaler.kubernetes.io/scale-down-disabled: "true" and handle it manually.
Zero-Disruption: minAvailable: 100%
Setting minAvailable: "100%" means no pod can ever be voluntarily disrupted — all pods must remain healthy at all times. This is rarely the right choice but is appropriate for:
- Leader-elected singletons that cannot tolerate even brief downtime
- During critical business periods (e.g., no deploys/drains during Black Friday)
- Workloads with very long startup times where replacement would take too long
spec:
minAvailable: "100%" # Zero disruptions allowed
selector:
matchLabels:
app: payment-processor
With
minAvailable: 100%, you cannot drain any node hosting a pod of this workload unless you first manually delete a pod (bypassing the PDB), do a rolling restart before drain, or scale up to add an extra pod on a different node. Cluster upgrades, node replacements, and autoscaler activity will all be blocked. Only use this for true zero-tolerance workloads with a documented maintenance procedure.
Multi-Workload PDB Selectors
A single PDB can cover multiple Deployments/StatefulSets if they share a common label. This is useful for cross-workload availability guarantees (e.g., keep at least 3 cache nodes across multiple cache Deployments).
# PDB covering two Deployments sharing the tier=cache label
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: cache-tier-pdb
namespace: production
spec:
minAvailable: 3
selector:
matchLabels:
tier: cache # Matches pods from both redis-primary and redis-replica Deployments
If a PDB selects 10 pods across multiple workloads with
minAvailable: 8, only 2 disruptions are allowed across all 10 pods — even if the disruption is on different workloads. This can be unexpectedly restrictive. Prefer per-workload PDBs for independent availability guarantees.
PDB + HPA Interaction
When HPA is active, the replica count fluctuates. A PDB with an absolute minAvailable value may become overly or insufficiently restrictive:
# With HPA: prefer percentage-based PDB to track replica changes
spec:
minAvailable: "66%" # Always keep 2/3 of current replicas healthy
# With replicas=3: min=2; with replicas=9: min=6
selector:
matchLabels:
app: api-server
# Avoid absolute minAvailable with HPA if replicas can scale below minAvailable:
# minAvailable: 5 with HPA minReplicas=3 → disruptionsAllowed < 0 → always blocked
Operational Commands
# List all PDBs across namespaces
kubectl get pdb -A
# Check which PDB is blocking an eviction
kubectl get events -n <namespace> --field-selector reason=FailedKillPod
kubectl get events -n <namespace> | grep -i "pdb\|disruption\|eviction"
# Watch PDB during drain
watch -n2 "kubectl get pdb -n production && echo && kubectl get pods -n production -l app=api-server"
# Test if a specific pod can be evicted (dry run)
kubectl delete pod <pod> -n <namespace> --dry-run=server
# Check disruptions allowed for all PDBs in a namespace
kubectl get pdb -n production -o jsonpath=\
'{range .items[*]}{.metadata.name}{" allowed:"}{.status.disruptionsAllowed}{"\n"}{end}'
# Temporarily increase PDB disruption budget (e.g., during planned maintenance)
kubectl patch pdb api-server-pdb -n production --type=merge \
-p '{"spec":{"minAvailable":1}}'
# Restore after maintenance:
kubectl patch pdb api-server-pdb -n production --type=merge \
-p '{"spec":{"minAvailable":2}}'
# Delete a PDB to unblock stuck drain (emergency only)
kubectl delete pdb api-server-pdb -n production
# After drain completes:
kubectl apply -f pdb.yaml
Common Anti-patterns
| Anti-pattern | Problem | Fix |
|---|---|---|
maxUnavailable: 0 on multi-replica Deployment |
Blocks all voluntary evictions including rollouts | Use minAvailable: N-1 or maxUnavailable: 1 |
Absolute minAvailable ≥ HPA minReplicas |
When HPA scales down to minReplicas, PDB is immediately at 0 disruptions — blocks all drains |
Use percentage-based PDB or ensure absolute value < HPA minReplicas |
| No PDB on StatefulSet with quorum | Node drain can remove multiple members simultaneously, breaking consensus | Add PDB with minAvailable: quorum_size |
| PDB selector matching 0 pods | PDB has no effect; evictions proceed unchecked | Verify selector with kubectl get pods -l <selector> |
minAvailable: 100% without documented drain procedure |
Node maintenance permanently blocked with no escape hatch | Document the emergency drain procedure; consider 99% or N-1 |
Metrics
| Metric | Labels | Use |
|---|---|---|
kube_poddisruptionbudget_status_current_healthy | poddisruptionbudget, namespace | Currently healthy pods (vs desired) |
kube_poddisruptionbudget_status_desired_healthy | poddisruptionbudget, namespace | Minimum required healthy pods |
kube_poddisruptionbudget_status_disruptions_allowed | poddisruptionbudget, namespace | Current eviction budget remaining (0 = blocked) |
kube_poddisruptionbudget_status_expected_pods | poddisruptionbudget, namespace | Total pods selected by this PDB |
kube_poddisruptionbudget_status_observed_generation | poddisruptionbudget, namespace | Reconciliation lag detection |
Alerting Rules
groups:
- name: pdb
rules:
# PDB at zero disruptions for extended period (drain may be stuck)
- alert: PDBNoDisruptionsAllowed
expr: kube_poddisruptionbudget_status_disruptions_allowed == 0
for: 15m
labels:
severity: warning
annotations:
summary: "PDB {{ $labels.namespace }}/{{ $labels.poddisruptionbudget }} has 0 disruptions allowed"
description: "Node drains, VPA evictions, and CA scale-down are blocked. Check pod health."
# PDB below desired healthy count (below minimum!)
- alert: PDBBelowDesiredHealthy
expr: |
kube_poddisruptionbudget_status_current_healthy
< kube_poddisruptionbudget_status_desired_healthy
for: 5m
labels:
severity: critical
annotations:
summary: "PDB {{ $labels.namespace }}/{{ $labels.poddisruptionbudget }} currentHealthy below desiredHealthy"
description: "Workload is already below its minimum availability threshold. All evictions blocked."
# PDB selecting no pods (misconfigured selector)
- alert: PDBSelectsNoPods
expr: kube_poddisruptionbudget_status_expected_pods == 0
for: 5m
labels:
severity: warning
annotations:
summary: "PDB {{ $labels.namespace }}/{{ $labels.poddisruptionbudget }} selects 0 pods — check selector"
# Workload consistently at minimum replicas — PDB may be too tight
- alert: PDBAlwaysAtMinimum
expr: |
kube_poddisruptionbudget_status_disruptions_allowed == 0
and kube_poddisruptionbudget_status_current_healthy
== kube_poddisruptionbudget_status_desired_healthy
for: 1h
labels:
severity: info
annotations:
summary: "PDB {{ $labels.poddisruptionbudget }} has been at exactly minimum healthy for 1h — consider increasing replicas"
Runbooks
Node Drain Stuck Due to PDB
# 1. Identify which PDB is blocking
kubectl get pdb -n <namespace> -o wide
kubectl describe pdb <pdb-name> -n <namespace>
# 2. Check current pod health
kubectl get pods -n <namespace> -l <pdb-selector>
# 3. If pods are not Ready — fix them first
kubectl describe pods -n <namespace> -l <pdb-selector> | grep -A10 Events
# 4. If pods are healthy but budget is 0 (replicas == minAvailable):
# Option A: Scale up replicas temporarily
kubectl scale deployment <name> -n <namespace> --replicas=4
# Option B: Temporarily lower minAvailable
kubectl patch pdb <pdb> -n <namespace> --type=merge -p '{"spec":{"minAvailable":1}}'
kubectl drain <node> --ignore-daemonsets
kubectl patch pdb <pdb> -n <namespace> --type=merge -p '{"spec":{"minAvailable":2}}'
# Option C: Emergency — delete PDB, drain, re-apply (production risk)
kubectl delete pdb <pdb> -n <namespace>
kubectl drain <node> --ignore-daemonsets
kubectl apply -f pdb.yaml
PDB Selecting No Pods (Misconfigured)
# Verify selector
kubectl get pdb <name> -n <namespace> -o jsonpath='{.spec.selector}'
# Check pods with that selector
kubectl get pods -n <namespace> -l <label-key>=<label-value>
# Compare with Deployment selector
kubectl get deployment <name> -n <namespace> -o jsonpath='{.spec.selector.matchLabels}'
# Fix: update PDB selector to match Deployment labels
kubectl patch pdb <name> -n <namespace> --type=merge \
-p '{"spec":{"selector":{"matchLabels":{"app":"<correct-app>"}}}}'
Cluster Autoscaler Stuck on Node with PDB
# Check CA logs for PDB-related messages
kubectl logs -n kube-system -l app=cluster-autoscaler | grep -i "pdb\|disruption\|not safe"
# Check which pods on the node have PDBs
NODE=<node-name>
kubectl get pods --all-namespaces --field-selector spec.nodeName=$NODE \
-o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}'
# For each pod, check if there's a blocking PDB
kubectl get pdb -A | grep -v "ALLOWED DISRUPTIONS: [1-9]"
# If pod is unhealthy and blocking: use AlwaysAllow policy
kubectl patch pdb <pdb> -n <namespace> --type=merge \
-p '{"spec":{"unhealthyPodEvictionPolicy":"AlwaysAllow"}}'
Rolling Update Deadlock
# Check rollout status
kubectl rollout status deployment <name> -n <namespace>
# Check PDB status
kubectl get pdb -n <namespace> -l app=<name>
# If disruptionsAllowed=0 and rollout is stuck:
# Temporarily increase allowed disruptions
kubectl patch pdb <pdb> -n <namespace> --type=merge -p '{"spec":{"minAvailable":1}}'
# Wait for rollout to progress
kubectl rollout status deployment <name> -n <namespace> --timeout=300s
# Restore after rollout
kubectl patch pdb <pdb> -n <namespace> --type=merge -p '{"spec":{"minAvailable":2}}'
Best Practices
- Every production workload with ≥ 2 replicas should have a PDB — without one, a node drain can evict all pods simultaneously if they happen to land on the same node.
- Use
minAvailable: N-1(notmaxUnavailable: 0) for rolling-update compatibility —maxUnavailable: 0semantically sounds like "zero downtime" but deadlocks Deployment rollouts.minAvailable: N-1is equivalent for healthy workloads and allows progress. - Use percentage-based PDBs when HPA manages replicas — absolute values can create a permanently-blocked budget if HPA scales down to the PDB's minimum. A percentage-based PDB scales with the current replica count.
- Set
unhealthyPodEvictionPolicy: AlwaysAllowfor workloads that can run with fewer instances — prevents broken pods from permanently blocking node maintenance. Pair with good alerting so you know when a pod is unhealthy. - For quorum systems, set
minAvailableto exactly the quorum size — for a 3-node etcd,minAvailable: 2; for a 5-node ZooKeeper,minAvailable: 3. Going below quorum means data loss risk. - Alert on
disruptionsAllowed == 0persisting for > 15 minutes — this means infrastructure operations are blocked. Either the workload is under-replicated or pods are unhealthy and need attention. - Test PDB behavior before cluster upgrades — run
kubectl drain <node> --dry-runon each node type before a maintenance window to discover which PDBs will block and plan accordingly. - Document emergency drain procedures for
minAvailable: 100%workloads — zero-disruption guarantees require a defined escape hatch. The procedure should be in a runbook, not in the head of a single engineer.