Pod Disruption Budgets
📋 Page Coverage Checklist
  • PDB purpose: voluntary vs involuntary disruptions
  • spec.minAvailable: integer and percentage semantics
  • spec.maxUnavailable: integer and percentage semantics
  • Eviction API: policy/v1/evictions subresource, 429 response
  • kubectl drain: --ignore-daemonsets, --delete-emptydir-data, --force, --grace-period
  • PDB status fields: currentHealthy, desiredHealthy, disruptionsAllowed, expectedPods
  • unhealthyPodEvictionPolicy (1.26+): IfHealthyBudget vs AlwaysAllow
  • PDB for quorum systems: etcd, Kafka, ZooKeeper examples
  • maxUnavailable: 0 + RollingUpdate deadlock and resolution
  • PDB with Deployment RollingUpdate: maxSurge requirement
  • Cluster Autoscaler: CA respects PDBs on scale-down, stuck node scenario
  • VPA Updater: PDB as eviction guard
  • Zero-disruption: minAvailable: 100% semantics and use case
  • Multi-workload PDB selectors and namespace scope
  • 5 metrics + 4 alerting rules + 5 runbooks + 8 best practices
  • Pod Disruption Budgets

    Limit voluntary disruptions to maintain availability during drains, upgrades, and autoscaling

    policy/v1 GA 1.21 Platform Engineer

    A PodDisruptionBudget (PDB) constrains how many pods of a workload can be simultaneously unavailable due to voluntary disruptions — node drains, cluster upgrades, VPA evictions, Cluster Autoscaler scale-down, and any other caller of the Eviction API. PDBs do not protect against involuntary disruptions (node hardware failure, kernel panic, OOM kill) — those are handled by replica counts, topology spread, and readiness probes.

    Voluntary vs Involuntary Disruptions

    Disruption typeCategoryPDB protects?Examples
    Node drain (kubectl drain)VoluntaryYesNode maintenance, OS upgrade, cluster upgrade
    Cluster Autoscaler scale-downVoluntaryYesRemoving underutilized nodes
    VPA Updater evictionVoluntaryYesRight-sizing resource requests
    kubectl delete podVoluntaryYes — uses Eviction API if specifiedManual intervention
    Deployment RollingUpdateVoluntaryPartial — HPA and kube-controller use spec, not Eviction APIkubectl rollout
    Node hardware failureInvoluntaryNoPower outage, NIC failure
    OOM killInvoluntaryNoContainer exceeds memory limit
    Kernel panic / node NotReadyInvoluntaryNoOS crash, kubelet failure
    Pod eviction for resource pressureInvoluntaryNokubelet evicts BestEffort/Burstable pods

    PDB Spec

    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: api-server-pdb
      namespace: production
    spec:
      # --- Availability constraint (choose one: minAvailable OR maxUnavailable) ---
    
      minAvailable: 2        # At least 2 pods must be healthy before eviction is allowed
      # OR
      # maxUnavailable: 1    # At most 1 pod can be unavailable at a time
      # OR percentages:
      # minAvailable: "80%"  # At least 80% of pods must be healthy
      # maxUnavailable: "20%"
    
      # --- Pod selector ---
      selector:
        matchLabels:
          app: api-server
        # Optional: matchExpressions for complex selectors
        # matchExpressions:
        #   - key: environment
        #     operator: In
        #     values: [production, staging]
    
      # --- Unhealthy pod policy (1.26+) ---
      unhealthyPodEvictionPolicy: IfHealthyBudget  # IfHealthyBudget (default) | AlwaysAllow
    minAvailable and maxUnavailable are mutually exclusive
    You cannot set both in the same PDB. Choose based on how you think about the constraint: minAvailable for quorum-based systems ("always keep N healthy"), maxUnavailable for rolling operations ("allow at most N down at once").

    minAvailable vs maxUnavailable Semantics

    Deployment: replicas=5, selector: app=api-server PDB: minAvailable: 3 ├─ currentHealthy=5, disruptionsAllowed=2 ├─ Evict pod-1 → currentHealthy=4, disruptionsAllowed=1 ├─ Evict pod-2 → currentHealthy=3, disruptionsAllowed=0 └─ Evict pod-3 → BLOCKED (429 Too Many Requests) PDB: maxUnavailable: 2 ├─ currentHealthy=5, disruptionsAllowed=2 (same result for replicas=5) ├─ If replicas scale to 10 → disruptionsAllowed=2 (absolute) │ vs minAvailable: "60%" → disruptionsAllowed=4 (percentage scales with replicas) Percentage example with replicas=10: ├─ minAvailable: "80%" → minAvailable=ceil(10×0.8)=8 → disruptionsAllowed=2 └─ maxUnavailable: "20%" → maxUnavailable=floor(10×0.2)=2 → disruptionsAllowed=2
    ConstraintReplicas=3Replicas=5Replicas=10Notes
    minAvailable: 21 allowed3 allowed8 allowedAbsolute — scales with replicas
    minAvailable: "60%"1 allowed (3-ceil(1.8))3 allowed (5-3)4 allowed (10-6)Percentage rounds minAvailable up
    maxUnavailable: 11 allowed1 allowed1 allowedAbsolute — does not scale
    maxUnavailable: "20%"0 allowed (floor(0.6))1 allowed (floor(1))2 allowed (floor(2))Percentage rounds maxUnavailable down (conservative)
    Percentage rounding direction
    minAvailable percentages round up (more conservative — more pods must stay healthy). maxUnavailable percentages round down (more conservative — fewer pods can be removed). For small replica counts, maxUnavailable: "20%" with 3 replicas allows 0 disruptions (floor(0.6) = 0) — effectively freezing all evictions. Verify behavior at your minimum replica count.

    Eviction API

    PDBs are enforced through the Eviction API (policy/v1/evictions), not through the Delete API. Any caller that wants to respect PDBs must use eviction rather than deletion.

    Eviction API flow: Caller (kubectl drain, CA, VPA, etc.) │ ▼ POST /api/v1/namespaces/{ns}/pods/{pod}/eviction │ ▼ kube-apiserver checks all PDBs selecting this pod: ├─ Is disruption budget satisfied? → 200 OK (pod eviction proceeds) └─ Budget would be violated? → 429 Too Many Requests body: {"code":429,"reason":"TooManyRequests", "message":"Cannot evict pod as it would violate PDB"} Caller must retry on 429 — no automatic retry by apiserver kubectl drain retries with exponential backoff automatically
    # Evict a pod via the Eviction API (PDB-aware)
    kubectl delete pod <pod-name> --grace-period=30
    # kubectl delete uses the Eviction API by default since 1.22
    
    # Raw eviction API call
    kubectl proxy &
    curl -X POST \
      "http://localhost:8001/api/v1/namespaces/production/pods/api-server-xyz/eviction" \
      -H "Content-Type: application/json" \
      -d '{"apiVersion":"policy/v1","kind":"Eviction","metadata":{"name":"api-server-xyz","namespace":"production"}}'

    kubectl drain and PDBs

    kubectl drain cordons the node (marks it unschedulable) then evicts all pods via the Eviction API, respecting PDBs automatically. It retries on 429 responses until the PDB allows the eviction or a timeout is reached.

    # Standard drain — respects PDBs, graceful termination
    kubectl drain node-1 \
      --ignore-daemonsets \          # Don't evict DaemonSet pods (they reschedule on same node)
      --delete-emptydir-data \       # Allow eviction of pods with emptyDir volumes
      --grace-period=60 \            # Override pod's terminationGracePeriodSeconds
      --timeout=600s                 # Give up after 10 minutes total
    
    # Check what would be drained (dry run)
    kubectl drain node-1 --ignore-daemonsets --dry-run
    
    # Force drain — bypasses PDBs (DANGEROUS — use only for node replacement)
    kubectl drain node-1 --ignore-daemonsets --force --disable-eviction
    # --disable-eviction: uses DELETE instead of Eviction API (bypasses PDB checks)
    # Use only when: node is already NotReady, or you accept the availability risk
    --force and --disable-eviction bypass PDBs
    --force allows deletion of pods not managed by a controller (orphan pods). --disable-eviction switches from the Eviction API to direct deletion, entirely bypassing PDB checks. Both flags can cause outages if used on healthy nodes with carefully configured PDBs. Reserve them for disaster recovery scenarios where the node is already failed.
    # Drain stuck due to PDB — diagnose before forcing
    # See which PDB is blocking
    kubectl get pdb -n production
    kubectl describe pdb api-server-pdb -n production
    
    # See which pods are selected
    kubectl get pods -n production -l app=api-server
    
    # Check if some pods are not Ready (contributing to low currentHealthy)
    kubectl get pods -n production -l app=api-server \
      -o custom-columns=NAME:.metadata.name,READY:.status.conditions[?(@.type=="Ready")].status
    
    # If pods are stuck not-Ready, fix them first before draining
    # Or use unhealthyPodEvictionPolicy: AlwaysAllow to allow eviction of unhealthy pods

    PDB Status Fields

    kubectl get pdb -n production
    # NAME             MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
    # api-server-pdb   2               N/A               3                     14d
    
    kubectl describe pdb api-server-pdb -n production
    Status fieldMeaning
    status.currentHealthyNumber of pods matching selector that are currently Ready
    status.desiredHealthyMinimum number of pods that must be healthy (derived from minAvailable / maxUnavailable)
    status.disruptionsAllowedcurrentHealthy - desiredHealthy — how many evictions are currently permitted
    status.expectedPodsTotal pods matching the selector (whether healthy or not)
    status.disruptedPodsPods that have been evicted but not yet removed from the endpoint list
    status.observedGenerationGeneration of the PDB spec this status reflects
    status.conditions[].type: DisruptionAllowedTrue if at least 1 disruption is currently allowed
    # Watch PDB status in real time during drain
    watch kubectl get pdb -n production
    
    # Get disruptionsAllowed programmatically
    kubectl get pdb api-server-pdb -n production \
      -o jsonpath='{.status.disruptionsAllowed}'
    
    # Get all PDB conditions
    kubectl get pdb api-server-pdb -n production \
      -o jsonpath='{.status.conditions}' | jq .

    unhealthyPodEvictionPolicy (1.26+)

    This field controls what happens when pods are already unhealthy (not Ready) and a PDB exists. Without it, unhealthy pods count toward the budget — even though they're already broken — blocking eviction of nodes that need maintenance.

    PolicyBehaviorUse case
    IfHealthyBudget (default) Unhealthy pods can be evicted only if currentHealthy > desiredHealthy (budget has room). If all pods are unhealthy, none can be evicted. Strict availability guarantee — never evict when below the healthy threshold
    AlwaysAllow Unhealthy pods can always be evicted regardless of budget state Allow node drain to proceed even when pods are already broken (e.g., stuck CrashLoopBackOff blocking maintenance)
    spec:
      minAvailable: 2
      unhealthyPodEvictionPolicy: AlwaysAllow  # Unblock drains when pods are already broken
      selector:
        matchLabels:
          app: api-server
    AlwaysAllow can cause further degradation
    With AlwaysAllow, if 3 of 5 pods are already in CrashLoopBackOff and a node drain evicts the remaining 2 healthy pods, you have 0 healthy pods serving traffic. Use AlwaysAllow only when the alternative (nodes stuck in maintenance limbo) is worse than the availability risk.

    PDB for Quorum Systems

    Distributed consensus systems (etcd, Kafka, ZooKeeper, PostgreSQL with Patroni) require a strict quorum of healthy members. PDBs enforce this during maintenance.

    etcd (3-node cluster — needs 2 for quorum)

    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: etcd-pdb
      namespace: kube-system
    spec:
      minAvailable: 2       # 2 of 3 = quorum maintained
      selector:
        matchLabels:
          component: etcd

    Kafka (3 brokers — ISR-based availability)

    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: kafka-pdb
      namespace: platform
    spec:
      maxUnavailable: 1     # Only 1 broker down at a time
      selector:
        matchLabels:
          app.kubernetes.io/component: kafka
          app.kubernetes.io/instance: production

    ZooKeeper (5-node — needs 3 for quorum)

    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: zookeeper-pdb
      namespace: platform
    spec:
      minAvailable: 3       # 3 of 5 = quorum (majority)
      selector:
        matchLabels:
          app: zookeeper

    PostgreSQL with Patroni (1 primary + 2 replicas)

    # Primary must never be disrupted alone — use minAvailable: 2 to ensure
    # at least one replica is present before eviction is allowed
    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: postgres-pdb
      namespace: databases
    spec:
      minAvailable: 2       # 2 of 3 — Patroni can failover if primary is drained
      selector:
        matchLabels:
          app: patroni-cluster

    RollingUpdate + PDB Deadlock

    A common misconfiguration: maxUnavailable: 0 in the PDB combined with a Deployment's RollingUpdate strategy can create a deadlock where the rollout cannot proceed.

    Deadlock scenario: Deployment: replicas=3, maxUnavailable=0, maxSurge=1 PDB: maxUnavailable: 0 RollingUpdate creates new pod (replicas now 4) ✓ RollingUpdate wants to delete old pod → Eviction API PDB: maxUnavailable=0, currentHealthy=3, disruptionsAllowed=0 └─ BLOCKED: cannot evict any old pod Rollout stuck indefinitely at 4/3 pods Resolution options: 1. Set PDB maxUnavailable: 1 (most common fix) 2. Set Deployment maxUnavailable: 1 (allow replacement without eviction) 3. Use minAvailable: N-1 instead of maxUnavailable: 0 4. Temporarily delete PDB during rollout (not recommended for prod)
    # WRONG — deadlocks RollingUpdate when all pods are healthy
    spec:
      maxUnavailable: 0
      selector:
        matchLabels:
          app: api-server
    
    # CORRECT — allows 1 pod down during rollout while maintaining 2 healthy
    spec:
      minAvailable: 2        # For replicas=3: allows 1 disruption
      selector:
        matchLabels:
          app: api-server
    Deployment controller uses Eviction API for rolling updates
    The Deployment controller does call the Eviction API when deleting pods during a rolling update (since Kubernetes 1.22). This means PDBs are respected during Deployments. If your PDB prevents any eviction, your rolling update will stall. Ensure minAvailable < currentReplicas (or equivalently, maxUnavailable >= 1) so the rollout can always make progress.

    Cluster Autoscaler & PDBs

    The Cluster Autoscaler (CA) respects PDBs when deciding whether to scale down underutilized nodes. A node is safe to remove only if all pods on it can be evicted without violating their PDBs.

    CA scale-down with PDB: Node-3: underutilized (CPU < 50% for 10min) Pods on Node-3: ├─ api-server-pod-4 (PDB: minAvailable=2, currentHealthy=3) → eviction OK └─ db-pod-2 (PDB: minAvailable=2, currentHealthy=2) → eviction BLOCKED CA decision: Node-3 cannot be removed (db-pod-2 eviction blocked by PDB) CA marks Node-3 as "not safe for removal" → tries again later → If db-pod is stuck/unhealthy and PDB policy=IfHealthyBudget: CA is stuck permanently until db-pod recovers or PDB is relaxed
    # Check why CA isn't scaling down a node
    kubectl get nodes -o wide
    kubectl describe node <node> | grep -i "scale-down\|autoscaler"
    
    # Check CA logs
    kubectl logs -n kube-system -l app=cluster-autoscaler --tail=100 | grep -i "pdb\|blocked\|not safe"
    
    # List all nodes CA considers safe to remove
    kubectl get nodes -o json | \
      jq '.items[] | select(.metadata.annotations["cluster-autoscaler.kubernetes.io/scale-down-disabled"] != "true") | .metadata.name'
    Stuck nodes due to PDB + unhealthy pods
    If a pod is in CrashLoopBackOff and its PDB uses IfHealthyBudget with all healthy copies at minimum, CA cannot remove the node hosting that broken pod. The broken pod prevents CA from evicting it (budget depleted), but the pod isn't self-healing. Resolution: fix the pod, use unhealthyPodEvictionPolicy: AlwaysAllow, or temporarily annotate the node with cluster-autoscaler.kubernetes.io/scale-down-disabled: "true" and handle it manually.

    Zero-Disruption: minAvailable: 100%

    Setting minAvailable: "100%" means no pod can ever be voluntarily disrupted — all pods must remain healthy at all times. This is rarely the right choice but is appropriate for:

    • Leader-elected singletons that cannot tolerate even brief downtime
    • During critical business periods (e.g., no deploys/drains during Black Friday)
    • Workloads with very long startup times where replacement would take too long
    spec:
      minAvailable: "100%"    # Zero disruptions allowed
      selector:
        matchLabels:
          app: payment-processor
    100% minAvailable blocks all node maintenance
    With minAvailable: 100%, you cannot drain any node hosting a pod of this workload unless you first manually delete a pod (bypassing the PDB), do a rolling restart before drain, or scale up to add an extra pod on a different node. Cluster upgrades, node replacements, and autoscaler activity will all be blocked. Only use this for true zero-tolerance workloads with a documented maintenance procedure.

    Multi-Workload PDB Selectors

    A single PDB can cover multiple Deployments/StatefulSets if they share a common label. This is useful for cross-workload availability guarantees (e.g., keep at least 3 cache nodes across multiple cache Deployments).

    # PDB covering two Deployments sharing the tier=cache label
    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: cache-tier-pdb
      namespace: production
    spec:
      minAvailable: 3
      selector:
        matchLabels:
          tier: cache        # Matches pods from both redis-primary and redis-replica Deployments
    Broad selectors reduce disruption allowance
    If a PDB selects 10 pods across multiple workloads with minAvailable: 8, only 2 disruptions are allowed across all 10 pods — even if the disruption is on different workloads. This can be unexpectedly restrictive. Prefer per-workload PDBs for independent availability guarantees.

    PDB + HPA Interaction

    When HPA is active, the replica count fluctuates. A PDB with an absolute minAvailable value may become overly or insufficiently restrictive:

    # With HPA: prefer percentage-based PDB to track replica changes
    spec:
      minAvailable: "66%"    # Always keep 2/3 of current replicas healthy
                             # With replicas=3: min=2; with replicas=9: min=6
      selector:
        matchLabels:
          app: api-server
    
    # Avoid absolute minAvailable with HPA if replicas can scale below minAvailable:
    # minAvailable: 5 with HPA minReplicas=3 → disruptionsAllowed < 0 → always blocked

    Operational Commands

    # List all PDBs across namespaces
    kubectl get pdb -A
    
    # Check which PDB is blocking an eviction
    kubectl get events -n <namespace> --field-selector reason=FailedKillPod
    kubectl get events -n <namespace> | grep -i "pdb\|disruption\|eviction"
    
    # Watch PDB during drain
    watch -n2 "kubectl get pdb -n production && echo && kubectl get pods -n production -l app=api-server"
    
    # Test if a specific pod can be evicted (dry run)
    kubectl delete pod <pod> -n <namespace> --dry-run=server
    
    # Check disruptions allowed for all PDBs in a namespace
    kubectl get pdb -n production -o jsonpath=\
    '{range .items[*]}{.metadata.name}{" allowed:"}{.status.disruptionsAllowed}{"\n"}{end}'
    
    # Temporarily increase PDB disruption budget (e.g., during planned maintenance)
    kubectl patch pdb api-server-pdb -n production --type=merge \
      -p '{"spec":{"minAvailable":1}}'
    # Restore after maintenance:
    kubectl patch pdb api-server-pdb -n production --type=merge \
      -p '{"spec":{"minAvailable":2}}'
    
    # Delete a PDB to unblock stuck drain (emergency only)
    kubectl delete pdb api-server-pdb -n production
    # After drain completes:
    kubectl apply -f pdb.yaml

    Common Anti-patterns

    Anti-patternProblemFix
    maxUnavailable: 0 on multi-replica Deployment Blocks all voluntary evictions including rollouts Use minAvailable: N-1 or maxUnavailable: 1
    Absolute minAvailable ≥ HPA minReplicas When HPA scales down to minReplicas, PDB is immediately at 0 disruptions — blocks all drains Use percentage-based PDB or ensure absolute value < HPA minReplicas
    No PDB on StatefulSet with quorum Node drain can remove multiple members simultaneously, breaking consensus Add PDB with minAvailable: quorum_size
    PDB selector matching 0 pods PDB has no effect; evictions proceed unchecked Verify selector with kubectl get pods -l <selector>
    minAvailable: 100% without documented drain procedure Node maintenance permanently blocked with no escape hatch Document the emergency drain procedure; consider 99% or N-1

    Metrics

    MetricLabelsUse
    kube_poddisruptionbudget_status_current_healthypoddisruptionbudget, namespaceCurrently healthy pods (vs desired)
    kube_poddisruptionbudget_status_desired_healthypoddisruptionbudget, namespaceMinimum required healthy pods
    kube_poddisruptionbudget_status_disruptions_allowedpoddisruptionbudget, namespaceCurrent eviction budget remaining (0 = blocked)
    kube_poddisruptionbudget_status_expected_podspoddisruptionbudget, namespaceTotal pods selected by this PDB
    kube_poddisruptionbudget_status_observed_generationpoddisruptionbudget, namespaceReconciliation lag detection

    Alerting Rules

    groups:
      - name: pdb
        rules:
          # PDB at zero disruptions for extended period (drain may be stuck)
          - alert: PDBNoDisruptionsAllowed
            expr: kube_poddisruptionbudget_status_disruptions_allowed == 0
            for: 15m
            labels:
              severity: warning
            annotations:
              summary: "PDB {{ $labels.namespace }}/{{ $labels.poddisruptionbudget }} has 0 disruptions allowed"
              description: "Node drains, VPA evictions, and CA scale-down are blocked. Check pod health."
    
          # PDB below desired healthy count (below minimum!)
          - alert: PDBBelowDesiredHealthy
            expr: |
              kube_poddisruptionbudget_status_current_healthy
              < kube_poddisruptionbudget_status_desired_healthy
            for: 5m
            labels:
              severity: critical
            annotations:
              summary: "PDB {{ $labels.namespace }}/{{ $labels.poddisruptionbudget }} currentHealthy below desiredHealthy"
              description: "Workload is already below its minimum availability threshold. All evictions blocked."
    
          # PDB selecting no pods (misconfigured selector)
          - alert: PDBSelectsNoPods
            expr: kube_poddisruptionbudget_status_expected_pods == 0
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "PDB {{ $labels.namespace }}/{{ $labels.poddisruptionbudget }} selects 0 pods — check selector"
    
          # Workload consistently at minimum replicas — PDB may be too tight
          - alert: PDBAlwaysAtMinimum
            expr: |
              kube_poddisruptionbudget_status_disruptions_allowed == 0
              and kube_poddisruptionbudget_status_current_healthy
                == kube_poddisruptionbudget_status_desired_healthy
            for: 1h
            labels:
              severity: info
            annotations:
              summary: "PDB {{ $labels.poddisruptionbudget }} has been at exactly minimum healthy for 1h — consider increasing replicas"

    Runbooks

    Node Drain Stuck Due to PDB

    # 1. Identify which PDB is blocking
    kubectl get pdb -n <namespace> -o wide
    kubectl describe pdb <pdb-name> -n <namespace>
    
    # 2. Check current pod health
    kubectl get pods -n <namespace> -l <pdb-selector>
    
    # 3. If pods are not Ready — fix them first
    kubectl describe pods -n <namespace> -l <pdb-selector> | grep -A10 Events
    
    # 4. If pods are healthy but budget is 0 (replicas == minAvailable):
    #    Option A: Scale up replicas temporarily
    kubectl scale deployment <name> -n <namespace> --replicas=4
    
    #    Option B: Temporarily lower minAvailable
    kubectl patch pdb <pdb> -n <namespace> --type=merge -p '{"spec":{"minAvailable":1}}'
    kubectl drain <node> --ignore-daemonsets
    kubectl patch pdb <pdb> -n <namespace> --type=merge -p '{"spec":{"minAvailable":2}}'
    
    #    Option C: Emergency — delete PDB, drain, re-apply (production risk)
    kubectl delete pdb <pdb> -n <namespace>
    kubectl drain <node> --ignore-daemonsets
    kubectl apply -f pdb.yaml

    PDB Selecting No Pods (Misconfigured)

    # Verify selector
    kubectl get pdb <name> -n <namespace> -o jsonpath='{.spec.selector}'
    
    # Check pods with that selector
    kubectl get pods -n <namespace> -l <label-key>=<label-value>
    
    # Compare with Deployment selector
    kubectl get deployment <name> -n <namespace> -o jsonpath='{.spec.selector.matchLabels}'
    
    # Fix: update PDB selector to match Deployment labels
    kubectl patch pdb <name> -n <namespace> --type=merge \
      -p '{"spec":{"selector":{"matchLabels":{"app":"<correct-app>"}}}}'

    Cluster Autoscaler Stuck on Node with PDB

    # Check CA logs for PDB-related messages
    kubectl logs -n kube-system -l app=cluster-autoscaler | grep -i "pdb\|disruption\|not safe"
    
    # Check which pods on the node have PDBs
    NODE=<node-name>
    kubectl get pods --all-namespaces --field-selector spec.nodeName=$NODE \
      -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}'
    
    # For each pod, check if there's a blocking PDB
    kubectl get pdb -A | grep -v "ALLOWED DISRUPTIONS: [1-9]"
    
    # If pod is unhealthy and blocking: use AlwaysAllow policy
    kubectl patch pdb <pdb> -n <namespace> --type=merge \
      -p '{"spec":{"unhealthyPodEvictionPolicy":"AlwaysAllow"}}'

    Rolling Update Deadlock

    # Check rollout status
    kubectl rollout status deployment <name> -n <namespace>
    
    # Check PDB status
    kubectl get pdb -n <namespace> -l app=<name>
    
    # If disruptionsAllowed=0 and rollout is stuck:
    # Temporarily increase allowed disruptions
    kubectl patch pdb <pdb> -n <namespace> --type=merge -p '{"spec":{"minAvailable":1}}'
    
    # Wait for rollout to progress
    kubectl rollout status deployment <name> -n <namespace> --timeout=300s
    
    # Restore after rollout
    kubectl patch pdb <pdb> -n <namespace> --type=merge -p '{"spec":{"minAvailable":2}}'

    Best Practices

    1. Every production workload with ≥ 2 replicas should have a PDB — without one, a node drain can evict all pods simultaneously if they happen to land on the same node.
    2. Use minAvailable: N-1 (not maxUnavailable: 0) for rolling-update compatibilitymaxUnavailable: 0 semantically sounds like "zero downtime" but deadlocks Deployment rollouts. minAvailable: N-1 is equivalent for healthy workloads and allows progress.
    3. Use percentage-based PDBs when HPA manages replicas — absolute values can create a permanently-blocked budget if HPA scales down to the PDB's minimum. A percentage-based PDB scales with the current replica count.
    4. Set unhealthyPodEvictionPolicy: AlwaysAllow for workloads that can run with fewer instances — prevents broken pods from permanently blocking node maintenance. Pair with good alerting so you know when a pod is unhealthy.
    5. For quorum systems, set minAvailable to exactly the quorum size — for a 3-node etcd, minAvailable: 2; for a 5-node ZooKeeper, minAvailable: 3. Going below quorum means data loss risk.
    6. Alert on disruptionsAllowed == 0 persisting for > 15 minutes — this means infrastructure operations are blocked. Either the workload is under-replicated or pods are unhealthy and need attention.
    7. Test PDB behavior before cluster upgrades — run kubectl drain <node> --dry-run on each node type before a maintenance window to discover which PDBs will block and plan accordingly.
    8. Document emergency drain procedures for minAvailable: 100% workloads — zero-disruption guarantees require a defined escape hatch. The procedure should be in a runbook, not in the head of a single engineer.