Persistent Volumes

A complete deep dive into PersistentVolumes and PersistentVolumeClaims — the binding algorithm, finalizers, protection, StatefulSet volumeClaimTemplates, orphaned PVC cleanup, PV migration, and every edge case that causes data loss or stuck pods in production.

Section 04 of 13 File 3 of 8 Platform Engineer
What This Page Covers
  • PV and PVC object model — cluster-scoped PV vs namespace-scoped PVC
  • Full PV spec reference — all fields with annotations
  • Full PVC spec reference — storageClassName, selector, volumeMode, volumeName
  • Binding algorithm — 5-step match order; best-fit selection logic
  • PV phases — Available/Bound/Released/Failed with state machine diagram
  • PVC phases — Pending/Bound/Lost with causes for each
  • PVC protection finalizer — kubernetes.io/pvc-protection; blocks PVC deletion while pod uses it
  • PV protection finalizer — kubernetes.io/pv-protection; blocks PV deletion while bound
  • Reclaim policy mechanics — Retain (claimRef, manual cleanup), Delete (external-provisioner GC)
  • Changing reclaim policy — patch on existing PV; StorageClass default doesn't retroactively apply
  • StatefulSet volumeClaimTemplates — per-replica PVC naming (data-pod-0), ordered provisioning, PVC not deleted on scale-down
  • Orphaned PVCs — PVCs left behind after StatefulSet deletion; manual cleanup required
  • PVC deletion order — safe StatefulSet teardown sequence
  • Label selectors on PVC — matchLabels/matchExpressions for static PV targeting
  • Capacity over-provisioning trap — PVC binds to larger PV than requested
  • Cross-namespace PV reuse — why it's blocked; how to work around
  • PV nodeAffinity — required for local PVs; optional for topology constraints on CSI PVs
  • PVC resizing — patch workflow; FileSystemResizePending condition; driver support matrix
  • PV cloning — dataSource: PVC; same namespace, same StorageClass
  • PV from snapshot restore — dataSource: VolumeSnapshot
  • dataSourceRef — cross-namespace clone via VolumeSnapshotContent (1.26+)
  • StorageClass defaulting — annotation behavior; changing default SC; multiple defaults behavior
  • Kubernetes 1.26+ retroactive default StorageClass assignment
  • PV migration — in-tree to CSI; volume.kubernetes.io/storage-provisioner annotation; migrated-to annotation
  • Detach/force-detach scenarios — node failure; volumeattachment stuck; CSI migration impact
  • 6 metrics + 4 alerting rules + 5 troubleshooting runbooks
  • 8 best practices
  • Object Model

    PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs) implement a two-level indirection: administrators provision storage capacity (PVs), and developers request storage (PVCs). Kubernetes binds them together.

    CLUSTER SCOPE                         NAMESPACE SCOPE
    ─────────────────────────────────     ─────────────────────────────────
    PersistentVolume (PV)                 PersistentVolumeClaim (PVC)
      name: pv-prod-db-001                  name: data
      capacity: 100Gi                       namespace: production
      accessModes: [RWO]                    requests.storage: 100Gi
      storageClassName: gp3                 accessModes: [RWO]
      reclaimPolicy: Retain                 storageClassName: gp3
      status.phase: Bound          ◄────────status.phase: Bound
      spec.claimRef:                        spec.volumeName: pv-prod-db-001
        namespace: production
        name: data
        uid: abc-123
    
                      ↕ (1:1 binding)
    
    StorageClass (cluster-scoped)
      name: gp3
      provisioner: ebs.csi.aws.com
      volumeBindingMode: WaitForFirstConsumer
      reclaimPolicy: Delete
    

    Key ownership rules:

    PersistentVolume Spec — Full Reference

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pv-prod-db-001
      labels:
        type: ssd                   # used by PVC label selectors
        env: production
      annotations:
        pv.kubernetes.io/provisioned-by: ebs.csi.aws.com   # set by dynamic provisioner
        volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
    spec:
      # ── Storage capacity ──────────────────────────────────────────
      capacity:
        storage: 100Gi              # reported capacity; CSI driver may report actual
    
      # ── Access modes ──────────────────────────────────────────────
      accessModes:
        - ReadWriteOnce             # RWO | ROX | RWX | RWOP (GA 1.29)
    
      # ── Volume mode ───────────────────────────────────────────────
      volumeMode: Filesystem        # Filesystem (default) | Block
    
      # ── Reclaim policy ────────────────────────────────────────────
      persistentVolumeReclaimPolicy: Retain  # Retain | Delete | Recycle(deprecated)
    
      # ── StorageClass association ───────────────────────────────────
      storageClassName: gp3         # must match PVC storageClassName to bind
                                    # empty string = no class; not same as unset
    
      # ── Mount options (passed to mount command) ───────────────────
      mountOptions:
        - noatime
        - discard
    
      # ── CSI source ────────────────────────────────────────────────
      csi:
        driver: ebs.csi.aws.com
        volumeHandle: vol-0abc123def456789   # unique ID on storage backend
        fsType: ext4
        readOnly: false
        volumeAttributes:
          throughput: "250"
          iops: "3000"
    
      # ── Topology constraint (CSI / local volumes) ─────────────────
      nodeAffinity:
        required:
          nodeSelectorTerms:
            - matchExpressions:
                - key: topology.ebs.csi.aws.com/zone
                  operator: In
                  values: [us-east-1a]
    
      # ── Claim reference (set by bind controller; do not set manually) ──
      claimRef:
        apiVersion: v1
        kind: PersistentVolumeClaim
        name: data
        namespace: production
        uid: abc-def-123            # UID prevents rebind to new PVC with same name
    ℹ️
    storageClassName: "" vs omitted An empty string (storageClassName: "") explicitly means "no StorageClass" — the PV will only bind to PVCs that also have storageClassName: "". An omitted storageClassName field causes the default StorageClass to be used (set by admission controller). These are different values with different binding behavior.

    PersistentVolumeClaim Spec — Full Reference

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: data
      namespace: production
      annotations:
        volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com  # set by provisioner
    spec:
      # ── Storage request ───────────────────────────────────────────
      resources:
        requests:
          storage: 100Gi           # minimum; may bind to larger PV (static binding)
    
      # ── Access modes ──────────────────────────────────────────────
      accessModes:
        - ReadWriteOnce
    
      # ── Volume mode ───────────────────────────────────────────────
      volumeMode: Filesystem       # must match PV volumeMode
    
      # ── StorageClass ──────────────────────────────────────────────
      storageClassName: gp3        # "" = static binding with no class
                                    # omit = default StorageClass
    
      # ── Static binding: target specific PV ───────────────────────
      volumeName: pv-prod-db-001   # skip binding algorithm; pin to this PV
    
      # ── Static binding: label selector ───────────────────────────
      selector:
        matchLabels:
          type: ssd
          env: production
        matchExpressions:
          - key: tier
            operator: In
            values: [fast, ultra-fast]
    
      # ── Data population sources ───────────────────────────────────
      dataSource:                  # clone from existing PVC or restore from snapshot
        kind: PersistentVolumeClaim
        name: source-pvc           # must be in same namespace, same StorageClass
    
      # dataSourceRef allows cross-namespace snapshot content refs (1.26+)
      # dataSourceRef:
      #   apiGroup: snapshot.storage.k8s.io
      #   kind: VolumeSnapshotContent
      #   name: vsc-cross-ns

    Binding Algorithm

    The PersistentVolume controller (inside kube-controller-manager) runs a continuous reconciliation loop. For each unbound PVC, it searches for the best-fit PV:

    1. storageClassName match — PV's storageClassName must equal the PVC's. Both empty strings match each other.
    2. accessModes check — The PV must support all modes the PVC requests (PV's set ⊇ PVC's set). A PV with RWO+ROX satisfies a PVC requesting only RWO.
    3. volumeMode check — Must match exactly (both Filesystem or both Block).
    4. capacity check — PV capacity ≥ PVC requested storage.
    5. label selector check — If the PVC has a selector, PV labels must satisfy it.

    Among all matching PVs, the controller selects the smallest capacity PV that satisfies the request (best-fit). This minimizes wasted capacity. If the PVC has volumeName, the algorithm is bypassed and that specific PV is targeted directly.

    ⚠️
    Capacity overshoot with static PVs If the smallest matching PV is 500Gi but the PVC requests 20Gi, the PVC binds to the 500Gi PV and the pod sees 500Gi — not 20Gi. You cannot shrink. Dynamic provisioning (via StorageClass) provisions exactly the requested size, so this only affects static binding.

    WaitForFirstConsumer and Topology-Aware Binding

    When the StorageClass uses volumeBindingMode: WaitForFirstConsumer, the binding algorithm is deferred until a pod using the PVC is scheduled. The scheduler selects a node first, then the bind controller creates (or selects) a PV in the zone matching that node's topology labels.

    WaitForFirstConsumer binding sequence:
    
    1. PVC created → stays in Pending (no PV yet)
    2. Pod created referencing PVC
    3. Scheduler selects node N in zone us-east-1a
    4. Scheduler annotates PVC:
       volume.kubernetes.io/selected-node: node-N
    5. PVC bind controller sees the annotation
    6. Dynamic provisioner creates PV in us-east-1a
    7. Bind controller binds PVC → PV
    8. Kubelet on node-N mounts the volume
    9. Pod starts
    
    Without WaitForFirstConsumer (Immediate):
    1. PVC created → provisioner immediately creates PV (zone may be random)
    2. Pod scheduler must find node in same zone as PV
    3. If no nodes in that zone → Pod stuck Pending forever
    

    PV and PVC Phase State Machines

    PV Phases

    PhaseMeaningTransitions To
    AvailablePV exists, not bound to any PVCBound (when matching PVC found)
    BoundPV is bound 1:1 to a PVCReleased (when PVC is deleted)
    ReleasedPVC deleted; PV retains data; claimRef still setAvailable (admin removes claimRef), Failed (reclamation error), Deleted (Delete policy)
    FailedAutomatic reclamation failed (e.g., cloud volume delete API error)Requires manual intervention

    PVC Phases

    PhaseMeaningCommon Causes
    PendingPVC created but not yet bound to a PVNo matching PV, provisioner error, WaitForFirstConsumer waiting for pod, quota exceeded
    BoundBound to a PV; ready for pods to use
    LostBound PV has been deleted or is unavailableAdmin deleted PV manually while PVC was bound; node failure with local PV
    🔴
    PVC in Lost phase A pod referencing a Lost PVC will be stuck in ContainerCreating. The data may still exist on the storage backend — the PV object was just deleted. To recover: create a new PV with the same volumeHandle pointing to the existing cloud volume, then manually set claimRef on the new PV to point to the lost PVC. The PVC will re-bind.

    Finalizers and Deletion Protection

    Kubernetes uses finalizers to prevent accidental deletion of PVs and PVCs while they are in use.

    PVC Protection Finalizer

    When a PVC is created, the admission controller adds kubernetes.io/pvc-protection as a finalizer. The PVCProtection controller in kube-controller-manager removes this finalizer only when no pod is actively using the PVC. If you run kubectl delete pvc <name> while a pod has it mounted:

    1. PVC moves to Terminating (deletion timestamp set)
    2. Finalizer prevents actual deletion
    3. Pod continues running with the volume
    4. When the pod is deleted, the PVC controller removes the finalizer
    5. PVC is deleted; PV moves to Released
    # Check finalizers on a PVC
    kubectl get pvc data -o jsonpath='{.metadata.finalizers}'
    # ["kubernetes.io/pvc-protection"]
    
    # PVC stuck in Terminating? Check if any pod is still mounting it
    kubectl get pods --all-namespaces -o json | \
      jq -r '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName=="data") | .metadata.name'

    PV Protection Finalizer

    Similarly, PVs have a kubernetes.io/pv-protection finalizer that prevents deletion while the PV is bound to a PVC. Deleting a bound PV moves it to Terminating; it completes deletion only when the PVC is deleted first.

    # Force-remove a stuck finalizer (DANGEROUS — only if you are sure data is not needed)
    kubectl patch pv <name> -p '{"metadata":{"finalizers":null}}'
    🔴
    Never force-remove finalizers unless data is expendable Removing the pvc-protection finalizer allows immediate PVC deletion even while a pod is using it — the volume unmounts from under a running container. This will corrupt any database or stateful application using the volume. Only do this for test environments or when data loss is explicitly acceptable.

    Reclaim Policy Mechanics

    Retain

    When the PVC is deleted, the PV moves to Released. The underlying storage (cloud volume, NFS directory, etc.) is preserved. The PV's claimRef still points to the deleted PVC — preventing automatic rebinding. An administrator must manually intervene:

    # Option 1: Delete the PV and recreate it (cleanest)
    kubectl delete pv <name>
    # Manually create a new PV pointing to the same storage backend volumeHandle
    
    # Option 2: Remove claimRef to make the PV available again
    kubectl patch pv <name> --type=json \
      -p '[{"op":"remove","path":"/spec/claimRef"}]'
    # PV moves back to Available; can be claimed by a new PVC
    ⚠️
    Data from previous tenant If you remove the claimRef and allow a new PVC to bind, the new consumer gets a volume containing the previous tenant's data. For shared-namespace clusters, this is a data leak. Always verify the volume is clean before reuse, or use snapshots to make a clean clone.

    Delete

    When the PVC is deleted, the external-provisioner sidecar calls the CSI DeleteVolume RPC. The cloud volume is deleted immediately. The PV object is also deleted. This is the default for dynamically-provisioned cloud StorageClasses.

    Changing Reclaim Policy on an Existing PV

    The StorageClass's reclaimPolicy only applies at provisioning time. Dynamically created PVs inherit the StorageClass policy, but you can change it on individual PVs after creation:

    # Change a dynamically-provisioned PV from Delete to Retain
    kubectl patch pv <pv-name> \
      -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
    
    # Verify
    kubectl get pv <pv-name> -o jsonpath='{.spec.persistentVolumeReclaimPolicy}'
    💡
    Change to Retain before deleting a StatefulSet Before running kubectl delete statefulset or helm uninstall, patch all PVs used by the StatefulSet to Retain. This ensures the cloud volumes survive even if the PVCs are deleted. You can then inspect and manually clean up, or rebind to a new StatefulSet.

    StatefulSet volumeClaimTemplates

    StatefulSets are the primary consumer of PVCs in production. They use volumeClaimTemplates to automatically provision a dedicated PVC for each replica. The PVC names follow a deterministic pattern: <template-name>-<statefulset-name>-<ordinal>.

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: postgres
      namespace: production
    spec:
      serviceName: postgres
      replicas: 3
      selector:
        matchLabels:
          app: postgres
      template:
        metadata:
          labels:
            app: postgres
        spec:
          containers:
          - name: postgres
            image: postgres:16
            volumeMounts:
            - name: data               # matches volumeClaimTemplates[0].metadata.name
              mountPath: /var/lib/postgresql/data
            - name: wal                # matches volumeClaimTemplates[1].metadata.name
              mountPath: /var/lib/postgresql/wal
      volumeClaimTemplates:
      - metadata:
          name: data
          annotations:
            volume.beta.kubernetes.io/storage-class: gp3-encrypted
        spec:
          accessModes: [ReadWriteOnce]
          storageClassName: gp3-encrypted
          resources:
            requests:
              storage: 100Gi
      - metadata:
          name: wal
        spec:
          accessModes: [ReadWriteOnce]
          storageClassName: gp3-encrypted
          resources:
            requests:
              storage: 20Gi

    This creates PVCs with names:

    Stable PVC Identity

    When a StatefulSet pod is deleted and rescheduled (on the same or different node), it reattaches to the same PVC with the same name. postgres-0 always uses data-postgres-0. This is the key property that makes StatefulSets suitable for databases — pod identity is coupled to storage identity.

    Orphaned PVCs — The StatefulSet Deletion Trap

    When you delete a StatefulSet, Kubernetes does not delete the PVCs created by volumeClaimTemplates. They are orphaned — no owner reference, no automatic cleanup. This is intentional (prevents accidental data loss), but it means:

    # Find all PVCs no longer referenced by any pod
    kubectl get pvc -n <ns> | grep -v Bound
    
    # More precise: find PVCs with no active pod consumer
    for pvc in $(kubectl get pvc -n production -o name); do
      name=$(echo $pvc | cut -d/ -f2)
      pods=$(kubectl get pods -n production -o json | \
        jq -r --arg PVC "$name" '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName==$PVC) | .metadata.name')
      if [ -z "$pods" ]; then
        echo "ORPHANED: $name"
      fi
    done

    Scale-Down Behavior

    Scaling a StatefulSet from 3 to 1 deletes postgres-2 and postgres-1 pods, but leaves data-postgres-2 and data-postgres-1 PVCs intact. Scaling back up to 3 reattaches to the existing PVCs. This is correct for databases where data must survive scale-down.

    If you explicitly want PVCs deleted on scale-down (e.g., ephemeral read replicas), use the whenDeleted and whenScaled fields in spec.persistentVolumeClaimRetentionPolicy (GA 1.27):

    spec:
      persistentVolumeClaimRetentionPolicy:
        whenDeleted: Retain        # Retain | Delete — on StatefulSet delete
        whenScaled: Delete         # Retain | Delete — on scale-down
                                   # Delete removes PVCs as pods are scaled down
    ⚠️
    whenScaled: Delete is irreversible Setting whenScaled: Delete means scaling down from 3→1 will permanently delete data-postgres-2 and data-postgres-1 and their underlying cloud volumes. Only use this for stateless or easily re-seeded replicas (read replicas that can be resynced from primary).

    PVC Cloning

    A PVC can be created as a clone of an existing PVC using dataSource. The CSI driver's CreateVolume with a data source creates a new volume pre-populated with the source's data. Constraints:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: postgres-data-clone
      namespace: production
    spec:
      dataSource:
        kind: PersistentVolumeClaim
        name: data-postgres-0        # source PVC (must be Bound, same namespace)
      accessModes: [ReadWriteOnce]
      storageClassName: gp3-encrypted
      resources:
        requests:
          storage: 100Gi             # must be ≥ source capacity

    Restore from Snapshot

    Restore a PVC from a VolumeSnapshot using dataSource with the snapshot as source. The snapshot must be in the same namespace:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: postgres-restored
      namespace: production
    spec:
      dataSource:
        apiGroup: snapshot.storage.k8s.io
        kind: VolumeSnapshot
        name: postgres-snap-2024-01-15    # VolumeSnapshot in same namespace
      accessModes: [ReadWriteOnce]
      storageClassName: gp3-encrypted
      resources:
        requests:
          storage: 100Gi

    dataSourceRef — Cross-Namespace Restore (1.26+)

    dataSourceRef extends the clone/restore model to allow referencing a VolumeSnapshotContent directly (bypassing the namespace constraint of VolumeSnapshot). Requires the CrossNamespaceVolumeDataSource feature gate and a ReferenceGrant-like mechanism (via namespace annotation):

    spec:
      dataSourceRef:
        apiGroup: snapshot.storage.k8s.io
        kind: VolumeSnapshotContent     # cluster-scoped, cross-namespace
        name: vsc-shared-backup
        namespace: backup-ns            # namespace of the VolumeSnapshotContent

    Default StorageClass

    A StorageClass can be designated as the default by adding the annotation storageclass.kubernetes.io/is-default-class: "true". When a PVC is created without specifying storageClassName, the default StorageClass is used.

    # Check which StorageClass is default
    kubectl get storageclass
    # NAME              PROVISIONER         RECLAIMPOLICY  VOLUMEBINDINGMODE     DEFAULT
    # gp2               kubernetes.io/aws   Delete         Immediate
    # gp3 (default)     ebs.csi.aws.com     Delete         WaitForFirstConsumer  ✓
    
    # Change the default StorageClass
    kubectl patch storageclass gp2 -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
    kubectl patch storageclass gp3 -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
    ⚠️
    Multiple default StorageClasses If multiple StorageClasses are marked default, PVC creation without an explicit storageClassName will fail with an admission error. Always ensure exactly one default at a time when switching defaults.

    Retroactive Default Assignment (1.26+)

    In Kubernetes 1.26+, if a PVC was created without a storageClassName (explicitly set to "" was treated differently), and no default existed at the time, a newly-set default StorageClass is retroactively assigned to the Pending PVC. This allows fixing PVCs that are stuck Pending due to a missing default without recreating them.

    In-Tree to CSI Migration

    Kubernetes has been migrating in-tree volume plugins to CSI drivers. Most major cloud providers completed migration by 1.27–1.28. When migration is active for a plugin, the kubelet transparently routes in-tree volume operations to the CSI driver.

    Migration Annotations

    # On a dynamically provisioned PV, these annotations track migration:
    annotations:
      pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs    # original in-tree provisioner
      volume.kubernetes.io/migrated-to: ebs.csi.aws.com          # CSI driver taking over

    Migration Impact on Existing Clusters

    ScenarioBehavior
    Existing PVs created by in-tree pluginTransparently served by CSI driver after migration; no PV changes needed
    StorageClass with in-tree provisioner nameAdmission controller rewrites to CSI provisioner name (on clusters with migration enabled)
    In-tree plugin removed from KubernetesOld PVs using that plugin type are invalid; must migrate before upgrading
    VolumeAttachment objectsNew CSI-style VolumeAttachment created; old in-tree attachment may linger

    Detach and Force-Detach Scenarios

    A PV attached to a node is represented by a VolumeAttachment object. Under normal operation, kubelet unmounts volumes before pod deletion. Problems arise when a node becomes unreachable.

    Node Failure with Attached PV

    Node N fails (network partition / power loss)
        │
        ▼
    Node N's status becomes NotReady
        │
        ├─► After node-monitor-grace-period (default 40s):
        │   Node marked Unreachable
        │
        ├─► After pod-eviction-timeout (default 5min):
        │   Pods on N get deletion timestamp set
        │
        ├─► New pod scheduled on node M
        │   ↓
        │   Kubelet on M attempts NodePublishVolume
        │   ↓
        │   CSI driver tries to attach EBS volume to node M
        │   ↓
        │   BLOCKED: EBS volume still "attached" to node N in AWS
        │
        └─► Force-detach after max-wait:
            CSI driver calls ControllerUnpublishVolume
            (force-detach: works because EBS knows node N is gone)
            Volume attached to node M, pod starts
    

    Stuck VolumeAttachment

    # View VolumeAttachment objects
    kubectl get volumeattachment
    
    # Identify which node a PV is attached to
    kubectl get volumeattachment -o json | \
      jq -r '.items[] | select(.spec.source.persistentVolumeName=="pv-prod-db-001") |
      "\(.metadata.name) → node: \(.spec.nodeName) attached: \(.status.attached)"'
    
    # If node is truly gone and attachment is stuck, force delete the VolumeAttachment
    # This tells the CSI driver to release the attachment
    kubectl delete volumeattachment <name>
    
    # WARNING: Only do this if the node is confirmed gone or the volume is not mounted.
    # Force-deleting an attachment while a node is still mounting the volume
    # can cause data corruption if two nodes mount the volume simultaneously.

    PVC Resizing Deep Dive

    Volume expansion requires allowVolumeExpansion: true on the StorageClass. Expansion is a two-phase operation:

    1. Controller expand — CSI ControllerExpandVolume resizes the backing cloud storage (e.g., EBS volume goes from 100Gi → 200Gi). This happens while the pod may be running.
    2. Node expand — CSI NodeExpandVolume runs filesystem resize commands (resize2fs for ext4, xfs_growfs for xfs) on the node where the volume is mounted. This requires the volume to be mounted by a running pod.
    # Step 1: Edit the PVC
    kubectl patch pvc data-postgres-0 -n production \
      -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'
    
    # Step 2: Watch status conditions
    kubectl get pvc data-postgres-0 -n production -w -o wide
    
    # Step 3: Check conditions if stuck
    kubectl describe pvc data-postgres-0 -n production | grep -A 4 Conditions:
    # Type                      Status
    # FileSystemResizePending   True
    #   → volume resized at cloud layer; waiting for pod to mount and run NodeExpandVolume
    
    # Step 4: Ensure the pod is running (not CrashLoopBackOff)
    # Kubelet runs NodeExpandVolume on the next mount cycle
    
    # Step 5: Verify from inside the pod
    kubectl exec -it postgres-0 -n production -- df -h /var/lib/postgresql/data
    ConditionMeaning
    FileSystemResizePending: TrueCloud volume resized; filesystem resize pending — pod must be running and mounting the volume
    Resizing: TrueResize in progress (controller expand still running)
    No conditions, capacity updatedResize complete
    ⚠️
    Online resize vs driver support Not all CSI drivers support online node expansion (while volume is mounted). Azure Disk and GCE PD support online expansion. Older EBS driver versions required the pod to be stopped first. Check your driver's release notes. If the driver doesn't support online node expansion, you must delete the pod (unmount), let the expansion complete, then redeploy.

    Metrics and Alerting

    Key Metrics

    MetricSourceAlert Threshold
    kube_persistentvolumeclaim_status_phasekube-state-metricsphase=Pending for >10m, phase=Lost for >1m
    kube_persistentvolume_status_phasekube-state-metricsphase=Failed
    kube_persistentvolumeclaim_resource_requests_storage_byteskube-state-metricsCapacity planning: total PVC bytes per namespace
    kubelet_volume_stats_used_bytes / capacity_byteskubelet>85% → warn; >95% → critical
    storage_operation_duration_seconds{operation_name="volume_attach"}kubeletP99 > 60s
    attachdetach_controller_total_volumeskube-controller-managerUnattached volumes growing over time

    Alerting Rules

    groups:
    - name: persistent-volumes
      rules:
      - alert: PVCPendingTooLong
        expr: kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1
        for: 10m
        labels: {severity: warning}
        annotations:
          summary: "PVC {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} stuck Pending"
    
      - alert: PVCLost
        expr: kube_persistentvolumeclaim_status_phase{phase="Lost"} == 1
        for: 1m
        labels: {severity: critical}
        annotations:
          summary: "PVC {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is Lost — data may be inaccessible"
    
      - alert: PVFailed
        expr: kube_persistentvolume_status_phase{phase="Failed"} == 1
        for: 1m
        labels: {severity: critical}
        annotations:
          summary: "PV {{ $labels.persistentvolume }} in Failed state — reclamation error"
    
      - alert: OrphanedPVCsAccumulating
        expr: count(kube_persistentvolumeclaim_status_phase{phase="Bound"}) by (namespace)
              - count(kube_pod_spec_volumes_persistentvolumeclaims_info) by (namespace) > 5
        for: 30m
        labels: {severity: warning}
        annotations:
          summary: "Namespace {{ $labels.namespace }} has many PVCs not mounted by any pod"

    Troubleshooting Runbooks

    Runbook: PVC Stuck in Pending — Provisioning Failure

    # 1. Describe PVC — check Events section
    kubectl describe pvc <name> -n <ns>
    # Common messages:
    # "no persistent volumes available" → no matching static PV
    # "waiting for a volume to be created, either by external provisioner..." → provisioner issue
    # "error setting quota" → ResourceQuota exceeded
    # "waiting for first consumer to be created before binding" → WaitForFirstConsumer, no pod yet
    
    # 2. Check CSI provisioner logs
    kubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner --tail=100 | grep -i error
    
    # 3. Check if ResourceQuota is blocking
    kubectl describe resourcequota -n <ns>
    
    # 4. Verify StorageClass exists and provisioner is running
    kubectl get storageclass <name>
    kubectl get pods -n kube-system | grep csi

    Runbook: PVC Stuck in Terminating

    # Check finalizers
    kubectl get pvc <name> -o jsonpath='{.metadata.finalizers}'
    # ["kubernetes.io/pvc-protection"]
    
    # Find which pod is using the PVC
    kubectl get pods -n <ns> -o json | jq -r \
      '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName=="<pvc-name>") | .metadata.name'
    
    # Delete the pod using the PVC first, then PVC will complete deletion
    kubectl delete pod <pod-name> -n <ns>
    
    # If pod is already gone but PVC still stuck (bug/finalizer stuck):
    kubectl patch pvc <name> -n <ns> -p '{"metadata":{"finalizers":null}}'

    Runbook: PVC in Lost Phase

    # The PV that was bound to this PVC no longer exists
    kubectl get pvc <name> -o jsonpath='{.spec.volumeName}'   # → pv-prod-db-001
    
    # Check if PV exists
    kubectl get pv pv-prod-db-001   # → NotFound
    
    # Recovery: recreate the PV pointing to the same cloud volume
    # 1. Find the cloud volume ID from the deleted PV (check your monitoring / cloud console)
    # 2. Create a new PV:
    cat <

    Runbook: Released PV Won't Rebind

    # PV is in Released state; new PVC won't bind even though sizes/classes match
    # Cause: claimRef still points to old PVC
    
    kubectl describe pv <name> | grep -A 5 "Claim:"
    # Claim: production/old-pvc   ← still points to deleted PVC
    
    # Remove claimRef to make PV Available
    kubectl patch pv <name> --type=json \
      -p '[{"op":"remove","path":"/spec/claimRef"}]'
    
    # PV is now Available; create new PVC to claim it

    Runbook: Orphaned PVCs After StatefulSet Deletion

    # List all PVCs not mounted by any pod in a namespace
    kubectl get pvc -n production --no-headers | awk '{print $1}' | while read pvc; do
      pods=$(kubectl get pods -n production -o json | \
        jq -r --arg PVC "$pvc" '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName==$PVC) | .metadata.name')
      [ -z "$pods" ] && echo "ORPHANED: $pvc"
    done
    
    # Safely delete orphaned PVCs (after confirming data is not needed)
    # First patch to Retain to ensure cloud volume persists as safety net
    kubectl get pvc -n production -o name | xargs -I{} kubectl patch {} -n production \
      -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
    # Then delete PVCs
    kubectl delete pvc -n production <orphaned-pvc-name>

    Best Practices

    1. Default StatefulSet PVs to Retain. Before any StatefulSet deletion or helm uninstall, patch all bound PVs to Retain. This is your last line of defence against accidental data loss.
    2. Use persistentVolumeClaimRetentionPolicy (GA 1.27) in StatefulSets to explicitly declare whether PVCs should survive scale-down. Default is Retain for both — which is correct for databases but may cause cost accumulation for ephemeral replicas.
    3. Monitor for orphaned PVCs. Run the orphan-detection script or alert on PVCs not mounted by any pod for >48 hours. Orphaned PVCs = orphaned cloud volumes = unexpected charges.
    4. Never reuse a Released PV without inspecting data. Removing claimRef makes the PV bindable again with previous data intact. Verify the volume is sanitized or explicitly acceptable for the new consumer.
    5. Set volumeName for static binding, not just a label selector. Label selectors can match multiple PVs — volumeName is deterministic and prevents unexpected cross-binding in shared clusters.
    6. Use allowVolumeExpansion: true on all StorageClasses. It costs nothing. Enabling it retroactively on a StorageClass does not expand existing PVCs — it only allows future expansion requests.
    7. Implement PVC snapshots before any schema migration or data transformation. A PVC clone + snapshot takes seconds and gives you an instant rollback path. See 05-volume-snapshots.html.
    8. Avoid manual PV creation in dynamic clusters. Static PVs require operator expertise to manage correctly (claimRef, reclaim, phase transitions). Use dynamic provisioning via StorageClass for all new workloads; reserve static binding for imports of existing cloud volumes.