Persistent Volumes

A complete deep dive into PersistentVolumes and PersistentVolumeClaims — the binding algorithm, finalizers, protection, StatefulSet volumeClaimTemplates, orphaned PVC cleanup, PV migration, and every edge case that causes data loss or stuck pods in production.

Section 04 of 13 File 3 of 8 Platform Engineer

What This Page Covers

PV and PVC object model — cluster-scoped PV vs namespace-scoped PVC

Full PV spec reference — all fields with annotations

Full PVC spec reference — storageClassName, selector, volumeMode, volumeName

Binding algorithm — 5-step match order; best-fit selection logic

PV phases — Available/Bound/Released/Failed with state machine diagram

PVC phases — Pending/Bound/Lost with causes for each

PVC protection finalizer — kubernetes.io/pvc-protection; blocks PVC deletion while pod uses it

PV protection finalizer — kubernetes.io/pv-protection; blocks PV deletion while bound

Reclaim policy mechanics — Retain (claimRef, manual cleanup), Delete (external-provisioner GC)

Changing reclaim policy — patch on existing PV; StorageClass default doesn't retroactively apply

StatefulSet volumeClaimTemplates — per-replica PVC naming (data-pod-0), ordered provisioning, PVC not deleted on scale-down

Orphaned PVCs — PVCs left behind after StatefulSet deletion; manual cleanup required

PVC deletion order — safe StatefulSet teardown sequence

Label selectors on PVC — matchLabels/matchExpressions for static PV targeting

Capacity over-provisioning trap — PVC binds to larger PV than requested

Cross-namespace PV reuse — why it's blocked; how to work around

PV nodeAffinity — required for local PVs; optional for topology constraints on CSI PVs

PVC resizing — patch workflow; FileSystemResizePending condition; driver support matrix

PV cloning — dataSource: PVC; same namespace, same StorageClass

PV from snapshot restore — dataSource: VolumeSnapshot

dataSourceRef — cross-namespace clone via VolumeSnapshotContent (1.26+)

StorageClass defaulting — annotation behavior; changing default SC; multiple defaults behavior

Kubernetes 1.26+ retroactive default StorageClass assignment

PV migration — in-tree to CSI; volume.kubernetes.io/storage-provisioner annotation; migrated-to annotation

Detach/force-detach scenarios — node failure; volumeattachment stuck; CSI migration impact

6 metrics + 4 alerting rules + 5 troubleshooting runbooks

8 best practices

Object Model

PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs) implement a two-level indirection: administrators provision storage capacity (PVs), and developers request storage (PVCs). Kubernetes binds them together.

CLUSTER SCOPE                         NAMESPACE SCOPE
─────────────────────────────────     ─────────────────────────────────
PersistentVolume (PV)                 PersistentVolumeClaim (PVC)
  name: pv-prod-db-001                  name: data
  capacity: 100Gi                       namespace: production
  accessModes: [RWO]                    requests.storage: 100Gi
  storageClassName: gp3                 accessModes: [RWO]
  reclaimPolicy: Retain                 storageClassName: gp3
  status.phase: Bound          ◄────────status.phase: Bound
  spec.claimRef:                        spec.volumeName: pv-prod-db-001
    namespace: production
    name: data
    uid: abc-123

                  ↕ (1:1 binding)

StorageClass (cluster-scoped)
  name: gp3
  provisioner: ebs.csi.aws.com
  volumeBindingMode: WaitForFirstConsumer
  reclaimPolicy: Delete

Key ownership rules:

One PV can be bound to exactly one PVC at a time.
One PVC is bound to exactly one PV.
PVs are cluster-scoped; PVCs are namespace-scoped. A PVC in namespace A cannot bind to a PV already claimed by namespace B.
The binding is recorded on both objects: PV's spec.claimRef and PVC's spec.volumeName (set post-bind in status).

PersistentVolume Spec — Full Reference

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-prod-db-001
  labels:
    type: ssd                   # used by PVC label selectors
    env: production
  annotations:
    pv.kubernetes.io/provisioned-by: ebs.csi.aws.com   # set by dynamic provisioner
    volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
spec:
  # ── Storage capacity ──────────────────────────────────────────
  capacity:
    storage: 100Gi              # reported capacity; CSI driver may report actual

  # ── Access modes ──────────────────────────────────────────────
  accessModes:
    - ReadWriteOnce             # RWO | ROX | RWX | RWOP (GA 1.29)

  # ── Volume mode ───────────────────────────────────────────────
  volumeMode: Filesystem        # Filesystem (default) | Block

  # ── Reclaim policy ────────────────────────────────────────────
  persistentVolumeReclaimPolicy: Retain  # Retain | Delete | Recycle(deprecated)

  # ── StorageClass association ───────────────────────────────────
  storageClassName: gp3         # must match PVC storageClassName to bind
                                # empty string = no class; not same as unset

  # ── Mount options (passed to mount command) ───────────────────
  mountOptions:
    - noatime
    - discard

  # ── CSI source ────────────────────────────────────────────────
  csi:
    driver: ebs.csi.aws.com
    volumeHandle: vol-0abc123def456789   # unique ID on storage backend
    fsType: ext4
    readOnly: false
    volumeAttributes:
      throughput: "250"
      iops: "3000"

  # ── Topology constraint (CSI / local volumes) ─────────────────
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: topology.ebs.csi.aws.com/zone
              operator: In
              values: [us-east-1a]

  # ── Claim reference (set by bind controller; do not set manually) ──
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: data
    namespace: production
    uid: abc-def-123            # UID prevents rebind to new PVC with same name

ℹ️

storageClassName: "" vs omitted An empty string (storageClassName: "") explicitly means "no StorageClass" — the PV will only bind to PVCs that also have storageClassName: "". An omitted storageClassName field causes the default StorageClass to be used (set by admission controller). These are different values with different binding behavior.

PersistentVolumeClaim Spec — Full Reference

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data
  namespace: production
  annotations:
    volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com  # set by provisioner
spec:
  # ── Storage request ───────────────────────────────────────────
  resources:
    requests:
      storage: 100Gi           # minimum; may bind to larger PV (static binding)

  # ── Access modes ──────────────────────────────────────────────
  accessModes:
    - ReadWriteOnce

  # ── Volume mode ───────────────────────────────────────────────
  volumeMode: Filesystem       # must match PV volumeMode

  # ── StorageClass ──────────────────────────────────────────────
  storageClassName: gp3        # "" = static binding with no class
                                # omit = default StorageClass

  # ── Static binding: target specific PV ───────────────────────
  volumeName: pv-prod-db-001   # skip binding algorithm; pin to this PV

  # ── Static binding: label selector ───────────────────────────
  selector:
    matchLabels:
      type: ssd
      env: production
    matchExpressions:
      - key: tier
        operator: In
        values: [fast, ultra-fast]

  # ── Data population sources ───────────────────────────────────
  dataSource:                  # clone from existing PVC or restore from snapshot
    kind: PersistentVolumeClaim
    name: source-pvc           # must be in same namespace, same StorageClass

  # dataSourceRef allows cross-namespace snapshot content refs (1.26+)
  # dataSourceRef:
  #   apiGroup: snapshot.storage.k8s.io
  #   kind: VolumeSnapshotContent
  #   name: vsc-cross-ns

Binding Algorithm

The PersistentVolume controller (inside kube-controller-manager) runs a continuous reconciliation loop. For each unbound PVC, it searches for the best-fit PV:

storageClassName match — PV's storageClassName must equal the PVC's. Both empty strings match each other.
accessModes check — The PV must support all modes the PVC requests (PV's set ⊇ PVC's set). A PV with RWO+ROX satisfies a PVC requesting only RWO.
volumeMode check — Must match exactly (both Filesystem or both Block).
capacity check — PV capacity ≥ PVC requested storage.
label selector check — If the PVC has a selector, PV labels must satisfy it.

Among all matching PVs, the controller selects the smallest capacity PV that satisfies the request (best-fit). This minimizes wasted capacity. If the PVC has volumeName, the algorithm is bypassed and that specific PV is targeted directly.

⚠️

Capacity overshoot with static PVs If the smallest matching PV is 500Gi but the PVC requests 20Gi, the PVC binds to the 500Gi PV and the pod sees 500Gi — not 20Gi. You cannot shrink. Dynamic provisioning (via StorageClass) provisions exactly the requested size, so this only affects static binding.

WaitForFirstConsumer and Topology-Aware Binding

When the StorageClass uses volumeBindingMode: WaitForFirstConsumer, the binding algorithm is deferred until a pod using the PVC is scheduled. The scheduler selects a node first, then the bind controller creates (or selects) a PV in the zone matching that node's topology labels.

WaitForFirstConsumer binding sequence:

1. PVC created → stays in Pending (no PV yet)
2. Pod created referencing PVC
3. Scheduler selects node N in zone us-east-1a
4. Scheduler annotates PVC:
   volume.kubernetes.io/selected-node: node-N
5. PVC bind controller sees the annotation
6. Dynamic provisioner creates PV in us-east-1a
7. Bind controller binds PVC → PV
8. Kubelet on node-N mounts the volume
9. Pod starts

Without WaitForFirstConsumer (Immediate):
1. PVC created → provisioner immediately creates PV (zone may be random)
2. Pod scheduler must find node in same zone as PV
3. If no nodes in that zone → Pod stuck Pending forever

PV and PVC Phase State Machines

PV Phases

Phase	Meaning	Transitions To
`Available`	PV exists, not bound to any PVC	Bound (when matching PVC found)
`Bound`	PV is bound 1:1 to a PVC	Released (when PVC is deleted)
`Released`	PVC deleted; PV retains data; `claimRef` still set	Available (admin removes claimRef), Failed (reclamation error), Deleted (Delete policy)
`Failed`	Automatic reclamation failed (e.g., cloud volume delete API error)	Requires manual intervention

PVC Phases

Phase	Meaning	Common Causes
`Pending`	PVC created but not yet bound to a PV	No matching PV, provisioner error, WaitForFirstConsumer waiting for pod, quota exceeded
`Bound`	Bound to a PV; ready for pods to use	—
`Lost`	Bound PV has been deleted or is unavailable	Admin deleted PV manually while PVC was bound; node failure with local PV

🔴

PVC in Lost phase A pod referencing a Lost PVC will be stuck in ContainerCreating. The data may still exist on the storage backend — the PV object was just deleted. To recover: create a new PV with the same volumeHandle pointing to the existing cloud volume, then manually set claimRef on the new PV to point to the lost PVC. The PVC will re-bind.

Finalizers and Deletion Protection

Kubernetes uses finalizers to prevent accidental deletion of PVs and PVCs while they are in use.

PVC Protection Finalizer

When a PVC is created, the admission controller adds kubernetes.io/pvc-protection as a finalizer. The PVCProtection controller in kube-controller-manager removes this finalizer only when no pod is actively using the PVC. If you run kubectl delete pvc <name> while a pod has it mounted:

PVC moves to Terminating (deletion timestamp set)
Finalizer prevents actual deletion
Pod continues running with the volume
When the pod is deleted, the PVC controller removes the finalizer
PVC is deleted; PV moves to Released

# Check finalizers on a PVC
kubectl get pvc data -o jsonpath='{.metadata.finalizers}'
# ["kubernetes.io/pvc-protection"]

# PVC stuck in Terminating? Check if any pod is still mounting it
kubectl get pods --all-namespaces -o json | \
  jq -r '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName=="data") | .metadata.name'

PV Protection Finalizer

Similarly, PVs have a kubernetes.io/pv-protection finalizer that prevents deletion while the PV is bound to a PVC. Deleting a bound PV moves it to Terminating; it completes deletion only when the PVC is deleted first.

# Force-remove a stuck finalizer (DANGEROUS — only if you are sure data is not needed)
kubectl patch pv <name> -p '{"metadata":{"finalizers":null}}'

🔴

Never force-remove finalizers unless data is expendable Removing the pvc-protection finalizer allows immediate PVC deletion even while a pod is using it — the volume unmounts from under a running container. This will corrupt any database or stateful application using the volume. Only do this for test environments or when data loss is explicitly acceptable.

Reclaim Policy Mechanics

Retain

When the PVC is deleted, the PV moves to Released. The underlying storage (cloud volume, NFS directory, etc.) is preserved. The PV's claimRef still points to the deleted PVC — preventing automatic rebinding. An administrator must manually intervene:

# Option 1: Delete the PV and recreate it (cleanest)
kubectl delete pv <name>
# Manually create a new PV pointing to the same storage backend volumeHandle

# Option 2: Remove claimRef to make the PV available again
kubectl patch pv <name> --type=json \
  -p '[{"op":"remove","path":"/spec/claimRef"}]'
# PV moves back to Available; can be claimed by a new PVC

⚠️

Data from previous tenant If you remove the claimRef and allow a new PVC to bind, the new consumer gets a volume containing the previous tenant's data. For shared-namespace clusters, this is a data leak. Always verify the volume is clean before reuse, or use snapshots to make a clean clone.

Delete

When the PVC is deleted, the external-provisioner sidecar calls the CSI DeleteVolume RPC. The cloud volume is deleted immediately. The PV object is also deleted. This is the default for dynamically-provisioned cloud StorageClasses.

Changing Reclaim Policy on an Existing PV

The StorageClass's reclaimPolicy only applies at provisioning time. Dynamically created PVs inherit the StorageClass policy, but you can change it on individual PVs after creation:

# Change a dynamically-provisioned PV from Delete to Retain
kubectl patch pv <pv-name> \
  -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

# Verify
kubectl get pv <pv-name> -o jsonpath='{.spec.persistentVolumeReclaimPolicy}'

💡

Change to Retain before deleting a StatefulSet Before running kubectl delete statefulset or helm uninstall, patch all PVs used by the StatefulSet to Retain. This ensures the cloud volumes survive even if the PVCs are deleted. You can then inspect and manually clean up, or rebind to a new StatefulSet.

StatefulSet volumeClaimTemplates

StatefulSets are the primary consumer of PVCs in production. They use volumeClaimTemplates to automatically provision a dedicated PVC for each replica. The PVC names follow a deterministic pattern: <template-name>-<statefulset-name>-<ordinal>.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: production
spec:
  serviceName: postgres
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:16
        volumeMounts:
        - name: data               # matches volumeClaimTemplates[0].metadata.name
          mountPath: /var/lib/postgresql/data
        - name: wal                # matches volumeClaimTemplates[1].metadata.name
          mountPath: /var/lib/postgresql/wal
  volumeClaimTemplates:
  - metadata:
      name: data
      annotations:
        volume.beta.kubernetes.io/storage-class: gp3-encrypted
    spec:
      accessModes: [ReadWriteOnce]
      storageClassName: gp3-encrypted
      resources:
        requests:
          storage: 100Gi
  - metadata:
      name: wal
    spec:
      accessModes: [ReadWriteOnce]
      storageClassName: gp3-encrypted
      resources:
        requests:
          storage: 20Gi

This creates PVCs with names:

data-postgres-0, data-postgres-1, data-postgres-2
wal-postgres-0, wal-postgres-1, wal-postgres-2

Stable PVC Identity

When a StatefulSet pod is deleted and rescheduled (on the same or different node), it reattaches to the same PVC with the same name. postgres-0 always uses data-postgres-0. This is the key property that makes StatefulSets suitable for databases — pod identity is coupled to storage identity.

Orphaned PVCs — The StatefulSet Deletion Trap

When you delete a StatefulSet, Kubernetes does not delete the PVCs created by volumeClaimTemplates. They are orphaned — no owner reference, no automatic cleanup. This is intentional (prevents accidental data loss), but it means:

PVCs accumulate after CI/CD teardown, failed releases, or namespace cleanup
Cloud volumes continue to be billed even after the StatefulSet is gone
A reinstalled StatefulSet with the same name will reuse the existing PVCs (the exact same data)

# Find all PVCs no longer referenced by any pod
kubectl get pvc -n <ns> | grep -v Bound

# More precise: find PVCs with no active pod consumer
for pvc in $(kubectl get pvc -n production -o name); do
  name=$(echo $pvc | cut -d/ -f2)
  pods=$(kubectl get pods -n production -o json | \
    jq -r --arg PVC "$name" '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName==$PVC) | .metadata.name')
  if [ -z "$pods" ]; then
    echo "ORPHANED: $name"
  fi
done

Scale-Down Behavior

Scaling a StatefulSet from 3 to 1 deletes postgres-2 and postgres-1 pods, but leaves data-postgres-2 and data-postgres-1 PVCs intact. Scaling back up to 3 reattaches to the existing PVCs. This is correct for databases where data must survive scale-down.

If you explicitly want PVCs deleted on scale-down (e.g., ephemeral read replicas), use the whenDeleted and whenScaled fields in spec.persistentVolumeClaimRetentionPolicy (GA 1.27):

spec:
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain        # Retain | Delete — on StatefulSet delete
    whenScaled: Delete         # Retain | Delete — on scale-down
                               # Delete removes PVCs as pods are scaled down

⚠️

whenScaled: Delete is irreversible Setting whenScaled: Delete means scaling down from 3→1 will permanently delete data-postgres-2 and data-postgres-1 and their underlying cloud volumes. Only use this for stateless or easily re-seeded replicas (read replicas that can be resynced from primary).

PVC Cloning

A PVC can be created as a clone of an existing PVC using dataSource. The CSI driver's CreateVolume with a data source creates a new volume pre-populated with the source's data. Constraints:

Source and destination PVC must be in the same namespace
Source and destination must use the same StorageClass
Destination capacity ≥ source capacity
Source PVC must be Bound at the time of cloning

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data-clone
  namespace: production
spec:
  dataSource:
    kind: PersistentVolumeClaim
    name: data-postgres-0        # source PVC (must be Bound, same namespace)
  accessModes: [ReadWriteOnce]
  storageClassName: gp3-encrypted
  resources:
    requests:
      storage: 100Gi             # must be ≥ source capacity

Restore from Snapshot

Restore a PVC from a VolumeSnapshot using dataSource with the snapshot as source. The snapshot must be in the same namespace:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-restored
  namespace: production
spec:
  dataSource:
    apiGroup: snapshot.storage.k8s.io
    kind: VolumeSnapshot
    name: postgres-snap-2024-01-15    # VolumeSnapshot in same namespace
  accessModes: [ReadWriteOnce]
  storageClassName: gp3-encrypted
  resources:
    requests:
      storage: 100Gi

dataSourceRef — Cross-Namespace Restore (1.26+)

dataSourceRef extends the clone/restore model to allow referencing a VolumeSnapshotContent directly (bypassing the namespace constraint of VolumeSnapshot). Requires the CrossNamespaceVolumeDataSource feature gate and a ReferenceGrant-like mechanism (via namespace annotation):

spec:
  dataSourceRef:
    apiGroup: snapshot.storage.k8s.io
    kind: VolumeSnapshotContent     # cluster-scoped, cross-namespace
    name: vsc-shared-backup
    namespace: backup-ns            # namespace of the VolumeSnapshotContent

Default StorageClass

A StorageClass can be designated as the default by adding the annotation storageclass.kubernetes.io/is-default-class: "true". When a PVC is created without specifying storageClassName, the default StorageClass is used.

# Check which StorageClass is default
kubectl get storageclass
# NAME              PROVISIONER         RECLAIMPOLICY  VOLUMEBINDINGMODE     DEFAULT
# gp2               kubernetes.io/aws   Delete         Immediate
# gp3 (default)     ebs.csi.aws.com     Delete         WaitForFirstConsumer  ✓

# Change the default StorageClass
kubectl patch storageclass gp2 -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
kubectl patch storageclass gp3 -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

⚠️

Multiple default StorageClasses If multiple StorageClasses are marked default, PVC creation without an explicit storageClassName will fail with an admission error. Always ensure exactly one default at a time when switching defaults.

Retroactive Default Assignment (1.26+)

In Kubernetes 1.26+, if a PVC was created without a storageClassName (explicitly set to "" was treated differently), and no default existed at the time, a newly-set default StorageClass is retroactively assigned to the Pending PVC. This allows fixing PVCs that are stuck Pending due to a missing default without recreating them.

In-Tree to CSI Migration

Kubernetes has been migrating in-tree volume plugins to CSI drivers. Most major cloud providers completed migration by 1.27–1.28. When migration is active for a plugin, the kubelet transparently routes in-tree volume operations to the CSI driver.

Migration Annotations

# On a dynamically provisioned PV, these annotations track migration:
annotations:
  pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs    # original in-tree provisioner
  volume.kubernetes.io/migrated-to: ebs.csi.aws.com          # CSI driver taking over

Migration Impact on Existing Clusters

Scenario	Behavior
Existing PVs created by in-tree plugin	Transparently served by CSI driver after migration; no PV changes needed
StorageClass with in-tree provisioner name	Admission controller rewrites to CSI provisioner name (on clusters with migration enabled)
In-tree plugin removed from Kubernetes	Old PVs using that plugin type are invalid; must migrate before upgrading
VolumeAttachment objects	New CSI-style VolumeAttachment created; old in-tree attachment may linger

Detach and Force-Detach Scenarios

A PV attached to a node is represented by a VolumeAttachment object. Under normal operation, kubelet unmounts volumes before pod deletion. Problems arise when a node becomes unreachable.

Node Failure with Attached PV

Node N fails (network partition / power loss)
    │
    ▼
Node N's status becomes NotReady
    │
    ├─► After node-monitor-grace-period (default 40s):
    │   Node marked Unreachable
    │
    ├─► After pod-eviction-timeout (default 5min):
    │   Pods on N get deletion timestamp set
    │
    ├─► New pod scheduled on node M
    │   ↓
    │   Kubelet on M attempts NodePublishVolume
    │   ↓
    │   CSI driver tries to attach EBS volume to node M
    │   ↓
    │   BLOCKED: EBS volume still "attached" to node N in AWS
    │
    └─► Force-detach after max-wait:
        CSI driver calls ControllerUnpublishVolume
        (force-detach: works because EBS knows node N is gone)
        Volume attached to node M, pod starts

Stuck VolumeAttachment

# View VolumeAttachment objects
kubectl get volumeattachment

# Identify which node a PV is attached to
kubectl get volumeattachment -o json | \
  jq -r '.items[] | select(.spec.source.persistentVolumeName=="pv-prod-db-001") |
  "\(.metadata.name) → node: \(.spec.nodeName) attached: \(.status.attached)"'

# If node is truly gone and attachment is stuck, force delete the VolumeAttachment
# This tells the CSI driver to release the attachment
kubectl delete volumeattachment <name>

# WARNING: Only do this if the node is confirmed gone or the volume is not mounted.
# Force-deleting an attachment while a node is still mounting the volume
# can cause data corruption if two nodes mount the volume simultaneously.

PVC Resizing Deep Dive

Volume expansion requires allowVolumeExpansion: true on the StorageClass. Expansion is a two-phase operation:

Controller expand — CSI ControllerExpandVolume resizes the backing cloud storage (e.g., EBS volume goes from 100Gi → 200Gi). This happens while the pod may be running.
Node expand — CSI NodeExpandVolume runs filesystem resize commands (resize2fs for ext4, xfs_growfs for xfs) on the node where the volume is mounted. This requires the volume to be mounted by a running pod.

# Step 1: Edit the PVC
kubectl patch pvc data-postgres-0 -n production \
  -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'

# Step 2: Watch status conditions
kubectl get pvc data-postgres-0 -n production -w -o wide

# Step 3: Check conditions if stuck
kubectl describe pvc data-postgres-0 -n production | grep -A 4 Conditions:
# Type                      Status
# FileSystemResizePending   True
#   → volume resized at cloud layer; waiting for pod to mount and run NodeExpandVolume

# Step 4: Ensure the pod is running (not CrashLoopBackOff)
# Kubelet runs NodeExpandVolume on the next mount cycle

# Step 5: Verify from inside the pod
kubectl exec -it postgres-0 -n production -- df -h /var/lib/postgresql/data

Condition	Meaning
`FileSystemResizePending: True`	Cloud volume resized; filesystem resize pending — pod must be running and mounting the volume
`Resizing: True`	Resize in progress (controller expand still running)
No conditions, capacity updated	Resize complete

⚠️

Online resize vs driver support Not all CSI drivers support online node expansion (while volume is mounted). Azure Disk and GCE PD support online expansion. Older EBS driver versions required the pod to be stopped first. Check your driver's release notes. If the driver doesn't support online node expansion, you must delete the pod (unmount), let the expansion complete, then redeploy.

Metrics and Alerting

Key Metrics

Metric	Source	Alert Threshold
`kube_persistentvolumeclaim_status_phase`	kube-state-metrics	phase=Pending for >10m, phase=Lost for >1m
`kube_persistentvolume_status_phase`	kube-state-metrics	phase=Failed
`kube_persistentvolumeclaim_resource_requests_storage_bytes`	kube-state-metrics	Capacity planning: total PVC bytes per namespace
`kubelet_volume_stats_used_bytes / capacity_bytes`	kubelet	>85% → warn; >95% → critical
`storage_operation_duration_seconds{operation_name="volume_attach"}`	kubelet	P99 > 60s
`attachdetach_controller_total_volumes`	kube-controller-manager	Unattached volumes growing over time

Alerting Rules

groups:
- name: persistent-volumes
  rules:
  - alert: PVCPendingTooLong
    expr: kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1
    for: 10m
    labels: {severity: warning}
    annotations:
      summary: "PVC {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} stuck Pending"

  - alert: PVCLost
    expr: kube_persistentvolumeclaim_status_phase{phase="Lost"} == 1
    for: 1m
    labels: {severity: critical}
    annotations:
      summary: "PVC {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is Lost — data may be inaccessible"

  - alert: PVFailed
    expr: kube_persistentvolume_status_phase{phase="Failed"} == 1
    for: 1m
    labels: {severity: critical}
    annotations:
      summary: "PV {{ $labels.persistentvolume }} in Failed state — reclamation error"

  - alert: OrphanedPVCsAccumulating
    expr: count(kube_persistentvolumeclaim_status_phase{phase="Bound"}) by (namespace)
          - count(kube_pod_spec_volumes_persistentvolumeclaims_info) by (namespace) > 5
    for: 30m
    labels: {severity: warning}
    annotations:
      summary: "Namespace {{ $labels.namespace }} has many PVCs not mounted by any pod"

Troubleshooting Runbooks

Runbook: PVC Stuck in Pending — Provisioning Failure

# 1. Describe PVC — check Events section
kubectl describe pvc <name> -n <ns>
# Common messages:
# "no persistent volumes available" → no matching static PV
# "waiting for a volume to be created, either by external provisioner..." → provisioner issue
# "error setting quota" → ResourceQuota exceeded
# "waiting for first consumer to be created before binding" → WaitForFirstConsumer, no pod yet

# 2. Check CSI provisioner logs
kubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner --tail=100 | grep -i error

# 3. Check if ResourceQuota is blocking
kubectl describe resourcequota -n <ns>

# 4. Verify StorageClass exists and provisioner is running
kubectl get storageclass <name>
kubectl get pods -n kube-system | grep csi

Runbook: PVC Stuck in Terminating

# Check finalizers
kubectl get pvc <name> -o jsonpath='{.metadata.finalizers}'
# ["kubernetes.io/pvc-protection"]

# Find which pod is using the PVC
kubectl get pods -n <ns> -o json | jq -r \
  '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName=="<pvc-name>") | .metadata.name'

# Delete the pod using the PVC first, then PVC will complete deletion
kubectl delete pod <pod-name> -n <ns>

# If pod is already gone but PVC still stuck (bug/finalizer stuck):
kubectl patch pvc <name> -n <ns> -p '{"metadata":{"finalizers":null}}'

Runbook: PVC in Lost Phase

# The PV that was bound to this PVC no longer exists
kubectl get pvc <name> -o jsonpath='{.spec.volumeName}'   # → pv-prod-db-001

# Check if PV exists
kubectl get pv pv-prod-db-001   # → NotFound

# Recovery: recreate the PV pointing to the same cloud volume
# 1. Find the cloud volume ID from the deleted PV (check your monitoring / cloud console)
# 2. Create a new PV:
cat <



Runbook: Released PV Won't Rebind
# PV is in Released state; new PVC won't bind even though sizes/classes match
# Cause: claimRef still points to old PVC

kubectl describe pv <name> | grep -A 5 "Claim:"
# Claim: production/old-pvc   ← still points to deleted PVC

# Remove claimRef to make PV Available
kubectl patch pv <name> --type=json \
  -p '[{"op":"remove","path":"/spec/claimRef"}]'

# PV is now Available; create new PVC to claim it

Runbook: Orphaned PVCs After StatefulSet Deletion
# List all PVCs not mounted by any pod in a namespace
kubectl get pvc -n production --no-headers | awk '{print $1}' | while read pvc; do
  pods=$(kubectl get pods -n production -o json | \
    jq -r --arg PVC "$pvc" '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName==$PVC) | .metadata.name')
  [ -z "$pods" ] && echo "ORPHANED: $pvc"
done

# Safely delete orphaned PVCs (after confirming data is not needed)
# First patch to Retain to ensure cloud volume persists as safety net
kubectl get pvc -n production -o name | xargs -I{} kubectl patch {} -n production \
  -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
# Then delete PVCs
kubectl delete pvc -n production <orphaned-pvc-name>


Best Practices


  Default StatefulSet PVs to Retain. Before any StatefulSet deletion or helm uninstall, patch all bound PVs to Retain. This is your last line of defence against accidental data loss.
  Use persistentVolumeClaimRetentionPolicy (GA 1.27) in StatefulSets to explicitly declare whether PVCs should survive scale-down. Default is Retain for both — which is correct for databases but may cause cost accumulation for ephemeral replicas.
  Monitor for orphaned PVCs. Run the orphan-detection script or alert on PVCs not mounted by any pod for >48 hours. Orphaned PVCs = orphaned cloud volumes = unexpected charges.
  Never reuse a Released PV without inspecting data. Removing claimRef makes the PV bindable again with previous data intact. Verify the volume is sanitized or explicitly acceptable for the new consumer.
  Set volumeName for static binding, not just a label selector. Label selectors can match multiple PVs — volumeName is deterministic and prevents unexpected cross-binding in shared clusters.
  Use allowVolumeExpansion: true on all StorageClasses. It costs nothing. Enabling it retroactively on a StorageClass does not expand existing PVCs — it only allows future expansion requests.
  Implement PVC snapshots before any schema migration or data transformation. A PVC clone + snapshot takes seconds and gives you an instant rollback path. See 05-volume-snapshots.html.
  Avoid manual PV creation in dynamic clusters. Static PVs require operator expertise to manage correctly (claimRef, reclaim, phase transitions). Use dynamic provisioning via StorageClass for all new workloads; reserve static binding for imports of existing cloud volumes.




  
    ← Previous
    Volumes
  
  
    Next →
    Storage Classes