Persistent Volumes
A complete deep dive into PersistentVolumes and PersistentVolumeClaims — the binding algorithm, finalizers, protection, StatefulSet volumeClaimTemplates, orphaned PVC cleanup, PV migration, and every edge case that causes data loss or stuck pods in production.
What This Page Covers
Object Model
PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs) implement a two-level indirection: administrators provision storage capacity (PVs), and developers request storage (PVCs). Kubernetes binds them together.
CLUSTER SCOPE NAMESPACE SCOPE
───────────────────────────────── ─────────────────────────────────
PersistentVolume (PV) PersistentVolumeClaim (PVC)
name: pv-prod-db-001 name: data
capacity: 100Gi namespace: production
accessModes: [RWO] requests.storage: 100Gi
storageClassName: gp3 accessModes: [RWO]
reclaimPolicy: Retain storageClassName: gp3
status.phase: Bound ◄────────status.phase: Bound
spec.claimRef: spec.volumeName: pv-prod-db-001
namespace: production
name: data
uid: abc-123
↕ (1:1 binding)
StorageClass (cluster-scoped)
name: gp3
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
Key ownership rules:
- One PV can be bound to exactly one PVC at a time.
- One PVC is bound to exactly one PV.
- PVs are cluster-scoped; PVCs are namespace-scoped. A PVC in namespace A cannot bind to a PV already claimed by namespace B.
- The binding is recorded on both objects: PV's
spec.claimRefand PVC'sspec.volumeName(set post-bind instatus).
PersistentVolume Spec — Full Reference
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-prod-db-001
labels:
type: ssd # used by PVC label selectors
env: production
annotations:
pv.kubernetes.io/provisioned-by: ebs.csi.aws.com # set by dynamic provisioner
volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
spec:
# ── Storage capacity ──────────────────────────────────────────
capacity:
storage: 100Gi # reported capacity; CSI driver may report actual
# ── Access modes ──────────────────────────────────────────────
accessModes:
- ReadWriteOnce # RWO | ROX | RWX | RWOP (GA 1.29)
# ── Volume mode ───────────────────────────────────────────────
volumeMode: Filesystem # Filesystem (default) | Block
# ── Reclaim policy ────────────────────────────────────────────
persistentVolumeReclaimPolicy: Retain # Retain | Delete | Recycle(deprecated)
# ── StorageClass association ───────────────────────────────────
storageClassName: gp3 # must match PVC storageClassName to bind
# empty string = no class; not same as unset
# ── Mount options (passed to mount command) ───────────────────
mountOptions:
- noatime
- discard
# ── CSI source ────────────────────────────────────────────────
csi:
driver: ebs.csi.aws.com
volumeHandle: vol-0abc123def456789 # unique ID on storage backend
fsType: ext4
readOnly: false
volumeAttributes:
throughput: "250"
iops: "3000"
# ── Topology constraint (CSI / local volumes) ─────────────────
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.ebs.csi.aws.com/zone
operator: In
values: [us-east-1a]
# ── Claim reference (set by bind controller; do not set manually) ──
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: data
namespace: production
uid: abc-def-123 # UID prevents rebind to new PVC with same name
storageClassName: "") explicitly means "no StorageClass" — the PV will only bind to PVCs that also have storageClassName: "". An omitted storageClassName field causes the default StorageClass to be used (set by admission controller). These are different values with different binding behavior.PersistentVolumeClaim Spec — Full Reference
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data
namespace: production
annotations:
volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com # set by provisioner
spec:
# ── Storage request ───────────────────────────────────────────
resources:
requests:
storage: 100Gi # minimum; may bind to larger PV (static binding)
# ── Access modes ──────────────────────────────────────────────
accessModes:
- ReadWriteOnce
# ── Volume mode ───────────────────────────────────────────────
volumeMode: Filesystem # must match PV volumeMode
# ── StorageClass ──────────────────────────────────────────────
storageClassName: gp3 # "" = static binding with no class
# omit = default StorageClass
# ── Static binding: target specific PV ───────────────────────
volumeName: pv-prod-db-001 # skip binding algorithm; pin to this PV
# ── Static binding: label selector ───────────────────────────
selector:
matchLabels:
type: ssd
env: production
matchExpressions:
- key: tier
operator: In
values: [fast, ultra-fast]
# ── Data population sources ───────────────────────────────────
dataSource: # clone from existing PVC or restore from snapshot
kind: PersistentVolumeClaim
name: source-pvc # must be in same namespace, same StorageClass
# dataSourceRef allows cross-namespace snapshot content refs (1.26+)
# dataSourceRef:
# apiGroup: snapshot.storage.k8s.io
# kind: VolumeSnapshotContent
# name: vsc-cross-ns
Binding Algorithm
The PersistentVolume controller (inside kube-controller-manager) runs a continuous reconciliation loop. For each unbound PVC, it searches for the best-fit PV:
- storageClassName match — PV's
storageClassNamemust equal the PVC's. Both empty strings match each other. - accessModes check — The PV must support all modes the PVC requests (PV's set ⊇ PVC's set). A PV with RWO+ROX satisfies a PVC requesting only RWO.
- volumeMode check — Must match exactly (both Filesystem or both Block).
- capacity check — PV capacity ≥ PVC requested storage.
- label selector check — If the PVC has a
selector, PV labels must satisfy it.
Among all matching PVs, the controller selects the smallest capacity PV that satisfies the request (best-fit). This minimizes wasted capacity. If the PVC has volumeName, the algorithm is bypassed and that specific PV is targeted directly.
WaitForFirstConsumer and Topology-Aware Binding
When the StorageClass uses volumeBindingMode: WaitForFirstConsumer, the binding algorithm is deferred until a pod using the PVC is scheduled. The scheduler selects a node first, then the bind controller creates (or selects) a PV in the zone matching that node's topology labels.
WaitForFirstConsumer binding sequence: 1. PVC created → stays in Pending (no PV yet) 2. Pod created referencing PVC 3. Scheduler selects node N in zone us-east-1a 4. Scheduler annotates PVC: volume.kubernetes.io/selected-node: node-N 5. PVC bind controller sees the annotation 6. Dynamic provisioner creates PV in us-east-1a 7. Bind controller binds PVC → PV 8. Kubelet on node-N mounts the volume 9. Pod starts Without WaitForFirstConsumer (Immediate): 1. PVC created → provisioner immediately creates PV (zone may be random) 2. Pod scheduler must find node in same zone as PV 3. If no nodes in that zone → Pod stuck Pending forever
PV and PVC Phase State Machines
PV Phases
| Phase | Meaning | Transitions To |
|---|---|---|
Available | PV exists, not bound to any PVC | Bound (when matching PVC found) |
Bound | PV is bound 1:1 to a PVC | Released (when PVC is deleted) |
Released | PVC deleted; PV retains data; claimRef still set | Available (admin removes claimRef), Failed (reclamation error), Deleted (Delete policy) |
Failed | Automatic reclamation failed (e.g., cloud volume delete API error) | Requires manual intervention |
PVC Phases
| Phase | Meaning | Common Causes |
|---|---|---|
Pending | PVC created but not yet bound to a PV | No matching PV, provisioner error, WaitForFirstConsumer waiting for pod, quota exceeded |
Bound | Bound to a PV; ready for pods to use | — |
Lost | Bound PV has been deleted or is unavailable | Admin deleted PV manually while PVC was bound; node failure with local PV |
ContainerCreating. The data may still exist on the storage backend — the PV object was just deleted. To recover: create a new PV with the same volumeHandle pointing to the existing cloud volume, then manually set claimRef on the new PV to point to the lost PVC. The PVC will re-bind.Finalizers and Deletion Protection
Kubernetes uses finalizers to prevent accidental deletion of PVs and PVCs while they are in use.
PVC Protection Finalizer
When a PVC is created, the admission controller adds kubernetes.io/pvc-protection as a finalizer. The PVCProtection controller in kube-controller-manager removes this finalizer only when no pod is actively using the PVC. If you run kubectl delete pvc <name> while a pod has it mounted:
- PVC moves to
Terminating(deletion timestamp set) - Finalizer prevents actual deletion
- Pod continues running with the volume
- When the pod is deleted, the PVC controller removes the finalizer
- PVC is deleted; PV moves to Released
# Check finalizers on a PVC
kubectl get pvc data -o jsonpath='{.metadata.finalizers}'
# ["kubernetes.io/pvc-protection"]
# PVC stuck in Terminating? Check if any pod is still mounting it
kubectl get pods --all-namespaces -o json | \
jq -r '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName=="data") | .metadata.name'
PV Protection Finalizer
Similarly, PVs have a kubernetes.io/pv-protection finalizer that prevents deletion while the PV is bound to a PVC. Deleting a bound PV moves it to Terminating; it completes deletion only when the PVC is deleted first.
# Force-remove a stuck finalizer (DANGEROUS — only if you are sure data is not needed)
kubectl patch pv <name> -p '{"metadata":{"finalizers":null}}'
pvc-protection finalizer allows immediate PVC deletion even while a pod is using it — the volume unmounts from under a running container. This will corrupt any database or stateful application using the volume. Only do this for test environments or when data loss is explicitly acceptable.Reclaim Policy Mechanics
Retain
When the PVC is deleted, the PV moves to Released. The underlying storage (cloud volume, NFS directory, etc.) is preserved. The PV's claimRef still points to the deleted PVC — preventing automatic rebinding. An administrator must manually intervene:
# Option 1: Delete the PV and recreate it (cleanest)
kubectl delete pv <name>
# Manually create a new PV pointing to the same storage backend volumeHandle
# Option 2: Remove claimRef to make the PV available again
kubectl patch pv <name> --type=json \
-p '[{"op":"remove","path":"/spec/claimRef"}]'
# PV moves back to Available; can be claimed by a new PVC
Delete
When the PVC is deleted, the external-provisioner sidecar calls the CSI DeleteVolume RPC. The cloud volume is deleted immediately. The PV object is also deleted. This is the default for dynamically-provisioned cloud StorageClasses.
Changing Reclaim Policy on an Existing PV
The StorageClass's reclaimPolicy only applies at provisioning time. Dynamically created PVs inherit the StorageClass policy, but you can change it on individual PVs after creation:
# Change a dynamically-provisioned PV from Delete to Retain
kubectl patch pv <pv-name> \
-p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
# Verify
kubectl get pv <pv-name> -o jsonpath='{.spec.persistentVolumeReclaimPolicy}'
kubectl delete statefulset or helm uninstall, patch all PVs used by the StatefulSet to Retain. This ensures the cloud volumes survive even if the PVCs are deleted. You can then inspect and manually clean up, or rebind to a new StatefulSet.StatefulSet volumeClaimTemplates
StatefulSets are the primary consumer of PVCs in production. They use volumeClaimTemplates to automatically provision a dedicated PVC for each replica. The PVC names follow a deterministic pattern: <template-name>-<statefulset-name>-<ordinal>.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: production
spec:
serviceName: postgres
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:16
volumeMounts:
- name: data # matches volumeClaimTemplates[0].metadata.name
mountPath: /var/lib/postgresql/data
- name: wal # matches volumeClaimTemplates[1].metadata.name
mountPath: /var/lib/postgresql/wal
volumeClaimTemplates:
- metadata:
name: data
annotations:
volume.beta.kubernetes.io/storage-class: gp3-encrypted
spec:
accessModes: [ReadWriteOnce]
storageClassName: gp3-encrypted
resources:
requests:
storage: 100Gi
- metadata:
name: wal
spec:
accessModes: [ReadWriteOnce]
storageClassName: gp3-encrypted
resources:
requests:
storage: 20Gi
This creates PVCs with names:
data-postgres-0,data-postgres-1,data-postgres-2wal-postgres-0,wal-postgres-1,wal-postgres-2
Stable PVC Identity
When a StatefulSet pod is deleted and rescheduled (on the same or different node), it reattaches to the same PVC with the same name. postgres-0 always uses data-postgres-0. This is the key property that makes StatefulSets suitable for databases — pod identity is coupled to storage identity.
Orphaned PVCs — The StatefulSet Deletion Trap
When you delete a StatefulSet, Kubernetes does not delete the PVCs created by volumeClaimTemplates. They are orphaned — no owner reference, no automatic cleanup. This is intentional (prevents accidental data loss), but it means:
- PVCs accumulate after CI/CD teardown, failed releases, or namespace cleanup
- Cloud volumes continue to be billed even after the StatefulSet is gone
- A reinstalled StatefulSet with the same name will reuse the existing PVCs (the exact same data)
# Find all PVCs no longer referenced by any pod
kubectl get pvc -n <ns> | grep -v Bound
# More precise: find PVCs with no active pod consumer
for pvc in $(kubectl get pvc -n production -o name); do
name=$(echo $pvc | cut -d/ -f2)
pods=$(kubectl get pods -n production -o json | \
jq -r --arg PVC "$name" '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName==$PVC) | .metadata.name')
if [ -z "$pods" ]; then
echo "ORPHANED: $name"
fi
done
Scale-Down Behavior
Scaling a StatefulSet from 3 to 1 deletes postgres-2 and postgres-1 pods, but leaves data-postgres-2 and data-postgres-1 PVCs intact. Scaling back up to 3 reattaches to the existing PVCs. This is correct for databases where data must survive scale-down.
If you explicitly want PVCs deleted on scale-down (e.g., ephemeral read replicas), use the whenDeleted and whenScaled fields in spec.persistentVolumeClaimRetentionPolicy (GA 1.27):
spec:
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain # Retain | Delete — on StatefulSet delete
whenScaled: Delete # Retain | Delete — on scale-down
# Delete removes PVCs as pods are scaled down
whenScaled: Delete means scaling down from 3→1 will permanently delete data-postgres-2 and data-postgres-1 and their underlying cloud volumes. Only use this for stateless or easily re-seeded replicas (read replicas that can be resynced from primary).PVC Cloning
A PVC can be created as a clone of an existing PVC using dataSource. The CSI driver's CreateVolume with a data source creates a new volume pre-populated with the source's data. Constraints:
- Source and destination PVC must be in the same namespace
- Source and destination must use the same StorageClass
- Destination capacity ≥ source capacity
- Source PVC must be
Boundat the time of cloning
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data-clone
namespace: production
spec:
dataSource:
kind: PersistentVolumeClaim
name: data-postgres-0 # source PVC (must be Bound, same namespace)
accessModes: [ReadWriteOnce]
storageClassName: gp3-encrypted
resources:
requests:
storage: 100Gi # must be ≥ source capacity
Restore from Snapshot
Restore a PVC from a VolumeSnapshot using dataSource with the snapshot as source. The snapshot must be in the same namespace:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-restored
namespace: production
spec:
dataSource:
apiGroup: snapshot.storage.k8s.io
kind: VolumeSnapshot
name: postgres-snap-2024-01-15 # VolumeSnapshot in same namespace
accessModes: [ReadWriteOnce]
storageClassName: gp3-encrypted
resources:
requests:
storage: 100Gi
dataSourceRef — Cross-Namespace Restore (1.26+)
dataSourceRef extends the clone/restore model to allow referencing a VolumeSnapshotContent directly (bypassing the namespace constraint of VolumeSnapshot). Requires the CrossNamespaceVolumeDataSource feature gate and a ReferenceGrant-like mechanism (via namespace annotation):
spec:
dataSourceRef:
apiGroup: snapshot.storage.k8s.io
kind: VolumeSnapshotContent # cluster-scoped, cross-namespace
name: vsc-shared-backup
namespace: backup-ns # namespace of the VolumeSnapshotContent
Default StorageClass
A StorageClass can be designated as the default by adding the annotation storageclass.kubernetes.io/is-default-class: "true". When a PVC is created without specifying storageClassName, the default StorageClass is used.
# Check which StorageClass is default
kubectl get storageclass
# NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE DEFAULT
# gp2 kubernetes.io/aws Delete Immediate
# gp3 (default) ebs.csi.aws.com Delete WaitForFirstConsumer ✓
# Change the default StorageClass
kubectl patch storageclass gp2 -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
kubectl patch storageclass gp3 -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Retroactive Default Assignment (1.26+)
In Kubernetes 1.26+, if a PVC was created without a storageClassName (explicitly set to "" was treated differently), and no default existed at the time, a newly-set default StorageClass is retroactively assigned to the Pending PVC. This allows fixing PVCs that are stuck Pending due to a missing default without recreating them.
In-Tree to CSI Migration
Kubernetes has been migrating in-tree volume plugins to CSI drivers. Most major cloud providers completed migration by 1.27–1.28. When migration is active for a plugin, the kubelet transparently routes in-tree volume operations to the CSI driver.
Migration Annotations
# On a dynamically provisioned PV, these annotations track migration:
annotations:
pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs # original in-tree provisioner
volume.kubernetes.io/migrated-to: ebs.csi.aws.com # CSI driver taking over
Migration Impact on Existing Clusters
| Scenario | Behavior |
|---|---|
| Existing PVs created by in-tree plugin | Transparently served by CSI driver after migration; no PV changes needed |
| StorageClass with in-tree provisioner name | Admission controller rewrites to CSI provisioner name (on clusters with migration enabled) |
| In-tree plugin removed from Kubernetes | Old PVs using that plugin type are invalid; must migrate before upgrading |
| VolumeAttachment objects | New CSI-style VolumeAttachment created; old in-tree attachment may linger |
Detach and Force-Detach Scenarios
A PV attached to a node is represented by a VolumeAttachment object. Under normal operation, kubelet unmounts volumes before pod deletion. Problems arise when a node becomes unreachable.
Node Failure with Attached PV
Node N fails (network partition / power loss)
│
▼
Node N's status becomes NotReady
│
├─► After node-monitor-grace-period (default 40s):
│ Node marked Unreachable
│
├─► After pod-eviction-timeout (default 5min):
│ Pods on N get deletion timestamp set
│
├─► New pod scheduled on node M
│ ↓
│ Kubelet on M attempts NodePublishVolume
│ ↓
│ CSI driver tries to attach EBS volume to node M
│ ↓
│ BLOCKED: EBS volume still "attached" to node N in AWS
│
└─► Force-detach after max-wait:
CSI driver calls ControllerUnpublishVolume
(force-detach: works because EBS knows node N is gone)
Volume attached to node M, pod starts
Stuck VolumeAttachment
# View VolumeAttachment objects
kubectl get volumeattachment
# Identify which node a PV is attached to
kubectl get volumeattachment -o json | \
jq -r '.items[] | select(.spec.source.persistentVolumeName=="pv-prod-db-001") |
"\(.metadata.name) → node: \(.spec.nodeName) attached: \(.status.attached)"'
# If node is truly gone and attachment is stuck, force delete the VolumeAttachment
# This tells the CSI driver to release the attachment
kubectl delete volumeattachment <name>
# WARNING: Only do this if the node is confirmed gone or the volume is not mounted.
# Force-deleting an attachment while a node is still mounting the volume
# can cause data corruption if two nodes mount the volume simultaneously.
PVC Resizing Deep Dive
Volume expansion requires allowVolumeExpansion: true on the StorageClass. Expansion is a two-phase operation:
- Controller expand — CSI
ControllerExpandVolumeresizes the backing cloud storage (e.g., EBS volume goes from 100Gi → 200Gi). This happens while the pod may be running. - Node expand — CSI
NodeExpandVolumeruns filesystem resize commands (resize2fsfor ext4,xfs_growfsfor xfs) on the node where the volume is mounted. This requires the volume to be mounted by a running pod.
# Step 1: Edit the PVC
kubectl patch pvc data-postgres-0 -n production \
-p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'
# Step 2: Watch status conditions
kubectl get pvc data-postgres-0 -n production -w -o wide
# Step 3: Check conditions if stuck
kubectl describe pvc data-postgres-0 -n production | grep -A 4 Conditions:
# Type Status
# FileSystemResizePending True
# → volume resized at cloud layer; waiting for pod to mount and run NodeExpandVolume
# Step 4: Ensure the pod is running (not CrashLoopBackOff)
# Kubelet runs NodeExpandVolume on the next mount cycle
# Step 5: Verify from inside the pod
kubectl exec -it postgres-0 -n production -- df -h /var/lib/postgresql/data
| Condition | Meaning |
|---|---|
FileSystemResizePending: True | Cloud volume resized; filesystem resize pending — pod must be running and mounting the volume |
Resizing: True | Resize in progress (controller expand still running) |
| No conditions, capacity updated | Resize complete |
Metrics and Alerting
Key Metrics
| Metric | Source | Alert Threshold |
|---|---|---|
kube_persistentvolumeclaim_status_phase | kube-state-metrics | phase=Pending for >10m, phase=Lost for >1m |
kube_persistentvolume_status_phase | kube-state-metrics | phase=Failed |
kube_persistentvolumeclaim_resource_requests_storage_bytes | kube-state-metrics | Capacity planning: total PVC bytes per namespace |
kubelet_volume_stats_used_bytes / capacity_bytes | kubelet | >85% → warn; >95% → critical |
storage_operation_duration_seconds{operation_name="volume_attach"} | kubelet | P99 > 60s |
attachdetach_controller_total_volumes | kube-controller-manager | Unattached volumes growing over time |
Alerting Rules
groups:
- name: persistent-volumes
rules:
- alert: PVCPendingTooLong
expr: kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1
for: 10m
labels: {severity: warning}
annotations:
summary: "PVC {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} stuck Pending"
- alert: PVCLost
expr: kube_persistentvolumeclaim_status_phase{phase="Lost"} == 1
for: 1m
labels: {severity: critical}
annotations:
summary: "PVC {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is Lost — data may be inaccessible"
- alert: PVFailed
expr: kube_persistentvolume_status_phase{phase="Failed"} == 1
for: 1m
labels: {severity: critical}
annotations:
summary: "PV {{ $labels.persistentvolume }} in Failed state — reclamation error"
- alert: OrphanedPVCsAccumulating
expr: count(kube_persistentvolumeclaim_status_phase{phase="Bound"}) by (namespace)
- count(kube_pod_spec_volumes_persistentvolumeclaims_info) by (namespace) > 5
for: 30m
labels: {severity: warning}
annotations:
summary: "Namespace {{ $labels.namespace }} has many PVCs not mounted by any pod"
Troubleshooting Runbooks
Runbook: PVC Stuck in Pending — Provisioning Failure
# 1. Describe PVC — check Events section
kubectl describe pvc <name> -n <ns>
# Common messages:
# "no persistent volumes available" → no matching static PV
# "waiting for a volume to be created, either by external provisioner..." → provisioner issue
# "error setting quota" → ResourceQuota exceeded
# "waiting for first consumer to be created before binding" → WaitForFirstConsumer, no pod yet
# 2. Check CSI provisioner logs
kubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner --tail=100 | grep -i error
# 3. Check if ResourceQuota is blocking
kubectl describe resourcequota -n <ns>
# 4. Verify StorageClass exists and provisioner is running
kubectl get storageclass <name>
kubectl get pods -n kube-system | grep csi
Runbook: PVC Stuck in Terminating
# Check finalizers
kubectl get pvc <name> -o jsonpath='{.metadata.finalizers}'
# ["kubernetes.io/pvc-protection"]
# Find which pod is using the PVC
kubectl get pods -n <ns> -o json | jq -r \
'.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName=="<pvc-name>") | .metadata.name'
# Delete the pod using the PVC first, then PVC will complete deletion
kubectl delete pod <pod-name> -n <ns>
# If pod is already gone but PVC still stuck (bug/finalizer stuck):
kubectl patch pvc <name> -n <ns> -p '{"metadata":{"finalizers":null}}'
Runbook: PVC in Lost Phase
# The PV that was bound to this PVC no longer exists
kubectl get pvc <name> -o jsonpath='{.spec.volumeName}' # → pv-prod-db-001
# Check if PV exists
kubectl get pv pv-prod-db-001 # → NotFound
# Recovery: recreate the PV pointing to the same cloud volume
# 1. Find the cloud volume ID from the deleted PV (check your monitoring / cloud console)
# 2. Create a new PV:
cat <
Runbook: Released PV Won't Rebind
# PV is in Released state; new PVC won't bind even though sizes/classes match
# Cause: claimRef still points to old PVC
kubectl describe pv <name> | grep -A 5 "Claim:"
# Claim: production/old-pvc ← still points to deleted PVC
# Remove claimRef to make PV Available
kubectl patch pv <name> --type=json \
-p '[{"op":"remove","path":"/spec/claimRef"}]'
# PV is now Available; create new PVC to claim it
Runbook: Orphaned PVCs After StatefulSet Deletion
# List all PVCs not mounted by any pod in a namespace
kubectl get pvc -n production --no-headers | awk '{print $1}' | while read pvc; do
pods=$(kubectl get pods -n production -o json | \
jq -r --arg PVC "$pvc" '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName==$PVC) | .metadata.name')
[ -z "$pods" ] && echo "ORPHANED: $pvc"
done
# Safely delete orphaned PVCs (after confirming data is not needed)
# First patch to Retain to ensure cloud volume persists as safety net
kubectl get pvc -n production -o name | xargs -I{} kubectl patch {} -n production \
-p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
# Then delete PVCs
kubectl delete pvc -n production <orphaned-pvc-name>
Best Practices
- Default StatefulSet PVs to Retain. Before any StatefulSet deletion or helm uninstall, patch all bound PVs to
Retain. This is your last line of defence against accidental data loss. - Use
persistentVolumeClaimRetentionPolicy(GA 1.27) in StatefulSets to explicitly declare whether PVCs should survive scale-down. Default is Retain for both — which is correct for databases but may cause cost accumulation for ephemeral replicas. - Monitor for orphaned PVCs. Run the orphan-detection script or alert on PVCs not mounted by any pod for >48 hours. Orphaned PVCs = orphaned cloud volumes = unexpected charges.
- Never reuse a Released PV without inspecting data. Removing claimRef makes the PV bindable again with previous data intact. Verify the volume is sanitized or explicitly acceptable for the new consumer.
- Set
volumeNamefor static binding, not just a label selector. Label selectors can match multiple PVs — volumeName is deterministic and prevents unexpected cross-binding in shared clusters. - Use
allowVolumeExpansion: trueon all StorageClasses. It costs nothing. Enabling it retroactively on a StorageClass does not expand existing PVCs — it only allows future expansion requests. - Implement PVC snapshots before any schema migration or data transformation. A PVC clone + snapshot takes seconds and gives you an instant rollback path. See 05-volume-snapshots.html.
- Avoid manual PV creation in dynamic clusters. Static PVs require operator expertise to manage correctly (claimRef, reclaim, phase transitions). Use dynamic provisioning via StorageClass for all new workloads; reserve static binding for imports of existing cloud volumes.