Storage Overview
The complete map of Kubernetes storage — from ephemeral scratch space to durable, replicated block and file volumes — with the primitives, lifecycle, and decision framework you need to choose the right storage type for every workload.
What This Page Covers
Storage Taxonomy
Kubernetes storage breaks cleanly into two axes: lifetime (ephemeral vs persistent) and source (cluster-native vs external CSI driver). Understanding where a volume type falls on these axes determines how it behaves when a pod is deleted, rescheduled, or when the node fails.
┌─────────────────────────────────────────────────────────────────────┐ │ KUBERNETES STORAGE TAXONOMY │ │ │ │ EPHEMERAL (tied to pod lifetime) │ │ ┌─────────────┬──────────────┬────────────┬─────────────────────┐ │ │ │ emptyDir │ configMap │ secret │ downwardAPI │ │ │ │ (scratch) │ (config │ (creds, │ (pod metadata) │ │ │ │ │ files) │ tokens) │ │ │ │ ├─────────────┴──────────────┴────────────┴─────────────────────┤ │ │ │ projected (combines configMap+secret+downwardAPI+SAToken) │ │ │ ├────────────────────────────────────────────────────────────────┤ │ │ │ generic ephemeral volume (PVC created/deleted with pod) │ │ │ │ CSI ephemeral volume (inline CSI, no PVC object) │ │ │ └────────────────────────────────────────────────────────────────┘ │ │ │ │ PERSISTENT (outlives pod) │ │ ┌────────────────────────────────────────────────────────────────┐ │ │ │ PersistentVolume (PV) ←bound to→ PersistentVolumeClaim │ │ │ │ (PVC) │ │ │ │ Provisioned by: │ │ │ │ • Static — admin creates PV manually │ │ │ │ • Dynamic — StorageClass triggers CSI driver │ │ │ │ │ │ │ │ Backed by: │ │ │ │ Cloud block (EBS, Persistent Disk, Azure Disk) │ │ │ │ Cloud file (EFS, Filestore, Azure Files) │ │ │ │ Network (NFS, iSCSI, Ceph RBD, CephFS, GlusterFS) │ │ │ │ Local (local PV — node affinity required) │ │ │ └────────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘
Ephemeral Volume Types
emptyDir
Empty directory created when pod starts. Lives in host RAM (medium: Memory) or disk. Shared across all containers in the pod. Lost on pod removal.
configMap
ConfigMap keys projected as files. Changes to the ConfigMap propagate into the volume within ~1min (kubelet sync period). Optional keys and subPath available.
secret
Secret keys projected as files with mode 0644 by default. Same update semantics as configMap. Stored in tmpfs on node (not written to disk).
downwardAPI
Pod metadata (name, namespace, labels, annotations, resource limits) exposed as files. Useful for apps that need to know their own identity at runtime.
projected
Combines configMap + secret + downwardAPI + serviceAccountToken into a single mount directory. Token has a configurable expiry (expirationSeconds).
Generic Ephemeral
Pod spec includes an inline PVC template. The PVC is created with the pod and deleted with it. Supports any StorageClass — enables ephemeral use of cloud SSDs.
CSI Ephemeral
Inline CSI volume — no PVC/PV objects created. Driver must declare volumeLifecycleModes: Ephemeral. Used for secrets stores (Secrets Store CSI Driver), node-local caches.
PersistentVolume Lifecycle
Every persistent volume in Kubernetes goes through a defined lifecycle. Understanding each phase prevents data loss and leaking cloud resources.
static or dynamic
PVC → PV matched
PVC Bound
Pod mounts volume
PVC deleted, PV Released
Retain / Delete
PV Phase Values
| Phase | Meaning | Next Action |
|---|---|---|
Available | PV exists, not bound to any PVC | Wait for matching PVC |
Bound | PV is bound to exactly one PVC | Pod can mount |
Released | PVC was deleted; data still on storage backend | Admin reclaims or deletes |
Failed | Automatic reclamation failed | Manual intervention |
Released PV cannot be rebound to a new PVC automatically — the claimRef field still points to the old PVC. To reuse a Retained PV, delete the claimRef manually: kubectl patch pv <name> -p '{"spec":{"claimRef":null}}'PersistentVolume Anatomy
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-example
annotations:
pv.kubernetes.io/provisioned-by: ebs.csi.aws.com # set by dynamic provisioner
spec:
capacity:
storage: 20Gi # declared capacity; CSI may report actual
accessModes:
- ReadWriteOnce # RWO: single node at a time
volumeMode: Filesystem # or Block for raw device
persistentVolumeReclaimPolicy: Retain
storageClassName: gp3-encrypted
mountOptions:
- noatime
- discard
csi:
driver: ebs.csi.aws.com
volumeHandle: vol-0abc123def456789
fsType: ext4
volumeAttributes:
throughput: "250"
nodeAffinity: # required for local PVs, optional for CSI
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.ebs.csi.aws.com/zone
operator: In
values: [us-east-1a]
Access Modes
| Mode | Short | Meaning | Typical Drivers |
|---|---|---|---|
| ReadWriteOnce | RWO | One node can mount read-write. Multiple pods on the same node can use it simultaneously. | EBS, Azure Disk, GCE PD, local |
| ReadOnlyMany | ROX | Many nodes can mount read-only. | EFS, NFS, CephFS, Azure Files |
| ReadWriteMany | RWX | Many nodes can mount read-write. | EFS, CephFS, GlusterFS, NFS, Azure Files (SMB/NFS) |
| ReadWriteOncePod | RWOP | Only a single pod cluster-wide can mount read-write. GA 1.29. Stronger than RWO. | Any CSI driver (enforced by Kubernetes, not driver) |
Volume Modes
| volumeMode | What the pod sees | Use case |
|---|---|---|
Filesystem (default) | A formatted filesystem directory | 99% of workloads — databases, app data, logs |
Block | A raw block device path (e.g. /dev/xvda) | Databases that manage their own I/O (Ceph OSD, PostgreSQL on raw disk, SAP HANA) |
With volumeMode: Block, the container spec uses volumeDevices instead of volumeMounts:
containers:
- name: db
image: postgres:16
volumeDevices:
- name: data
devicePath: /dev/xvda # raw block — no filesystem layer
volumes:
- name: data
persistentVolumeClaim:
claimName: raw-block-pvc
PersistentVolumeClaim Anatomy
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
namespace: production
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 20Gi # minimum size — dynamic provisioner provisions exactly this
storageClassName: gp3-encrypted
# selector: # for static binding only
# matchLabels:
# type: fast-ssd
# volumeName: pv-specific # pin to a specific PV (static only)
Binding Logic
The bind controller matches a PVC to a PV by finding the smallest PV that satisfies:
- storageClassName matches (or both are empty)
- accessModes requested ⊆ accessModes available
- volumeMode matches
- capacity ≥ requested storage
- selector labels match (if specified)
StorageClass
A StorageClass is the template used by the dynamic provisioner to create PVs on demand. It encodes all provisioner-specific parameters so consumers need only specify a class name.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-encrypted
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iopsPerGB: "3000"
throughput: "125"
encrypted: "true"
kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/mrk-abc123
reclaimPolicy: Delete # Delete or Retain
volumeBindingMode: WaitForFirstConsumer # or Immediate
allowVolumeExpansion: true
mountOptions:
- noatime
Reclaim Policies
| Policy | What Happens When PVC Deleted | Production Default |
|---|---|---|
Delete | PV and underlying cloud storage deleted immediately | ✓ Cloud StorageClasses default to Delete |
Retain | PV moves to Released state; data preserved; admin must clean up | Preferred for stateful data — safer |
Recycle | Deprecated. Ran rm -rf on the volume. Removed in 1.25. | ✗ Do not use |
reclaimPolicy: Delete. If you delete a PVC (e.g., from helm uninstall without volumeClaimTemplates protection), the data is gone. Use Retain for any production database or use a Velero backup policy.Binding Modes
| Mode | When PV is Provisioned | Use Case |
|---|---|---|
Immediate | When PVC is created — provisioner runs immediately regardless of pod scheduling | NFS, CephFS, EFS — topology-agnostic storage |
WaitForFirstConsumer | When a pod using the PVC is scheduled — provisioner knows the node's zone | EBS, Azure Disk, GCE PD — zonal block storage must land in pod's AZ |
Immediate, the EBS volume might be provisioned in us-east-1a while the pod scheduler later places the pod on a node in us-east-1b. The pod gets stuck in Pending because EBS is zonal. WaitForFirstConsumer delays provisioning until the scheduler commits to a node, ensuring co-location.Static vs Dynamic Provisioning
| Aspect | Static Provisioning | Dynamic Provisioning |
|---|---|---|
| PV creation | Admin creates PV manifests manually | StorageClass + provisioner create PV on demand |
| Workflow | Admin creates volume in cloud console → writes PV YAML → deploys | Developer creates PVC → done |
| Flexibility | Admin controls exact volume ID, size, IOPS | Parameters in StorageClass |
| Self-service | No — requires admin for each volume | Yes |
| Pre-existing data | Yes — can import existing cloud volumes | No — always creates new volume |
| Use case | Migration, production databases with specific volume IDs, local PVs | General-purpose — recommended default |
Static Provisioning Example (import existing EBS volume)
apiVersion: v1
kind: PersistentVolume
metadata:
name: imported-prod-db
spec:
capacity:
storage: 500Gi
accessModes: [ReadWriteOnce]
persistentVolumeReclaimPolicy: Retain
storageClassName: "" # empty = not managed by StorageClass
csi:
driver: ebs.csi.aws.com
volumeHandle: vol-0existingvolumeid # existing EBS volume ID
fsType: xfs
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prod-db-claim
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 500Gi
storageClassName: ""
volumeName: imported-prod-db # pin to specific PV
Volume Expansion
Expanding a PVC requires the StorageClass to have allowVolumeExpansion: true. You resize by editing the PVC's spec.resources.requests.storage to a larger value — you cannot shrink a PVC.
# Expand PVC from 20Gi to 50Gi
kubectl patch pvc postgres-data -p '{"spec":{"resources":{"requests":{"storage":"50Gi"}}}}'
# Watch status
kubectl get pvc postgres-data -w
# NAME STATUS VOLUME CAPACITY ACCESS MODES
# postgres-data Bound pvc-abc... 20Gi RWO # initially
# postgres-data Bound pvc-abc... 50Gi RWO # after expansion
Offline vs Online Expansion
| Type | Requires Pod Restart | Mechanism |
|---|---|---|
| Controller expansion (cloud volume resize) | No (CSI driver resizes cloud volume while pod runs) | CSI ControllerExpandVolume |
| Node expansion (filesystem resize) | Sometimes — driver-dependent | CSI NodeExpandVolume runs resize2fs/xfs_growfs |
| Offline resize (legacy) | Yes — must delete pod to unmount first | Required for plugins that don't support online node expansion |
The PVC condition FileSystemResizePending indicates the cloud volume has been resized but the filesystem on the node hasn't been expanded yet. This clears after the pod mounts the volume and the kubelet runs the filesystem resize.
kubectl describe pvc postgres-data | grep -A 3 Conditions
# Conditions:
# Type Status
# FileSystemResizePending True # waiting for pod to remount
CSI Architecture Overview
The Container Storage Interface (CSI) is the standard API between Kubernetes and storage drivers. Every production storage driver (EBS, Azure Disk, Ceph, etc.) implements CSI. The architecture separates the driver into two halves: a controller plugin (runs as a Deployment, calls cloud APIs) and a node plugin (runs as a DaemonSet, handles node-level mount/format/unmount).
┌─────────────────────────────────────────────────────────────────┐ │ CSI ARCHITECTURE │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ CONTROLLER SIDE (Deployment) │ │ │ │ │ │ │ │ ┌──────────────────────┐ ┌────────────────────────┐ │ │ │ │ │ CSI Driver │ │ Kubernetes Sidecars │ │ │ │ │ │ (controller plugin) │◄───│ external-provisioner │ │ │ │ │ │ │ │ external-attacher │ │ │ │ │ │ Implements: │ │ external-resizer │ │ │ │ │ │ CreateVolume │ │ external-snapshotter │ │ │ │ │ │ DeleteVolume │ └────────────────────────┘ │ │ │ │ │ ControllerPublish │ │ │ │ │ │ ControllerExpand │ │ │ │ │ └──────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ NODE SIDE (DaemonSet — runs on every worker node) │ │ │ │ │ │ │ │ ┌──────────────────────┐ ┌────────────────────────┐ │ │ │ │ │ CSI Driver │ │ node-driver-registrar │ │ │ │ │ │ (node plugin) │◄───│ (registers with │ │ │ │ │ │ │ │ kubelet plugin dir) │ │ │ │ │ │ Implements: │ └────────────────────────┘ │ │ │ │ │ NodeStageVolume │ │ │ │ │ │ NodePublishVolume │ ← kubelet calls these │ │ │ │ │ NodeExpandVolume │ via gRPC Unix socket │ │ │ │ │ NodeUnpublishVolume │ │ │ │ │ └──────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ │ CSIDriver object (cluster-scoped): declares driver capabilities│ │ (podInfoOnMount, attachRequired, volumeLifecycleModes, etc.) │ └─────────────────────────────────────────────────────────────────┘
CSIDriver Object
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
name: ebs.csi.aws.com
spec:
attachRequired: true # driver manages attach/detach (most block drivers)
podInfoOnMount: true # kubelet passes pod name/namespace to NodePublish
volumeLifecycleModes:
- Persistent # supports PV/PVC
# - Ephemeral # supports inline CSI ephemeral volumes
fsGroupPolicy: File # how fsGroup chown works: None/File/ReadWriteOnceWithFSType
tokenRequests: # request audience-specific tokens (e.g., for cloud auth)
- audience: sts.amazonaws.com
expirationSeconds: 86400
requiresRepublish: false
Storage Capacity Tracking
GA in Kubernetes 1.24. When enabled, the CSI driver's external-provisioner creates CSIStorageCapacity objects that report available capacity per topology zone. The kube-scheduler uses this information to avoid scheduling pods on nodes where the required storage capacity isn't available.
# CSIStorageCapacity is created automatically by external-provisioner
apiVersion: storage.k8s.io/v1
kind: CSIStorageCapacity
metadata:
name: capacity-us-east-1a-gp3
namespace: kube-system
storageClassName: gp3-encrypted
nodeTopology:
matchLabels:
topology.ebs.csi.aws.com/zone: us-east-1a
capacity: 5Ti # available capacity in this zone
Enable in the StorageClass by referencing volumeBindingMode: WaitForFirstConsumer — the scheduler only reads CSIStorageCapacity when using WaitForFirstConsumer. Enable in the CSI driver deployment with --feature-gates=Topology=true on the external-provisioner sidecar.
Local Volumes
Local PVs expose a node-local disk, directory, or partition as a PV — with full RWO access and block device support. Unlike hostPath, local PVs track node affinity so the scheduler binds pods to the correct node. There is no dynamic provisioner for local PVs (the local static provisioner automates PV creation from a discovery directory).
apiVersion: v1
kind: PersistentVolume
metadata:
name: local-ssd-node1
spec:
capacity:
storage: 400Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain # always Retain for local PVs
storageClassName: local-ssd
volumeMode: Filesystem
local:
path: /mnt/disks/ssd1
nodeAffinity: # REQUIRED — ties PV to a specific node
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values: [worker-node-1]
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-ssd
provisioner: kubernetes.io/no-provisioner # no dynamic provisioner
volumeBindingMode: WaitForFirstConsumer # REQUIRED for local PVs
Pending until the node comes back. Local PVs have no replication — data is tied to a single node. Use a distributed storage system (Ceph, Longhorn) if you need durability without node affinity.Encryption at Rest
Cloud KMS Integration
Most cloud CSI drivers support volume-level encryption via cloud KMS. Specify the KMS key in the StorageClass parameters:
# AWS EBS gp3 with KMS
parameters:
encrypted: "true"
kmsKeyId: arn:aws:kms:us-east-1:123456:key/mrk-abc123
# GCE PD with Cloud KMS
parameters:
disk-encryption-kms-key: projects/my-proj/locations/us-east1/keyRings/my-ring/cryptoKeys/my-key
# Azure Disk with disk encryption set
parameters:
diskEncryptionSetID: /subscriptions/.../diskEncryptionSets/my-des
etcd Encryption (Secrets)
Kubernetes Secrets stored in etcd are base64-encoded but not encrypted by default. Configure the API server with an EncryptionConfiguration to encrypt Secrets at rest using AES-GCM, AES-CBC, or KMS provider:
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources: [secrets]
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-32-byte-key>
- identity: {} # fallback — allows reading unencrypted secrets during rotation
Volume Snapshots Overview
Volume snapshots provide point-in-time copies of PVs for backup and restore. Three custom resources (installed via the volume-snapshots CRDs, not included in core Kubernetes):
| Resource | Scope | Analogous To |
|---|---|---|
VolumeSnapshotClass | Cluster | StorageClass — defines driver and deletion policy |
VolumeSnapshot | Namespace | PVC — user requests a snapshot |
VolumeSnapshotContent | Cluster | PV — actual snapshot on the backend |
Deep coverage including CSI snapshot controller, restore workflow, and cross-namespace snapshot cloning is in 05-volume-snapshots.html.
Storage Decision Matrix
| Workload Need | Recommended Type | AccessMode | Notes |
|---|---|---|---|
| Temporary scratch space (build cache, tmp files) | emptyDir | N/A | Use medium: Memory for RAM disk |
| App configuration / environment | configMap or secret volume | N/A | Auto-updates; don't use for binary data >1MiB |
| Database (MySQL, PostgreSQL, MongoDB) | PVC with block storage (gp3, Premium SSD) | RWO or RWOP | WaitForFirstConsumer; StatefulSet + volumeClaimTemplates |
| Shared read-write (NFS-style) | PVC with EFS/CephFS/Azure Files | RWX | Higher latency than block; avoid for random I/O DBs |
| Shared read-only (media, ML models) | PVC with ROX-capable driver | ROX | Or use OCI artifact / object storage + init container |
| High-performance local NVMe | Local PV | RWO | No portability; manual PV management; local-static-provisioner |
| Ephemeral cloud SSD (larger than node disk) | Generic ephemeral volume | RWO | PVC created/deleted with pod; uses any StorageClass |
| Secrets store (Vault, AWS SSM) | CSI ephemeral (Secrets Store CSI Driver) | N/A | No PV/PVC; mounts secrets from external vault as files |
Cloud Provider Support Matrix
| Driver | RWO | ROX | RWX | RWOP | Block | Snapshot | Expand |
|---|---|---|---|---|---|---|---|
| AWS EBS (ebs.csi.aws.com) | ✓ | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ online |
| AWS EFS (efs.csi.aws.com) | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✓ auto |
| GCE PD (pd.csi.storage.gke.io) | ✓ | ✓ | ✗ | ✓ | ✓ | ✓ | ✓ online |
| GCE Filestore (filestore.csi.storage.gke.io) | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ | ✓ |
| Azure Disk (disk.csi.azure.com) | ✓ | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ online |
| Azure Files (file.csi.azure.com) | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ | ✓ |
| Ceph RBD (rbd.csi.ceph.com) | ✓ | ✓ | ✗ | ✓ | ✓ | ✓ | ✓ online |
| CephFS (cephfs.csi.ceph.com) | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ | ✓ |
| Local (kubernetes.io/no-provisioner) | ✓ | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ |
Metrics and Alerting
Key Metrics
| Metric | Source | Alert Threshold |
|---|---|---|
kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes | kubelet | > 85% → warn; > 95% → critical |
kubelet_volume_stats_inodes_used / kubelet_volume_stats_inodes_free | kubelet | > 85% inode usage |
kube_persistentvolumeclaim_status_phase | kube-state-metrics | phase != Bound for > 5m |
kube_persistentvolume_status_phase | kube-state-metrics | phase = Failed |
storage_operation_duration_seconds | kubelet | P99 mount time > 30s |
csi_operations_seconds | CSI driver (external sidecar metrics) | P99 provision time > 60s |
Alerting Rules
groups:
- name: storage
rules:
- alert: PVCDiskAlmostFull
expr: |
(kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.85
for: 5m
labels: {severity: warning}
annotations:
summary: "PVC {{ $labels.persistentvolumeclaim }} is {{ $value | humanizePercentage }} full"
- alert: PVCDiskCritical
expr: |
(kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.95
for: 2m
labels: {severity: critical}
- alert: PVCNotBound
expr: kube_persistentvolumeclaim_status_phase{phase!="Bound"} == 1
for: 10m
labels: {severity: warning}
annotations:
summary: "PVC {{ $labels.persistentvolumeclaim }} stuck in {{ $labels.phase }}"
- alert: PVFailed
expr: kube_persistentvolume_status_phase{phase="Failed"} == 1
for: 1m
labels: {severity: critical}
annotations:
summary: "PV {{ $labels.persistentvolume }} is in Failed state"
Troubleshooting Runbooks
Runbook: PVC Stuck in Pending
# 1. Describe the PVC
kubectl describe pvc <name> -n <ns>
# Events section will show binding failures
# 2. Common causes:
# "no persistent volumes available for this claim" → no matching static PV; wrong storageClassName
# "waiting for a volume to be created" → provisioner can't create volume (check events + provisioner logs)
# "waiting for first consumer" → WaitForFirstConsumer + no pod using PVC yet (expected)
# 3. Check CSI driver controller logs
kubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner --tail=50
# 4. Check StorageClass exists and is spelled correctly
kubectl get storageclass
kubectl describe storageclass <name>
Runbook: Pod Stuck in ContainerCreating (Volume Mount Failure)
# 1. Describe pod to see volume mount error
kubectl describe pod <name> -n <ns>
# Look for: "Unable to attach or mount volumes"
# 2. Common: previous pod crashed and volume is still attached to old node
# Check VolumeAttachment objects
kubectl get volumeattachment | grep <pv-name>
kubectl delete volumeattachment <stuck-attachment> # force detach (risky if node still up)
# 3. Check node-side CSI plugin
kubectl logs -n kube-system -l app=ebs-csi-node -c csi-driver --tail=50
# 4. Check if node has reached attachment limit (AWS: m5.xlarge = 25 volumes)
kubectl describe node <node> | grep "Allocatable" -A 5 | grep attachable
Runbook: PVC Expand Not Completing
# 1. Check PVC conditions
kubectl describe pvc <name> | grep -A 5 Conditions
# If FileSystemResizePending: pod must be running to trigger NodeExpandVolume
# 2. Ensure pod is running (not crashed) — kubelet triggers filesystem resize on mount
# 3. Check kubelet logs for resize errors
journalctl -u kubelet | grep -i "resize" | tail -20
# 4. If StorageClass does not have allowVolumeExpansion:true, the patch will be rejected
kubectl get storageclass <name> -o yaml | grep allowVolumeExpansion
Runbook: Released PV Cannot Rebind
# PV is Released (previous PVC deleted) but new PVC won't bind to it
# The PV still has claimRef pointing to the old PVC
kubectl get pv <name> -o yaml | grep -A 6 claimRef
# claimRef:
# name: old-pvc
# namespace: production
# uid: abc-123
# Remove claimRef to make PV Available again
kubectl patch pv <name> --type=json \
-p '[{"op":"remove","path":"/spec/claimRef"}]'
Runbook: Inode Exhaustion (disk usage looks fine but writes fail)
# Symptoms: pod gets ENOSPC but df shows plenty of disk space
# Cause: filesystem has no free inodes (too many small files)
# Check inode usage
kubectl exec -it <pod> -- df -i
# Filesystem Inodes IUsed IFree IUse% Mounted on
# /dev/xvda 6553600 6553600 0 100% /data ← full inodes!
# Monitor via kubelet_volume_stats_inodes_used metric
# Long-term fix: increase PVC size (more inodes on ext4/xfs) or clean up small files
Section Roadmap
This overview covers the framework. Each subsequent file goes deep on one topic:
-
04/01
01-volumes.html All volume types in detail — emptyDir, configMap, secret, projected, hostPath, NFS, gitRepo (deprecated), CSI inline; subPath; volumeMounts lifecycle; init containers and shared volumes
-
04/02
02-persistent-volumes.html PV/PVC deep dive — binding algorithm, volumeClaimTemplates in StatefulSets, orphaned PVCs, finalizers, PVC protection, label selectors, capacity over-provisioning gotchas
-
04/03
03-storage-classes.html StorageClass deep dive — all provisioner parameters for EBS/GCE/Azure/Ceph; default StorageClass; allowedTopologies; bindingMode edge cases; StorageClass migration
-
04/04
04-csi-drivers.html CSI deep dive — driver deployment patterns; external sidecar responsibilities; NodeStage vs NodePublish; volume health monitoring; driver upgrade strategy; writing a minimal CSI driver
-
04/05
05-volume-snapshots.html Snapshot lifecycle; snapshot controller + CRD installation; pre-provisioned snapshots; restore to new PVC; cross-namespace clone; VolumeGroupSnapshot (alpha); backup strategies
-
04/06
06-stateful-storage-patterns.html StatefulSet storage patterns; volumeClaimTemplates; ordered provisioning; pod identity + stable storage; distributed DBs on Kubernetes (PostgreSQL, Cassandra, Kafka); Longhorn and Rook-Ceph
-
04/07
07-storage-capacity.html CSIStorageCapacity objects; capacity-aware scheduling; topology constraints; multi-zone capacity planning; node volume limits (AWS instance type limits); capacity monitoring
Best Practices
- Use WaitForFirstConsumer for any zonal block storage (EBS, Azure Disk, GCE PD). Immediate binding creates cross-AZ attachment failures.
- Default to Retain for production data. Override back to Delete only for ephemeral/test namespaces. The cloud default (Delete) will destroy data on helm uninstall.
- Set allowVolumeExpansion: true on every StorageClass. It costs nothing to enable and prevents emergency PV migrations when a database outgrows its disk.
- Alert on PVC fill >85%. CSI expansion is online but requires capacity — don't let a volume hit 100% before alerting fires.
- Use volumeClaimTemplates in StatefulSets, not a single shared PVC. Each pod instance gets its own isolated volume with a stable name (
data-pod-0,data-pod-1). - Never use hostPath for production workloads. It bypasses all Kubernetes storage accounting, has no access mode enforcement, and is a security risk. Use local PVs instead.
- Use RWOP instead of RWO for single-writer workloads (GA 1.29). It prevents split-brain from two pods accidentally claiming the same volume when a node is partitioned.
- Monitor inode usage separately from bytes. Workloads creating many small files (build caches, log aggregators, ML feature stores) exhaust inodes long before disk space.