Storage Overview

The complete map of Kubernetes storage — from ephemeral scratch space to durable, replicated block and file volumes — with the primitives, lifecycle, and decision framework you need to choose the right storage type for every workload.

Section 04 of 13 File 1 of 8 Platform Engineer

What This Page Covers

Kubernetes storage taxonomy — ephemeral vs persistent, inline vs external

All ephemeral volume types: emptyDir, configMap, secret, downwardAPI, projected, generic ephemeral, CSI ephemeral

Persistent volume lifecycle: PV → PVC → bind → mount → use → reclaim

PersistentVolume spec anatomy — capacity, accessModes, volumeMode, reclaim policy, storageClassName, nodeAffinity

PersistentVolumeClaim spec — resource requests, selector, volumeMode, volumeName, binding modes

StorageClass fields — provisioner, parameters, reclaimPolicy, volumeBindingMode, allowVolumeExpansion, mountOptions

Static vs dynamic provisioning — full compare with example manifests

Access modes in depth — RWO/ROX/RWX/RWOP definitions, CSI driver support matrix, cloud-provider reality

Volume modes — Filesystem vs Block, use cases for raw block devices

Reclaim policies — Retain (operator action required), Delete (cloud default), Recycle (deprecated)

Binding modes — Immediate vs WaitForFirstConsumer and why it matters for topology

Volume expansion — allowVolumeExpansion, online vs offline resize, FileSystemResizePending condition

CSI architecture overview — external provisioner/attacher/resizer/snapshotter sidecars, node plugin, driver registration

Storage capacity tracking (GA 1.24) — CSIStorageCapacity object, scheduler capacity awareness

Local volumes — local PV, nodeAffinity requirement, WaitForFirstConsumer, no dynamic provisioner

Encryption at rest — cloud KMS integration, etcd encryption for Secrets, CSI driver-level encryption

Volume snapshots overview — VolumeSnapshotClass, VolumeSnapshot, VolumeSnapshotContent (deep-dived in 05)

Cross-cutting storage decisions — access pattern matrix, cloud provider support table

Section roadmap — links and summaries for all 7 subsequent storage files

6 metrics + 4 alerting rules + 5 troubleshooting runbooks

8 best practices for production storage

Storage Taxonomy

Kubernetes storage breaks cleanly into two axes: lifetime (ephemeral vs persistent) and source (cluster-native vs external CSI driver). Understanding where a volume type falls on these axes determines how it behaves when a pod is deleted, rescheduled, or when the node fails.

┌─────────────────────────────────────────────────────────────────────┐
│                      KUBERNETES STORAGE TAXONOMY                    │
│                                                                     │
│   EPHEMERAL (tied to pod lifetime)                                  │
│   ┌─────────────┬──────────────┬────────────┬─────────────────────┐ │
│   │  emptyDir   │  configMap   │  secret    │  downwardAPI        │ │
│   │  (scratch)  │  (config     │  (creds,   │  (pod metadata)     │ │
│   │             │   files)     │   tokens)  │                     │ │
│   ├─────────────┴──────────────┴────────────┴─────────────────────┤ │
│   │  projected  (combines configMap+secret+downwardAPI+SAToken)   │ │
│   ├────────────────────────────────────────────────────────────────┤ │
│   │  generic ephemeral volume  (PVC created/deleted with pod)     │ │
│   │  CSI ephemeral volume      (inline CSI, no PVC object)        │ │
│   └────────────────────────────────────────────────────────────────┘ │
│                                                                     │
│   PERSISTENT (outlives pod)                                         │
│   ┌────────────────────────────────────────────────────────────────┐ │
│   │  PersistentVolume (PV)  ←bound to→  PersistentVolumeClaim     │ │
│   │                                         (PVC)                 │ │
│   │  Provisioned by:                                               │ │
│   │    • Static  — admin creates PV manually                      │ │
│   │    • Dynamic — StorageClass triggers CSI driver               │ │
│   │                                                               │ │
│   │  Backed by:                                                    │ │
│   │    Cloud block (EBS, Persistent Disk, Azure Disk)             │ │
│   │    Cloud file (EFS, Filestore, Azure Files)                   │ │
│   │    Network (NFS, iSCSI, Ceph RBD, CephFS, GlusterFS)         │ │
│   │    Local (local PV — node affinity required)                  │ │
│   └────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘

Ephemeral Volume Types

Ephemeral

emptyDir

Empty directory created when pod starts. Lives in host RAM (medium: Memory) or disk. Shared across all containers in the pod. Lost on pod removal.

Ephemeral

configMap

ConfigMap keys projected as files. Changes to the ConfigMap propagate into the volume within ~1min (kubelet sync period). Optional keys and subPath available.

Ephemeral

secret

Secret keys projected as files with mode 0644 by default. Same update semantics as configMap. Stored in tmpfs on node (not written to disk).

Ephemeral

downwardAPI

Pod metadata (name, namespace, labels, annotations, resource limits) exposed as files. Useful for apps that need to know their own identity at runtime.

Ephemeral

projected

Combines configMap + secret + downwardAPI + serviceAccountToken into a single mount directory. Token has a configurable expiry (expirationSeconds).

Ephemeral

Generic Ephemeral

Pod spec includes an inline PVC template. The PVC is created with the pod and deleted with it. Supports any StorageClass — enables ephemeral use of cloud SSDs.

Ephemeral

CSI Ephemeral

Inline CSI volume — no PVC/PV objects created. Driver must declare volumeLifecycleModes: Ephemeral. Used for secrets stores (Secrets Store CSI Driver), node-local caches.

PersistentVolume Lifecycle

Every persistent volume in Kubernetes goes through a defined lifecycle. Understanding each phase prevents data loss and leaking cloud resources.

Provision
static or dynamic

→

Bind
PVC → PV matched

→

Available
PVC Bound

→

In Use
Pod mounts volume

→

Released
PVC deleted, PV Released

→

Reclaim
Retain / Delete

PV Phase Values

Phase	Meaning	Next Action
`Available`	PV exists, not bound to any PVC	Wait for matching PVC
`Bound`	PV is bound to exactly one PVC	Pod can mount
`Released`	PVC was deleted; data still on storage backend	Admin reclaims or deletes
`Failed`	Automatic reclamation failed	Manual intervention

⚠️

Released ≠ Available A Released PV cannot be rebound to a new PVC automatically — the claimRef field still points to the old PVC. To reuse a Retained PV, delete the claimRef manually: kubectl patch pv <name> -p '{"spec":{"claimRef":null}}'

PersistentVolume Anatomy

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-example
  annotations:
    pv.kubernetes.io/provisioned-by: ebs.csi.aws.com   # set by dynamic provisioner
spec:
  capacity:
    storage: 20Gi                     # declared capacity; CSI may report actual
  accessModes:
    - ReadWriteOnce                   # RWO: single node at a time
  volumeMode: Filesystem              # or Block for raw device
  persistentVolumeReclaimPolicy: Retain
  storageClassName: gp3-encrypted
  mountOptions:
    - noatime
    - discard
  csi:
    driver: ebs.csi.aws.com
    volumeHandle: vol-0abc123def456789
    fsType: ext4
    volumeAttributes:
      throughput: "250"
  nodeAffinity:                       # required for local PVs, optional for CSI
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: topology.ebs.csi.aws.com/zone
              operator: In
              values: [us-east-1a]

Access Modes

Mode	Short	Meaning	Typical Drivers
ReadWriteOnce	RWO	One node can mount read-write. Multiple pods on the same node can use it simultaneously.	EBS, Azure Disk, GCE PD, local
ReadOnlyMany	ROX	Many nodes can mount read-only.	EFS, NFS, CephFS, Azure Files
ReadWriteMany	RWX	Many nodes can mount read-write.	EFS, CephFS, GlusterFS, NFS, Azure Files (SMB/NFS)
ReadWriteOncePod	RWOP	Only a single pod cluster-wide can mount read-write. GA 1.29. Stronger than RWO.	Any CSI driver (enforced by Kubernetes, not driver)

ℹ️

RWO vs RWOP RWO allows multiple pods on the same node to share a volume. RWOP enforces single-pod access cluster-wide at the Kubernetes layer — the CSI driver doesn't need to implement it. RWOP is the right choice for any workload where shared access would cause data corruption.

Volume Modes

volumeMode	What the pod sees	Use case
`Filesystem` (default)	A formatted filesystem directory	99% of workloads — databases, app data, logs
`Block`	A raw block device path (e.g. `/dev/xvda`)	Databases that manage their own I/O (Ceph OSD, PostgreSQL on raw disk, SAP HANA)

With volumeMode: Block, the container spec uses volumeDevices instead of volumeMounts:

containers:
- name: db
  image: postgres:16
  volumeDevices:
    - name: data
      devicePath: /dev/xvda    # raw block — no filesystem layer
volumes:
- name: data
  persistentVolumeClaim:
    claimName: raw-block-pvc

PersistentVolumeClaim Anatomy

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
  namespace: production
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 20Gi             # minimum size — dynamic provisioner provisions exactly this
  storageClassName: gp3-encrypted
  # selector:                   # for static binding only
  #   matchLabels:
  #     type: fast-ssd
  # volumeName: pv-specific     # pin to a specific PV (static only)

Binding Logic

The bind controller matches a PVC to a PV by finding the smallest PV that satisfies:

storageClassName matches (or both are empty)
accessModes requested ⊆ accessModes available
volumeMode matches
capacity ≥ requested storage
selector labels match (if specified)

⚠️

Capacity over-provisioning If the smallest matching PV is 100Gi and the PVC requests 20Gi, the PVC binds to the 100Gi PV — but the pod only sees 100Gi of space (not just 20Gi). You cannot resize down. Dynamic provisioning provisions exactly the requested size.

StorageClass

A StorageClass is the template used by the dynamic provisioner to create PVs on demand. It encodes all provisioner-specific parameters so consumers need only specify a class name.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-encrypted
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iopsPerGB: "3000"
  throughput: "125"
  encrypted: "true"
  kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/mrk-abc123
reclaimPolicy: Delete          # Delete or Retain
volumeBindingMode: WaitForFirstConsumer   # or Immediate
allowVolumeExpansion: true
mountOptions:
  - noatime

Reclaim Policies

Policy	What Happens When PVC Deleted	Production Default
`Delete`	PV and underlying cloud storage deleted immediately	✓ Cloud StorageClasses default to Delete
`Retain`	PV moves to Released state; data preserved; admin must clean up	Preferred for stateful data — safer
`Recycle`	Deprecated. Ran `rm -rf` on the volume. Removed in 1.25.	✗ Do not use

🔴

Default cloud StorageClasses use Delete On EKS, GKE, and AKS, the default StorageClass uses reclaimPolicy: Delete. If you delete a PVC (e.g., from helm uninstall without volumeClaimTemplates protection), the data is gone. Use Retain for any production database or use a Velero backup policy.

Binding Modes

Mode	When PV is Provisioned	Use Case
`Immediate`	When PVC is created — provisioner runs immediately regardless of pod scheduling	NFS, CephFS, EFS — topology-agnostic storage
`WaitForFirstConsumer`	When a pod using the PVC is scheduled — provisioner knows the node's zone	EBS, Azure Disk, GCE PD — zonal block storage must land in pod's AZ

ℹ️

Why WaitForFirstConsumer matters With Immediate, the EBS volume might be provisioned in us-east-1a while the pod scheduler later places the pod on a node in us-east-1b. The pod gets stuck in Pending because EBS is zonal. WaitForFirstConsumer delays provisioning until the scheduler commits to a node, ensuring co-location.

Static vs Dynamic Provisioning

Aspect	Static Provisioning	Dynamic Provisioning
PV creation	Admin creates PV manifests manually	StorageClass + provisioner create PV on demand
Workflow	Admin creates volume in cloud console → writes PV YAML → deploys	Developer creates PVC → done
Flexibility	Admin controls exact volume ID, size, IOPS	Parameters in StorageClass
Self-service	No — requires admin for each volume	Yes
Pre-existing data	Yes — can import existing cloud volumes	No — always creates new volume
Use case	Migration, production databases with specific volume IDs, local PVs	General-purpose — recommended default

Static Provisioning Example (import existing EBS volume)

apiVersion: v1
kind: PersistentVolume
metadata:
  name: imported-prod-db
spec:
  capacity:
    storage: 500Gi
  accessModes: [ReadWriteOnce]
  persistentVolumeReclaimPolicy: Retain
  storageClassName: ""              # empty = not managed by StorageClass
  csi:
    driver: ebs.csi.aws.com
    volumeHandle: vol-0existingvolumeid   # existing EBS volume ID
    fsType: xfs
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prod-db-claim
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 500Gi
  storageClassName: ""
  volumeName: imported-prod-db      # pin to specific PV

Volume Expansion

Expanding a PVC requires the StorageClass to have allowVolumeExpansion: true. You resize by editing the PVC's spec.resources.requests.storage to a larger value — you cannot shrink a PVC.

# Expand PVC from 20Gi to 50Gi
kubectl patch pvc postgres-data -p '{"spec":{"resources":{"requests":{"storage":"50Gi"}}}}'

# Watch status
kubectl get pvc postgres-data -w
# NAME           STATUS   VOLUME              CAPACITY   ACCESS MODES
# postgres-data  Bound    pvc-abc...          20Gi       RWO       # initially
# postgres-data  Bound    pvc-abc...          50Gi       RWO       # after expansion

Offline vs Online Expansion

Type	Requires Pod Restart	Mechanism
Controller expansion (cloud volume resize)	No (CSI driver resizes cloud volume while pod runs)	CSI `ControllerExpandVolume`
Node expansion (filesystem resize)	Sometimes — driver-dependent	CSI `NodeExpandVolume` runs `resize2fs`/`xfs_growfs`
Offline resize (legacy)	Yes — must delete pod to unmount first	Required for plugins that don't support online node expansion

The PVC condition FileSystemResizePending indicates the cloud volume has been resized but the filesystem on the node hasn't been expanded yet. This clears after the pod mounts the volume and the kubelet runs the filesystem resize.

kubectl describe pvc postgres-data | grep -A 3 Conditions
# Conditions:
#   Type                      Status
#   FileSystemResizePending   True    # waiting for pod to remount

CSI Architecture Overview

The Container Storage Interface (CSI) is the standard API between Kubernetes and storage drivers. Every production storage driver (EBS, Azure Disk, Ceph, etc.) implements CSI. The architecture separates the driver into two halves: a controller plugin (runs as a Deployment, calls cloud APIs) and a node plugin (runs as a DaemonSet, handles node-level mount/format/unmount).

┌─────────────────────────────────────────────────────────────────┐
│                     CSI ARCHITECTURE                            │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  CONTROLLER SIDE (Deployment)                            │  │
│  │                                                          │  │
│  │  ┌──────────────────────┐    ┌────────────────────────┐  │  │
│  │  │  CSI Driver          │    │  Kubernetes Sidecars   │  │  │
│  │  │  (controller plugin) │◄───│  external-provisioner  │  │  │
│  │  │                      │    │  external-attacher     │  │  │
│  │  │  Implements:         │    │  external-resizer      │  │  │
│  │  │  CreateVolume        │    │  external-snapshotter  │  │  │
│  │  │  DeleteVolume        │    └────────────────────────┘  │  │
│  │  │  ControllerPublish   │                                 │  │
│  │  │  ControllerExpand    │                                 │  │
│  │  └──────────────────────┘                                 │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  NODE SIDE (DaemonSet — runs on every worker node)       │  │
│  │                                                          │  │
│  │  ┌──────────────────────┐    ┌────────────────────────┐  │  │
│  │  │  CSI Driver          │    │  node-driver-registrar │  │  │
│  │  │  (node plugin)       │◄───│  (registers with       │  │  │
│  │  │                      │    │   kubelet plugin dir)  │  │  │
│  │  │  Implements:         │    └────────────────────────┘  │  │
│  │  │  NodeStageVolume     │                                 │  │
│  │  │  NodePublishVolume   │    ← kubelet calls these        │  │
│  │  │  NodeExpandVolume    │      via gRPC Unix socket       │  │
│  │  │  NodeUnpublishVolume │                                 │  │
│  │  └──────────────────────┘                                 │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
│  CSIDriver object (cluster-scoped): declares driver capabilities│
│  (podInfoOnMount, attachRequired, volumeLifecycleModes, etc.)  │
└─────────────────────────────────────────────────────────────────┘

CSIDriver Object

apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
  name: ebs.csi.aws.com
spec:
  attachRequired: true              # driver manages attach/detach (most block drivers)
  podInfoOnMount: true              # kubelet passes pod name/namespace to NodePublish
  volumeLifecycleModes:
    - Persistent                    # supports PV/PVC
    # - Ephemeral                   # supports inline CSI ephemeral volumes
  fsGroupPolicy: File               # how fsGroup chown works: None/File/ReadWriteOnceWithFSType
  tokenRequests:                    # request audience-specific tokens (e.g., for cloud auth)
    - audience: sts.amazonaws.com
      expirationSeconds: 86400
  requiresRepublish: false

Storage Capacity Tracking

GA in Kubernetes 1.24. When enabled, the CSI driver's external-provisioner creates CSIStorageCapacity objects that report available capacity per topology zone. The kube-scheduler uses this information to avoid scheduling pods on nodes where the required storage capacity isn't available.

# CSIStorageCapacity is created automatically by external-provisioner
apiVersion: storage.k8s.io/v1
kind: CSIStorageCapacity
metadata:
  name: capacity-us-east-1a-gp3
  namespace: kube-system
storageClassName: gp3-encrypted
nodeTopology:
  matchLabels:
    topology.ebs.csi.aws.com/zone: us-east-1a
capacity: 5Ti                # available capacity in this zone

Enable in the StorageClass by referencing volumeBindingMode: WaitForFirstConsumer — the scheduler only reads CSIStorageCapacity when using WaitForFirstConsumer. Enable in the CSI driver deployment with --feature-gates=Topology=true on the external-provisioner sidecar.

Local Volumes

Local PVs expose a node-local disk, directory, or partition as a PV — with full RWO access and block device support. Unlike hostPath, local PVs track node affinity so the scheduler binds pods to the correct node. There is no dynamic provisioner for local PVs (the local static provisioner automates PV creation from a discovery directory).

apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-ssd-node1
spec:
  capacity:
    storage: 400Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain   # always Retain for local PVs
  storageClassName: local-ssd
  volumeMode: Filesystem
  local:
    path: /mnt/disks/ssd1
  nodeAffinity:                 # REQUIRED — ties PV to a specific node
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values: [worker-node-1]
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-ssd
provisioner: kubernetes.io/no-provisioner   # no dynamic provisioner
volumeBindingMode: WaitForFirstConsumer     # REQUIRED for local PVs

⚠️

Node failure with local PVs If the node with the local PV fails, the pod using it will be stuck in Pending until the node comes back. Local PVs have no replication — data is tied to a single node. Use a distributed storage system (Ceph, Longhorn) if you need durability without node affinity.

Encryption at Rest

Cloud KMS Integration

Most cloud CSI drivers support volume-level encryption via cloud KMS. Specify the KMS key in the StorageClass parameters:

# AWS EBS gp3 with KMS
parameters:
  encrypted: "true"
  kmsKeyId: arn:aws:kms:us-east-1:123456:key/mrk-abc123

# GCE PD with Cloud KMS
parameters:
  disk-encryption-kms-key: projects/my-proj/locations/us-east1/keyRings/my-ring/cryptoKeys/my-key

# Azure Disk with disk encryption set
parameters:
  diskEncryptionSetID: /subscriptions/.../diskEncryptionSets/my-des

etcd Encryption (Secrets)

Kubernetes Secrets stored in etcd are base64-encoded but not encrypted by default. Configure the API server with an EncryptionConfiguration to encrypt Secrets at rest using AES-GCM, AES-CBC, or KMS provider:

apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
  - resources: [secrets]
    providers:
      - aescbc:
          keys:
            - name: key1
              secret: <base64-encoded-32-byte-key>
      - identity: {}   # fallback — allows reading unencrypted secrets during rotation

Volume Snapshots Overview

Volume snapshots provide point-in-time copies of PVs for backup and restore. Three custom resources (installed via the volume-snapshots CRDs, not included in core Kubernetes):

Resource	Scope	Analogous To
`VolumeSnapshotClass`	Cluster	StorageClass — defines driver and deletion policy
`VolumeSnapshot`	Namespace	PVC — user requests a snapshot
`VolumeSnapshotContent`	Cluster	PV — actual snapshot on the backend

Deep coverage including CSI snapshot controller, restore workflow, and cross-namespace snapshot cloning is in 05-volume-snapshots.html.

Storage Decision Matrix

Workload Need	Recommended Type	AccessMode	Notes
Temporary scratch space (build cache, tmp files)	emptyDir	N/A	Use `medium: Memory` for RAM disk
App configuration / environment	configMap or secret volume	N/A	Auto-updates; don't use for binary data >1MiB
Database (MySQL, PostgreSQL, MongoDB)	PVC with block storage (gp3, Premium SSD)	RWO or RWOP	WaitForFirstConsumer; StatefulSet + volumeClaimTemplates
Shared read-write (NFS-style)	PVC with EFS/CephFS/Azure Files	RWX	Higher latency than block; avoid for random I/O DBs
Shared read-only (media, ML models)	PVC with ROX-capable driver	ROX	Or use OCI artifact / object storage + init container
High-performance local NVMe	Local PV	RWO	No portability; manual PV management; local-static-provisioner
Ephemeral cloud SSD (larger than node disk)	Generic ephemeral volume	RWO	PVC created/deleted with pod; uses any StorageClass
Secrets store (Vault, AWS SSM)	CSI ephemeral (Secrets Store CSI Driver)	N/A	No PV/PVC; mounts secrets from external vault as files

Cloud Provider Support Matrix

Driver	RWO	ROX	RWX	RWOP	Block	Snapshot	Expand
AWS EBS (ebs.csi.aws.com)	✓	✗	✗	✓	✓	✓	✓ online
AWS EFS (efs.csi.aws.com)	✓	✓	✓	✗	✗	✗	✓ auto
GCE PD (pd.csi.storage.gke.io)	✓	✓	✗	✓	✓	✓	✓ online
GCE Filestore (filestore.csi.storage.gke.io)	✓	✓	✓	✗	✗	✓	✓
Azure Disk (disk.csi.azure.com)	✓	✗	✗	✓	✓	✓	✓ online
Azure Files (file.csi.azure.com)	✓	✓	✓	✗	✗	✓	✓
Ceph RBD (rbd.csi.ceph.com)	✓	✓	✗	✓	✓	✓	✓ online
CephFS (cephfs.csi.ceph.com)	✓	✓	✓	✗	✗	✓	✓
Local (kubernetes.io/no-provisioner)	✓	✗	✗	✓	✓	✗	✗

Metrics and Alerting

Key Metrics

Metric	Source	Alert Threshold
`kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes`	kubelet	> 85% → warn; > 95% → critical
`kubelet_volume_stats_inodes_used / kubelet_volume_stats_inodes_free`	kubelet	> 85% inode usage
`kube_persistentvolumeclaim_status_phase`	kube-state-metrics	phase != Bound for > 5m
`kube_persistentvolume_status_phase`	kube-state-metrics	phase = Failed
`storage_operation_duration_seconds`	kubelet	P99 mount time > 30s
`csi_operations_seconds`	CSI driver (external sidecar metrics)	P99 provision time > 60s

Alerting Rules

groups:
- name: storage
  rules:
  - alert: PVCDiskAlmostFull
    expr: |
      (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.85
    for: 5m
    labels: {severity: warning}
    annotations:
      summary: "PVC {{ $labels.persistentvolumeclaim }} is {{ $value | humanizePercentage }} full"

  - alert: PVCDiskCritical
    expr: |
      (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.95
    for: 2m
    labels: {severity: critical}

  - alert: PVCNotBound
    expr: kube_persistentvolumeclaim_status_phase{phase!="Bound"} == 1
    for: 10m
    labels: {severity: warning}
    annotations:
      summary: "PVC {{ $labels.persistentvolumeclaim }} stuck in {{ $labels.phase }}"

  - alert: PVFailed
    expr: kube_persistentvolume_status_phase{phase="Failed"} == 1
    for: 1m
    labels: {severity: critical}
    annotations:
      summary: "PV {{ $labels.persistentvolume }} is in Failed state"

Troubleshooting Runbooks

Runbook: PVC Stuck in Pending

# 1. Describe the PVC
kubectl describe pvc <name> -n <ns>
# Events section will show binding failures

# 2. Common causes:
# "no persistent volumes available for this claim" → no matching static PV; wrong storageClassName
# "waiting for a volume to be created" → provisioner can't create volume (check events + provisioner logs)
# "waiting for first consumer" → WaitForFirstConsumer + no pod using PVC yet (expected)

# 3. Check CSI driver controller logs
kubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner --tail=50

# 4. Check StorageClass exists and is spelled correctly
kubectl get storageclass
kubectl describe storageclass <name>

Runbook: Pod Stuck in ContainerCreating (Volume Mount Failure)

# 1. Describe pod to see volume mount error
kubectl describe pod <name> -n <ns>
# Look for: "Unable to attach or mount volumes"

# 2. Common: previous pod crashed and volume is still attached to old node
# Check VolumeAttachment objects
kubectl get volumeattachment | grep <pv-name>
kubectl delete volumeattachment <stuck-attachment>   # force detach (risky if node still up)

# 3. Check node-side CSI plugin
kubectl logs -n kube-system -l app=ebs-csi-node -c csi-driver --tail=50

# 4. Check if node has reached attachment limit (AWS: m5.xlarge = 25 volumes)
kubectl describe node <node> | grep "Allocatable" -A 5 | grep attachable

Runbook: PVC Expand Not Completing

# 1. Check PVC conditions
kubectl describe pvc <name> | grep -A 5 Conditions
# If FileSystemResizePending: pod must be running to trigger NodeExpandVolume

# 2. Ensure pod is running (not crashed) — kubelet triggers filesystem resize on mount
# 3. Check kubelet logs for resize errors
journalctl -u kubelet | grep -i "resize" | tail -20

# 4. If StorageClass does not have allowVolumeExpansion:true, the patch will be rejected
kubectl get storageclass <name> -o yaml | grep allowVolumeExpansion

Runbook: Released PV Cannot Rebind

# PV is Released (previous PVC deleted) but new PVC won't bind to it
# The PV still has claimRef pointing to the old PVC

kubectl get pv <name> -o yaml | grep -A 6 claimRef
# claimRef:
#   name: old-pvc
#   namespace: production
#   uid: abc-123

# Remove claimRef to make PV Available again
kubectl patch pv <name> --type=json \
  -p '[{"op":"remove","path":"/spec/claimRef"}]'

Runbook: Inode Exhaustion (disk usage looks fine but writes fail)

# Symptoms: pod gets ENOSPC but df shows plenty of disk space
# Cause: filesystem has no free inodes (too many small files)

# Check inode usage
kubectl exec -it <pod> -- df -i
# Filesystem   Inodes   IUsed   IFree IUse% Mounted on
# /dev/xvda   6553600 6553600       0  100% /data   ← full inodes!

# Monitor via kubelet_volume_stats_inodes_used metric
# Long-term fix: increase PVC size (more inodes on ext4/xfs) or clean up small files

Section Roadmap

This overview covers the framework. Each subsequent file goes deep on one topic:

04/01
01-volumes.html All volume types in detail — emptyDir, configMap, secret, projected, hostPath, NFS, gitRepo (deprecated), CSI inline; subPath; volumeMounts lifecycle; init containers and shared volumes
04/02
02-persistent-volumes.html PV/PVC deep dive — binding algorithm, volumeClaimTemplates in StatefulSets, orphaned PVCs, finalizers, PVC protection, label selectors, capacity over-provisioning gotchas
04/03
03-storage-classes.html StorageClass deep dive — all provisioner parameters for EBS/GCE/Azure/Ceph; default StorageClass; allowedTopologies; bindingMode edge cases; StorageClass migration
04/04
04-csi-drivers.html CSI deep dive — driver deployment patterns; external sidecar responsibilities; NodeStage vs NodePublish; volume health monitoring; driver upgrade strategy; writing a minimal CSI driver
04/05
05-volume-snapshots.html Snapshot lifecycle; snapshot controller + CRD installation; pre-provisioned snapshots; restore to new PVC; cross-namespace clone; VolumeGroupSnapshot (alpha); backup strategies
04/06
06-stateful-storage-patterns.html StatefulSet storage patterns; volumeClaimTemplates; ordered provisioning; pod identity + stable storage; distributed DBs on Kubernetes (PostgreSQL, Cassandra, Kafka); Longhorn and Rook-Ceph
04/07
07-storage-capacity.html CSIStorageCapacity objects; capacity-aware scheduling; topology constraints; multi-zone capacity planning; node volume limits (AWS instance type limits); capacity monitoring

Best Practices

Use WaitForFirstConsumer for any zonal block storage (EBS, Azure Disk, GCE PD). Immediate binding creates cross-AZ attachment failures.
Default to Retain for production data. Override back to Delete only for ephemeral/test namespaces. The cloud default (Delete) will destroy data on helm uninstall.
Set allowVolumeExpansion: true on every StorageClass. It costs nothing to enable and prevents emergency PV migrations when a database outgrows its disk.
Alert on PVC fill >85%. CSI expansion is online but requires capacity — don't let a volume hit 100% before alerting fires.
Use volumeClaimTemplates in StatefulSets, not a single shared PVC. Each pod instance gets its own isolated volume with a stable name (data-pod-0, data-pod-1).
Never use hostPath for production workloads. It bypasses all Kubernetes storage accounting, has no access mode enforcement, and is a security risk. Use local PVs instead.
Use RWOP instead of RWO for single-writer workloads (GA 1.29). It prevents split-brain from two pods accidentally claiming the same volume when a node is partitioned.
Monitor inode usage separately from bytes. Workloads creating many small files (build caches, log aggregators, ML feature stores) exhaust inodes long before disk space.

← Previous Service Mesh Next → Volumes