Storage Overview

The complete map of Kubernetes storage — from ephemeral scratch space to durable, replicated block and file volumes — with the primitives, lifecycle, and decision framework you need to choose the right storage type for every workload.

Section 04 of 13 File 1 of 8 Platform Engineer
What This Page Covers
  • Kubernetes storage taxonomy — ephemeral vs persistent, inline vs external
  • All ephemeral volume types: emptyDir, configMap, secret, downwardAPI, projected, generic ephemeral, CSI ephemeral
  • Persistent volume lifecycle: PV → PVC → bind → mount → use → reclaim
  • PersistentVolume spec anatomy — capacity, accessModes, volumeMode, reclaim policy, storageClassName, nodeAffinity
  • PersistentVolumeClaim spec — resource requests, selector, volumeMode, volumeName, binding modes
  • StorageClass fields — provisioner, parameters, reclaimPolicy, volumeBindingMode, allowVolumeExpansion, mountOptions
  • Static vs dynamic provisioning — full compare with example manifests
  • Access modes in depth — RWO/ROX/RWX/RWOP definitions, CSI driver support matrix, cloud-provider reality
  • Volume modes — Filesystem vs Block, use cases for raw block devices
  • Reclaim policies — Retain (operator action required), Delete (cloud default), Recycle (deprecated)
  • Binding modes — Immediate vs WaitForFirstConsumer and why it matters for topology
  • Volume expansion — allowVolumeExpansion, online vs offline resize, FileSystemResizePending condition
  • CSI architecture overview — external provisioner/attacher/resizer/snapshotter sidecars, node plugin, driver registration
  • Storage capacity tracking (GA 1.24) — CSIStorageCapacity object, scheduler capacity awareness
  • Local volumes — local PV, nodeAffinity requirement, WaitForFirstConsumer, no dynamic provisioner
  • Encryption at rest — cloud KMS integration, etcd encryption for Secrets, CSI driver-level encryption
  • Volume snapshots overview — VolumeSnapshotClass, VolumeSnapshot, VolumeSnapshotContent (deep-dived in 05)
  • Cross-cutting storage decisions — access pattern matrix, cloud provider support table
  • Section roadmap — links and summaries for all 7 subsequent storage files
  • 6 metrics + 4 alerting rules + 5 troubleshooting runbooks
  • 8 best practices for production storage
  • Storage Taxonomy

    Kubernetes storage breaks cleanly into two axes: lifetime (ephemeral vs persistent) and source (cluster-native vs external CSI driver). Understanding where a volume type falls on these axes determines how it behaves when a pod is deleted, rescheduled, or when the node fails.

    ┌─────────────────────────────────────────────────────────────────────┐
    │                      KUBERNETES STORAGE TAXONOMY                    │
    │                                                                     │
    │   EPHEMERAL (tied to pod lifetime)                                  │
    │   ┌─────────────┬──────────────┬────────────┬─────────────────────┐ │
    │   │  emptyDir   │  configMap   │  secret    │  downwardAPI        │ │
    │   │  (scratch)  │  (config     │  (creds,   │  (pod metadata)     │ │
    │   │             │   files)     │   tokens)  │                     │ │
    │   ├─────────────┴──────────────┴────────────┴─────────────────────┤ │
    │   │  projected  (combines configMap+secret+downwardAPI+SAToken)   │ │
    │   ├────────────────────────────────────────────────────────────────┤ │
    │   │  generic ephemeral volume  (PVC created/deleted with pod)     │ │
    │   │  CSI ephemeral volume      (inline CSI, no PVC object)        │ │
    │   └────────────────────────────────────────────────────────────────┘ │
    │                                                                     │
    │   PERSISTENT (outlives pod)                                         │
    │   ┌────────────────────────────────────────────────────────────────┐ │
    │   │  PersistentVolume (PV)  ←bound to→  PersistentVolumeClaim     │ │
    │   │                                         (PVC)                 │ │
    │   │  Provisioned by:                                               │ │
    │   │    • Static  — admin creates PV manually                      │ │
    │   │    • Dynamic — StorageClass triggers CSI driver               │ │
    │   │                                                               │ │
    │   │  Backed by:                                                    │ │
    │   │    Cloud block (EBS, Persistent Disk, Azure Disk)             │ │
    │   │    Cloud file (EFS, Filestore, Azure Files)                   │ │
    │   │    Network (NFS, iSCSI, Ceph RBD, CephFS, GlusterFS)         │ │
    │   │    Local (local PV — node affinity required)                  │ │
    │   └────────────────────────────────────────────────────────────────┘ │
    └─────────────────────────────────────────────────────────────────────┘
    

    Ephemeral Volume Types

    Ephemeral

    emptyDir

    Empty directory created when pod starts. Lives in host RAM (medium: Memory) or disk. Shared across all containers in the pod. Lost on pod removal.

    Ephemeral

    configMap

    ConfigMap keys projected as files. Changes to the ConfigMap propagate into the volume within ~1min (kubelet sync period). Optional keys and subPath available.

    Ephemeral

    secret

    Secret keys projected as files with mode 0644 by default. Same update semantics as configMap. Stored in tmpfs on node (not written to disk).

    Ephemeral

    downwardAPI

    Pod metadata (name, namespace, labels, annotations, resource limits) exposed as files. Useful for apps that need to know their own identity at runtime.

    Ephemeral

    projected

    Combines configMap + secret + downwardAPI + serviceAccountToken into a single mount directory. Token has a configurable expiry (expirationSeconds).

    Ephemeral

    Generic Ephemeral

    Pod spec includes an inline PVC template. The PVC is created with the pod and deleted with it. Supports any StorageClass — enables ephemeral use of cloud SSDs.

    Ephemeral

    CSI Ephemeral

    Inline CSI volume — no PVC/PV objects created. Driver must declare volumeLifecycleModes: Ephemeral. Used for secrets stores (Secrets Store CSI Driver), node-local caches.

    PersistentVolume Lifecycle

    Every persistent volume in Kubernetes goes through a defined lifecycle. Understanding each phase prevents data loss and leaking cloud resources.

    Provision
    static or dynamic
    Bind
    PVC → PV matched
    Available
    PVC Bound
    In Use
    Pod mounts volume
    Released
    PVC deleted, PV Released
    Reclaim
    Retain / Delete

    PV Phase Values

    PhaseMeaningNext Action
    AvailablePV exists, not bound to any PVCWait for matching PVC
    BoundPV is bound to exactly one PVCPod can mount
    ReleasedPVC was deleted; data still on storage backendAdmin reclaims or deletes
    FailedAutomatic reclamation failedManual intervention
    ⚠️
    Released ≠ Available A Released PV cannot be rebound to a new PVC automatically — the claimRef field still points to the old PVC. To reuse a Retained PV, delete the claimRef manually: kubectl patch pv <name> -p '{"spec":{"claimRef":null}}'

    PersistentVolume Anatomy

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pv-example
      annotations:
        pv.kubernetes.io/provisioned-by: ebs.csi.aws.com   # set by dynamic provisioner
    spec:
      capacity:
        storage: 20Gi                     # declared capacity; CSI may report actual
      accessModes:
        - ReadWriteOnce                   # RWO: single node at a time
      volumeMode: Filesystem              # or Block for raw device
      persistentVolumeReclaimPolicy: Retain
      storageClassName: gp3-encrypted
      mountOptions:
        - noatime
        - discard
      csi:
        driver: ebs.csi.aws.com
        volumeHandle: vol-0abc123def456789
        fsType: ext4
        volumeAttributes:
          throughput: "250"
      nodeAffinity:                       # required for local PVs, optional for CSI
        required:
          nodeSelectorTerms:
            - matchExpressions:
                - key: topology.ebs.csi.aws.com/zone
                  operator: In
                  values: [us-east-1a]

    Access Modes

    ModeShortMeaningTypical Drivers
    ReadWriteOnce RWO One node can mount read-write. Multiple pods on the same node can use it simultaneously. EBS, Azure Disk, GCE PD, local
    ReadOnlyMany ROX Many nodes can mount read-only. EFS, NFS, CephFS, Azure Files
    ReadWriteMany RWX Many nodes can mount read-write. EFS, CephFS, GlusterFS, NFS, Azure Files (SMB/NFS)
    ReadWriteOncePod RWOP Only a single pod cluster-wide can mount read-write. GA 1.29. Stronger than RWO. Any CSI driver (enforced by Kubernetes, not driver)
    ℹ️
    RWO vs RWOP RWO allows multiple pods on the same node to share a volume. RWOP enforces single-pod access cluster-wide at the Kubernetes layer — the CSI driver doesn't need to implement it. RWOP is the right choice for any workload where shared access would cause data corruption.

    Volume Modes

    volumeModeWhat the pod seesUse case
    Filesystem (default)A formatted filesystem directory99% of workloads — databases, app data, logs
    BlockA raw block device path (e.g. /dev/xvda)Databases that manage their own I/O (Ceph OSD, PostgreSQL on raw disk, SAP HANA)

    With volumeMode: Block, the container spec uses volumeDevices instead of volumeMounts:

    containers:
    - name: db
      image: postgres:16
      volumeDevices:
        - name: data
          devicePath: /dev/xvda    # raw block — no filesystem layer
    volumes:
    - name: data
      persistentVolumeClaim:
        claimName: raw-block-pvc

    PersistentVolumeClaim Anatomy

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: postgres-data
      namespace: production
    spec:
      accessModes:
        - ReadWriteOnce
      volumeMode: Filesystem
      resources:
        requests:
          storage: 20Gi             # minimum size — dynamic provisioner provisions exactly this
      storageClassName: gp3-encrypted
      # selector:                   # for static binding only
      #   matchLabels:
      #     type: fast-ssd
      # volumeName: pv-specific     # pin to a specific PV (static only)

    Binding Logic

    The bind controller matches a PVC to a PV by finding the smallest PV that satisfies:

    1. storageClassName matches (or both are empty)
    2. accessModes requested ⊆ accessModes available
    3. volumeMode matches
    4. capacity ≥ requested storage
    5. selector labels match (if specified)
    ⚠️
    Capacity over-provisioning If the smallest matching PV is 100Gi and the PVC requests 20Gi, the PVC binds to the 100Gi PV — but the pod only sees 100Gi of space (not just 20Gi). You cannot resize down. Dynamic provisioning provisions exactly the requested size.

    StorageClass

    A StorageClass is the template used by the dynamic provisioner to create PVs on demand. It encodes all provisioner-specific parameters so consumers need only specify a class name.

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: gp3-encrypted
      annotations:
        storageclass.kubernetes.io/is-default-class: "true"
    provisioner: ebs.csi.aws.com
    parameters:
      type: gp3
      iopsPerGB: "3000"
      throughput: "125"
      encrypted: "true"
      kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/mrk-abc123
    reclaimPolicy: Delete          # Delete or Retain
    volumeBindingMode: WaitForFirstConsumer   # or Immediate
    allowVolumeExpansion: true
    mountOptions:
      - noatime

    Reclaim Policies

    PolicyWhat Happens When PVC DeletedProduction Default
    DeletePV and underlying cloud storage deleted immediately✓ Cloud StorageClasses default to Delete
    RetainPV moves to Released state; data preserved; admin must clean upPreferred for stateful data — safer
    RecycleDeprecated. Ran rm -rf on the volume. Removed in 1.25.✗ Do not use
    🔴
    Default cloud StorageClasses use Delete On EKS, GKE, and AKS, the default StorageClass uses reclaimPolicy: Delete. If you delete a PVC (e.g., from helm uninstall without volumeClaimTemplates protection), the data is gone. Use Retain for any production database or use a Velero backup policy.

    Binding Modes

    ModeWhen PV is ProvisionedUse Case
    ImmediateWhen PVC is created — provisioner runs immediately regardless of pod schedulingNFS, CephFS, EFS — topology-agnostic storage
    WaitForFirstConsumerWhen a pod using the PVC is scheduled — provisioner knows the node's zoneEBS, Azure Disk, GCE PD — zonal block storage must land in pod's AZ
    ℹ️
    Why WaitForFirstConsumer matters With Immediate, the EBS volume might be provisioned in us-east-1a while the pod scheduler later places the pod on a node in us-east-1b. The pod gets stuck in Pending because EBS is zonal. WaitForFirstConsumer delays provisioning until the scheduler commits to a node, ensuring co-location.

    Static vs Dynamic Provisioning

    AspectStatic ProvisioningDynamic Provisioning
    PV creationAdmin creates PV manifests manuallyStorageClass + provisioner create PV on demand
    WorkflowAdmin creates volume in cloud console → writes PV YAML → deploysDeveloper creates PVC → done
    FlexibilityAdmin controls exact volume ID, size, IOPSParameters in StorageClass
    Self-serviceNo — requires admin for each volumeYes
    Pre-existing dataYes — can import existing cloud volumesNo — always creates new volume
    Use caseMigration, production databases with specific volume IDs, local PVsGeneral-purpose — recommended default

    Static Provisioning Example (import existing EBS volume)

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: imported-prod-db
    spec:
      capacity:
        storage: 500Gi
      accessModes: [ReadWriteOnce]
      persistentVolumeReclaimPolicy: Retain
      storageClassName: ""              # empty = not managed by StorageClass
      csi:
        driver: ebs.csi.aws.com
        volumeHandle: vol-0existingvolumeid   # existing EBS volume ID
        fsType: xfs
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: prod-db-claim
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 500Gi
      storageClassName: ""
      volumeName: imported-prod-db      # pin to specific PV

    Volume Expansion

    Expanding a PVC requires the StorageClass to have allowVolumeExpansion: true. You resize by editing the PVC's spec.resources.requests.storage to a larger value — you cannot shrink a PVC.

    # Expand PVC from 20Gi to 50Gi
    kubectl patch pvc postgres-data -p '{"spec":{"resources":{"requests":{"storage":"50Gi"}}}}'
    
    # Watch status
    kubectl get pvc postgres-data -w
    # NAME           STATUS   VOLUME              CAPACITY   ACCESS MODES
    # postgres-data  Bound    pvc-abc...          20Gi       RWO       # initially
    # postgres-data  Bound    pvc-abc...          50Gi       RWO       # after expansion

    Offline vs Online Expansion

    TypeRequires Pod RestartMechanism
    Controller expansion (cloud volume resize)No (CSI driver resizes cloud volume while pod runs)CSI ControllerExpandVolume
    Node expansion (filesystem resize)Sometimes — driver-dependentCSI NodeExpandVolume runs resize2fs/xfs_growfs
    Offline resize (legacy)Yes — must delete pod to unmount firstRequired for plugins that don't support online node expansion

    The PVC condition FileSystemResizePending indicates the cloud volume has been resized but the filesystem on the node hasn't been expanded yet. This clears after the pod mounts the volume and the kubelet runs the filesystem resize.

    kubectl describe pvc postgres-data | grep -A 3 Conditions
    # Conditions:
    #   Type                      Status
    #   FileSystemResizePending   True    # waiting for pod to remount
    

    CSI Architecture Overview

    The Container Storage Interface (CSI) is the standard API between Kubernetes and storage drivers. Every production storage driver (EBS, Azure Disk, Ceph, etc.) implements CSI. The architecture separates the driver into two halves: a controller plugin (runs as a Deployment, calls cloud APIs) and a node plugin (runs as a DaemonSet, handles node-level mount/format/unmount).

    ┌─────────────────────────────────────────────────────────────────┐
    │                     CSI ARCHITECTURE                            │
    │                                                                 │
    │  ┌──────────────────────────────────────────────────────────┐  │
    │  │  CONTROLLER SIDE (Deployment)                            │  │
    │  │                                                          │  │
    │  │  ┌──────────────────────┐    ┌────────────────────────┐  │  │
    │  │  │  CSI Driver          │    │  Kubernetes Sidecars   │  │  │
    │  │  │  (controller plugin) │◄───│  external-provisioner  │  │  │
    │  │  │                      │    │  external-attacher     │  │  │
    │  │  │  Implements:         │    │  external-resizer      │  │  │
    │  │  │  CreateVolume        │    │  external-snapshotter  │  │  │
    │  │  │  DeleteVolume        │    └────────────────────────┘  │  │
    │  │  │  ControllerPublish   │                                 │  │
    │  │  │  ControllerExpand    │                                 │  │
    │  │  └──────────────────────┘                                 │  │
    │  └──────────────────────────────────────────────────────────┘  │
    │                                                                 │
    │  ┌──────────────────────────────────────────────────────────┐  │
    │  │  NODE SIDE (DaemonSet — runs on every worker node)       │  │
    │  │                                                          │  │
    │  │  ┌──────────────────────┐    ┌────────────────────────┐  │  │
    │  │  │  CSI Driver          │    │  node-driver-registrar │  │  │
    │  │  │  (node plugin)       │◄───│  (registers with       │  │  │
    │  │  │                      │    │   kubelet plugin dir)  │  │  │
    │  │  │  Implements:         │    └────────────────────────┘  │  │
    │  │  │  NodeStageVolume     │                                 │  │
    │  │  │  NodePublishVolume   │    ← kubelet calls these        │  │
    │  │  │  NodeExpandVolume    │      via gRPC Unix socket       │  │
    │  │  │  NodeUnpublishVolume │                                 │  │
    │  │  └──────────────────────┘                                 │  │
    │  └──────────────────────────────────────────────────────────┘  │
    │                                                                 │
    │  CSIDriver object (cluster-scoped): declares driver capabilities│
    │  (podInfoOnMount, attachRequired, volumeLifecycleModes, etc.)  │
    └─────────────────────────────────────────────────────────────────┘
    

    CSIDriver Object

    apiVersion: storage.k8s.io/v1
    kind: CSIDriver
    metadata:
      name: ebs.csi.aws.com
    spec:
      attachRequired: true              # driver manages attach/detach (most block drivers)
      podInfoOnMount: true              # kubelet passes pod name/namespace to NodePublish
      volumeLifecycleModes:
        - Persistent                    # supports PV/PVC
        # - Ephemeral                   # supports inline CSI ephemeral volumes
      fsGroupPolicy: File               # how fsGroup chown works: None/File/ReadWriteOnceWithFSType
      tokenRequests:                    # request audience-specific tokens (e.g., for cloud auth)
        - audience: sts.amazonaws.com
          expirationSeconds: 86400
      requiresRepublish: false

    Storage Capacity Tracking

    GA in Kubernetes 1.24. When enabled, the CSI driver's external-provisioner creates CSIStorageCapacity objects that report available capacity per topology zone. The kube-scheduler uses this information to avoid scheduling pods on nodes where the required storage capacity isn't available.

    # CSIStorageCapacity is created automatically by external-provisioner
    apiVersion: storage.k8s.io/v1
    kind: CSIStorageCapacity
    metadata:
      name: capacity-us-east-1a-gp3
      namespace: kube-system
    storageClassName: gp3-encrypted
    nodeTopology:
      matchLabels:
        topology.ebs.csi.aws.com/zone: us-east-1a
    capacity: 5Ti                # available capacity in this zone

    Enable in the StorageClass by referencing volumeBindingMode: WaitForFirstConsumer — the scheduler only reads CSIStorageCapacity when using WaitForFirstConsumer. Enable in the CSI driver deployment with --feature-gates=Topology=true on the external-provisioner sidecar.

    Local Volumes

    Local PVs expose a node-local disk, directory, or partition as a PV — with full RWO access and block device support. Unlike hostPath, local PVs track node affinity so the scheduler binds pods to the correct node. There is no dynamic provisioner for local PVs (the local static provisioner automates PV creation from a discovery directory).

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: local-ssd-node1
    spec:
      capacity:
        storage: 400Gi
      accessModes:
        - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain   # always Retain for local PVs
      storageClassName: local-ssd
      volumeMode: Filesystem
      local:
        path: /mnt/disks/ssd1
      nodeAffinity:                 # REQUIRED — ties PV to a specific node
        required:
          nodeSelectorTerms:
            - matchExpressions:
                - key: kubernetes.io/hostname
                  operator: In
                  values: [worker-node-1]
    ---
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: local-ssd
    provisioner: kubernetes.io/no-provisioner   # no dynamic provisioner
    volumeBindingMode: WaitForFirstConsumer     # REQUIRED for local PVs
    ⚠️
    Node failure with local PVs If the node with the local PV fails, the pod using it will be stuck in Pending until the node comes back. Local PVs have no replication — data is tied to a single node. Use a distributed storage system (Ceph, Longhorn) if you need durability without node affinity.

    Encryption at Rest

    Cloud KMS Integration

    Most cloud CSI drivers support volume-level encryption via cloud KMS. Specify the KMS key in the StorageClass parameters:

    # AWS EBS gp3 with KMS
    parameters:
      encrypted: "true"
      kmsKeyId: arn:aws:kms:us-east-1:123456:key/mrk-abc123
    
    # GCE PD with Cloud KMS
    parameters:
      disk-encryption-kms-key: projects/my-proj/locations/us-east1/keyRings/my-ring/cryptoKeys/my-key
    
    # Azure Disk with disk encryption set
    parameters:
      diskEncryptionSetID: /subscriptions/.../diskEncryptionSets/my-des

    etcd Encryption (Secrets)

    Kubernetes Secrets stored in etcd are base64-encoded but not encrypted by default. Configure the API server with an EncryptionConfiguration to encrypt Secrets at rest using AES-GCM, AES-CBC, or KMS provider:

    apiVersion: apiserver.config.k8s.io/v1
    kind: EncryptionConfiguration
    resources:
      - resources: [secrets]
        providers:
          - aescbc:
              keys:
                - name: key1
                  secret: <base64-encoded-32-byte-key>
          - identity: {}   # fallback — allows reading unencrypted secrets during rotation

    Volume Snapshots Overview

    Volume snapshots provide point-in-time copies of PVs for backup and restore. Three custom resources (installed via the volume-snapshots CRDs, not included in core Kubernetes):

    ResourceScopeAnalogous To
    VolumeSnapshotClassClusterStorageClass — defines driver and deletion policy
    VolumeSnapshotNamespacePVC — user requests a snapshot
    VolumeSnapshotContentClusterPV — actual snapshot on the backend

    Deep coverage including CSI snapshot controller, restore workflow, and cross-namespace snapshot cloning is in 05-volume-snapshots.html.

    Storage Decision Matrix

    Workload NeedRecommended TypeAccessModeNotes
    Temporary scratch space (build cache, tmp files)emptyDirN/AUse medium: Memory for RAM disk
    App configuration / environmentconfigMap or secret volumeN/AAuto-updates; don't use for binary data >1MiB
    Database (MySQL, PostgreSQL, MongoDB)PVC with block storage (gp3, Premium SSD)RWO or RWOPWaitForFirstConsumer; StatefulSet + volumeClaimTemplates
    Shared read-write (NFS-style)PVC with EFS/CephFS/Azure FilesRWXHigher latency than block; avoid for random I/O DBs
    Shared read-only (media, ML models)PVC with ROX-capable driverROXOr use OCI artifact / object storage + init container
    High-performance local NVMeLocal PVRWONo portability; manual PV management; local-static-provisioner
    Ephemeral cloud SSD (larger than node disk)Generic ephemeral volumeRWOPVC created/deleted with pod; uses any StorageClass
    Secrets store (Vault, AWS SSM)CSI ephemeral (Secrets Store CSI Driver)N/ANo PV/PVC; mounts secrets from external vault as files

    Cloud Provider Support Matrix

    DriverRWOROXRWXRWOPBlockSnapshotExpand
    AWS EBS (ebs.csi.aws.com)✓ online
    AWS EFS (efs.csi.aws.com)✓ auto
    GCE PD (pd.csi.storage.gke.io)✓ online
    GCE Filestore (filestore.csi.storage.gke.io)
    Azure Disk (disk.csi.azure.com)✓ online
    Azure Files (file.csi.azure.com)
    Ceph RBD (rbd.csi.ceph.com)✓ online
    CephFS (cephfs.csi.ceph.com)
    Local (kubernetes.io/no-provisioner)

    Metrics and Alerting

    Key Metrics

    MetricSourceAlert Threshold
    kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_byteskubelet> 85% → warn; > 95% → critical
    kubelet_volume_stats_inodes_used / kubelet_volume_stats_inodes_freekubelet> 85% inode usage
    kube_persistentvolumeclaim_status_phasekube-state-metricsphase != Bound for > 5m
    kube_persistentvolume_status_phasekube-state-metricsphase = Failed
    storage_operation_duration_secondskubeletP99 mount time > 30s
    csi_operations_secondsCSI driver (external sidecar metrics)P99 provision time > 60s

    Alerting Rules

    groups:
    - name: storage
      rules:
      - alert: PVCDiskAlmostFull
        expr: |
          (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.85
        for: 5m
        labels: {severity: warning}
        annotations:
          summary: "PVC {{ $labels.persistentvolumeclaim }} is {{ $value | humanizePercentage }} full"
    
      - alert: PVCDiskCritical
        expr: |
          (kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) > 0.95
        for: 2m
        labels: {severity: critical}
    
      - alert: PVCNotBound
        expr: kube_persistentvolumeclaim_status_phase{phase!="Bound"} == 1
        for: 10m
        labels: {severity: warning}
        annotations:
          summary: "PVC {{ $labels.persistentvolumeclaim }} stuck in {{ $labels.phase }}"
    
      - alert: PVFailed
        expr: kube_persistentvolume_status_phase{phase="Failed"} == 1
        for: 1m
        labels: {severity: critical}
        annotations:
          summary: "PV {{ $labels.persistentvolume }} is in Failed state"

    Troubleshooting Runbooks

    Runbook: PVC Stuck in Pending

    # 1. Describe the PVC
    kubectl describe pvc <name> -n <ns>
    # Events section will show binding failures
    
    # 2. Common causes:
    # "no persistent volumes available for this claim" → no matching static PV; wrong storageClassName
    # "waiting for a volume to be created" → provisioner can't create volume (check events + provisioner logs)
    # "waiting for first consumer" → WaitForFirstConsumer + no pod using PVC yet (expected)
    
    # 3. Check CSI driver controller logs
    kubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner --tail=50
    
    # 4. Check StorageClass exists and is spelled correctly
    kubectl get storageclass
    kubectl describe storageclass <name>

    Runbook: Pod Stuck in ContainerCreating (Volume Mount Failure)

    # 1. Describe pod to see volume mount error
    kubectl describe pod <name> -n <ns>
    # Look for: "Unable to attach or mount volumes"
    
    # 2. Common: previous pod crashed and volume is still attached to old node
    # Check VolumeAttachment objects
    kubectl get volumeattachment | grep <pv-name>
    kubectl delete volumeattachment <stuck-attachment>   # force detach (risky if node still up)
    
    # 3. Check node-side CSI plugin
    kubectl logs -n kube-system -l app=ebs-csi-node -c csi-driver --tail=50
    
    # 4. Check if node has reached attachment limit (AWS: m5.xlarge = 25 volumes)
    kubectl describe node <node> | grep "Allocatable" -A 5 | grep attachable

    Runbook: PVC Expand Not Completing

    # 1. Check PVC conditions
    kubectl describe pvc <name> | grep -A 5 Conditions
    # If FileSystemResizePending: pod must be running to trigger NodeExpandVolume
    
    # 2. Ensure pod is running (not crashed) — kubelet triggers filesystem resize on mount
    # 3. Check kubelet logs for resize errors
    journalctl -u kubelet | grep -i "resize" | tail -20
    
    # 4. If StorageClass does not have allowVolumeExpansion:true, the patch will be rejected
    kubectl get storageclass <name> -o yaml | grep allowVolumeExpansion

    Runbook: Released PV Cannot Rebind

    # PV is Released (previous PVC deleted) but new PVC won't bind to it
    # The PV still has claimRef pointing to the old PVC
    
    kubectl get pv <name> -o yaml | grep -A 6 claimRef
    # claimRef:
    #   name: old-pvc
    #   namespace: production
    #   uid: abc-123
    
    # Remove claimRef to make PV Available again
    kubectl patch pv <name> --type=json \
      -p '[{"op":"remove","path":"/spec/claimRef"}]'

    Runbook: Inode Exhaustion (disk usage looks fine but writes fail)

    # Symptoms: pod gets ENOSPC but df shows plenty of disk space
    # Cause: filesystem has no free inodes (too many small files)
    
    # Check inode usage
    kubectl exec -it <pod> -- df -i
    # Filesystem   Inodes   IUsed   IFree IUse% Mounted on
    # /dev/xvda   6553600 6553600       0  100% /data   ← full inodes!
    
    # Monitor via kubelet_volume_stats_inodes_used metric
    # Long-term fix: increase PVC size (more inodes on ext4/xfs) or clean up small files

    Section Roadmap

    This overview covers the framework. Each subsequent file goes deep on one topic:

    Best Practices

    1. Use WaitForFirstConsumer for any zonal block storage (EBS, Azure Disk, GCE PD). Immediate binding creates cross-AZ attachment failures.
    2. Default to Retain for production data. Override back to Delete only for ephemeral/test namespaces. The cloud default (Delete) will destroy data on helm uninstall.
    3. Set allowVolumeExpansion: true on every StorageClass. It costs nothing to enable and prevents emergency PV migrations when a database outgrows its disk.
    4. Alert on PVC fill >85%. CSI expansion is online but requires capacity — don't let a volume hit 100% before alerting fires.
    5. Use volumeClaimTemplates in StatefulSets, not a single shared PVC. Each pod instance gets its own isolated volume with a stable name (data-pod-0, data-pod-1).
    6. Never use hostPath for production workloads. It bypasses all Kubernetes storage accounting, has no access mode enforcement, and is a security risk. Use local PVs instead.
    7. Use RWOP instead of RWO for single-writer workloads (GA 1.29). It prevents split-brain from two pods accidentally claiming the same volume when a node is partitioned.
    8. Monitor inode usage separately from bytes. Workloads creating many small files (build caches, log aggregators, ML feature stores) exhaust inodes long before disk space.