Storage Classes

The complete StorageClass reference — every field, all major cloud provisioner parameters, topology constraints, multi-tier design patterns, and how to safely migrate PVCs between StorageClasses without downtime.

Section 04 of 13 File 4 of 8 Platform Engineer
What This Page Covers
  • StorageClass object model — cluster-scoped; relationship to PV/PVC/provisioner
  • Full StorageClass spec reference — provisioner, parameters, reclaimPolicy, volumeBindingMode, allowVolumeExpansion, mountOptions, allowedTopologies
  • provisioner field — CSI driver name; kubernetes.io/no-provisioner for local; legacy in-tree names
  • reclaimPolicy — Delete vs Retain; does not affect existing PVs
  • volumeBindingMode — Immediate vs WaitForFirstConsumer; topology-aware flow
  • allowVolumeExpansion — enabling on existing SC; per-driver support
  • mountOptions — passed to mount syscall; driver must support; debugging mount options
  • allowedTopologies — restrict provisioning to specific zones/regions
  • AWS EBS CSI driver — all gp2/gp3/io1/io2/st1/sc1 parameters; KMS encryption; throughput/IOPS tuning; multi-attach io1/io2
  • GCE Persistent Disk CSI driver — pd-standard/pd-ssd/pd-balanced/pd-extreme; replication-type; disk encryption key; provisioned-iops-on-create
  • Azure Disk CSI driver — Standard_LRS/Premium_LRS/UltraSSD_LRS/Premium_ZRS; cachingMode; enableBursting; networkAccessPolicy
  • Azure Files CSI driver — NFS vs SMB protocol; skuName; storageAccount; allowBlobPublicAccess
  • AWS EFS CSI driver — dynamic provisioning with access points; basePath; dirName; provisioningMode
  • Ceph RBD CSI — pool; imageFormat; imageFeatures; csi.storage.k8s.io/node-stage-secret-name
  • CephFS CSI — fsName; pool; mounter; kernelMountOptions
  • NFS subdir external provisioner — pathPattern; onDelete; archiveOnDelete
  • Multi-tier StorageClass design — fast/balanced/slow/shared tiers with example manifests
  • StorageClass for different teams — namespace-scoped access via ResourceQuota + LimitRange
  • Changing default StorageClass — annotation swap; multiple-defaults failure; retroactive assignment
  • StorageClass migration — cloning PVCs to new SC; Velero approach; zero-downtime migration runbook
  • StorageClass and VolumeSnapshotClass alignment — snapshotting must use compatible class
  • 4 alerting rules + 5 troubleshooting runbooks
  • 8 best practices
  • Object Model

    A StorageClass is a cluster-scoped object that acts as a template for dynamic PV provisioning. It encodes everything the provisioner needs to create a volume on behalf of a PVC: which backend to call, what type of disk to create, how to handle reclaim, and topology constraints.

    StorageClass (cluster-scoped)
      name: gp3-encrypted
      provisioner: ebs.csi.aws.com    ──────────► CSI driver controller plugin
      parameters: {type: gp3, ...}              (running in kube-system as Deployment)
      reclaimPolicy: Delete
      volumeBindingMode: WaitForFirstConsumer
      allowVolumeExpansion: true
             │
             │  PVC created with storageClassName: gp3-encrypted
             ▼
      external-provisioner sidecar calls CSI CreateVolume
             │
             ▼
      PersistentVolume created (spec inherited from StorageClass + provisioner response)
      PVC bound to PV
    

    StorageClasses are immutable in their parameters field once created — you cannot change the disk type or provisioner. To change parameters, create a new StorageClass and migrate PVCs (see StorageClass Migration).

    Full StorageClass Spec

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: gp3-encrypted
      annotations:
        storageclass.kubernetes.io/is-default-class: "true"   # at most one per cluster
    provisioner: ebs.csi.aws.com          # CSI driver name; must match CSIDriver object name
    
    parameters:                           # driver-specific; opaque to Kubernetes core
      type: gp3
      encrypted: "true"
      kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/mrk-abc123
    
    reclaimPolicy: Delete                 # Delete (default) | Retain
                                          # Recycle is deprecated and removed
    
    volumeBindingMode: WaitForFirstConsumer  # Immediate | WaitForFirstConsumer
    
    allowVolumeExpansion: true            # false by default; enable for resize support
    
    mountOptions:                         # passed to the mount command on the node
      - noatime
      - discard
    
    allowedTopologies:                    # restrict provisioning to these zones
      - matchLabelExpressions:
          - key: topology.ebs.csi.aws.com/zone
            values:
              - us-east-1a
              - us-east-1b
              - us-east-1c

    provisioner Field

    The provisioner name must exactly match the name registered in the CSIDriver object. CSI driver names are typically reverse-domain formatted. Common values:

    ProvisionerStorage Backend
    ebs.csi.aws.comAWS EBS (gp2, gp3, io1, io2, st1, sc1)
    efs.csi.aws.comAWS EFS (NFS managed file system)
    pd.csi.storage.gke.ioGCE Persistent Disk (pd-standard, pd-ssd, pd-balanced, pd-extreme)
    filestore.csi.storage.gke.ioGCP Filestore (NFS)
    disk.csi.azure.comAzure Managed Disk
    file.csi.azure.comAzure Files (SMB / NFS)
    rbd.csi.ceph.comCeph RBD (block)
    cephfs.csi.ceph.comCephFS (file, RWX)
    kubernetes.io/no-provisionerLocal volumes (no dynamic provisioning)
    nfs.csi.k8s.ioNFS CSI driver (subdir provisioner)
    driver.longhorn.ioLonghorn distributed block storage
    rancher.io/local-pathLocal Path Provisioner (Rancher)

    volumeBindingMode

    This field controls when the PV is provisioned relative to pod scheduling. See the full explanation in Persistent Volumes — WaitForFirstConsumer. The summary:

    ModePV Provisioned WhenRequired For
    ImmediatePVC is created (zone chosen randomly by provisioner)Topology-agnostic storage: EFS, CephFS, NFS, Azure Files
    WaitForFirstConsumerA pod using the PVC is scheduled to a nodeZonal block storage: EBS, Azure Disk, GCE PD, local PVs
    🔴
    Never use Immediate for EBS, Azure Disk, or GCE PD These are zonal resources. With Immediate, the provisioner may create the volume in a different AZ from where the pod lands. The pod becomes permanently stuck in Pending with node(s) had no available volume zone. Always use WaitForFirstConsumer for any zonal block driver.

    mountOptions

    Mount options are passed directly to the mount command on the node when attaching a filesystem volume. They are not validated by Kubernetes — an invalid option causes the mount to fail and the pod to be stuck in ContainerCreating.

    mountOptions:
      - noatime          # don't update atime on reads — reduces write I/O (recommended for DBs)
      - nodiratime       # don't update directory atime
      - discard          # enable TRIM for SSDs — frees blocks on delete (EBS gp3/io2 support it)
      - rsize=1048576    # NFS read size 1MiB (tune for throughput)
      - wsize=1048576    # NFS write size 1MiB
      - nfsvers=4.1      # force NFSv4.1 (pNFS capable)
      - hard             # NFS: retry indefinitely on server failure
      - timeo=600        # NFS: timeout 60 seconds before retry
    ⚠️
    discard and database write performance The discard (TRIM) option causes the filesystem to issue TRIM commands on block deallocation. On some cloud block storage implementations this adds latency to delete-heavy workloads. PostgreSQL VACUUM, MySQL purge threads, and compaction jobs can be significantly slower with discard. Use fstrim via a periodic job instead, or test the performance impact before enabling.

    allowedTopologies

    Restricts the provisioner to create volumes only within the specified topology zones. This is useful for performance (co-locate volume with expected workload zones) and cost optimization (avoid cross-AZ transfer fees).

    allowedTopologies:
      - matchLabelExpressions:
          - key: topology.ebs.csi.aws.com/zone
            values: [us-east-1a, us-east-1b]
      # EBS volumes will ONLY be provisioned in these two AZs
      # Pods that schedule to us-east-1c will not get volumes from this StorageClass
    ℹ️
    allowedTopologies + WaitForFirstConsumer interaction When both are set, the scheduler must choose a node in one of the allowed topology zones. If no nodes exist in the allowed zones, pod scheduling fails. Use allowedTopologies to pin storage to specific zones — useful for regulated workloads that must stay in specific regions.

    AWS EBS CSI Driver Parameters

    Driver: ebs.csi.aws.com

    ParameterValuesDescription
    typegp2, gp3, io1, io2, st1, sc1EBS volume type
    iopsInteger stringProvisioned IOPS for io1/io2; baseline IOPS for gp3
    throughputInteger string (MiB/s)Throughput for gp3 (default 125, max 1000)
    encrypted"true" / "false"Enable EBS encryption
    kmsKeyIdARN or key aliasCustomer-managed KMS key for encryption
    blockExpress"true" / "false"Enable io2 Block Express (higher IOPS/GiB, sub-millisecond latency)
    throughputMode"provisioned"For io1/io2 only

    gp3 — General Purpose (Recommended Default)

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: gp3
      annotations:
        storageclass.kubernetes.io/is-default-class: "true"
    provisioner: ebs.csi.aws.com
    parameters:
      type: gp3
      # gp3 defaults: 3000 IOPS, 125 MiB/s throughput (included in base price)
      # Override for higher performance (charged extra above baseline):
      iops: "6000"           # up to 16000 IOPS
      throughput: "250"       # up to 1000 MiB/s
      encrypted: "true"
    reclaimPolicy: Delete
    volumeBindingMode: WaitForFirstConsumer
    allowVolumeExpansion: true

    io2 Block Express — High Performance Databases

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: io2-block-express
    provisioner: ebs.csi.aws.com
    parameters:
      type: io2
      iops: "64000"          # up to 256000 IOPS with Block Express
      encrypted: "true"
      kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/mrk-prod-db
      blockExpress: "true"
    reclaimPolicy: Retain    # always Retain for production DBs
    volumeBindingMode: WaitForFirstConsumer
    allowVolumeExpansion: true

    st1 — Throughput Optimized HDD (Cold Data / Data Lake)

    parameters:
      type: st1
      # Minimum size: 125 GiB; Maximum: 16 TiB
      # 40 MiB/s per TiB baseline throughput, 250 MiB/s per TiB burst
      # NOT suitable for random I/O workloads — sequential only

    io1/io2 Multi-Attach

    parameters:
      type: io1               # or io2
      iops: "10000"
      multiAttachEnabled: "true"   # allows RWX-like behavior at EBS level
    # WARNING: Multi-Attach is not a substitute for distributed locking.
    # The application must handle concurrent writes (e.g., cluster-aware filesystems,
    # GFS2, OCFS2). Standard ext4/xfs will corrupt with concurrent writers.

    GCE Persistent Disk CSI Driver Parameters

    Driver: pd.csi.storage.gke.io

    ParameterValuesDescription
    typepd-standard, pd-ssd, pd-balanced, pd-extremeDisk type
    replication-typenone, regional-pdRegional PD: synchronous replication across 2 zones
    disk-encryption-kms-keyKMS key resource nameCustomer-managed encryption key
    provisioned-iops-on-createInteger stringProvisioned IOPS for pd-extreme (min 10000)
    provisioned-throughput-on-createInteger string (MiB/s)Provisioned throughput for pd-extreme
    # pd-ssd — SSD-backed (recommended for databases on GKE)
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: pd-ssd
    provisioner: pd.csi.storage.gke.io
    parameters:
      type: pd-ssd
      replication-type: none
    volumeBindingMode: WaitForFirstConsumer
    allowVolumeExpansion: true
    reclaimPolicy: Delete
    ---
    # Regional PD — synchronous two-zone replication for HA
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: pd-ssd-regional
    provisioner: pd.csi.storage.gke.io
    parameters:
      type: pd-ssd
      replication-type: regional-pd   # provisions in 2 AZs simultaneously
    allowedTopologies:
      - matchLabelExpressions:
          - key: topology.gke.io/zone
            values: [us-central1-a, us-central1-b]
    volumeBindingMode: WaitForFirstConsumer
    allowVolumeExpansion: true

    Azure Disk CSI Driver Parameters

    Driver: disk.csi.azure.com

    ParameterValuesDescription
    skuNameStandard_LRS, Premium_LRS, StandardSSD_LRS, UltraSSD_LRS, Premium_ZRS, StandardSSD_ZRSDisk SKU
    cachingModeNone, ReadOnly, ReadWriteHost caching mode
    kindmanagedAlways use managed (unmanaged is deprecated)
    diskEncryptionSetIDResource IDDisk Encryption Set for CMK
    enableBursting"true"Enable on-demand bursting for Premium SSDs
    networkAccessPolicyAllowAll, DenyAll, AllowPrivateControl disk access for security
    diskAccessIDResource IDDisk Access resource for private endpoint
    tagskey1=value1,key2=value2Azure resource tags on disk
    # Premium SSD with encryption and bursting
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: premium-ssd-encrypted
    provisioner: disk.csi.azure.com
    parameters:
      skuName: Premium_LRS
      cachingMode: ReadOnly          # appropriate for most DB data volumes
      kind: managed
      diskEncryptionSetID: /subscriptions/xxx/resourceGroups/rg/providers/Microsoft.Compute/diskEncryptionSets/my-des
      enableBursting: "true"
      networkAccessPolicy: DenyAll   # disallow direct disk access outside the cluster
    volumeBindingMode: WaitForFirstConsumer
    allowVolumeExpansion: true
    reclaimPolicy: Retain
    ---
    # Premium SSD v2 (ZRS — zone-redundant)
    parameters:
      skuName: Premium_ZRS           # synchronous zone-redundant storage (3 AZ copies)
      # NOTE: Premium_ZRS does not support cachingMode — must be None

    UltraSSD for demanding databases

    parameters:
      skuName: UltraSSD_LRS
      # Must enable Ultra SSD on the node pool:
      # az aks nodepool update --enable-ultra-ssd
      diskIOPSReadWrite: "160000"    # provisioned IOPS
      diskMBpsReadWrite: "2000"      # provisioned throughput MiB/s

    Azure Files CSI Driver Parameters

    Driver: file.csi.azure.com — supports RWX (ReadWriteMany)

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: azure-files-nfs
    provisioner: file.csi.azure.com
    parameters:
      protocol: nfs                   # nfs | smb; NFS recommended for Linux workloads
      skuName: Premium_LRS            # Standard_LRS, Premium_LRS, Standard_GRS, etc.
      # storageAccount: mystoraccount  # optional: use specific storage account
      # resourceGroup: my-rg           # optional: storage account resource group
      # subscriptionID: xxx            # optional: cross-subscription
    mountOptions:
      - nconnect=4                    # NFS: parallel connections per mount (improves throughput)
      - actimeo=30                    # NFS: attribute cache timeout
    volumeBindingMode: Immediate      # Azure Files is topology-agnostic (global file system)
    allowVolumeExpansion: true
    reclaimPolicy: Delete

    AWS EFS CSI Driver Parameters

    Driver: efs.csi.aws.com — RWX, serverless NFS, auto-scaling

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: efs-sc
    provisioner: efs.csi.aws.com
    parameters:
      provisioningMode: efs-ap         # efs-ap = dynamic access point provisioning
      fileSystemId: fs-0abc123def456   # existing EFS filesystem ID
      directoryPerms: "700"            # permissions for the access point root directory
      basePath: /dynamic               # base path on the EFS filesystem for access points
      # gidRangeStart: "1000"          # optional: GID range for access point
      # gidRangeEnd: "2000"
      # ensureUniqueDirectory: "true"  # prefix dirName with PVC UID to guarantee uniqueness
    volumeBindingMode: Immediate       # EFS is region-wide (not zonal)
    allowVolumeExpansion: true         # EFS auto-expands; this enables resize status tracking
    ℹ️
    EFS access points The EFS CSI driver in efs-ap mode creates an EFS Access Point for each PVC. Each access point gets its own root directory on the EFS filesystem with isolated ownership and permissions. This enables secure multi-tenant NFS on a shared EFS filesystem — different namespaces get different access points with no cross-tenant visibility.

    Ceph RBD CSI Driver Parameters

    Driver: rbd.csi.ceph.com — block storage backed by Ceph (via Rook-Ceph or external Ceph cluster)

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: rook-ceph-block
    provisioner: rbd.csi.ceph.com
    parameters:
      clusterID: <ceph-cluster-id>         # from `ceph fsid`
      pool: replicapool                      # Ceph RBD pool name
      imageFormat: "2"                       # always use format 2 (format 1 is deprecated)
      imageFeatures: layering                # comma-separated RBD features
      # imageFeatures: layering,fast-diff,object-map,deep-flatten,exclusive-lock
      csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
      csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
      csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
      csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
      csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
      csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
      csi.storage.k8s.io/fstype: ext4
    reclaimPolicy: Delete
    allowVolumeExpansion: true
    volumeBindingMode: Immediate
    mountOptions:
      - discard

    CephFS CSI Driver Parameters

    Driver: cephfs.csi.ceph.com — POSIX filesystem, RWX support

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: rook-cephfs
    provisioner: cephfs.csi.ceph.com
    parameters:
      clusterID: <ceph-cluster-id>
      fsName: myfs                     # CephFS filesystem name
      pool: myfs-replicated            # metadata or data pool
      mounter: kernel                  # kernel | fuse; kernel is faster; fuse supports more features
      kernelMountOptions: ms_mode=prefer-crc    # kernel client mount options
      csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
      csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
      csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
      csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
    reclaimPolicy: Delete
    allowVolumeExpansion: true
    volumeBindingMode: Immediate

    NFS Subdir External Provisioner

    Dynamically provisions subdirectories on an existing NFS server as PVs. Not a CSI driver — uses the older external provisioner interface.

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: nfs-client
    provisioner: cluster.local/nfs-subdir-external-provisioner
    parameters:
      server: nfs-server.prod.svc.cluster.local
      path: /exports
      pathPattern: "${.PVC.namespace}/${.PVC.annotations.nfs.io/storage-path}"  # dynamic path pattern
      onDelete: retain                 # retain | delete | archive
      archiveOnDelete: "true"          # if onDelete=delete: rename dir to archived-<pvc-uid>
    reclaimPolicy: Delete
    volumeBindingMode: Immediate

    Multi-Tier StorageClass Design

    Production clusters should define multiple StorageClasses covering different performance and cost tiers. Namespace teams select the appropriate tier via PVC storageClassName, and platform teams enforce limits via ResourceQuota.

    fast
    High IOPS

    io2 Block Express / pd-extreme / UltraSSD. For production databases requiring <1ms latency and high IOPS. Most expensive. Protected by ResourceQuota.

    standard (default)
    Balanced

    gp3 / pd-ssd / Premium_LRS. Default for most workloads. Good baseline performance at reasonable cost. Should be the cluster default SC.

    slow
    Low Cost

    st1 / pd-standard / Standard_LRS. For batch jobs, backups, cold data, or development. Lowest cost. Not suitable for latency-sensitive workloads.

    shared
    RWX / NFS

    EFS / CephFS / Azure Files NFS. ReadWriteMany. For shared config, ML datasets, multi-pod media access. Higher latency than block.

    # ResourceQuota to restrict fast storage to production namespace only
    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: storage-quota
      namespace: development
    spec:
      hard:
        requests.storage: 100Gi              # total PVC storage in namespace
        fast.storageclass.storage.k8s.io/requests.storage: "0"   # no fast SC in dev
        fast.storageclass.storage.k8s.io/persistentvolumeclaims: "0"
    💡
    StorageClass-scoped ResourceQuota The format <storageClassName>.storageclass.storage.k8s.io/requests.storage limits PVC capacity per StorageClass per namespace. This lets you offer all tiers cluster-wide but control which namespaces can access expensive classes.

    Default StorageClass Management

    Exactly one StorageClass should be marked default at any time. PVCs without a storageClassName field (not "", but truly absent) use the default.

    # List all StorageClasses and their default status
    kubectl get storageclass
    # NAME             PROVISIONER        RECLAIMPOLICY  VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION  AGE
    # gp2              kubernetes.io/aws  Delete         Immediate              false                 2y
    # gp3 (default)    ebs.csi.aws.com    Delete         WaitForFirstConsumer   true                  6mo
    
    # Atomically swap the default (remove from gp2, add to gp3)
    kubectl patch storageclass gp2 \
      -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
    kubectl patch storageclass gp3 \
      -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
    ⚠️
    Brief window with no default is safe Removing the old default before adding the new one creates a brief window where no default exists. PVCs created during this window without an explicit storageClassName will fail admission. In practice this window is milliseconds; apply both patches in the same kubectl call or accept the minimal risk.

    StorageClass Migration

    StorageClass parameters are immutable. To move PVCs to a new StorageClass (e.g., upgrading from gp2 to gp3, or adding encryption), you must create new PVCs and migrate data. Three approaches:

    Approach 1: PVC Clone (Online, Same StorageClass Driver)

    Works if the source and destination StorageClasses share the same CSI driver but different parameters (e.g., gp2 → gp3 both use ebs.csi.aws.com). Uses dataSource PVC cloning:

    # 1. Create new PVC as a clone with the new StorageClass
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: data-postgres-0-gp3
      namespace: production
    spec:
      dataSource:
        kind: PersistentVolumeClaim
        name: data-postgres-0          # source PVC (must be Bound)
      storageClassName: gp3-encrypted  # destination StorageClass
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 100Gi
    
    # 2. Verify clone is Bound and data is present
    kubectl exec -it postgres-0 -- psql -c "SELECT count(*) FROM orders;"   # note count
    
    # 3. Update StatefulSet to use new PVC name (or swap via volumeClaimTemplate rename)
    # This requires careful StatefulSet management — see 06-stateful-storage-patterns.html

    Approach 2: Snapshot-Based Migration (Cross-Driver)

    # 1. Take snapshot of source PVC
    kubectl apply -f - <

    Approach 3: rsync / Velero (Cross-Cluster or Different CSI)

    When cloning and snapshot restore are not available (different cloud providers, on-prem to cloud migration), use a data mover:

    # Velero backup and restore
    velero backup create prod-db-backup \
      --include-namespaces production \
      --snapshot-volumes \
      --storage-location default
    
    # Restore into new cluster with StorageClass remapping
    velero restore create --from-backup prod-db-backup \
      --namespace-mappings production:production-new \
      --existing-resource-policy update

    StorageClass and VolumeSnapshotClass Alignment

    A VolumeSnapshot must be created with a VolumeSnapshotClass that uses the same CSI driver as the PVC's StorageClass. Mismatched drivers cause snapshot creation to fail:

    apiVersion: snapshot.storage.k8s.io/v1
    kind: VolumeSnapshotClass
    metadata:
      name: csi-aws-vsc
      annotations:
        snapshot.storage.kubernetes.io/is-default-class: "true"
    driver: ebs.csi.aws.com      # MUST match the StorageClass provisioner
    deletionPolicy: Delete        # Delete | Retain
    parameters:
      # driver-specific snapshot parameters (e.g., tags)
      tagSpecification_1: "environment=production"
    ⚠️
    Separate default VolumeSnapshotClass per driver If your cluster has both EBS and EFS StorageClasses, you need separate VolumeSnapshotClasses for each. Marking both as default causes snapshot creation without explicit class to fail. Align: one default VolumeSnapshotClass per driver.

    Alerting Rules

    groups:
    - name: storageclass
      rules:
      - alert: PVCProvisioningFailed
        expr: |
          increase(storage_operation_errors_total{operation_name="provision"}[5m]) > 0
        labels: {severity: warning}
        annotations:
          summary: "CSI provisioning errors — check CSI controller logs"
    
      - alert: StorageClassMissing
        expr: |
          kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1
            unless on(storageclass)
          kube_storageclass_info
        for: 5m
        labels: {severity: warning}
        annotations:
          summary: "PVC references a StorageClass that doesn't exist"
    
      - alert: DefaultStorageClassMissing
        expr: |
          count(kube_storageclass_info{is_default_class="true"}) == 0
        for: 2m
        labels: {severity: warning}
        annotations:
          summary: "No default StorageClass defined — PVCs without explicit SC will fail"
    
      - alert: MultipleDefaultStorageClasses
        expr: |
          count(kube_storageclass_info{is_default_class="true"}) > 1
        for: 1m
        labels: {severity: critical}
        annotations:
          summary: "Multiple default StorageClasses — PVC creation without SC name will fail"

    Troubleshooting Runbooks

    Runbook: PVC Stuck Pending — Wrong StorageClass

    # Check what StorageClass the PVC references
    kubectl get pvc <name> -n <ns> -o jsonpath='{.spec.storageClassName}'
    
    # Verify the StorageClass exists
    kubectl get storageclass <name>
    # If NotFound: check for typo or missing cluster setup
    
    # Verify the provisioner is running
    kubectl get pods -n kube-system | grep csi
    
    # Check provisioner logs
    kubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner --tail=50

    Runbook: Volume Mount Fails — Bad mountOptions

    # Pod stuck in ContainerCreating with "failed to mount" error
    kubectl describe pod <name> -n <ns>
    # Event: MountVolume.MountDevice failed: ... exit status 32
    # OR: unrecognized option 'noatime' (driver doesn't pass options)
    
    # Check kubelet logs on the node
    kubectl get node -o wide   # find node
    ssh node-ip -- journalctl -u kubelet | grep -i "mount failed" | tail -20
    
    # Common causes:
    # 1. Option not supported by filesystem type (e.g., nfs option on ext4)
    # 2. Typo in mount option name
    # 3. Driver does not implement mountOptions (verify CSIDriver spec)
    kubectl get csidriver ebs.csi.aws.com -o yaml | grep -i mountInfo

    Runbook: Migration — PVC Clone Stuck in Pending

    # PVC clone is Pending; source PVC is Bound
    kubectl describe pvc data-postgres-0-gp3 -n production
    # Event: "waiting for a volume to be created..."
    
    # Common causes:
    # 1. Source and destination StorageClass use different provisioners
    #    Clone requires same driver
    kubectl get storageclass gp2 -o jsonpath='{.provisioner}'
    kubectl get storageclass gp3 -o jsonpath='{.provisioner}'
    # If different → use snapshot-based migration instead
    
    # 2. Source PVC is not in Bound state at clone time
    kubectl get pvc data-postgres-0 -o jsonpath='{.status.phase}'

    Runbook: EBS Volume Created in Wrong AZ (Immediate Binding)

    # Pod stuck: "no nodes available to schedule pods"
    # PVC Events: "successfully provisioned volume" in wrong zone
    kubectl describe pvc <name> | grep "topology.ebs.csi.aws.com/zone"
    
    # Fix for existing stuck PVC:
    # 1. Note the current data (snapshot if needed)
    # 2. Delete the PVC (will delete the EBS volume if reclaimPolicy:Delete)
    # 3. Patch the StorageClass to WaitForFirstConsumer
    kubectl patch storageclass <name> -p '{"volumeBindingMode":"WaitForFirstConsumer"}'
    # 4. Recreate the PVC
    
    # Prevention: always use WaitForFirstConsumer for EBS/Azure Disk/GCE PD

    Runbook: discard Mount Option Causing Slow DB Writes

    # Database (PostgreSQL VACUUM, MySQL purge) is very slow
    # Suspect: discard mount option on StorageClass
    
    # Check if discard is in mountOptions
    kubectl get storageclass <name> -o jsonpath='{.mountOptions}'
    
    # If discard is enabled and causing issues:
    # 1. Create new StorageClass without discard
    # 2. Migrate PVCs to new SC
    # 3. Or: disable at filesystem level on existing volumes
    kubectl exec -it <pod> -- tune2fs -o ^discard /dev/xvda
    # Use periodic fstrim instead:
    kubectl exec -it <pod> -- fstrim -v /var/lib/postgresql/data

    Best Practices

    1. Use WaitForFirstConsumer for all zonal block drivers (EBS, Azure Disk, GCE PD). This is the single most common StorageClass misconfiguration that causes stuck pods in multi-AZ clusters.
    2. Set reclaimPolicy: Retain for production database StorageClasses. Even if you forget to patch a specific PV, the StorageClass policy applies at provisioning time.
    3. Always enable allowVolumeExpansion: true. It costs nothing to enable. StorageClass parameters are immutable — you cannot add this later without creating a new StorageClass and migrating PVCs.
    4. Define named tiers (fast/standard/slow/shared) rather than a single default. Teams that never think about storage will use the default. Make the default the right choice for most workloads (gp3, pd-ssd, Premium_LRS) — not the cheapest or most expensive.
    5. Enforce StorageClass access via ResourceQuota. Use per-class quotas to prevent development namespaces from consuming expensive io2/pd-extreme storage.
    6. Align VolumeSnapshotClass driver with StorageClass provisioner. Mismatched drivers cause silent snapshot failures. Name your VolumeSnapshotClasses to match their StorageClasses for clarity.
    7. Test mountOptions before production rollout. Invalid options cause mount failures with cryptic errors. Test by manually running the mount command on a node, or by deploying a test PVC and pod first.
    8. Document StorageClass intent in annotations. Add metadata.annotations describing the intended workload type, cost tier, and any performance characteristics. Cluster users shouldn't need to read StorageClass parameters to understand what a class is for.