Storage Classes

The complete StorageClass reference — every field, all major cloud provisioner parameters, topology constraints, multi-tier design patterns, and how to safely migrate PVCs between StorageClasses without downtime.

Section 04 of 13 File 4 of 8 Platform Engineer

What This Page Covers

StorageClass object model — cluster-scoped; relationship to PV/PVC/provisioner

Full StorageClass spec reference — provisioner, parameters, reclaimPolicy, volumeBindingMode, allowVolumeExpansion, mountOptions, allowedTopologies

provisioner field — CSI driver name; kubernetes.io/no-provisioner for local; legacy in-tree names

reclaimPolicy — Delete vs Retain; does not affect existing PVs

volumeBindingMode — Immediate vs WaitForFirstConsumer; topology-aware flow

allowVolumeExpansion — enabling on existing SC; per-driver support

mountOptions — passed to mount syscall; driver must support; debugging mount options

allowedTopologies — restrict provisioning to specific zones/regions

AWS EBS CSI driver — all gp2/gp3/io1/io2/st1/sc1 parameters; KMS encryption; throughput/IOPS tuning; multi-attach io1/io2

GCE Persistent Disk CSI driver — pd-standard/pd-ssd/pd-balanced/pd-extreme; replication-type; disk encryption key; provisioned-iops-on-create

Azure Disk CSI driver — Standard_LRS/Premium_LRS/UltraSSD_LRS/Premium_ZRS; cachingMode; enableBursting; networkAccessPolicy

Azure Files CSI driver — NFS vs SMB protocol; skuName; storageAccount; allowBlobPublicAccess

AWS EFS CSI driver — dynamic provisioning with access points; basePath; dirName; provisioningMode

Ceph RBD CSI — pool; imageFormat; imageFeatures; csi.storage.k8s.io/node-stage-secret-name

CephFS CSI — fsName; pool; mounter; kernelMountOptions

NFS subdir external provisioner — pathPattern; onDelete; archiveOnDelete

Multi-tier StorageClass design — fast/balanced/slow/shared tiers with example manifests

StorageClass for different teams — namespace-scoped access via ResourceQuota + LimitRange

Changing default StorageClass — annotation swap; multiple-defaults failure; retroactive assignment

StorageClass migration — cloning PVCs to new SC; Velero approach; zero-downtime migration runbook

StorageClass and VolumeSnapshotClass alignment — snapshotting must use compatible class

4 alerting rules + 5 troubleshooting runbooks

8 best practices

Object Model

A StorageClass is a cluster-scoped object that acts as a template for dynamic PV provisioning. It encodes everything the provisioner needs to create a volume on behalf of a PVC: which backend to call, what type of disk to create, how to handle reclaim, and topology constraints.

StorageClass (cluster-scoped)
  name: gp3-encrypted
  provisioner: ebs.csi.aws.com    ──────────► CSI driver controller plugin
  parameters: {type: gp3, ...}              (running in kube-system as Deployment)
  reclaimPolicy: Delete
  volumeBindingMode: WaitForFirstConsumer
  allowVolumeExpansion: true
         │
         │  PVC created with storageClassName: gp3-encrypted
         ▼
  external-provisioner sidecar calls CSI CreateVolume
         │
         ▼
  PersistentVolume created (spec inherited from StorageClass + provisioner response)
  PVC bound to PV

StorageClasses are immutable in their parameters field once created — you cannot change the disk type or provisioner. To change parameters, create a new StorageClass and migrate PVCs (see StorageClass Migration).

Full StorageClass Spec

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-encrypted
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"   # at most one per cluster
provisioner: ebs.csi.aws.com          # CSI driver name; must match CSIDriver object name

parameters:                           # driver-specific; opaque to Kubernetes core
  type: gp3
  encrypted: "true"
  kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/mrk-abc123

reclaimPolicy: Delete                 # Delete (default) | Retain
                                      # Recycle is deprecated and removed

volumeBindingMode: WaitForFirstConsumer  # Immediate | WaitForFirstConsumer

allowVolumeExpansion: true            # false by default; enable for resize support

mountOptions:                         # passed to the mount command on the node
  - noatime
  - discard

allowedTopologies:                    # restrict provisioning to these zones
  - matchLabelExpressions:
      - key: topology.ebs.csi.aws.com/zone
        values:
          - us-east-1a
          - us-east-1b
          - us-east-1c

provisioner Field

The provisioner name must exactly match the name registered in the CSIDriver object. CSI driver names are typically reverse-domain formatted. Common values:

Provisioner	Storage Backend
`ebs.csi.aws.com`	AWS EBS (gp2, gp3, io1, io2, st1, sc1)
`efs.csi.aws.com`	AWS EFS (NFS managed file system)
`pd.csi.storage.gke.io`	GCE Persistent Disk (pd-standard, pd-ssd, pd-balanced, pd-extreme)
`filestore.csi.storage.gke.io`	GCP Filestore (NFS)
`disk.csi.azure.com`	Azure Managed Disk
`file.csi.azure.com`	Azure Files (SMB / NFS)
`rbd.csi.ceph.com`	Ceph RBD (block)
`cephfs.csi.ceph.com`	CephFS (file, RWX)
`kubernetes.io/no-provisioner`	Local volumes (no dynamic provisioning)
`nfs.csi.k8s.io`	NFS CSI driver (subdir provisioner)
`driver.longhorn.io`	Longhorn distributed block storage
`rancher.io/local-path`	Local Path Provisioner (Rancher)

volumeBindingMode

This field controls when the PV is provisioned relative to pod scheduling. See the full explanation in Persistent Volumes — WaitForFirstConsumer. The summary:

Mode	PV Provisioned When	Required For
`Immediate`	PVC is created (zone chosen randomly by provisioner)	Topology-agnostic storage: EFS, CephFS, NFS, Azure Files
`WaitForFirstConsumer`	A pod using the PVC is scheduled to a node	Zonal block storage: EBS, Azure Disk, GCE PD, local PVs

🔴

Never use Immediate for EBS, Azure Disk, or GCE PD These are zonal resources. With Immediate, the provisioner may create the volume in a different AZ from where the pod lands. The pod becomes permanently stuck in Pending with node(s) had no available volume zone. Always use WaitForFirstConsumer for any zonal block driver.

mountOptions

Mount options are passed directly to the mount command on the node when attaching a filesystem volume. They are not validated by Kubernetes — an invalid option causes the mount to fail and the pod to be stuck in ContainerCreating.

mountOptions:
  - noatime          # don't update atime on reads — reduces write I/O (recommended for DBs)
  - nodiratime       # don't update directory atime
  - discard          # enable TRIM for SSDs — frees blocks on delete (EBS gp3/io2 support it)
  - rsize=1048576    # NFS read size 1MiB (tune for throughput)
  - wsize=1048576    # NFS write size 1MiB
  - nfsvers=4.1      # force NFSv4.1 (pNFS capable)
  - hard             # NFS: retry indefinitely on server failure
  - timeo=600        # NFS: timeout 60 seconds before retry

⚠️

discard and database write performance The discard (TRIM) option causes the filesystem to issue TRIM commands on block deallocation. On some cloud block storage implementations this adds latency to delete-heavy workloads. PostgreSQL VACUUM, MySQL purge threads, and compaction jobs can be significantly slower with discard. Use fstrim via a periodic job instead, or test the performance impact before enabling.

allowedTopologies

Restricts the provisioner to create volumes only within the specified topology zones. This is useful for performance (co-locate volume with expected workload zones) and cost optimization (avoid cross-AZ transfer fees).

allowedTopologies:
  - matchLabelExpressions:
      - key: topology.ebs.csi.aws.com/zone
        values: [us-east-1a, us-east-1b]
  # EBS volumes will ONLY be provisioned in these two AZs
  # Pods that schedule to us-east-1c will not get volumes from this StorageClass

ℹ️

allowedTopologies + WaitForFirstConsumer interaction When both are set, the scheduler must choose a node in one of the allowed topology zones. If no nodes exist in the allowed zones, pod scheduling fails. Use allowedTopologies to pin storage to specific zones — useful for regulated workloads that must stay in specific regions.

AWS EBS CSI Driver Parameters

Driver: ebs.csi.aws.com

Parameter	Values	Description
`type`	gp2, gp3, io1, io2, st1, sc1	EBS volume type
`iops`	Integer string	Provisioned IOPS for io1/io2; baseline IOPS for gp3
`throughput`	Integer string (MiB/s)	Throughput for gp3 (default 125, max 1000)
`encrypted`	"true" / "false"	Enable EBS encryption
`kmsKeyId`	ARN or key alias	Customer-managed KMS key for encryption
`blockExpress`	"true" / "false"	Enable io2 Block Express (higher IOPS/GiB, sub-millisecond latency)
`throughputMode`	"provisioned"	For io1/io2 only

gp3 — General Purpose (Recommended Default)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  # gp3 defaults: 3000 IOPS, 125 MiB/s throughput (included in base price)
  # Override for higher performance (charged extra above baseline):
  iops: "6000"           # up to 16000 IOPS
  throughput: "250"       # up to 1000 MiB/s
  encrypted: "true"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

io2 Block Express — High Performance Databases

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: io2-block-express
provisioner: ebs.csi.aws.com
parameters:
  type: io2
  iops: "64000"          # up to 256000 IOPS with Block Express
  encrypted: "true"
  kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/mrk-prod-db
  blockExpress: "true"
reclaimPolicy: Retain    # always Retain for production DBs
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

st1 — Throughput Optimized HDD (Cold Data / Data Lake)

parameters:
  type: st1
  # Minimum size: 125 GiB; Maximum: 16 TiB
  # 40 MiB/s per TiB baseline throughput, 250 MiB/s per TiB burst
  # NOT suitable for random I/O workloads — sequential only

io1/io2 Multi-Attach

parameters:
  type: io1               # or io2
  iops: "10000"
  multiAttachEnabled: "true"   # allows RWX-like behavior at EBS level
# WARNING: Multi-Attach is not a substitute for distributed locking.
# The application must handle concurrent writes (e.g., cluster-aware filesystems,
# GFS2, OCFS2). Standard ext4/xfs will corrupt with concurrent writers.

GCE Persistent Disk CSI Driver Parameters

Driver: pd.csi.storage.gke.io

Parameter	Values	Description
`type`	pd-standard, pd-ssd, pd-balanced, pd-extreme	Disk type
`replication-type`	none, regional-pd	Regional PD: synchronous replication across 2 zones
`disk-encryption-kms-key`	KMS key resource name	Customer-managed encryption key
`provisioned-iops-on-create`	Integer string	Provisioned IOPS for pd-extreme (min 10000)
`provisioned-throughput-on-create`	Integer string (MiB/s)	Provisioned throughput for pd-extreme

# pd-ssd — SSD-backed (recommended for databases on GKE)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pd-ssd
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
  replication-type: none
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Delete
---
# Regional PD — synchronous two-zone replication for HA
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pd-ssd-regional
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
  replication-type: regional-pd   # provisions in 2 AZs simultaneously
allowedTopologies:
  - matchLabelExpressions:
      - key: topology.gke.io/zone
        values: [us-central1-a, us-central1-b]
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Azure Disk CSI Driver Parameters

Driver: disk.csi.azure.com

Parameter	Values	Description
`skuName`	Standard_LRS, Premium_LRS, StandardSSD_LRS, UltraSSD_LRS, Premium_ZRS, StandardSSD_ZRS	Disk SKU
`cachingMode`	None, ReadOnly, ReadWrite	Host caching mode
`kind`	managed	Always use managed (unmanaged is deprecated)
`diskEncryptionSetID`	Resource ID	Disk Encryption Set for CMK
`enableBursting`	"true"	Enable on-demand bursting for Premium SSDs
`networkAccessPolicy`	AllowAll, DenyAll, AllowPrivate	Control disk access for security
`diskAccessID`	Resource ID	Disk Access resource for private endpoint
`tags`	key1=value1,key2=value2	Azure resource tags on disk

# Premium SSD with encryption and bursting
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: premium-ssd-encrypted
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_LRS
  cachingMode: ReadOnly          # appropriate for most DB data volumes
  kind: managed
  diskEncryptionSetID: /subscriptions/xxx/resourceGroups/rg/providers/Microsoft.Compute/diskEncryptionSets/my-des
  enableBursting: "true"
  networkAccessPolicy: DenyAll   # disallow direct disk access outside the cluster
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
---
# Premium SSD v2 (ZRS — zone-redundant)
parameters:
  skuName: Premium_ZRS           # synchronous zone-redundant storage (3 AZ copies)
  # NOTE: Premium_ZRS does not support cachingMode — must be None

UltraSSD for demanding databases

parameters:
  skuName: UltraSSD_LRS
  # Must enable Ultra SSD on the node pool:
  # az aks nodepool update --enable-ultra-ssd
  diskIOPSReadWrite: "160000"    # provisioned IOPS
  diskMBpsReadWrite: "2000"      # provisioned throughput MiB/s

Azure Files CSI Driver Parameters

Driver: file.csi.azure.com — supports RWX (ReadWriteMany)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-files-nfs
provisioner: file.csi.azure.com
parameters:
  protocol: nfs                   # nfs | smb; NFS recommended for Linux workloads
  skuName: Premium_LRS            # Standard_LRS, Premium_LRS, Standard_GRS, etc.
  # storageAccount: mystoraccount  # optional: use specific storage account
  # resourceGroup: my-rg           # optional: storage account resource group
  # subscriptionID: xxx            # optional: cross-subscription
mountOptions:
  - nconnect=4                    # NFS: parallel connections per mount (improves throughput)
  - actimeo=30                    # NFS: attribute cache timeout
volumeBindingMode: Immediate      # Azure Files is topology-agnostic (global file system)
allowVolumeExpansion: true
reclaimPolicy: Delete

AWS EFS CSI Driver Parameters

Driver: efs.csi.aws.com — RWX, serverless NFS, auto-scaling

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap         # efs-ap = dynamic access point provisioning
  fileSystemId: fs-0abc123def456   # existing EFS filesystem ID
  directoryPerms: "700"            # permissions for the access point root directory
  basePath: /dynamic               # base path on the EFS filesystem for access points
  # gidRangeStart: "1000"          # optional: GID range for access point
  # gidRangeEnd: "2000"
  # ensureUniqueDirectory: "true"  # prefix dirName with PVC UID to guarantee uniqueness
volumeBindingMode: Immediate       # EFS is region-wide (not zonal)
allowVolumeExpansion: true         # EFS auto-expands; this enables resize status tracking

ℹ️

EFS access points The EFS CSI driver in efs-ap mode creates an EFS Access Point for each PVC. Each access point gets its own root directory on the EFS filesystem with isolated ownership and permissions. This enables secure multi-tenant NFS on a shared EFS filesystem — different namespaces get different access points with no cross-tenant visibility.

Ceph RBD CSI Driver Parameters

Driver: rbd.csi.ceph.com — block storage backed by Ceph (via Rook-Ceph or external Ceph cluster)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-ceph-block
provisioner: rbd.csi.ceph.com
parameters:
  clusterID: <ceph-cluster-id>         # from `ceph fsid`
  pool: replicapool                      # Ceph RBD pool name
  imageFormat: "2"                       # always use format 2 (format 1 is deprecated)
  imageFeatures: layering                # comma-separated RBD features
  # imageFeatures: layering,fast-diff,object-map,deep-flatten,exclusive-lock
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate
mountOptions:
  - discard

CephFS CSI Driver Parameters

Driver: cephfs.csi.ceph.com — POSIX filesystem, RWX support

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-cephfs
provisioner: cephfs.csi.ceph.com
parameters:
  clusterID: <ceph-cluster-id>
  fsName: myfs                     # CephFS filesystem name
  pool: myfs-replicated            # metadata or data pool
  mounter: kernel                  # kernel | fuse; kernel is faster; fuse supports more features
  kernelMountOptions: ms_mode=prefer-crc    # kernel client mount options
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate

NFS Subdir External Provisioner

Dynamically provisions subdirectories on an existing NFS server as PVs. Not a CSI driver — uses the older external provisioner interface.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-client
provisioner: cluster.local/nfs-subdir-external-provisioner
parameters:
  server: nfs-server.prod.svc.cluster.local
  path: /exports
  pathPattern: "${.PVC.namespace}/${.PVC.annotations.nfs.io/storage-path}"  # dynamic path pattern
  onDelete: retain                 # retain | delete | archive
  archiveOnDelete: "true"          # if onDelete=delete: rename dir to archived-<pvc-uid>
reclaimPolicy: Delete
volumeBindingMode: Immediate

Multi-Tier StorageClass Design

Production clusters should define multiple StorageClasses covering different performance and cost tiers. Namespace teams select the appropriate tier via PVC storageClassName, and platform teams enforce limits via ResourceQuota.

fast

High IOPS

io2 Block Express / pd-extreme / UltraSSD. For production databases requiring <1ms latency and high IOPS. Most expensive. Protected by ResourceQuota.

standard (default)

Balanced

gp3 / pd-ssd / Premium_LRS. Default for most workloads. Good baseline performance at reasonable cost. Should be the cluster default SC.

slow

Low Cost

st1 / pd-standard / Standard_LRS. For batch jobs, backups, cold data, or development. Lowest cost. Not suitable for latency-sensitive workloads.

shared

RWX / NFS

EFS / CephFS / Azure Files NFS. ReadWriteMany. For shared config, ML datasets, multi-pod media access. Higher latency than block.

# ResourceQuota to restrict fast storage to production namespace only
apiVersion: v1
kind: ResourceQuota
metadata:
  name: storage-quota
  namespace: development
spec:
  hard:
    requests.storage: 100Gi              # total PVC storage in namespace
    fast.storageclass.storage.k8s.io/requests.storage: "0"   # no fast SC in dev
    fast.storageclass.storage.k8s.io/persistentvolumeclaims: "0"

💡

StorageClass-scoped ResourceQuota The format <storageClassName>.storageclass.storage.k8s.io/requests.storage limits PVC capacity per StorageClass per namespace. This lets you offer all tiers cluster-wide but control which namespaces can access expensive classes.

Default StorageClass Management

Exactly one StorageClass should be marked default at any time. PVCs without a storageClassName field (not "", but truly absent) use the default.

# List all StorageClasses and their default status
kubectl get storageclass
# NAME             PROVISIONER        RECLAIMPOLICY  VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION  AGE
# gp2              kubernetes.io/aws  Delete         Immediate              false                 2y
# gp3 (default)    ebs.csi.aws.com    Delete         WaitForFirstConsumer   true                  6mo

# Atomically swap the default (remove from gp2, add to gp3)
kubectl patch storageclass gp2 \
  -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
kubectl patch storageclass gp3 \
  -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

⚠️

Brief window with no default is safe Removing the old default before adding the new one creates a brief window where no default exists. PVCs created during this window without an explicit storageClassName will fail admission. In practice this window is milliseconds; apply both patches in the same kubectl call or accept the minimal risk.

StorageClass Migration

StorageClass parameters are immutable. To move PVCs to a new StorageClass (e.g., upgrading from gp2 to gp3, or adding encryption), you must create new PVCs and migrate data. Three approaches:

Approach 1: PVC Clone (Online, Same StorageClass Driver)

Works if the source and destination StorageClasses share the same CSI driver but different parameters (e.g., gp2 → gp3 both use ebs.csi.aws.com). Uses dataSource PVC cloning:

# 1. Create new PVC as a clone with the new StorageClass
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-postgres-0-gp3
  namespace: production
spec:
  dataSource:
    kind: PersistentVolumeClaim
    name: data-postgres-0          # source PVC (must be Bound)
  storageClassName: gp3-encrypted  # destination StorageClass
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 100Gi

# 2. Verify clone is Bound and data is present
kubectl exec -it postgres-0 -- psql -c "SELECT count(*) FROM orders;"   # note count

# 3. Update StatefulSet to use new PVC name (or swap via volumeClaimTemplate rename)
# This requires careful StatefulSet management — see 06-stateful-storage-patterns.html

Approach 2: Snapshot-Based Migration (Cross-Driver)

# 1. Take snapshot of source PVC
kubectl apply -f - <



Approach 3: rsync / Velero (Cross-Cluster or Different CSI)

When cloning and snapshot restore are not available (different cloud providers, on-prem to cloud migration), use a data mover:

# Velero backup and restore
velero backup create prod-db-backup \
  --include-namespaces production \
  --snapshot-volumes \
  --storage-location default

# Restore into new cluster with StorageClass remapping
velero restore create --from-backup prod-db-backup \
  --namespace-mappings production:production-new \
  --existing-resource-policy update


StorageClass and VolumeSnapshotClass Alignment

A VolumeSnapshot must be created with a VolumeSnapshotClass that uses the same CSI driver as the PVC's StorageClass. Mismatched drivers cause snapshot creation to fail:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-aws-vsc
  annotations:
    snapshot.storage.kubernetes.io/is-default-class: "true"
driver: ebs.csi.aws.com      # MUST match the StorageClass provisioner
deletionPolicy: Delete        # Delete | Retain
parameters:
  # driver-specific snapshot parameters (e.g., tags)
  tagSpecification_1: "environment=production"


  ⚠️
  Separate default VolumeSnapshotClass per driver If your cluster has both EBS and EFS StorageClasses, you need separate VolumeSnapshotClasses for each. Marking both as default causes snapshot creation without explicit class to fail. Align: one default VolumeSnapshotClass per driver.



Alerting Rules

groups:
- name: storageclass
  rules:
  - alert: PVCProvisioningFailed
    expr: |
      increase(storage_operation_errors_total{operation_name="provision"}[5m]) > 0
    labels: {severity: warning}
    annotations:
      summary: "CSI provisioning errors — check CSI controller logs"

  - alert: StorageClassMissing
    expr: |
      kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1
        unless on(storageclass)
      kube_storageclass_info
    for: 5m
    labels: {severity: warning}
    annotations:
      summary: "PVC references a StorageClass that doesn't exist"

  - alert: DefaultStorageClassMissing
    expr: |
      count(kube_storageclass_info{is_default_class="true"}) == 0
    for: 2m
    labels: {severity: warning}
    annotations:
      summary: "No default StorageClass defined — PVCs without explicit SC will fail"

  - alert: MultipleDefaultStorageClasses
    expr: |
      count(kube_storageclass_info{is_default_class="true"}) > 1
    for: 1m
    labels: {severity: critical}
    annotations:
      summary: "Multiple default StorageClasses — PVC creation without SC name will fail"


Troubleshooting Runbooks

Runbook: PVC Stuck Pending — Wrong StorageClass
# Check what StorageClass the PVC references
kubectl get pvc <name> -n <ns> -o jsonpath='{.spec.storageClassName}'

# Verify the StorageClass exists
kubectl get storageclass <name>
# If NotFound: check for typo or missing cluster setup

# Verify the provisioner is running
kubectl get pods -n kube-system | grep csi

# Check provisioner logs
kubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner --tail=50

Runbook: Volume Mount Fails — Bad mountOptions
# Pod stuck in ContainerCreating with "failed to mount" error
kubectl describe pod <name> -n <ns>
# Event: MountVolume.MountDevice failed: ... exit status 32
# OR: unrecognized option 'noatime' (driver doesn't pass options)

# Check kubelet logs on the node
kubectl get node -o wide   # find node
ssh node-ip -- journalctl -u kubelet | grep -i "mount failed" | tail -20

# Common causes:
# 1. Option not supported by filesystem type (e.g., nfs option on ext4)
# 2. Typo in mount option name
# 3. Driver does not implement mountOptions (verify CSIDriver spec)
kubectl get csidriver ebs.csi.aws.com -o yaml | grep -i mountInfo

Runbook: Migration — PVC Clone Stuck in Pending
# PVC clone is Pending; source PVC is Bound
kubectl describe pvc data-postgres-0-gp3 -n production
# Event: "waiting for a volume to be created..."

# Common causes:
# 1. Source and destination StorageClass use different provisioners
#    Clone requires same driver
kubectl get storageclass gp2 -o jsonpath='{.provisioner}'
kubectl get storageclass gp3 -o jsonpath='{.provisioner}'
# If different → use snapshot-based migration instead

# 2. Source PVC is not in Bound state at clone time
kubectl get pvc data-postgres-0 -o jsonpath='{.status.phase}'

Runbook: EBS Volume Created in Wrong AZ (Immediate Binding)
# Pod stuck: "no nodes available to schedule pods"
# PVC Events: "successfully provisioned volume" in wrong zone
kubectl describe pvc <name> | grep "topology.ebs.csi.aws.com/zone"

# Fix for existing stuck PVC:
# 1. Note the current data (snapshot if needed)
# 2. Delete the PVC (will delete the EBS volume if reclaimPolicy:Delete)
# 3. Patch the StorageClass to WaitForFirstConsumer
kubectl patch storageclass <name> -p '{"volumeBindingMode":"WaitForFirstConsumer"}'
# 4. Recreate the PVC

# Prevention: always use WaitForFirstConsumer for EBS/Azure Disk/GCE PD

Runbook: discard Mount Option Causing Slow DB Writes
# Database (PostgreSQL VACUUM, MySQL purge) is very slow
# Suspect: discard mount option on StorageClass

# Check if discard is in mountOptions
kubectl get storageclass <name> -o jsonpath='{.mountOptions}'

# If discard is enabled and causing issues:
# 1. Create new StorageClass without discard
# 2. Migrate PVCs to new SC
# 3. Or: disable at filesystem level on existing volumes
kubectl exec -it <pod> -- tune2fs -o ^discard /dev/xvda
# Use periodic fstrim instead:
kubectl exec -it <pod> -- fstrim -v /var/lib/postgresql/data


Best Practices


  Use WaitForFirstConsumer for all zonal block drivers (EBS, Azure Disk, GCE PD). This is the single most common StorageClass misconfiguration that causes stuck pods in multi-AZ clusters.
  Set reclaimPolicy: Retain for production database StorageClasses. Even if you forget to patch a specific PV, the StorageClass policy applies at provisioning time.
  Always enable allowVolumeExpansion: true. It costs nothing to enable. StorageClass parameters are immutable — you cannot add this later without creating a new StorageClass and migrating PVCs.
  Define named tiers (fast/standard/slow/shared) rather than a single default. Teams that never think about storage will use the default. Make the default the right choice for most workloads (gp3, pd-ssd, Premium_LRS) — not the cheapest or most expensive.
  Enforce StorageClass access via ResourceQuota. Use per-class quotas to prevent development namespaces from consuming expensive io2/pd-extreme storage.
  Align VolumeSnapshotClass driver with StorageClass provisioner. Mismatched drivers cause silent snapshot failures. Name your VolumeSnapshotClasses to match their StorageClasses for clarity.
  Test mountOptions before production rollout. Invalid options cause mount failures with cryptic errors. Test by manually running the mount command on a node, or by deploying a test PVC and pod first.
  Document StorageClass intent in annotations. Add metadata.annotations describing the intended workload type, cost tier, and any performance characteristics. Cluster users shouldn't need to read StorageClass parameters to understand what a class is for.




  
    ← Previous
    Persistent Volumes
  
  
    Next →
    CSI Drivers