Storage Classes
The complete StorageClass reference — every field, all major cloud provisioner parameters, topology constraints, multi-tier design patterns, and how to safely migrate PVCs between StorageClasses without downtime.
What This Page Covers
Object Model
A StorageClass is a cluster-scoped object that acts as a template for dynamic PV provisioning. It encodes everything the provisioner needs to create a volume on behalf of a PVC: which backend to call, what type of disk to create, how to handle reclaim, and topology constraints.
StorageClass (cluster-scoped)
name: gp3-encrypted
provisioner: ebs.csi.aws.com ──────────► CSI driver controller plugin
parameters: {type: gp3, ...} (running in kube-system as Deployment)
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
│
│ PVC created with storageClassName: gp3-encrypted
▼
external-provisioner sidecar calls CSI CreateVolume
│
▼
PersistentVolume created (spec inherited from StorageClass + provisioner response)
PVC bound to PV
StorageClasses are immutable in their parameters field once created — you cannot change the disk type or provisioner. To change parameters, create a new StorageClass and migrate PVCs (see StorageClass Migration).
Full StorageClass Spec
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-encrypted
annotations:
storageclass.kubernetes.io/is-default-class: "true" # at most one per cluster
provisioner: ebs.csi.aws.com # CSI driver name; must match CSIDriver object name
parameters: # driver-specific; opaque to Kubernetes core
type: gp3
encrypted: "true"
kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/mrk-abc123
reclaimPolicy: Delete # Delete (default) | Retain
# Recycle is deprecated and removed
volumeBindingMode: WaitForFirstConsumer # Immediate | WaitForFirstConsumer
allowVolumeExpansion: true # false by default; enable for resize support
mountOptions: # passed to the mount command on the node
- noatime
- discard
allowedTopologies: # restrict provisioning to these zones
- matchLabelExpressions:
- key: topology.ebs.csi.aws.com/zone
values:
- us-east-1a
- us-east-1b
- us-east-1c
provisioner Field
The provisioner name must exactly match the name registered in the CSIDriver object. CSI driver names are typically reverse-domain formatted. Common values:
| Provisioner | Storage Backend |
|---|---|
ebs.csi.aws.com | AWS EBS (gp2, gp3, io1, io2, st1, sc1) |
efs.csi.aws.com | AWS EFS (NFS managed file system) |
pd.csi.storage.gke.io | GCE Persistent Disk (pd-standard, pd-ssd, pd-balanced, pd-extreme) |
filestore.csi.storage.gke.io | GCP Filestore (NFS) |
disk.csi.azure.com | Azure Managed Disk |
file.csi.azure.com | Azure Files (SMB / NFS) |
rbd.csi.ceph.com | Ceph RBD (block) |
cephfs.csi.ceph.com | CephFS (file, RWX) |
kubernetes.io/no-provisioner | Local volumes (no dynamic provisioning) |
nfs.csi.k8s.io | NFS CSI driver (subdir provisioner) |
driver.longhorn.io | Longhorn distributed block storage |
rancher.io/local-path | Local Path Provisioner (Rancher) |
volumeBindingMode
This field controls when the PV is provisioned relative to pod scheduling. See the full explanation in Persistent Volumes — WaitForFirstConsumer. The summary:
| Mode | PV Provisioned When | Required For |
|---|---|---|
Immediate | PVC is created (zone chosen randomly by provisioner) | Topology-agnostic storage: EFS, CephFS, NFS, Azure Files |
WaitForFirstConsumer | A pod using the PVC is scheduled to a node | Zonal block storage: EBS, Azure Disk, GCE PD, local PVs |
Pending with node(s) had no available volume zone. Always use WaitForFirstConsumer for any zonal block driver.mountOptions
Mount options are passed directly to the mount command on the node when attaching a filesystem volume. They are not validated by Kubernetes — an invalid option causes the mount to fail and the pod to be stuck in ContainerCreating.
mountOptions:
- noatime # don't update atime on reads — reduces write I/O (recommended for DBs)
- nodiratime # don't update directory atime
- discard # enable TRIM for SSDs — frees blocks on delete (EBS gp3/io2 support it)
- rsize=1048576 # NFS read size 1MiB (tune for throughput)
- wsize=1048576 # NFS write size 1MiB
- nfsvers=4.1 # force NFSv4.1 (pNFS capable)
- hard # NFS: retry indefinitely on server failure
- timeo=600 # NFS: timeout 60 seconds before retry
discard (TRIM) option causes the filesystem to issue TRIM commands on block deallocation. On some cloud block storage implementations this adds latency to delete-heavy workloads. PostgreSQL VACUUM, MySQL purge threads, and compaction jobs can be significantly slower with discard. Use fstrim via a periodic job instead, or test the performance impact before enabling.allowedTopologies
Restricts the provisioner to create volumes only within the specified topology zones. This is useful for performance (co-locate volume with expected workload zones) and cost optimization (avoid cross-AZ transfer fees).
allowedTopologies:
- matchLabelExpressions:
- key: topology.ebs.csi.aws.com/zone
values: [us-east-1a, us-east-1b]
# EBS volumes will ONLY be provisioned in these two AZs
# Pods that schedule to us-east-1c will not get volumes from this StorageClass
allowedTopologies to pin storage to specific zones — useful for regulated workloads that must stay in specific regions.AWS EBS CSI Driver Parameters
Driver: ebs.csi.aws.com
| Parameter | Values | Description |
|---|---|---|
type | gp2, gp3, io1, io2, st1, sc1 | EBS volume type |
iops | Integer string | Provisioned IOPS for io1/io2; baseline IOPS for gp3 |
throughput | Integer string (MiB/s) | Throughput for gp3 (default 125, max 1000) |
encrypted | "true" / "false" | Enable EBS encryption |
kmsKeyId | ARN or key alias | Customer-managed KMS key for encryption |
blockExpress | "true" / "false" | Enable io2 Block Express (higher IOPS/GiB, sub-millisecond latency) |
throughputMode | "provisioned" | For io1/io2 only |
gp3 — General Purpose (Recommended Default)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
type: gp3
# gp3 defaults: 3000 IOPS, 125 MiB/s throughput (included in base price)
# Override for higher performance (charged extra above baseline):
iops: "6000" # up to 16000 IOPS
throughput: "250" # up to 1000 MiB/s
encrypted: "true"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
io2 Block Express — High Performance Databases
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: io2-block-express
provisioner: ebs.csi.aws.com
parameters:
type: io2
iops: "64000" # up to 256000 IOPS with Block Express
encrypted: "true"
kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/mrk-prod-db
blockExpress: "true"
reclaimPolicy: Retain # always Retain for production DBs
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
st1 — Throughput Optimized HDD (Cold Data / Data Lake)
parameters:
type: st1
# Minimum size: 125 GiB; Maximum: 16 TiB
# 40 MiB/s per TiB baseline throughput, 250 MiB/s per TiB burst
# NOT suitable for random I/O workloads — sequential only
io1/io2 Multi-Attach
parameters:
type: io1 # or io2
iops: "10000"
multiAttachEnabled: "true" # allows RWX-like behavior at EBS level
# WARNING: Multi-Attach is not a substitute for distributed locking.
# The application must handle concurrent writes (e.g., cluster-aware filesystems,
# GFS2, OCFS2). Standard ext4/xfs will corrupt with concurrent writers.
GCE Persistent Disk CSI Driver Parameters
Driver: pd.csi.storage.gke.io
| Parameter | Values | Description |
|---|---|---|
type | pd-standard, pd-ssd, pd-balanced, pd-extreme | Disk type |
replication-type | none, regional-pd | Regional PD: synchronous replication across 2 zones |
disk-encryption-kms-key | KMS key resource name | Customer-managed encryption key |
provisioned-iops-on-create | Integer string | Provisioned IOPS for pd-extreme (min 10000) |
provisioned-throughput-on-create | Integer string (MiB/s) | Provisioned throughput for pd-extreme |
# pd-ssd — SSD-backed (recommended for databases on GKE)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: pd-ssd
provisioner: pd.csi.storage.gke.io
parameters:
type: pd-ssd
replication-type: none
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Delete
---
# Regional PD — synchronous two-zone replication for HA
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: pd-ssd-regional
provisioner: pd.csi.storage.gke.io
parameters:
type: pd-ssd
replication-type: regional-pd # provisions in 2 AZs simultaneously
allowedTopologies:
- matchLabelExpressions:
- key: topology.gke.io/zone
values: [us-central1-a, us-central1-b]
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
Azure Disk CSI Driver Parameters
Driver: disk.csi.azure.com
| Parameter | Values | Description |
|---|---|---|
skuName | Standard_LRS, Premium_LRS, StandardSSD_LRS, UltraSSD_LRS, Premium_ZRS, StandardSSD_ZRS | Disk SKU |
cachingMode | None, ReadOnly, ReadWrite | Host caching mode |
kind | managed | Always use managed (unmanaged is deprecated) |
diskEncryptionSetID | Resource ID | Disk Encryption Set for CMK |
enableBursting | "true" | Enable on-demand bursting for Premium SSDs |
networkAccessPolicy | AllowAll, DenyAll, AllowPrivate | Control disk access for security |
diskAccessID | Resource ID | Disk Access resource for private endpoint |
tags | key1=value1,key2=value2 | Azure resource tags on disk |
# Premium SSD with encryption and bursting
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: premium-ssd-encrypted
provisioner: disk.csi.azure.com
parameters:
skuName: Premium_LRS
cachingMode: ReadOnly # appropriate for most DB data volumes
kind: managed
diskEncryptionSetID: /subscriptions/xxx/resourceGroups/rg/providers/Microsoft.Compute/diskEncryptionSets/my-des
enableBursting: "true"
networkAccessPolicy: DenyAll # disallow direct disk access outside the cluster
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
---
# Premium SSD v2 (ZRS — zone-redundant)
parameters:
skuName: Premium_ZRS # synchronous zone-redundant storage (3 AZ copies)
# NOTE: Premium_ZRS does not support cachingMode — must be None
UltraSSD for demanding databases
parameters:
skuName: UltraSSD_LRS
# Must enable Ultra SSD on the node pool:
# az aks nodepool update --enable-ultra-ssd
diskIOPSReadWrite: "160000" # provisioned IOPS
diskMBpsReadWrite: "2000" # provisioned throughput MiB/s
Azure Files CSI Driver Parameters
Driver: file.csi.azure.com — supports RWX (ReadWriteMany)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: azure-files-nfs
provisioner: file.csi.azure.com
parameters:
protocol: nfs # nfs | smb; NFS recommended for Linux workloads
skuName: Premium_LRS # Standard_LRS, Premium_LRS, Standard_GRS, etc.
# storageAccount: mystoraccount # optional: use specific storage account
# resourceGroup: my-rg # optional: storage account resource group
# subscriptionID: xxx # optional: cross-subscription
mountOptions:
- nconnect=4 # NFS: parallel connections per mount (improves throughput)
- actimeo=30 # NFS: attribute cache timeout
volumeBindingMode: Immediate # Azure Files is topology-agnostic (global file system)
allowVolumeExpansion: true
reclaimPolicy: Delete
AWS EFS CSI Driver Parameters
Driver: efs.csi.aws.com — RWX, serverless NFS, auto-scaling
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: efs-sc
provisioner: efs.csi.aws.com
parameters:
provisioningMode: efs-ap # efs-ap = dynamic access point provisioning
fileSystemId: fs-0abc123def456 # existing EFS filesystem ID
directoryPerms: "700" # permissions for the access point root directory
basePath: /dynamic # base path on the EFS filesystem for access points
# gidRangeStart: "1000" # optional: GID range for access point
# gidRangeEnd: "2000"
# ensureUniqueDirectory: "true" # prefix dirName with PVC UID to guarantee uniqueness
volumeBindingMode: Immediate # EFS is region-wide (not zonal)
allowVolumeExpansion: true # EFS auto-expands; this enables resize status tracking
efs-ap mode creates an EFS Access Point for each PVC. Each access point gets its own root directory on the EFS filesystem with isolated ownership and permissions. This enables secure multi-tenant NFS on a shared EFS filesystem — different namespaces get different access points with no cross-tenant visibility.Ceph RBD CSI Driver Parameters
Driver: rbd.csi.ceph.com — block storage backed by Ceph (via Rook-Ceph or external Ceph cluster)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
provisioner: rbd.csi.ceph.com
parameters:
clusterID: <ceph-cluster-id> # from `ceph fsid`
pool: replicapool # Ceph RBD pool name
imageFormat: "2" # always use format 2 (format 1 is deprecated)
imageFeatures: layering # comma-separated RBD features
# imageFeatures: layering,fast-diff,object-map,deep-flatten,exclusive-lock
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate
mountOptions:
- discard
CephFS CSI Driver Parameters
Driver: cephfs.csi.ceph.com — POSIX filesystem, RWX support
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-cephfs
provisioner: cephfs.csi.ceph.com
parameters:
clusterID: <ceph-cluster-id>
fsName: myfs # CephFS filesystem name
pool: myfs-replicated # metadata or data pool
mounter: kernel # kernel | fuse; kernel is faster; fuse supports more features
kernelMountOptions: ms_mode=prefer-crc # kernel client mount options
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate
NFS Subdir External Provisioner
Dynamically provisions subdirectories on an existing NFS server as PVs. Not a CSI driver — uses the older external provisioner interface.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-client
provisioner: cluster.local/nfs-subdir-external-provisioner
parameters:
server: nfs-server.prod.svc.cluster.local
path: /exports
pathPattern: "${.PVC.namespace}/${.PVC.annotations.nfs.io/storage-path}" # dynamic path pattern
onDelete: retain # retain | delete | archive
archiveOnDelete: "true" # if onDelete=delete: rename dir to archived-<pvc-uid>
reclaimPolicy: Delete
volumeBindingMode: Immediate
Multi-Tier StorageClass Design
Production clusters should define multiple StorageClasses covering different performance and cost tiers. Namespace teams select the appropriate tier via PVC storageClassName, and platform teams enforce limits via ResourceQuota.
io2 Block Express / pd-extreme / UltraSSD. For production databases requiring <1ms latency and high IOPS. Most expensive. Protected by ResourceQuota.
gp3 / pd-ssd / Premium_LRS. Default for most workloads. Good baseline performance at reasonable cost. Should be the cluster default SC.
st1 / pd-standard / Standard_LRS. For batch jobs, backups, cold data, or development. Lowest cost. Not suitable for latency-sensitive workloads.
EFS / CephFS / Azure Files NFS. ReadWriteMany. For shared config, ML datasets, multi-pod media access. Higher latency than block.
# ResourceQuota to restrict fast storage to production namespace only
apiVersion: v1
kind: ResourceQuota
metadata:
name: storage-quota
namespace: development
spec:
hard:
requests.storage: 100Gi # total PVC storage in namespace
fast.storageclass.storage.k8s.io/requests.storage: "0" # no fast SC in dev
fast.storageclass.storage.k8s.io/persistentvolumeclaims: "0"
<storageClassName>.storageclass.storage.k8s.io/requests.storage limits PVC capacity per StorageClass per namespace. This lets you offer all tiers cluster-wide but control which namespaces can access expensive classes.Default StorageClass Management
Exactly one StorageClass should be marked default at any time. PVCs without a storageClassName field (not "", but truly absent) use the default.
# List all StorageClasses and their default status
kubectl get storageclass
# NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
# gp2 kubernetes.io/aws Delete Immediate false 2y
# gp3 (default) ebs.csi.aws.com Delete WaitForFirstConsumer true 6mo
# Atomically swap the default (remove from gp2, add to gp3)
kubectl patch storageclass gp2 \
-p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
kubectl patch storageclass gp3 \
-p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
StorageClass Migration
StorageClass parameters are immutable. To move PVCs to a new StorageClass (e.g., upgrading from gp2 to gp3, or adding encryption), you must create new PVCs and migrate data. Three approaches:
Approach 1: PVC Clone (Online, Same StorageClass Driver)
Works if the source and destination StorageClasses share the same CSI driver but different parameters (e.g., gp2 → gp3 both use ebs.csi.aws.com). Uses dataSource PVC cloning:
# 1. Create new PVC as a clone with the new StorageClass
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-postgres-0-gp3
namespace: production
spec:
dataSource:
kind: PersistentVolumeClaim
name: data-postgres-0 # source PVC (must be Bound)
storageClassName: gp3-encrypted # destination StorageClass
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 100Gi
# 2. Verify clone is Bound and data is present
kubectl exec -it postgres-0 -- psql -c "SELECT count(*) FROM orders;" # note count
# 3. Update StatefulSet to use new PVC name (or swap via volumeClaimTemplate rename)
# This requires careful StatefulSet management — see 06-stateful-storage-patterns.html
Approach 2: Snapshot-Based Migration (Cross-Driver)
# 1. Take snapshot of source PVC
kubectl apply -f - <
Approach 3: rsync / Velero (Cross-Cluster or Different CSI)
When cloning and snapshot restore are not available (different cloud providers, on-prem to cloud migration), use a data mover:
# Velero backup and restore
velero backup create prod-db-backup \
--include-namespaces production \
--snapshot-volumes \
--storage-location default
# Restore into new cluster with StorageClass remapping
velero restore create --from-backup prod-db-backup \
--namespace-mappings production:production-new \
--existing-resource-policy update
StorageClass and VolumeSnapshotClass Alignment
A VolumeSnapshot must be created with a VolumeSnapshotClass that uses the same CSI driver as the PVC's StorageClass. Mismatched drivers cause snapshot creation to fail:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-aws-vsc
annotations:
snapshot.storage.kubernetes.io/is-default-class: "true"
driver: ebs.csi.aws.com # MUST match the StorageClass provisioner
deletionPolicy: Delete # Delete | Retain
parameters:
# driver-specific snapshot parameters (e.g., tags)
tagSpecification_1: "environment=production"
Alerting Rules
groups:
- name: storageclass
rules:
- alert: PVCProvisioningFailed
expr: |
increase(storage_operation_errors_total{operation_name="provision"}[5m]) > 0
labels: {severity: warning}
annotations:
summary: "CSI provisioning errors — check CSI controller logs"
- alert: StorageClassMissing
expr: |
kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1
unless on(storageclass)
kube_storageclass_info
for: 5m
labels: {severity: warning}
annotations:
summary: "PVC references a StorageClass that doesn't exist"
- alert: DefaultStorageClassMissing
expr: |
count(kube_storageclass_info{is_default_class="true"}) == 0
for: 2m
labels: {severity: warning}
annotations:
summary: "No default StorageClass defined — PVCs without explicit SC will fail"
- alert: MultipleDefaultStorageClasses
expr: |
count(kube_storageclass_info{is_default_class="true"}) > 1
for: 1m
labels: {severity: critical}
annotations:
summary: "Multiple default StorageClasses — PVC creation without SC name will fail"
Troubleshooting Runbooks
Runbook: PVC Stuck Pending — Wrong StorageClass
# Check what StorageClass the PVC references
kubectl get pvc <name> -n <ns> -o jsonpath='{.spec.storageClassName}'
# Verify the StorageClass exists
kubectl get storageclass <name>
# If NotFound: check for typo or missing cluster setup
# Verify the provisioner is running
kubectl get pods -n kube-system | grep csi
# Check provisioner logs
kubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner --tail=50
Runbook: Volume Mount Fails — Bad mountOptions
# Pod stuck in ContainerCreating with "failed to mount" error
kubectl describe pod <name> -n <ns>
# Event: MountVolume.MountDevice failed: ... exit status 32
# OR: unrecognized option 'noatime' (driver doesn't pass options)
# Check kubelet logs on the node
kubectl get node -o wide # find node
ssh node-ip -- journalctl -u kubelet | grep -i "mount failed" | tail -20
# Common causes:
# 1. Option not supported by filesystem type (e.g., nfs option on ext4)
# 2. Typo in mount option name
# 3. Driver does not implement mountOptions (verify CSIDriver spec)
kubectl get csidriver ebs.csi.aws.com -o yaml | grep -i mountInfo
Runbook: Migration — PVC Clone Stuck in Pending
# PVC clone is Pending; source PVC is Bound
kubectl describe pvc data-postgres-0-gp3 -n production
# Event: "waiting for a volume to be created..."
# Common causes:
# 1. Source and destination StorageClass use different provisioners
# Clone requires same driver
kubectl get storageclass gp2 -o jsonpath='{.provisioner}'
kubectl get storageclass gp3 -o jsonpath='{.provisioner}'
# If different → use snapshot-based migration instead
# 2. Source PVC is not in Bound state at clone time
kubectl get pvc data-postgres-0 -o jsonpath='{.status.phase}'
Runbook: EBS Volume Created in Wrong AZ (Immediate Binding)
# Pod stuck: "no nodes available to schedule pods"
# PVC Events: "successfully provisioned volume" in wrong zone
kubectl describe pvc <name> | grep "topology.ebs.csi.aws.com/zone"
# Fix for existing stuck PVC:
# 1. Note the current data (snapshot if needed)
# 2. Delete the PVC (will delete the EBS volume if reclaimPolicy:Delete)
# 3. Patch the StorageClass to WaitForFirstConsumer
kubectl patch storageclass <name> -p '{"volumeBindingMode":"WaitForFirstConsumer"}'
# 4. Recreate the PVC
# Prevention: always use WaitForFirstConsumer for EBS/Azure Disk/GCE PD
Runbook: discard Mount Option Causing Slow DB Writes
# Database (PostgreSQL VACUUM, MySQL purge) is very slow
# Suspect: discard mount option on StorageClass
# Check if discard is in mountOptions
kubectl get storageclass <name> -o jsonpath='{.mountOptions}'
# If discard is enabled and causing issues:
# 1. Create new StorageClass without discard
# 2. Migrate PVCs to new SC
# 3. Or: disable at filesystem level on existing volumes
kubectl exec -it <pod> -- tune2fs -o ^discard /dev/xvda
# Use periodic fstrim instead:
kubectl exec -it <pod> -- fstrim -v /var/lib/postgresql/data
Best Practices
- Use WaitForFirstConsumer for all zonal block drivers (EBS, Azure Disk, GCE PD). This is the single most common StorageClass misconfiguration that causes stuck pods in multi-AZ clusters.
- Set reclaimPolicy: Retain for production database StorageClasses. Even if you forget to patch a specific PV, the StorageClass policy applies at provisioning time.
- Always enable allowVolumeExpansion: true. It costs nothing to enable. StorageClass parameters are immutable — you cannot add this later without creating a new StorageClass and migrating PVCs.
- Define named tiers (fast/standard/slow/shared) rather than a single default. Teams that never think about storage will use the default. Make the default the right choice for most workloads (gp3, pd-ssd, Premium_LRS) — not the cheapest or most expensive.
- Enforce StorageClass access via ResourceQuota. Use per-class quotas to prevent development namespaces from consuming expensive io2/pd-extreme storage.
- Align VolumeSnapshotClass driver with StorageClass provisioner. Mismatched drivers cause silent snapshot failures. Name your VolumeSnapshotClasses to match their StorageClasses for clarity.
- Test mountOptions before production rollout. Invalid options cause mount failures with cryptic errors. Test by manually running the mount command on a node, or by deploying a test PVC and pod first.
- Document StorageClass intent in annotations. Add
metadata.annotationsdescribing the intended workload type, cost tier, and any performance characteristics. Cluster users shouldn't need to read StorageClass parameters to understand what a class is for.