Storage Capacity — Kubernetes Docs

▶ What This Page Covers

The pre-GA capacity problem: scheduler blindness and ProvisioningFailed pods

CSIStorageCapacity API (GA 1.24) — object model, fields, owner references

How external-provisioner publishes CSIStorageCapacity objects

Capacity-aware scheduling flow — scheduler plugin StorageCapacityFilter

storageCapacity: true on StorageClass — the required opt-in gate

WaitForFirstConsumer interaction — why Immediate mode defeats capacity tracking

Topology segments and how they map to CSIStorageCapacity objects

Multi-zone capacity planning: zone-scoped vs node-scoped capacity

Node volume attachment limits per cloud provider (AWS/GCE/Azure)

CSINode object — allocatable volume count, driver topology keys

kube-scheduler VolumeBinding plugin internals

Capacity staleness: resync period, cache TTL, worst-case scheduling race

Local storage capacity with TopoLVM operator

ResourceQuota for storage: PVC count, storage size per StorageClass

LimitRange for PVC minimum/maximum sizes

StorageClass ResourceQuota scoping

Capacity monitoring: available vs allocated vs used

5 metrics + 4 alerting rules + 5 runbooks

8 best practices for capacity planning at scale

The Pre-GA Capacity Problem

Before Kubernetes 1.24, the scheduler had no knowledge of how much storage capacity was available in any given topology zone. When a pod with a WaitForFirstConsumer PVC was scheduled to a node in zone us-east-1a, the scheduler committed the pod to that zone before the CSI driver attempted to provision the volume. If the provisioner then discovered that zone us-east-1a had no capacity — quota exhausted, regional limits hit, pool full — the PVC remained Pending with event ProvisioningFailed, and the pod was stuck forever.

Without capacity tracking (pre-1.24): Scheduler Pod PVC CSI Provisioner │ │ │ │ │ Schedule pod to │ │ │ │ zone us-east-1a ──►│ │ │ │ (no capacity info) │ │ │ │ │ Triggers PVC │ │ │ │ binding ─────►│ │ │ │ │ CreateVolume RPC ──►│ │ │ │ │ RESOURCE_EXHAUSTED │ │ │◄── ProvisioningFailed│ │ │ │ (zone full) │ │ │◄── Pod stuck Pending ────────────── │ │ │ (no retry to other zone) │ With capacity tracking (1.24+ GA): Scheduler reads CSIStorageCapacity objects BEFORE scheduling: us-east-1a: 0 Gi available ← skip this zone us-east-1b: 500 Gi available ← schedule here us-east-1c: 1200 Gi available → Pod scheduled to us-east-1b or us-east-1c immediately

CSIStorageCapacity Object

CSIStorageCapacity is a namespace-scoped API object (in the same namespace as the CSI controller pod) that represents the available storage capacity for a specific StorageClass within a specific topology segment (zone, rack, node). The external-provisioner sidecar creates and updates these objects periodically.

Object Structure

apiVersion: storage.k8s.io/v1
kind: CSIStorageCapacity
metadata:
  name: csi-sc-ebs-gp3-us-east-1b-a1b2c3   # auto-generated name
  namespace: kube-system                      # same ns as CSI controller
  ownerReferences:                            # garbage-collected when provisioner pod is deleted
  - apiVersion: apps/v1
    kind: ReplicaSet
    name: ebs-csi-controller-7d9f8b
    uid: a1b2c3d4-...
    controller: true
    blockOwnerDeletion: true
storageClassName: ebs-gp3                    # the StorageClass this applies to
nodeTopology:                                # which topology segment
  matchLabels:
    topology.kubernetes.io/zone: us-east-1b
capacity: "2Ti"                              # available capacity in this segment
maximumVolumeSize: "16Ti"                    # maximum single-volume size allowed

Field	Type	Description
`storageClassName`	string (required)	References the StorageClass; must match a SC with `storageCapacity: true`
`nodeTopology`	LabelSelector (optional)	Topology segment this capacity applies to. Nil means cluster-wide (e.g., NFS). Typically `matchLabels` with zone key.
`capacity`	Quantity (optional)	Available storage. Nil means unknown. The scheduler treats nil as sufficient (optimistic).
`maximumVolumeSize`	Quantity (optional)	Largest single volume this driver can create in this segment. A PVC requesting more will fail scheduling.

Listing CSIStorageCapacity Objects

# List all capacity objects across all namespaces
kubectl get csistoragecapacity -A

# Show capacity per zone for a specific storage class
kubectl get csistoragecapacity -A \
  -o custom-columns='NAMESPACE:.metadata.namespace,SC:.storageClassName,TOPOLOGY:.nodeTopology,CAPACITY:.capacity' \
  --sort-by='.storageClassName'

# Example output:
# NAMESPACE     SC          TOPOLOGY                                    CAPACITY
# kube-system   ebs-gp3     map[topology.kubernetes.io/zone:us-east-1a] 5497558138880
# kube-system   ebs-gp3     map[topology.kubernetes.io/zone:us-east-1b] 2199023255552
# kube-system   ebs-gp3     map[topology.kubernetes.io/zone:us-east-1c] 10995116277760

# Watch for capacity updates in real time
kubectl get csistoragecapacity -A -w

How external-provisioner Publishes Capacity

The external-provisioner sidecar is responsible for creating and refreshing CSIStorageCapacity objects. It does this by calling the CSI GetCapacity RPC on the controller plugin for each topology segment it knows about, then writing or updating the corresponding object.

external-provisioner capacity publication loop: external-provisioner (in CSI controller pod) │ │ 1. List topology segments from CSINode objects │ (each node's accessible topologies from NodeGetInfo) │ │ 2. For each (StorageClass, topology segment) pair: │ Call GetCapacity(parameters, topology) │ ┌─────────────────────────────────────────────┐ │ │ CSI Controller Plugin (driver process) │ │ │ GetCapacity RPC │ │ │ → queries cloud API (e.g., EC2 DescribeVolumes) │ │ │ → returns available_capacity, maximum_volume_size │ │ └─────────────────────────────────────────────┘ │ │ 3. Create/Update CSIStorageCapacity objects │ in the controller pod's namespace │ │ Repeat every --capacity-poll-interval (default: 1m) │ └── Owner reference on ReplicaSet ensures GC on provisioner restart

Enabling Capacity Tracking on the Provisioner

# In the CSI controller Deployment, external-provisioner container:
containers:
- name: external-provisioner
  image: registry.k8s.io/sig-storage/csi-provisioner:v4.0.0
  args:
  - --csi-address=/var/lib/csi/sockets/pluginproxy/csi.sock
  - --leader-election
  - --feature-gates=Topology=true
  - --enable-capacity                    # enable CSIStorageCapacity publishing
  - --capacity-ownerref-level=2          # 0=pod,1=replicaset,2=deployment owner ref level
  - --capacity-poll-interval=1m          # how often to refresh capacity (default 1m)
  # RBAC needed: create/update/delete csistoragecapacities in controller namespace

GetCapacity RPC is Optional

Not all CSI drivers implement GetCapacity. If the driver returns UNIMPLEMENTED, the external-provisioner silently skips capacity publication for that driver — no objects are created and the scheduler falls back to the pre-1.24 optimistic behavior. Check driver release notes for GET_CAPACITY support in ControllerGetCapabilities.

StorageClass storageCapacity Gate

Capacity-aware scheduling is opt-in per StorageClass via the storageCapacity field. Without it, the scheduler ignores CSIStorageCapacity objects entirely even if they exist.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-gp3
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer   # REQUIRED for capacity tracking to be useful
storageCapacity: true                     # enable capacity-aware scheduling
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"

Immediate Binding Mode Defeats Capacity Tracking

With volumeBindingMode: Immediate, PVCs are bound before any pod is scheduled — the scheduler never has the chance to filter nodes based on capacity. The CSIStorageCapacity objects are published but ignored by the scheduling pipeline. Always use WaitForFirstConsumer for topology-aware drivers when capacity tracking matters.

Capacity-Aware Scheduling Flow

When a pod with an unbound WaitForFirstConsumer PVC is scheduled, the VolumeBinding plugin in kube-scheduler filters and scores nodes using CSIStorageCapacity data.

Capacity-aware scheduling (1.24+): kube-scheduler (VolumeBinding plugin) │ │ Pod created with unbound PVC requesting 200Gi on SC "ebs-gp3" │ │ Filter phase: │ ┌────────────────────────────────────────────────────────────┐ │ │ For each candidate node: │ │ │ 1. Determine node's topology: zone=us-east-1b │ │ │ 2. Find CSIStorageCapacity where: │ │ │ storageClassName = "ebs-gp3" │ │ │ nodeTopology matches node's zone label │ │ │ 3. Filter: capacity.capacity >= PVC.requests.storage │ │ │ AND capacity.maximumVolumeSize >= PVC request │ │ │ 4. If no matching CSIStorageCapacity → node passes │ │ │ (optimistic: nil capacity = unknown = assume sufficient)│ │ └────────────────────────────────────────────────────────────┘ │ │ Nodes in us-east-1a (0 Gi) → filtered out │ Nodes in us-east-1b (2 Ti) → pass (200Gi fits) │ Nodes in us-east-1c (10 Ti) → pass │ │ Score phase: higher capacity zones score higher │ (scheduler prefers zones with more headroom) │ │ Pod scheduled to node in us-east-1c │ VolumeBinding plugin annotates PVC with selected node │ external-provisioner calls CreateVolume in us-east-1c

VolumeBinding Plugin Behavior with Nil Capacity

CSIStorageCapacity State	Scheduler Behavior	Risk
Object exists, capacity >= PVC size	Node passes filter	Low — capacity confirmed
Object exists, capacity < PVC size	Node filtered out	None — avoided bad zone
Object exists, capacity = nil	Node passes (optimistic)	May fail at provision time
No object for this (SC, topology)	Node passes (optimistic)	May fail at provision time
maximumVolumeSize < PVC size	Node filtered out	None — volume would never fit

Topology Segments

A topology segment is a set of node labels that define a storage domain. For zone-scoped drivers (AWS EBS, GCE PD, Azure Disk), each availability zone is one segment. For node-scoped drivers (local volumes, TopoLVM), each node is one segment.

# Zone-scoped: one CSIStorageCapacity per zone per StorageClass
nodeTopology:
  matchLabels:
    topology.kubernetes.io/zone: us-east-1a

# Node-scoped: one CSIStorageCapacity per node per StorageClass (e.g., TopoLVM)
nodeTopology:
  matchLabels:
    kubernetes.io/hostname: node1

# Multi-label (rack-aware Ceph):
nodeTopology:
  matchLabels:
    topology.kubernetes.io/zone: us-east-1a
    topology.rook.io/rack: rack2

# Cluster-wide (NFS, CephFS with global pool):
# nodeTopology: null (omitted)
# capacity: "50Ti"

How Topology Keys Are Discovered

The external-provisioner discovers topology keys from CSINode objects. Each node running the CSI node plugin has a CSINode object updated by the node-driver-registrar sidecar after NodeGetInfo is called.

# Inspect CSINode for topology keys a driver exposes
kubectl get csinode node1 -o yaml

# Relevant section:
spec:
  drivers:
  - name: ebs.csi.aws.com
    nodeID: i-0abc123def456789
    topologyKeys:
    - topology.kubernetes.io/zone      # driver announces this key
    allocatable:
      count: 25                        # max volumes this node can attach

Node Volume Attachment Limits

Every cloud provider imposes a hard limit on how many block volumes can be attached to a single VM instance. Kubernetes enforces these limits via the CSINode.spec.drivers[*].allocatable.count field and the MaxVolumesPerNode VolumeBinding scheduler predicate.

Cloud Provider	Default Limit	Instance-Specific Limits	Notes
AWS EBS	25 volumes	Nitro-based: up to 28 (NVMe + EBS); older: 39 incl. root	CSI driver enforces per-node; `VOLUMES_LIMIT` env or auto-detect
GCE PD	16 volumes	N2/C2/M2: up to 128 (NVMe local + PD)	Shared-core (f1/g1): max 16 always
Azure Disk	16 volumes	DS-series: up to 64; LS-series: up to 64	Ultra Disk counts separately from Standard/Premium
vSphere	59 volumes	Configurable via vCenter	Includes SCSI controller limit (4 controllers × 15 devices)
Local (hostPath/local)	Unlimited (disk-based)	Constrained by physical disks	No attachment limit; capacity is the constraint

Checking Node Volume Limits in the Cluster

# Check allocatable volume count per node per driver
kubectl get csinode -o json | jq -r '
  .items[] |
  .metadata.name as $node |
  .spec.drivers[]? |
  select(.allocatable != null) |
  "\($node)\t\(.name)\t\(.allocatable.count // "unlimited")"
' | column -t

# Expected output:
# node1   ebs.csi.aws.com   25
# node2   ebs.csi.aws.com   25
# node3   ebs.csi.aws.com   25

# Find nodes approaching volume limit
kubectl get csinode -o json | jq -r '
  .items[] | .metadata.name as $node |
  .spec.drivers[]? | select(.name=="ebs.csi.aws.com") |
  "\($node) max=\(.allocatable.count)"
'

# Count currently attached volumes per node
kubectl get volumeattachments -o json | jq -r '
  [.items[] | select(.status.attached==true) | .spec.nodeName] |
  group_by(.) | map({node: .[0], count: length}) | .[]
  | "\(.node): \(.count) attached"
'

Volume Limit Exhaustion Causes Pending Pods

When a node's volume attachment limit is reached, any pod requiring a new EBS/PD/Azure Disk volume cannot be scheduled to that node. The scheduler event reads: 0/10 nodes are available: 10 node(s) exceed max volume count. Mitigate by right-sizing instances, using NVMe local storage for scratch, or horizontally distributing volumes across more nodes.

CSINode Object Deep Dive

apiVersion: storage.k8s.io/v1
kind: CSINode
metadata:
  name: ip-10-0-1-100.ec2.internal
  ownerReferences:
  - apiVersion: v1
    kind: Node
    name: ip-10-0-1-100.ec2.internal
    uid: abc123...
spec:
  drivers:
  - name: ebs.csi.aws.com
    nodeID: i-0abc123def456789   # driver's internal ID for this node (EC2 instance ID)
    topologyKeys:
    - topology.kubernetes.io/zone
    allocatable:
      count: 25                  # max EBS volumes attachable; set by NodeGetInfo MaxVolumesPerNode
  - name: efs.csi.aws.com
    nodeID: i-0abc123def456789
    topologyKeys: []             # EFS is regional, no topology key
    # allocatable: not set for NFS-based drivers (no per-node attachment limit)

Capacity Staleness and Race Conditions

CSIStorageCapacity objects are snapshots of available capacity at the time the external-provisioner last polled the driver. Between poll cycles, actual capacity can decrease (other PVCs provisioned outside this cluster, quota changes) or increase (volumes deleted). This means the scheduler may make decisions based on stale data.

Scenario	Effect	Mitigation
Capacity decreases between poll cycles	Scheduler sends pod to zone, provisioner fails — PVC stuck Pending with ProvisioningFailed	Reduce `--capacity-poll-interval`; provisioner retries with exponential backoff
Capacity increases between poll cycles	Pod not scheduled to zone that now has capacity	Wait for next poll cycle; or manually trigger provisioner resync
Two clusters sharing same storage backend	Cluster A schedules based on capacity Cluster B is about to consume	Reserve capacity quotas per cluster at storage layer
Provisioner pod restarts	All CSIStorageCapacity objects GC'd (owner ref); scheduler reverts to optimistic until republished	Fast provisioner restart; owner ref at Deployment level (`--capacity-ownerref-level=2`)

# Reduce poll interval for environments with rapid capacity changes
args:
- --capacity-poll-interval=30s   # more aggressive polling (increases driver API calls)

# Check when capacity was last updated
kubectl get csistoragecapacity -A \
  -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.creationTimestamp}{"\n"}{end}'

Local Storage Capacity with TopoLVM

TopoLVM is a CSI driver that provisions LVM logical volumes from local NVMe disks on each node. It publishes per-node CSIStorageCapacity objects, enabling the scheduler to choose nodes with sufficient local disk space before committing a pod.

TopoLVM capacity flow: Node 1: 800 Gi available LVM VG Node 2: 200 Gi available LVM VG Node 3: 1.5 Ti available LVM VG CSIStorageCapacity objects (node-scoped): ┌─────────────────────────────────────────────────────┐ │ storageClassName: topolvm-provisioner │ │ nodeTopology: {kubernetes.io/hostname: node1} │ │ capacity: 800Gi │ ├─────────────────────────────────────────────────────┤ │ storageClassName: topolvm-provisioner │ │ nodeTopology: {kubernetes.io/hostname: node2} │ │ capacity: 200Gi │ ├─────────────────────────────────────────────────────┤ │ storageClassName: topolvm-provisioner │ │ nodeTopology: {kubernetes.io/hostname: node3} │ │ capacity: 1.5Ti │ └─────────────────────────────────────────────────────┘ Pod requesting 400Gi PVC on topolvm-provisioner: → Node 2 filtered (200Gi < 400Gi) → Node 1 passes (800Gi >= 400Gi) → Node 3 passes and scores higher (more headroom) → Pod scheduled to node3

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: topolvm-provisioner
provisioner: topolvm.io
volumeBindingMode: WaitForFirstConsumer
storageCapacity: true    # enables per-node capacity filter
parameters:
  "csi.storage.k8s.io/fstype": xfs
  # device-class: ssd   # target a specific LVM VG group on the node

ResourceQuota for Storage

Kubernetes ResourceQuota can limit the total storage consumed by PVCs in a namespace, both in aggregate and per StorageClass. This is the primary capacity governance mechanism for multi-tenant clusters.

Basic Storage Quota

apiVersion: v1
kind: ResourceQuota
metadata:
  name: storage-quota
  namespace: team-alpha
spec:
  hard:
    # Total PVC count across all storage classes
    persistentvolumeclaims: "50"

    # Total storage across all storage classes
    requests.storage: "10Ti"

    # Per-StorageClass limits (format: {storageclass}.storageclass.storage.k8s.io/{resource})
    ebs-gp3.storageclass.storage.k8s.io/persistentvolumeclaims: "20"
    ebs-gp3.storageclass.storage.k8s.io/requests.storage: "5Ti"

    ebs-io2.storageclass.storage.k8s.io/persistentvolumeclaims: "5"
    ebs-io2.storageclass.storage.k8s.io/requests.storage: "500Gi"

    # Ephemeral storage (requests only, not limits)
    requests.ephemeral-storage: "100Gi"
    limits.ephemeral-storage: "200Gi"

LimitRange for PVC Sizes

apiVersion: v1
kind: LimitRange
metadata:
  name: storage-limits
  namespace: team-alpha
spec:
  limits:
  - type: PersistentVolumeClaim
    max:
      storage: 1Ti    # no single PVC can request more than 1Ti
    min:
      storage: 1Gi    # no PVC smaller than 1Gi (prevents waste from tiny claims)

Checking Quota Consumption

# See current storage quota usage in a namespace
kubectl describe resourcequota storage-quota -n team-alpha

# Example output:
# Resource                                               Used   Hard
# --------                                               ----   ----
# ebs-gp3.storageclass.storage.k8s.io/requests.storage  3Ti    5Ti
# persistentvolumeclaims                                 18     50
# requests.storage                                       3.2Ti  10Ti

# Across all namespaces: find namespaces near storage quota
kubectl get resourcequota -A -o json | jq -r '
  .items[] |
  .metadata.namespace as $ns |
  .status.hard | to_entries[] |
  select(.key | contains("storage")) |
  "\($ns)\t\(.key)\t\(.value)"
' | column -t

Multi-Zone Capacity Planning

For clusters spanning multiple availability zones with zone-scoped storage (EBS, GCE PD, Azure Disk), each zone has independent storage capacity. Imbalanced workloads or a zone outage can leave remaining zones over-provisioned relative to their storage quotas.

Zone Capacity Health Check Script

#!/bin/bash
# Print available capacity per zone per StorageClass
echo "StorageClass | Zone | Available Capacity"
echo "-------------|------|-------------------"
kubectl get csistoragecapacity -A -o json | jq -r '
  .items[] |
  [
    .storageClassName,
    (.nodeTopology.matchLabels // {} | to_entries | map("\(.key)=\(.value)") | join(",")),
    (.capacity // "unknown")
  ] | @tsv
' | sort | column -t -s $'\t'

Topology-Aware PVC Placement

# Force a PVC to a specific zone using allowedTopologies on StorageClass
# (Useful for StatefulSets with zone-pinned nodes)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-gp3-us-east-1a
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
storageCapacity: true
parameters:
  type: gp3
allowedTopologies:
- matchLabelExpressions:
  - key: topology.kubernetes.io/zone
    values:
    - us-east-1a    # constrain to single zone for data locality

Zone-Pinned StorageClasses Reduce Availability

Constraining a StorageClass to a single zone means if that zone runs out of capacity or goes down, provisioning will fail. Use zone-specific StorageClasses only for workloads with strict data locality requirements (e.g., a database replica that must co-locate with a specific Kafka broker). For general workloads, let the scheduler distribute across zones.

Capacity Monitoring

Key Metrics

Metric	Source	What to Watch
`kubelet_volume_stats_capacity_bytes`	kubelet	Total filesystem capacity of each mounted PVC (from NodeGetVolumeStats)
`kubelet_volume_stats_used_bytes`	kubelet	Used bytes; ratio used/capacity alerts at 80% and 90%
`kubelet_volume_stats_available_bytes`	kubelet	Remaining bytes; complements used_bytes
`kubelet_volume_stats_inodes_used`	kubelet	Inode exhaustion is independent of byte usage; many small files can exhaust inodes
`kube_persistentvolumeclaim_resource_requests_storage_bytes`	kube-state-metrics	Requested storage per PVC; sum by namespace for allocation accounting

Alerting Rules

groups:
- name: storage-capacity
  rules:
  - alert: PVCDiskUsageWarning
    expr: |
      (
        kubelet_volume_stats_used_bytes
        / kubelet_volume_stats_capacity_bytes
      ) > 0.80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "PVC {{ $labels.persistentvolumeclaim }} in {{ $labels.namespace }} is >80% full"
      description: "{{ $value | humanizePercentage }} used. Expand PVC or clean up data."

  - alert: PVCDiskUsageCritical
    expr: |
      (
        kubelet_volume_stats_used_bytes
        / kubelet_volume_stats_capacity_bytes
      ) > 0.90
    for: 2m
    labels:
      severity: critical

  - alert: PVCInodeExhaustion
    expr: |
      (
        kubelet_volume_stats_inodes_used
        / kubelet_volume_stats_inodes
      ) > 0.90
    for: 5m
    annotations:
      summary: "PVC {{ $labels.persistentvolumeclaim }} inode usage >90%"
      description: "Inode exhaustion causes 'no space left on device' even with bytes available."

  - alert: NodeVolumeAttachmentLimit
    expr: |
      # Custom metric requiring VolumeAttachment count vs CSINode allocatable
      # Proxy: alert when a node has >20 VolumeAttachments (adjust per instance type)
      count by (nodeName) (
        kube_volumeattachment_info{attacher="ebs.csi.aws.com"}
      ) > 20
    for: 1m
    annotations:
      summary: "Node {{ $labels.nodeName }} has >20 EBS volumes attached"

Grafana Dashboard Queries

# Top 10 largest PVCs by requested storage
topk(10,
  kube_persistentvolumeclaim_resource_requests_storage_bytes
)

# PVC fill rate — estimated time to full (hours)
(
  kubelet_volume_stats_available_bytes
  / deriv(kubelet_volume_stats_used_bytes[1h])
) / 3600

# Total provisioned storage per StorageClass across cluster
sum by (storageclass) (
  kube_persistentvolumeclaim_resource_requests_storage_bytes
  * on(persistentvolumeclaim, namespace)
  group_left(storageclass) kube_persistentvolumeclaim_info
)

Runbooks

Pod Stuck: "exceed max volume count"

Check node VolumeAttachment count vs CSINode.spec.drivers[*].allocatable.count. If at limit: cordon the node and reschedule pods to free attachments, or use a larger instance type with higher volume limit. For AWS: Nitro-based instances allow more NVMe EBS volumes.

PVC Stuck Pending: ProvisioningFailed

Describe PVC for events: kubectl describe pvc NAME. If ProvisioningFailed: no capacity in zone X: check kubectl get csistoragecapacity -A for zone capacity. If zone is full: expand quota at storage layer, delete unused PVs, or use a different zone via allowedTopologies.

PVC >80% Full — Online Expansion

Edit PVC: kubectl edit pvc NAME, increase spec.resources.requests.storage. StorageClass needs allowVolumeExpansion: true. Watch for condition FileSystemResizePending — cleared after pod restart triggers filesystem resize. Monitor kubelet_volume_stats_capacity_bytes to confirm expansion.

Inode Exhaustion

Confirm: kubectl exec POD -- df -i /mount/path. If inodes full but bytes free: many small files (logs, cache). Fix: delete small files, or for ext4 volumes resize to get more inodes — but inode count is set at mkfs time, requiring volume replacement for ext4. xfs automatically scales inodes with capacity.

CSIStorageCapacity Objects Stale / Missing

Check provisioner pod: kubectl logs -n kube-system deploy/ebs-csi-controller -c csi-provisioner | grep capacity. If GetCapacity: Unimplemented: driver doesn't support it — capacity tracking unavailable. If objects missing after provisioner restart: wait for next poll cycle or reduce --capacity-poll-interval.

Best Practices

Enable storageCapacity: true on all zone-scoped StorageClasses — pair it with volumeBindingMode: WaitForFirstConsumer. This prevents the most common cause of ProvisioningFailed in multi-zone clusters: scheduler committing to a full zone before the provisioner discovers there's no space.
Verify your CSI driver implements GetCapacity — check ControllerGetCapabilities response or driver release notes. AWS EBS CSI (≥1.13), GCE PD CSI (≥1.7), and TopoLVM support it; many in-tree-replaced drivers do not.
Set ResourceQuota per namespace per StorageClass — prevent single teams from exhausting shared storage pools. Use both requests.storage (bytes) and persistentvolumeclaims (count) limits. Enforce a LimitRange minimum (e.g., 1Gi) to prevent dozens of trivially small PVCs that waste API objects.
Monitor inode usage alongside byte usage — kubelet_volume_stats_inodes_used / kubelet_volume_stats_inodes. Inode exhaustion produces the same error as byte exhaustion but is invisible to byte-only monitoring. Especially relevant for log directories and package caches.
Use xfs instead of ext4 for large volumes where inode density matters — xfs dynamically allocates inodes from free space; ext4 fixes inode count at mkfs time. For volumes holding many small files (log aggregators, CI artifact stores), xfs avoids inode exhaustion surprises.
Plan for node volume attachment headroom — for AWS, budget 20 EBS volumes per node as a safe limit (leaving headroom for OS volumes, instance store NVMe). Use larger instance types (m5.4xlarge vs m5.large) or NVMe local storage for scratch to free attachment slots for persistent data.
Reduce --capacity-poll-interval in rapidly-changing environments — default 1 minute is acceptable for most clusters. In high-churn test environments where PVCs are created and deleted constantly, shorten to 30 seconds to keep capacity data fresh. Weigh against increased driver API calls (AWS EC2 DescribeVolumes rate limits).
Alert on fill rate, not just utilization — a volume going from 50% to 90% in one hour is more urgent than a volume at 85% stable. Use deriv(kubelet_volume_stats_used_bytes[1h]) to estimate time-to-full and page on <4 hours remaining regardless of current percentage.

← Previous Stateful Storage Patterns Next → Workloads Overview