Volumes

Complete reference for every Kubernetes volume type — from ephemeral scratch directories to NFS network shares — with mount mechanics, subPath gotchas, init container patterns, and the full lifecycle of a volume from pod creation to deletion.

Section 04 of 13 File 2 of 8 Platform Engineer

What This Page Covers

How volumes differ from container filesystem (COW layer) — mount semantics

Volume mount lifecycle: kubelet NodeStage → NodePublish → container start → container stop → NodeUnpublish → NodeUnstage

emptyDir — disk vs RAM medium, sizeLimit, multi-container sharing, cache patterns

configMap volume — keys as files, defaultMode, items projection, optional flag, update propagation timing

secret volume — tmpfs backing, defaultMode, items projection, immutable secrets, update propagation

downwardAPI volume — fieldRef (name/namespace/labels/annotations/podIP/serviceAccountName) vs resourceFieldRef (limits/requests)

projected volume — combining all four sources; serviceAccountToken with audience + expirationSeconds

hostPath — type options (Directory/DirectoryOrCreate/File/FileOrCreate/Socket/CharDevice/BlockDevice); security risks; legitimate use cases (DaemonSets, node-local monitoring)

NFS volume — server/path/readOnly; mount options; connection pooling; no dynamic provisioner

iscsi volume — targetPortal/iqn/lun/fsType; CHAP authentication; multipath

fc (Fibre Channel) — targetWWNs/lun/fsType; prerequisites

cephfs / rbd (in-tree, deprecated path to CSI)

generic ephemeral volumes — inline volumeClaimTemplate; owner reference auto-cleanup

CSI ephemeral volumes — driver volumeLifecycleModes:Ephemeral; Secrets Store CSI Driver pattern

image volumes (alpha 1.31) — OCI image as read-only volume mount

subPath — mounting a single file/directory from a volume; subPathExpr with env vars; limitations with ConfigMap live-reload

volumeMounts fields — mountPath, readOnly, mountPropagation (None/HostToContainer/Bidirectional)

mountPropagation deep dive — None (isolated), HostToContainer (see host mounts), Bidirectional (propagate to host, requires privileged)

Init containers and shared volumes — data population patterns, wait-for patterns

Sidecar containers and shared volumes — log shipping, metrics scraping via shared emptyDir

Volume ownership and fsGroup — fsGroupChangePolicy (Always vs OnRootMismatch); supplementalGroups

Immutable ConfigMaps and Secrets — immutable: true; performance benefit (kubelet stops watching)

Volume size limits — emptyDir sizeLimit enforcement; ephemeral storage limit interaction

Deprecated and removed volume types — gitRepo (removed), flocker (removed), glusterfs (removed 1.26), azureFile/azureDisk (removed 1.27 in-tree), awsElasticBlockStore (removed 1.27 in-tree)

5 troubleshooting runbooks — volume not updating, permission denied on mount, wrong owner, subPath blocks ConfigMap reload, emptyDir memory exhaustion

7 best practices

How Volumes Work

A Kubernetes volume is a named storage object declared at the pod level and mounted into one or more containers within that pod. Unlike a container's ephemeral writable layer (which disappears on container restart), a volume's lifecycle is tied to the pod — it persists across container restarts but is cleaned up when the pod is deleted (for ephemeral volumes) or detached (for persistent volumes).

POD
├── spec.volumes[]          ← named volume declarations (pod-scoped)
│   ├── name: config        ← referenced by containers
│   ├── name: data
│   └── name: tmp
│
└── spec.containers[]
    └── container
        └── volumeMounts[]  ← bind declared volumes into container paths
            ├── name: config   mountPath: /etc/app
            ├── name: data     mountPath: /var/data
            └── name: tmp      mountPath: /tmp

Volume lifecycle (kubelet perspective):
  Pod scheduled → NodeStageVolume (format/mount to staging path if block)
               → NodePublishVolume (bind-mount into pod directory)
               → containers start with /proc/mounts showing the volume
  Pod deleted  → containers stop
               → NodeUnpublishVolume (unmount from pod dir)
               → NodeUnstageVolume (unmount from staging dir)
               → volume reclaimed (ephemeral: deleted; persistent: detached)

Volumes are not container-scoped. Two containers in the same pod sharing a volume see exactly the same bytes — writes by one are immediately visible to the other.

emptyDir

Ephemeral

An empty directory created by kubelet when the pod is assigned to a node. All containers in the pod can read and write to it. Deleted when the pod is removed from the node (not just when a container crashes — the volume survives container restarts).

volumes:
- name: cache
  emptyDir:
    sizeLimit: 512Mi    # optional: enforced via ephemeral storage eviction
    medium: ""          # "" = node disk (default); "Memory" = tmpfs (RAM)

containers:
- name: app
  volumeMounts:
  - name: cache
    mountPath: /var/cache/app

Memory-backed emptyDir

Setting medium: Memory mounts a tmpfs filesystem. Data lives in RAM — extremely fast but counts against the container's memory limit:

volumes:
- name: shared-mem
  emptyDir:
    medium: Memory
    sizeLimit: 256Mi    # prevents this emptyDir from using more than 256Mi of RAM

⚠️

Memory medium counts against container memory A tmpfs emptyDir counts against the node's memory, but in older Kubernetes (<1.22) it did not count against the container's memory limit. Since 1.22, the kubelet includes the tmpfs usage in the container's memory accounting. If your container writes large amounts to a Memory emptyDir without a sizeLimit, it can trigger OOMKill.

Common emptyDir Patterns

Pattern	Description	Example
Build cache	Compiler/tool cache shared across build steps	Maven `~/.m2`, pip cache
Log sharing	App writes logs; sidecar reads and ships to Fluentd/Loki	App → `/var/log` ← Fluent Bit sidecar
Scratch space	Temp files during data processing (avoids container layer writes)	ETL job staging area
IPC socket	App and sidecar communicate via Unix socket on shared emptyDir	Envoy uses `/tmp/agent.sock`
Init → main handoff	Init container writes config/data; main container consumes	git clone → app reads code

configMap Volume

Ephemeral

Mounts ConfigMap keys as files inside the container. The files are updated automatically when the ConfigMap changes (within the kubelet sync period, typically 1–2 minutes).

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  app.yaml: |
    port: 8080
    log_level: info
  nginx.conf: |
    worker_processes auto;
---
volumes:
- name: config
  configMap:
    name: app-config
    defaultMode: 0644       # file permissions (octal)
    items:                   # optional: project only specific keys
    - key: app.yaml
      path: config/app.yaml  # path within the mountPath directory
      mode: 0640             # per-file override
    optional: false          # if true, pod starts even if ConfigMap missing

Without items, every key becomes a file at the root of mountPath. The directory listing mirrors the ConfigMap's data keys exactly.

Update Propagation

Kubelet syncs ConfigMap volume content on a period governed by --sync-frequency (default 1 minute) plus an additional jitter. The update path:

ConfigMap updated in etcd via API server
Kubelet's reflector detects the change (watch event)
Kubelet writes new content to a temporary directory (atomic rename)
The ..data symlink in the volume is atomically swapped to point to the new directory
Application reads the updated files (if it re-reads on change — inotify works on the symlink target)

⚠️

subPath blocks ConfigMap live-reload If you use subPath to mount a single key as a specific file path (e.g., /etc/nginx/nginx.conf), the atomic symlink swap does not apply — kubelet writes the file directly. Updates will not be propagated to a running container. The file is fixed at pod start time. Use a full directory mount and configure your app to read from the directory, or restart the pod on ConfigMap change.

secret Volume

Ephemeral

Mounts Secret keys as files. Identical to configMap volumes in mechanics, with two important differences: the files are backed by a tmpfs mount on the node (data never written to disk), and the default permission mode is 0644 (you should lower this to 0400 for credentials).

volumes:
- name: tls-certs
  secret:
    secretName: my-tls-secret
    defaultMode: 0400        # read-only for owner; recommended for credentials
    items:
    - key: tls.crt
      path: server.crt
    - key: tls.key
      path: server.key
    optional: false

Immutable Secrets and ConfigMaps

Setting immutable: true on a Secret or ConfigMap prevents any updates to its data. Kubelet stops watching immutable objects — removing the watch overhead for large clusters with many ConfigMaps/Secrets.

apiVersion: v1
kind: Secret
metadata:
  name: static-tls-creds
immutable: true    # cannot be changed after creation; must delete and recreate
stringData:
  api-key: "supersecret"

💡

Use immutable for versioned configs Bundle config version in the name (app-config-v3) and mark it immutable. Update the Deployment to reference the new name. This gives you a clear history, prevents accidental mutation, and reduces kubelet overhead. Rollback = point Deployment back to the previous name.

downwardAPI Volume

Ephemeral

Exposes pod and container metadata as files. The same information available via environment variable downward API, but as files — suitable for larger values like labels and annotations that can exceed env var size limits.

volumes:
- name: pod-info
  downwardAPI:
    defaultMode: 0444
    items:
    - path: pod-name
      fieldRef:
        fieldPath: metadata.name
    - path: pod-namespace
      fieldRef:
        fieldPath: metadata.namespace
    - path: pod-ip
      fieldRef:
        fieldPath: status.podIP
    - path: node-name
      fieldRef:
        fieldPath: spec.nodeName
    - path: labels
      fieldRef:
        fieldPath: metadata.labels    # all labels as key="value"\n pairs
    - path: annotations
      fieldRef:
        fieldPath: metadata.annotations
    - path: cpu-limit
      resourceFieldRef:
        containerName: app
        resource: limits.cpu
        divisor: "1m"               # express in millicores
    - path: mem-request
      resourceFieldRef:
        containerName: app
        resource: requests.memory
        divisor: "1Mi"

Available fieldRef Fields

fieldPath	Value	Live Updates?
`metadata.name`	Pod name	No
`metadata.namespace`	Pod namespace	No
`metadata.uid`	Pod UID	No
`metadata.labels`	All labels as `key="value"` pairs, one per line	Yes — kubelet updates file on label change
`metadata.annotations`	All annotations, same format	Yes
`spec.nodeName`	Node the pod is scheduled on	No
`spec.serviceAccountName`	Service account name	No
`status.podIP`	Primary pod IP	Yes
`status.hostIP`	Node IP	No

projected Volume

Ephemeral

A projected volume combines multiple sources — configMap, secret, downwardAPI, and serviceAccountToken — into a single directory mount. All sources appear as files at the same mount point.

volumes:
- name: combined
  projected:
    defaultMode: 0444
    sources:
    - configMap:
        name: app-config
        items:
        - key: app.yaml
          path: config/app.yaml
    - secret:
        name: db-creds
        items:
        - key: password
          path: secrets/db-password
          mode: 0400
    - downwardAPI:
        items:
        - path: meta/pod-name
          fieldRef:
            fieldPath: metadata.name
    - serviceAccountToken:
        audience: api            # audience for the token (OIDC aud claim)
        expirationSeconds: 3600  # token rotated automatically by kubelet
        path: token/sa-token

serviceAccountToken in projected Volume

This is the standard mechanism for injecting service account tokens into pods (replacing the legacy auto-mounted /var/run/secrets/kubernetes.io/serviceaccount/token). The token is a bound service account token (audience + expiry) and is rotated by kubelet before expiry without restarting the pod. The kubelet fetches a fresh token via the TokenRequest API and atomically replaces the file.

volumes:
- name: kube-api-access
  projected:
    sources:
    - serviceAccountToken:
        expirationSeconds: 3607         # kubelet rotates at 80% of expiry
        path: token
    - configMap:
        name: kube-root-ca.crt          # cluster CA bundle
        items:
        - key: ca.crt
          path: ca.crt
    - downwardAPI:
        items:
        - path: namespace
          fieldRef:
            fieldPath: metadata.namespace

ℹ️

Why projected replaces the legacy token mount The legacy secret-based token mount (auto-created kubernetes.io/service-account-token Secret) has no expiry. The projected serviceAccountToken is audience-bound and expires — a compromised token is only valid for <1 hour rather than indefinitely. Use automountServiceAccountToken: false + an explicit projected volume when you need token control.

hostPath

Ephemeral

Mounts a file or directory from the host node's filesystem directly into the container. Powerful but dangerous — misuse allows container escape to the node.

volumes:
- name: docker-sock
  hostPath:
    path: /var/run/docker.sock
    type: Socket

- name: host-logs
  hostPath:
    path: /var/log/pods
    type: Directory

- name: created-dir
  hostPath:
    path: /mnt/fast-ssd/myapp
    type: DirectoryOrCreate   # creates the directory if it doesn't exist

hostPath Type Options

type	Behavior
`""` (empty)	No check — mount whatever exists (or nothing) at the path
`DirectoryOrCreate`	Create directory with 0755 if it doesn't exist; fail if path is a file
`Directory`	Path must exist and be a directory; fail otherwise
`FileOrCreate`	Create empty file with 0644 if it doesn't exist; fail if path is a directory
`File`	Path must exist and be a regular file
`Socket`	Path must exist and be a Unix socket
`CharDevice`	Path must be a character device
`BlockDevice`	Path must be a block device

🔴

hostPath security risks
• Mounting /, /etc, or /var/run/docker.sock gives the container full node access.
• hostPath volumes bypass Kubernetes storage quotas and LimitRanges entirely.
• Pod scheduled to a different node sees a different (possibly missing) path — workloads are not portable.
• Restrict with PodSecurity admission (Restricted profile blocks hostPath) or OPA/Kyverno policies.

Legitimate hostPath Use Cases

DaemonSets collecting node-level metrics (/proc, /sys, /var/log)
Container runtime socket for container management tools (Falco, Portainer, node-level CRI tools)
Node-local storage benchmarking tools
CNI plugin configuration on /etc/cni/net.d

NFS Volume

Network

Mounts an NFS export directly into the pod. No CSI driver required — the NFS client is built into the Linux kernel. Supports RWX (ReadWriteMany) natively.

volumes:
- name: nfs-share
  nfs:
    server: nfs-server.prod.svc.cluster.local   # NFS server hostname or IP
    path: /exports/shared-data                  # exported path on server
    readOnly: false

ℹ️

NFS mount options The in-tree nfs volume type does not expose mount options. For production NFS with custom options (nfsvers=4.1,rsize=1048576,hard,timeo=600), use a StorageClass backed by the NFS Subdir External Provisioner or a CSI NFS driver, which pass mount options through mountOptions on the StorageClass.

NFS Production Considerations

UID/GID mapping: NFS uses UID/GID for file ownership. If the container runs as UID 1000 but the NFS export expects UID 2000, permission errors occur. Use fsGroup or configure NFS all_squash / anonuid.
NFSv4 vs v3: NFSv4 has stateful connections and simpler firewall rules (single port 2049). NFSv3 uses multiple ports. Default is v4 on modern systems.
Hard vs soft mounts: Hard mounts (default) retry indefinitely on server failure — pods hang but don't corrupt. Soft mounts return EIO on timeout — pods crash but recover quickly.
Connection from pods: The in-tree NFS volume connects from the node, not the pod. Firewall rules must allow node IPs to reach the NFS server on port 2049.

iSCSI Volume

Network

volumes:
- name: iscsi-vol
  iscsi:
    targetPortal: 192.168.1.100:3260   # iSCSI target IP:port
    iqn: iqn.2023-01.com.example:storage.target.1   # iSCSI Qualified Name
    lun: 0                              # LUN number
    fsType: ext4
    readOnly: false
    chapAuthDiscovery: true             # enable CHAP for discovery
    chapAuthSession: true               # enable CHAP per session
    secretRef:
      name: chap-secret                 # Secret with discovery-user/discovery-password

iSCSI volumes are RWO only — a single node can mount them read-write. Use for legacy SAN storage integration. For new deployments, prefer a CSI driver (e.g., iSCSI-based OpenEBS) which handles node failure, topology awareness, and monitoring.

Generic Ephemeral Volumes

Ephemeral

A generic ephemeral volume embeds a PVC template directly in the pod spec. Kubernetes creates the PVC when the pod is created and garbage-collects it when the pod is deleted (via owner reference). This enables ephemeral use of any StorageClass — including cloud SSDs — without pre-creating PVCs.

volumes:
- name: scratch
  ephemeral:
    volumeClaimTemplate:
      metadata:
        labels:
          type: scratch-volume
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: gp3-encrypted
        resources:
          requests:
            storage: 50Gi

The created PVC is named <pod-name>-<volume-name> (e.g., my-pod-scratch). It has an owner reference to the pod, so it is deleted automatically when the pod is deleted. If the pod is part of a ReplicaSet/Deployment, each replica gets its own PVC.

⚠️

Scheduler must be capacity-aware Generic ephemeral volumes with WaitForFirstConsumer StorageClass work correctly — the scheduler accounts for the storage when placing the pod. With Immediate binding, the PVC may provision in the wrong zone. Use WaitForFirstConsumer for zonal storage.

CSI Ephemeral Volumes

Ephemeral

An inline CSI volume — no PVC or PV objects are created. The CSI driver must declare volumeLifecycleModes: [Ephemeral] in its CSIDriver object. The most prominent use case is the Secrets Store CSI Driver, which mounts secrets from Vault, AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager as files.

volumes:
- name: secrets-store
  csi:
    driver: secrets-store.csi.k8s.io
    readOnly: true
    volumeAttributes:
      secretProviderClass: my-aws-secrets   # SecretProviderClass CRD reference

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: my-aws-secrets
spec:
  provider: aws
  parameters:
    objects: |
      - objectName: "prod/db/password"
        objectType: "secretsmanager"
        objectAlias: "db-password"
      - objectName: "prod/tls/cert"
        objectType: "secretsmanager"
        objectAlias: "tls.crt"

The Secrets Store CSI Driver also supports syncing the mounted secret into a Kubernetes Secret object (for use as env vars or imagePullSecrets), configurable via secretObjects in the SecretProviderClass.

Image Volumes (Alpha, 1.31+)

Ephemeral

Image volumes (alpha in 1.31, requires ImageVolume feature gate) mount an OCI container image as a read-only volume. Useful for distributing large, immutable datasets, ML models, or binary assets packaged as OCI images without including them in the application container image.

volumes:
- name: ml-model
  image:
    reference: registry.example.com/models/resnet50:v3
    pullPolicy: IfNotPresent

The image is pulled by the container runtime and mounted read-only. No write access. The image's filesystem layers are overlaid exactly as they are in a container image (overlayfs), but exposed as a bind-mount into the pod.

subPath and subPathExpr

By default, a volumeMount mounts the entire volume at the target path. subPath mounts only a specific file or subdirectory from within the volume, while still mounting it at mountPath.

Common subPath Use Cases

Multiple containers, same volume, different subdirs

volumes:
- name: data
  emptyDir: {}

containers:
- name: app
  volumeMounts:
  - name: data
    mountPath: /app/data
    subPath: app        # mounts data/app/ → /app/data

- name: sidecar
  volumeMounts:
  - name: data
    mountPath: /sidecar/data
    subPath: sidecar    # mounts data/sidecar/ → /sidecar/data

Single ConfigMap key to specific file path

volumes:
- name: nginx-conf
  configMap:
    name: nginx-config

containers:
- name: nginx
  volumeMounts:
  - name: nginx-conf
    mountPath: /etc/nginx/nginx.conf
    subPath: nginx.conf  # mount only key nginx.conf
                         # NOTE: live-reload does NOT work

subPathExpr

subPathExpr uses environment variable expansion to build the subPath dynamically. Requires $(VAR_NAME) syntax — not shell $VAR.

env:
- name: POD_NAME
  valueFrom:
    fieldRef:
      fieldPath: metadata.name

volumeMounts:
- name: shared-logs
  mountPath: /var/log/pods
  subPathExpr: $(POD_NAME)    # each pod writes to its own subdirectory

⚠️

subPath limitations
• ConfigMap/Secret live-reload (atomic symlink swap) does not work with subPath. The file is fixed at pod creation.
• You cannot use subPath with a projected volume's serviceAccountToken source.
• subPathExpr requires the env var to be defined in the same container's env field (not just set in the environment).

volumeMounts Fields

volumeMounts:
- name: data                     # must match a volume name in spec.volumes
  mountPath: /var/data           # absolute path inside container
  subPath: ""                    # optional: subdirectory within volume
  readOnly: false                # default false; true = read-only bind mount
  mountPropagation: None         # None | HostToContainer | Bidirectional

mountPropagation

Controls whether mount events (new bind mounts) inside the container or on the host are visible across the boundary:

Value	Container sees host mounts?	Host sees container mounts?	Use Case
`None` (default)	No (isolation)	No	99% of workloads — complete isolation
`HostToContainer`	Yes — new mounts on host under this path are visible	No	Monitoring agents that need to see node-level mounts (e.g., cAdvisor reading /proc/mounts)
`Bidirectional`	Yes	Yes — container mounts propagate to host	FUSE filesystems, CSI node plugins that mount on behalf of other pods. Requires `privileged: true`.

🔴

Bidirectional requires privileged Kubernetes will reject a Bidirectional mountPropagation unless the container has securityContext.privileged: true. Bidirectional propagation means the container can create mounts visible to the host — a significant privilege. Only use it for CSI node plugin DaemonSets that explicitly need it.

Init Containers and Shared Volumes

Init containers run to completion before any app containers start. They share the same pod volumes — the classic pattern is an init container that populates a volume, which the main container then reads from.

Init Container Volume Patterns

Pattern 1: git clone into emptyDir

initContainers:
- name: git-clone
  image: alpine/git:latest
  command: [git, clone, "https://github.com/org/app-config", /config]
  volumeMounts:
  - name: config-data
    mountPath: /config

containers:
- name: app
  image: myapp:latest
  volumeMounts:
  - name: config-data
    mountPath: /app/config
    readOnly: true

volumes:
- name: config-data
  emptyDir: {}

Pattern 2: Wait for dependency + write signal file

initContainers:
- name: wait-for-db
  image: busybox
  command:
  - sh
  - -c
  - |
    until nc -z postgres-svc 5432; do
      echo "waiting for database..."; sleep 2
    done
    echo "DB ready" > /signal/ready
  volumeMounts:
  - name: signal
    mountPath: /signal

containers:
- name: app
  volumeMounts:
  - name: signal
    mountPath: /signal
    readOnly: true

volumes:
- name: signal
  emptyDir: {}

Pattern 3: Certificate generation

initContainers:
- name: cert-gen
  image: cfssl/cfssl
  command: [/bin/sh, -c, "cfssl gencert ... | cfssljson -bare /certs/server"]
  volumeMounts:
  - name: certs
    mountPath: /certs

containers:
- name: app
  volumeMounts:
  - name: certs
    mountPath: /etc/ssl/app
    readOnly: true

volumes:
- name: certs
  emptyDir:
    medium: Memory    # certs in RAM — never hit disk

Sidecar Containers and Shared Volumes

Kubernetes 1.29 introduced native sidecar support via initContainers with restartPolicy: Always — sidecars start before main containers and stay running. The log-shipping pattern is the canonical use case:

initContainers:
- name: log-shipper           # native sidecar: restartPolicy: Always
  restartPolicy: Always
  image: fluent/fluent-bit:latest
  volumeMounts:
  - name: log-dir
    mountPath: /var/log/app
    readOnly: true
  - name: fluent-config
    mountPath: /fluent-bit/etc

containers:
- name: app
  image: myapp:latest
  volumeMounts:
  - name: log-dir
    mountPath: /var/log/app   # app writes here; sidecar reads from same dir

volumes:
- name: log-dir
  emptyDir: {}
- name: fluent-config
  configMap:
    name: fluent-bit-config

The sidecar starts before the main container (blocking until the sidecar's startup probe passes if configured), and is terminated after the main container exits — ensuring all logs are flushed before the sidecar exits.

Volume Ownership: fsGroup and fsGroupChangePolicy

fsGroup in the pod's securityContext sets the supplemental GID for the pod and chowns all files in mounted volumes to that GID on mount. This solves the common problem where a container running as a non-root user (UID 1000) can't write to a volume provisioned with root ownership.

securityContext:
  runAsUser: 1000
  runAsGroup: 1000
  fsGroup: 2000               # all volume files are chowned to GID 2000
  fsGroupChangePolicy: OnRootMismatch   # default: Always (chown every mount)

fsGroupChangePolicy

Policy	Behavior	Performance
`Always` (default)	Recursively chown all files on every mount — even if ownership is already correct	Slow for large volumes (millions of files)
`OnRootMismatch`	Only chown if the root directory's ownership/permissions don't match the expected fsGroup	Fast after first mount; recommended for large PVCs

⚠️

fsGroup performance with large volumes A PostgreSQL database PVC with millions of files and fsGroupChangePolicy: Always will spend minutes chowning files on every pod restart. This blocks the container from starting and causes spurious CrashLoopBackOff-looking delays. Use OnRootMismatch for database pods, or set the fsGroup correctly at PVC creation time.

supplementalGroups

securityContext:
  fsGroup: 2000
  supplementalGroups: [3000, 4000]   # additional GIDs added to the process's group set

Deprecated and Removed Volume Types

Volume Type	Status	Replacement
`gitRepo`	Removed (1.25+)	Init container with `git clone`
`flocker`	Removed (1.25+)	CSI driver
`glusterfs`	Removed (1.26+)	CSI driver (glusterfs-csi)
`azureFile` (in-tree)	Removed (1.27+)	`file.csi.azure.com` CSI driver
`azureDisk` (in-tree)	Removed (1.27+)	`disk.csi.azure.com` CSI driver
`awsElasticBlockStore` (in-tree)	Removed (1.27+)	`ebs.csi.aws.com` CSI driver
`gcePersistentDisk` (in-tree)	Removed (1.28+)	`pd.csi.storage.gke.io` CSI driver
`cephfs` / `rbd` (in-tree)	Deprecated, target removal 1.31+	`cephfs.csi.ceph.com` / `rbd.csi.ceph.com`
`portworxVolume`	Deprecated 1.25	Portworx CSI driver

🔴

In-tree removal affects existing PVs If you have existing PVs using removed in-tree drivers (e.g., awsElasticBlockStore) and upgrade past 1.27, the API server will reject those PV specs. You must migrate PVs to CSI before upgrading. Use the volume migration controller or manually reprovision.

Volume Size Limits and Ephemeral Storage

emptyDir volumes with sizeLimit set are evicted when usage exceeds the limit. Without sizeLimit, emptyDir is unbounded but counts against the node's ephemeral storage. A container's resources.limits.ephemeral-storage limit applies to the sum of the container's writable layer + log files + all emptyDir volumes the container uses.

containers:
- name: app
  resources:
    requests:
      ephemeral-storage: 1Gi
    limits:
      ephemeral-storage: 2Gi    # enforced by kubelet; evicts pod if exceeded

volumes:
- name: tmp
  emptyDir:
    sizeLimit: 500Mi    # subset of the container's ephemeral storage limit

The kubelet checks ephemeral storage usage periodically (default 1 minute). On eviction, the pod is terminated with Reason: Evicted and Message: Pod ephemeral local storage usage exceeds the total limit of containers.

Troubleshooting Runbooks

Runbook: ConfigMap Volume Not Updating in Container

# Verify the ConfigMap was actually updated
kubectl get cm <name> -o yaml | grep -A 5 data

# Check if subPath is in use — this blocks updates
kubectl get pod <name> -o yaml | grep subPath
# If subPath is present → updates won't propagate → must restart pod

# If no subPath, check kubelet sync delay
# Wait up to 2 minutes after ConfigMap update
# Force a check by annotating the pod to trigger a rollout
kubectl rollout restart deployment/<name>

Runbook: Permission Denied on Volume Mount

# Check what UID/GID the container runs as
kubectl exec -it <pod> -- id
# uid=1000(app) gid=1000(app)

# Check volume file ownership
kubectl exec -it <pod> -- ls -la /var/data
# drwxr-xr-x 2 root root 4096 Jan 1 00:00 .  ← root-owned, GID 0

# Fix: add fsGroup to pod securityContext
# spec.securityContext.fsGroup: 1000
# Then rolling restart

# For PVs provisioned with specific UID: check CSI driver fsGroup support
# fsGroupPolicy: File = chown by kubelet; None = driver handles it

Runbook: emptyDir Memory Exhaustion (OOMKill)

# Symptoms: container OOMKilled despite low heap usage
# Cause: large writes to medium:Memory emptyDir counted in container memory

# Check tmpfs mounts in pod
kubectl exec -it <pod> -- df -h | grep tmpfs

# Add sizeLimit to the emptyDir to cap memory usage
# volumes:
# - name: cache
#   emptyDir:
#     medium: Memory
#     sizeLimit: 256Mi    # prevents this volume from eating container memory

Runbook: Volume Stuck — Previous Pod's Mount Not Cleaned Up

# Symptoms: new pod stuck in ContainerCreating with "already mounted" error
# Cause: previous pod on same node crashed without unmounting volume

# Check node events
kubectl describe node <node> | grep -i mount

# Force delete the stuck pod (use only if pod is truly gone from node)
kubectl delete pod <old-pod> --force --grace-period=0

# If VolumeAttachment is stuck (CSI block volumes)
kubectl get volumeattachment
kubectl delete volumeattachment <stuck-attachment>

# If node is partitioned/unreachable, CSI drivers respect:
# --node-drain-timeout / manual annotation: volume.kubernetes.io/selected-node

Runbook: Wrong fsGroup — Files Not Owned by Expected GID

# Verify pod securityContext
kubectl get pod <name> -o jsonpath='{.spec.securityContext}'

# Check if volume driver supports fsGroup
kubectl get csidriver <driver> -o yaml | grep fsGroupPolicy
# ReadWriteOnceWithFSType = only chown if fsType is set AND accessMode is RWO
# File = always chown (most drivers)
# None = kubelet does NOT chown — driver handles it (e.g., NFS with no-root-squash)

# For NFS: fsGroup has no effect unless nfs driver is configured for root-squash-off
# Set GID at NFS export level instead

Best Practices

Prefer projected volumes over separate configMap/secret/downwardAPI mounts when you need multiple sources — one volume, one directory, less cognitive overhead.
Never use subPath for live-reloaded config. Mount the full directory and configure the application to read from it. Use inotify/fsnotify in the app to watch the directory, not individual files.
Mark configuration ConfigMaps and Secrets immutable when they are versioned. Use the version in the name. This removes kubelet watch overhead and prevents accidental mutation.
Use fsGroupChangePolicy: OnRootMismatch for any PVC with more than a few thousand files. The default Always causes startup delays proportional to file count.
Avoid hostPath in application workloads. Enforce this with a Kyverno or OPA Gatekeeper policy that blocks hostPath except for designated DaemonSet namespaces.
Set sizeLimit on emptyDir volumes used for caches or scratch space. An unbounded emptyDir used by a runaway process can evict the entire pod (and others on the node) via ephemeral storage pressure.
Use native sidecar containers (1.29+) with restartPolicy: Always for log shippers and metric collectors instead of regular sidecars. They have correct startup/shutdown ordering, and pod termination blocks until the sidecar exits.

← Previous Storage Overview Next → Persistent Volumes