Volumes

Complete reference for every Kubernetes volume type — from ephemeral scratch directories to NFS network shares — with mount mechanics, subPath gotchas, init container patterns, and the full lifecycle of a volume from pod creation to deletion.

Section 04 of 13 File 2 of 8 Platform Engineer
What This Page Covers
  • How volumes differ from container filesystem (COW layer) — mount semantics
  • Volume mount lifecycle: kubelet NodeStage → NodePublish → container start → container stop → NodeUnpublish → NodeUnstage
  • emptyDir — disk vs RAM medium, sizeLimit, multi-container sharing, cache patterns
  • configMap volume — keys as files, defaultMode, items projection, optional flag, update propagation timing
  • secret volume — tmpfs backing, defaultMode, items projection, immutable secrets, update propagation
  • downwardAPI volume — fieldRef (name/namespace/labels/annotations/podIP/serviceAccountName) vs resourceFieldRef (limits/requests)
  • projected volume — combining all four sources; serviceAccountToken with audience + expirationSeconds
  • hostPath — type options (Directory/DirectoryOrCreate/File/FileOrCreate/Socket/CharDevice/BlockDevice); security risks; legitimate use cases (DaemonSets, node-local monitoring)
  • NFS volume — server/path/readOnly; mount options; connection pooling; no dynamic provisioner
  • iscsi volume — targetPortal/iqn/lun/fsType; CHAP authentication; multipath
  • fc (Fibre Channel) — targetWWNs/lun/fsType; prerequisites
  • cephfs / rbd (in-tree, deprecated path to CSI)
  • generic ephemeral volumes — inline volumeClaimTemplate; owner reference auto-cleanup
  • CSI ephemeral volumes — driver volumeLifecycleModes:Ephemeral; Secrets Store CSI Driver pattern
  • image volumes (alpha 1.31) — OCI image as read-only volume mount
  • subPath — mounting a single file/directory from a volume; subPathExpr with env vars; limitations with ConfigMap live-reload
  • volumeMounts fields — mountPath, readOnly, mountPropagation (None/HostToContainer/Bidirectional)
  • mountPropagation deep dive — None (isolated), HostToContainer (see host mounts), Bidirectional (propagate to host, requires privileged)
  • Init containers and shared volumes — data population patterns, wait-for patterns
  • Sidecar containers and shared volumes — log shipping, metrics scraping via shared emptyDir
  • Volume ownership and fsGroup — fsGroupChangePolicy (Always vs OnRootMismatch); supplementalGroups
  • Immutable ConfigMaps and Secrets — immutable: true; performance benefit (kubelet stops watching)
  • Volume size limits — emptyDir sizeLimit enforcement; ephemeral storage limit interaction
  • Deprecated and removed volume types — gitRepo (removed), flocker (removed), glusterfs (removed 1.26), azureFile/azureDisk (removed 1.27 in-tree), awsElasticBlockStore (removed 1.27 in-tree)
  • 5 troubleshooting runbooks — volume not updating, permission denied on mount, wrong owner, subPath blocks ConfigMap reload, emptyDir memory exhaustion
  • 7 best practices
  • How Volumes Work

    A Kubernetes volume is a named storage object declared at the pod level and mounted into one or more containers within that pod. Unlike a container's ephemeral writable layer (which disappears on container restart), a volume's lifecycle is tied to the pod — it persists across container restarts but is cleaned up when the pod is deleted (for ephemeral volumes) or detached (for persistent volumes).

    POD
    ├── spec.volumes[]          ← named volume declarations (pod-scoped)
    │   ├── name: config        ← referenced by containers
    │   ├── name: data
    │   └── name: tmp
    │
    └── spec.containers[]
        └── container
            └── volumeMounts[]  ← bind declared volumes into container paths
                ├── name: config   mountPath: /etc/app
                ├── name: data     mountPath: /var/data
                └── name: tmp      mountPath: /tmp
    
    Volume lifecycle (kubelet perspective):
      Pod scheduled → NodeStageVolume (format/mount to staging path if block)
                   → NodePublishVolume (bind-mount into pod directory)
                   → containers start with /proc/mounts showing the volume
      Pod deleted  → containers stop
                   → NodeUnpublishVolume (unmount from pod dir)
                   → NodeUnstageVolume (unmount from staging dir)
                   → volume reclaimed (ephemeral: deleted; persistent: detached)
    

    Volumes are not container-scoped. Two containers in the same pod sharing a volume see exactly the same bytes — writes by one are immediately visible to the other.

    emptyDir

    Ephemeral

    An empty directory created by kubelet when the pod is assigned to a node. All containers in the pod can read and write to it. Deleted when the pod is removed from the node (not just when a container crashes — the volume survives container restarts).

    volumes:
    - name: cache
      emptyDir:
        sizeLimit: 512Mi    # optional: enforced via ephemeral storage eviction
        medium: ""          # "" = node disk (default); "Memory" = tmpfs (RAM)
    
    containers:
    - name: app
      volumeMounts:
      - name: cache
        mountPath: /var/cache/app

    Memory-backed emptyDir

    Setting medium: Memory mounts a tmpfs filesystem. Data lives in RAM — extremely fast but counts against the container's memory limit:

    volumes:
    - name: shared-mem
      emptyDir:
        medium: Memory
        sizeLimit: 256Mi    # prevents this emptyDir from using more than 256Mi of RAM
    ⚠️
    Memory medium counts against container memory A tmpfs emptyDir counts against the node's memory, but in older Kubernetes (<1.22) it did not count against the container's memory limit. Since 1.22, the kubelet includes the tmpfs usage in the container's memory accounting. If your container writes large amounts to a Memory emptyDir without a sizeLimit, it can trigger OOMKill.

    Common emptyDir Patterns

    PatternDescriptionExample
    Build cacheCompiler/tool cache shared across build stepsMaven ~/.m2, pip cache
    Log sharingApp writes logs; sidecar reads and ships to Fluentd/LokiApp → /var/log ← Fluent Bit sidecar
    Scratch spaceTemp files during data processing (avoids container layer writes)ETL job staging area
    IPC socketApp and sidecar communicate via Unix socket on shared emptyDirEnvoy uses /tmp/agent.sock
    Init → main handoffInit container writes config/data; main container consumesgit clone → app reads code

    configMap Volume

    Ephemeral

    Mounts ConfigMap keys as files inside the container. The files are updated automatically when the ConfigMap changes (within the kubelet sync period, typically 1–2 minutes).

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: app-config
    data:
      app.yaml: |
        port: 8080
        log_level: info
      nginx.conf: |
        worker_processes auto;
    ---
    volumes:
    - name: config
      configMap:
        name: app-config
        defaultMode: 0644       # file permissions (octal)
        items:                   # optional: project only specific keys
        - key: app.yaml
          path: config/app.yaml  # path within the mountPath directory
          mode: 0640             # per-file override
        optional: false          # if true, pod starts even if ConfigMap missing

    Without items, every key becomes a file at the root of mountPath. The directory listing mirrors the ConfigMap's data keys exactly.

    Update Propagation

    Kubelet syncs ConfigMap volume content on a period governed by --sync-frequency (default 1 minute) plus an additional jitter. The update path:

    1. ConfigMap updated in etcd via API server
    2. Kubelet's reflector detects the change (watch event)
    3. Kubelet writes new content to a temporary directory (atomic rename)
    4. The ..data symlink in the volume is atomically swapped to point to the new directory
    5. Application reads the updated files (if it re-reads on change — inotify works on the symlink target)
    ⚠️
    subPath blocks ConfigMap live-reload If you use subPath to mount a single key as a specific file path (e.g., /etc/nginx/nginx.conf), the atomic symlink swap does not apply — kubelet writes the file directly. Updates will not be propagated to a running container. The file is fixed at pod start time. Use a full directory mount and configure your app to read from the directory, or restart the pod on ConfigMap change.

    secret Volume

    Ephemeral

    Mounts Secret keys as files. Identical to configMap volumes in mechanics, with two important differences: the files are backed by a tmpfs mount on the node (data never written to disk), and the default permission mode is 0644 (you should lower this to 0400 for credentials).

    volumes:
    - name: tls-certs
      secret:
        secretName: my-tls-secret
        defaultMode: 0400        # read-only for owner; recommended for credentials
        items:
        - key: tls.crt
          path: server.crt
        - key: tls.key
          path: server.key
        optional: false

    Immutable Secrets and ConfigMaps

    Setting immutable: true on a Secret or ConfigMap prevents any updates to its data. Kubelet stops watching immutable objects — removing the watch overhead for large clusters with many ConfigMaps/Secrets.

    apiVersion: v1
    kind: Secret
    metadata:
      name: static-tls-creds
    immutable: true    # cannot be changed after creation; must delete and recreate
    stringData:
      api-key: "supersecret"
    💡
    Use immutable for versioned configs Bundle config version in the name (app-config-v3) and mark it immutable. Update the Deployment to reference the new name. This gives you a clear history, prevents accidental mutation, and reduces kubelet overhead. Rollback = point Deployment back to the previous name.

    downwardAPI Volume

    Ephemeral

    Exposes pod and container metadata as files. The same information available via environment variable downward API, but as files — suitable for larger values like labels and annotations that can exceed env var size limits.

    volumes:
    - name: pod-info
      downwardAPI:
        defaultMode: 0444
        items:
        - path: pod-name
          fieldRef:
            fieldPath: metadata.name
        - path: pod-namespace
          fieldRef:
            fieldPath: metadata.namespace
        - path: pod-ip
          fieldRef:
            fieldPath: status.podIP
        - path: node-name
          fieldRef:
            fieldPath: spec.nodeName
        - path: labels
          fieldRef:
            fieldPath: metadata.labels    # all labels as key="value"\n pairs
        - path: annotations
          fieldRef:
            fieldPath: metadata.annotations
        - path: cpu-limit
          resourceFieldRef:
            containerName: app
            resource: limits.cpu
            divisor: "1m"               # express in millicores
        - path: mem-request
          resourceFieldRef:
            containerName: app
            resource: requests.memory
            divisor: "1Mi"

    Available fieldRef Fields

    fieldPathValueLive Updates?
    metadata.namePod nameNo
    metadata.namespacePod namespaceNo
    metadata.uidPod UIDNo
    metadata.labelsAll labels as key="value" pairs, one per lineYes — kubelet updates file on label change
    metadata.annotationsAll annotations, same formatYes
    spec.nodeNameNode the pod is scheduled onNo
    spec.serviceAccountNameService account nameNo
    status.podIPPrimary pod IPYes
    status.hostIPNode IPNo

    projected Volume

    Ephemeral

    A projected volume combines multiple sources — configMap, secret, downwardAPI, and serviceAccountToken — into a single directory mount. All sources appear as files at the same mount point.

    volumes:
    - name: combined
      projected:
        defaultMode: 0444
        sources:
        - configMap:
            name: app-config
            items:
            - key: app.yaml
              path: config/app.yaml
        - secret:
            name: db-creds
            items:
            - key: password
              path: secrets/db-password
              mode: 0400
        - downwardAPI:
            items:
            - path: meta/pod-name
              fieldRef:
                fieldPath: metadata.name
        - serviceAccountToken:
            audience: api            # audience for the token (OIDC aud claim)
            expirationSeconds: 3600  # token rotated automatically by kubelet
            path: token/sa-token

    serviceAccountToken in projected Volume

    This is the standard mechanism for injecting service account tokens into pods (replacing the legacy auto-mounted /var/run/secrets/kubernetes.io/serviceaccount/token). The token is a bound service account token (audience + expiry) and is rotated by kubelet before expiry without restarting the pod. The kubelet fetches a fresh token via the TokenRequest API and atomically replaces the file.

    volumes:
    - name: kube-api-access
      projected:
        sources:
        - serviceAccountToken:
            expirationSeconds: 3607         # kubelet rotates at 80% of expiry
            path: token
        - configMap:
            name: kube-root-ca.crt          # cluster CA bundle
            items:
            - key: ca.crt
              path: ca.crt
        - downwardAPI:
            items:
            - path: namespace
              fieldRef:
                fieldPath: metadata.namespace
    ℹ️
    Why projected replaces the legacy token mount The legacy secret-based token mount (auto-created kubernetes.io/service-account-token Secret) has no expiry. The projected serviceAccountToken is audience-bound and expires — a compromised token is only valid for <1 hour rather than indefinitely. Use automountServiceAccountToken: false + an explicit projected volume when you need token control.

    hostPath

    Ephemeral

    Mounts a file or directory from the host node's filesystem directly into the container. Powerful but dangerous — misuse allows container escape to the node.

    volumes:
    - name: docker-sock
      hostPath:
        path: /var/run/docker.sock
        type: Socket
    
    - name: host-logs
      hostPath:
        path: /var/log/pods
        type: Directory
    
    - name: created-dir
      hostPath:
        path: /mnt/fast-ssd/myapp
        type: DirectoryOrCreate   # creates the directory if it doesn't exist

    hostPath Type Options

    typeBehavior
    "" (empty)No check — mount whatever exists (or nothing) at the path
    DirectoryOrCreateCreate directory with 0755 if it doesn't exist; fail if path is a file
    DirectoryPath must exist and be a directory; fail otherwise
    FileOrCreateCreate empty file with 0644 if it doesn't exist; fail if path is a directory
    FilePath must exist and be a regular file
    SocketPath must exist and be a Unix socket
    CharDevicePath must be a character device
    BlockDevicePath must be a block device
    🔴
    hostPath security risks
    • Mounting /, /etc, or /var/run/docker.sock gives the container full node access.
    • hostPath volumes bypass Kubernetes storage quotas and LimitRanges entirely.
    • Pod scheduled to a different node sees a different (possibly missing) path — workloads are not portable.
    • Restrict with PodSecurity admission (Restricted profile blocks hostPath) or OPA/Kyverno policies.

    Legitimate hostPath Use Cases

    NFS Volume

    Network

    Mounts an NFS export directly into the pod. No CSI driver required — the NFS client is built into the Linux kernel. Supports RWX (ReadWriteMany) natively.

    volumes:
    - name: nfs-share
      nfs:
        server: nfs-server.prod.svc.cluster.local   # NFS server hostname or IP
        path: /exports/shared-data                  # exported path on server
        readOnly: false
    ℹ️
    NFS mount options The in-tree nfs volume type does not expose mount options. For production NFS with custom options (nfsvers=4.1,rsize=1048576,hard,timeo=600), use a StorageClass backed by the NFS Subdir External Provisioner or a CSI NFS driver, which pass mount options through mountOptions on the StorageClass.

    NFS Production Considerations

    iSCSI Volume

    Network
    volumes:
    - name: iscsi-vol
      iscsi:
        targetPortal: 192.168.1.100:3260   # iSCSI target IP:port
        iqn: iqn.2023-01.com.example:storage.target.1   # iSCSI Qualified Name
        lun: 0                              # LUN number
        fsType: ext4
        readOnly: false
        chapAuthDiscovery: true             # enable CHAP for discovery
        chapAuthSession: true               # enable CHAP per session
        secretRef:
          name: chap-secret                 # Secret with discovery-user/discovery-password

    iSCSI volumes are RWO only — a single node can mount them read-write. Use for legacy SAN storage integration. For new deployments, prefer a CSI driver (e.g., iSCSI-based OpenEBS) which handles node failure, topology awareness, and monitoring.

    Generic Ephemeral Volumes

    Ephemeral

    A generic ephemeral volume embeds a PVC template directly in the pod spec. Kubernetes creates the PVC when the pod is created and garbage-collects it when the pod is deleted (via owner reference). This enables ephemeral use of any StorageClass — including cloud SSDs — without pre-creating PVCs.

    volumes:
    - name: scratch
      ephemeral:
        volumeClaimTemplate:
          metadata:
            labels:
              type: scratch-volume
          spec:
            accessModes: [ReadWriteOnce]
            storageClassName: gp3-encrypted
            resources:
              requests:
                storage: 50Gi

    The created PVC is named <pod-name>-<volume-name> (e.g., my-pod-scratch). It has an owner reference to the pod, so it is deleted automatically when the pod is deleted. If the pod is part of a ReplicaSet/Deployment, each replica gets its own PVC.

    ⚠️
    Scheduler must be capacity-aware Generic ephemeral volumes with WaitForFirstConsumer StorageClass work correctly — the scheduler accounts for the storage when placing the pod. With Immediate binding, the PVC may provision in the wrong zone. Use WaitForFirstConsumer for zonal storage.

    CSI Ephemeral Volumes

    Ephemeral

    An inline CSI volume — no PVC or PV objects are created. The CSI driver must declare volumeLifecycleModes: [Ephemeral] in its CSIDriver object. The most prominent use case is the Secrets Store CSI Driver, which mounts secrets from Vault, AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager as files.

    volumes:
    - name: secrets-store
      csi:
        driver: secrets-store.csi.k8s.io
        readOnly: true
        volumeAttributes:
          secretProviderClass: my-aws-secrets   # SecretProviderClass CRD reference
    apiVersion: secrets-store.csi.x-k8s.io/v1
    kind: SecretProviderClass
    metadata:
      name: my-aws-secrets
    spec:
      provider: aws
      parameters:
        objects: |
          - objectName: "prod/db/password"
            objectType: "secretsmanager"
            objectAlias: "db-password"
          - objectName: "prod/tls/cert"
            objectType: "secretsmanager"
            objectAlias: "tls.crt"

    The Secrets Store CSI Driver also supports syncing the mounted secret into a Kubernetes Secret object (for use as env vars or imagePullSecrets), configurable via secretObjects in the SecretProviderClass.

    Image Volumes (Alpha, 1.31+)

    Ephemeral

    Image volumes (alpha in 1.31, requires ImageVolume feature gate) mount an OCI container image as a read-only volume. Useful for distributing large, immutable datasets, ML models, or binary assets packaged as OCI images without including them in the application container image.

    volumes:
    - name: ml-model
      image:
        reference: registry.example.com/models/resnet50:v3
        pullPolicy: IfNotPresent

    The image is pulled by the container runtime and mounted read-only. No write access. The image's filesystem layers are overlaid exactly as they are in a container image (overlayfs), but exposed as a bind-mount into the pod.

    subPath and subPathExpr

    By default, a volumeMount mounts the entire volume at the target path. subPath mounts only a specific file or subdirectory from within the volume, while still mounting it at mountPath.

    Common subPath Use Cases

    Multiple containers, same volume, different subdirs

    volumes:
    - name: data
      emptyDir: {}
    
    containers:
    - name: app
      volumeMounts:
      - name: data
        mountPath: /app/data
        subPath: app        # mounts data/app/ → /app/data
    
    - name: sidecar
      volumeMounts:
      - name: data
        mountPath: /sidecar/data
        subPath: sidecar    # mounts data/sidecar/ → /sidecar/data

    Single ConfigMap key to specific file path

    volumes:
    - name: nginx-conf
      configMap:
        name: nginx-config
    
    containers:
    - name: nginx
      volumeMounts:
      - name: nginx-conf
        mountPath: /etc/nginx/nginx.conf
        subPath: nginx.conf  # mount only key nginx.conf
                             # NOTE: live-reload does NOT work

    subPathExpr

    subPathExpr uses environment variable expansion to build the subPath dynamically. Requires $(VAR_NAME) syntax — not shell $VAR.

    env:
    - name: POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log/pods
      subPathExpr: $(POD_NAME)    # each pod writes to its own subdirectory
    ⚠️
    subPath limitations
    • ConfigMap/Secret live-reload (atomic symlink swap) does not work with subPath. The file is fixed at pod creation.
    • You cannot use subPath with a projected volume's serviceAccountToken source.
    • subPathExpr requires the env var to be defined in the same container's env field (not just set in the environment).

    volumeMounts Fields

    volumeMounts:
    - name: data                     # must match a volume name in spec.volumes
      mountPath: /var/data           # absolute path inside container
      subPath: ""                    # optional: subdirectory within volume
      readOnly: false                # default false; true = read-only bind mount
      mountPropagation: None         # None | HostToContainer | Bidirectional

    mountPropagation

    Controls whether mount events (new bind mounts) inside the container or on the host are visible across the boundary:

    ValueContainer sees host mounts?Host sees container mounts?Use Case
    None (default)No (isolation)No99% of workloads — complete isolation
    HostToContainerYes — new mounts on host under this path are visibleNoMonitoring agents that need to see node-level mounts (e.g., cAdvisor reading /proc/mounts)
    BidirectionalYesYes — container mounts propagate to hostFUSE filesystems, CSI node plugins that mount on behalf of other pods. Requires privileged: true.
    🔴
    Bidirectional requires privileged Kubernetes will reject a Bidirectional mountPropagation unless the container has securityContext.privileged: true. Bidirectional propagation means the container can create mounts visible to the host — a significant privilege. Only use it for CSI node plugin DaemonSets that explicitly need it.

    Init Containers and Shared Volumes

    Init containers run to completion before any app containers start. They share the same pod volumes — the classic pattern is an init container that populates a volume, which the main container then reads from.

    Init Container Volume Patterns

    Pattern 1: git clone into emptyDir

    initContainers:
    - name: git-clone
      image: alpine/git:latest
      command: [git, clone, "https://github.com/org/app-config", /config]
      volumeMounts:
      - name: config-data
        mountPath: /config
    
    containers:
    - name: app
      image: myapp:latest
      volumeMounts:
      - name: config-data
        mountPath: /app/config
        readOnly: true
    
    volumes:
    - name: config-data
      emptyDir: {}

    Pattern 2: Wait for dependency + write signal file

    initContainers:
    - name: wait-for-db
      image: busybox
      command:
      - sh
      - -c
      - |
        until nc -z postgres-svc 5432; do
          echo "waiting for database..."; sleep 2
        done
        echo "DB ready" > /signal/ready
      volumeMounts:
      - name: signal
        mountPath: /signal
    
    containers:
    - name: app
      volumeMounts:
      - name: signal
        mountPath: /signal
        readOnly: true
    
    volumes:
    - name: signal
      emptyDir: {}

    Pattern 3: Certificate generation

    initContainers:
    - name: cert-gen
      image: cfssl/cfssl
      command: [/bin/sh, -c, "cfssl gencert ... | cfssljson -bare /certs/server"]
      volumeMounts:
      - name: certs
        mountPath: /certs
    
    containers:
    - name: app
      volumeMounts:
      - name: certs
        mountPath: /etc/ssl/app
        readOnly: true
    
    volumes:
    - name: certs
      emptyDir:
        medium: Memory    # certs in RAM — never hit disk

    Sidecar Containers and Shared Volumes

    Kubernetes 1.29 introduced native sidecar support via initContainers with restartPolicy: Always — sidecars start before main containers and stay running. The log-shipping pattern is the canonical use case:

    initContainers:
    - name: log-shipper           # native sidecar: restartPolicy: Always
      restartPolicy: Always
      image: fluent/fluent-bit:latest
      volumeMounts:
      - name: log-dir
        mountPath: /var/log/app
        readOnly: true
      - name: fluent-config
        mountPath: /fluent-bit/etc
    
    containers:
    - name: app
      image: myapp:latest
      volumeMounts:
      - name: log-dir
        mountPath: /var/log/app   # app writes here; sidecar reads from same dir
    
    volumes:
    - name: log-dir
      emptyDir: {}
    - name: fluent-config
      configMap:
        name: fluent-bit-config

    The sidecar starts before the main container (blocking until the sidecar's startup probe passes if configured), and is terminated after the main container exits — ensuring all logs are flushed before the sidecar exits.

    Volume Ownership: fsGroup and fsGroupChangePolicy

    fsGroup in the pod's securityContext sets the supplemental GID for the pod and chowns all files in mounted volumes to that GID on mount. This solves the common problem where a container running as a non-root user (UID 1000) can't write to a volume provisioned with root ownership.

    securityContext:
      runAsUser: 1000
      runAsGroup: 1000
      fsGroup: 2000               # all volume files are chowned to GID 2000
      fsGroupChangePolicy: OnRootMismatch   # default: Always (chown every mount)

    fsGroupChangePolicy

    PolicyBehaviorPerformance
    Always (default)Recursively chown all files on every mount — even if ownership is already correctSlow for large volumes (millions of files)
    OnRootMismatchOnly chown if the root directory's ownership/permissions don't match the expected fsGroupFast after first mount; recommended for large PVCs
    ⚠️
    fsGroup performance with large volumes A PostgreSQL database PVC with millions of files and fsGroupChangePolicy: Always will spend minutes chowning files on every pod restart. This blocks the container from starting and causes spurious CrashLoopBackOff-looking delays. Use OnRootMismatch for database pods, or set the fsGroup correctly at PVC creation time.

    supplementalGroups

    securityContext:
      fsGroup: 2000
      supplementalGroups: [3000, 4000]   # additional GIDs added to the process's group set

    Deprecated and Removed Volume Types

    Volume TypeStatusReplacement
    gitRepoRemoved (1.25+)Init container with git clone
    flockerRemoved (1.25+)CSI driver
    glusterfsRemoved (1.26+)CSI driver (glusterfs-csi)
    azureFile (in-tree)Removed (1.27+)file.csi.azure.com CSI driver
    azureDisk (in-tree)Removed (1.27+)disk.csi.azure.com CSI driver
    awsElasticBlockStore (in-tree)Removed (1.27+)ebs.csi.aws.com CSI driver
    gcePersistentDisk (in-tree)Removed (1.28+)pd.csi.storage.gke.io CSI driver
    cephfs / rbd (in-tree)Deprecated, target removal 1.31+cephfs.csi.ceph.com / rbd.csi.ceph.com
    portworxVolumeDeprecated 1.25Portworx CSI driver
    🔴
    In-tree removal affects existing PVs If you have existing PVs using removed in-tree drivers (e.g., awsElasticBlockStore) and upgrade past 1.27, the API server will reject those PV specs. You must migrate PVs to CSI before upgrading. Use the volume migration controller or manually reprovision.

    Volume Size Limits and Ephemeral Storage

    emptyDir volumes with sizeLimit set are evicted when usage exceeds the limit. Without sizeLimit, emptyDir is unbounded but counts against the node's ephemeral storage. A container's resources.limits.ephemeral-storage limit applies to the sum of the container's writable layer + log files + all emptyDir volumes the container uses.

    containers:
    - name: app
      resources:
        requests:
          ephemeral-storage: 1Gi
        limits:
          ephemeral-storage: 2Gi    # enforced by kubelet; evicts pod if exceeded
    
    volumes:
    - name: tmp
      emptyDir:
        sizeLimit: 500Mi    # subset of the container's ephemeral storage limit

    The kubelet checks ephemeral storage usage periodically (default 1 minute). On eviction, the pod is terminated with Reason: Evicted and Message: Pod ephemeral local storage usage exceeds the total limit of containers.

    Troubleshooting Runbooks

    Runbook: ConfigMap Volume Not Updating in Container

    # Verify the ConfigMap was actually updated
    kubectl get cm <name> -o yaml | grep -A 5 data
    
    # Check if subPath is in use — this blocks updates
    kubectl get pod <name> -o yaml | grep subPath
    # If subPath is present → updates won't propagate → must restart pod
    
    # If no subPath, check kubelet sync delay
    # Wait up to 2 minutes after ConfigMap update
    # Force a check by annotating the pod to trigger a rollout
    kubectl rollout restart deployment/<name>

    Runbook: Permission Denied on Volume Mount

    # Check what UID/GID the container runs as
    kubectl exec -it <pod> -- id
    # uid=1000(app) gid=1000(app)
    
    # Check volume file ownership
    kubectl exec -it <pod> -- ls -la /var/data
    # drwxr-xr-x 2 root root 4096 Jan 1 00:00 .  ← root-owned, GID 0
    
    # Fix: add fsGroup to pod securityContext
    # spec.securityContext.fsGroup: 1000
    # Then rolling restart
    
    # For PVs provisioned with specific UID: check CSI driver fsGroup support
    # fsGroupPolicy: File = chown by kubelet; None = driver handles it

    Runbook: emptyDir Memory Exhaustion (OOMKill)

    # Symptoms: container OOMKilled despite low heap usage
    # Cause: large writes to medium:Memory emptyDir counted in container memory
    
    # Check tmpfs mounts in pod
    kubectl exec -it <pod> -- df -h | grep tmpfs
    
    # Add sizeLimit to the emptyDir to cap memory usage
    # volumes:
    # - name: cache
    #   emptyDir:
    #     medium: Memory
    #     sizeLimit: 256Mi    # prevents this volume from eating container memory

    Runbook: Volume Stuck — Previous Pod's Mount Not Cleaned Up

    # Symptoms: new pod stuck in ContainerCreating with "already mounted" error
    # Cause: previous pod on same node crashed without unmounting volume
    
    # Check node events
    kubectl describe node <node> | grep -i mount
    
    # Force delete the stuck pod (use only if pod is truly gone from node)
    kubectl delete pod <old-pod> --force --grace-period=0
    
    # If VolumeAttachment is stuck (CSI block volumes)
    kubectl get volumeattachment
    kubectl delete volumeattachment <stuck-attachment>
    
    # If node is partitioned/unreachable, CSI drivers respect:
    # --node-drain-timeout / manual annotation: volume.kubernetes.io/selected-node

    Runbook: Wrong fsGroup — Files Not Owned by Expected GID

    # Verify pod securityContext
    kubectl get pod <name> -o jsonpath='{.spec.securityContext}'
    
    # Check if volume driver supports fsGroup
    kubectl get csidriver <driver> -o yaml | grep fsGroupPolicy
    # ReadWriteOnceWithFSType = only chown if fsType is set AND accessMode is RWO
    # File = always chown (most drivers)
    # None = kubelet does NOT chown — driver handles it (e.g., NFS with no-root-squash)
    
    # For NFS: fsGroup has no effect unless nfs driver is configured for root-squash-off
    # Set GID at NFS export level instead

    Best Practices

    1. Prefer projected volumes over separate configMap/secret/downwardAPI mounts when you need multiple sources — one volume, one directory, less cognitive overhead.
    2. Never use subPath for live-reloaded config. Mount the full directory and configure the application to read from it. Use inotify/fsnotify in the app to watch the directory, not individual files.
    3. Mark configuration ConfigMaps and Secrets immutable when they are versioned. Use the version in the name. This removes kubelet watch overhead and prevents accidental mutation.
    4. Use fsGroupChangePolicy: OnRootMismatch for any PVC with more than a few thousand files. The default Always causes startup delays proportional to file count.
    5. Avoid hostPath in application workloads. Enforce this with a Kyverno or OPA Gatekeeper policy that blocks hostPath except for designated DaemonSet namespaces.
    6. Set sizeLimit on emptyDir volumes used for caches or scratch space. An unbounded emptyDir used by a runaway process can evict the entire pod (and others on the node) via ephemeral storage pressure.
    7. Use native sidecar containers (1.29+) with restartPolicy: Always for log shippers and metric collectors instead of regular sidecars. They have correct startup/shutdown ordering, and pod termination blocks until the sidecar exits.