Volumes
Complete reference for every Kubernetes volume type — from ephemeral scratch directories to NFS network shares — with mount mechanics, subPath gotchas, init container patterns, and the full lifecycle of a volume from pod creation to deletion.
What This Page Covers
How Volumes Work
A Kubernetes volume is a named storage object declared at the pod level and mounted into one or more containers within that pod. Unlike a container's ephemeral writable layer (which disappears on container restart), a volume's lifecycle is tied to the pod — it persists across container restarts but is cleaned up when the pod is deleted (for ephemeral volumes) or detached (for persistent volumes).
POD
├── spec.volumes[] ← named volume declarations (pod-scoped)
│ ├── name: config ← referenced by containers
│ ├── name: data
│ └── name: tmp
│
└── spec.containers[]
└── container
└── volumeMounts[] ← bind declared volumes into container paths
├── name: config mountPath: /etc/app
├── name: data mountPath: /var/data
└── name: tmp mountPath: /tmp
Volume lifecycle (kubelet perspective):
Pod scheduled → NodeStageVolume (format/mount to staging path if block)
→ NodePublishVolume (bind-mount into pod directory)
→ containers start with /proc/mounts showing the volume
Pod deleted → containers stop
→ NodeUnpublishVolume (unmount from pod dir)
→ NodeUnstageVolume (unmount from staging dir)
→ volume reclaimed (ephemeral: deleted; persistent: detached)
Volumes are not container-scoped. Two containers in the same pod sharing a volume see exactly the same bytes — writes by one are immediately visible to the other.
emptyDir
EphemeralAn empty directory created by kubelet when the pod is assigned to a node. All containers in the pod can read and write to it. Deleted when the pod is removed from the node (not just when a container crashes — the volume survives container restarts).
volumes:
- name: cache
emptyDir:
sizeLimit: 512Mi # optional: enforced via ephemeral storage eviction
medium: "" # "" = node disk (default); "Memory" = tmpfs (RAM)
containers:
- name: app
volumeMounts:
- name: cache
mountPath: /var/cache/app
Memory-backed emptyDir
Setting medium: Memory mounts a tmpfs filesystem. Data lives in RAM — extremely fast but counts against the container's memory limit:
volumes:
- name: shared-mem
emptyDir:
medium: Memory
sizeLimit: 256Mi # prevents this emptyDir from using more than 256Mi of RAM
Common emptyDir Patterns
| Pattern | Description | Example |
|---|---|---|
| Build cache | Compiler/tool cache shared across build steps | Maven ~/.m2, pip cache |
| Log sharing | App writes logs; sidecar reads and ships to Fluentd/Loki | App → /var/log ← Fluent Bit sidecar |
| Scratch space | Temp files during data processing (avoids container layer writes) | ETL job staging area |
| IPC socket | App and sidecar communicate via Unix socket on shared emptyDir | Envoy uses /tmp/agent.sock |
| Init → main handoff | Init container writes config/data; main container consumes | git clone → app reads code |
configMap Volume
EphemeralMounts ConfigMap keys as files inside the container. The files are updated automatically when the ConfigMap changes (within the kubelet sync period, typically 1–2 minutes).
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
app.yaml: |
port: 8080
log_level: info
nginx.conf: |
worker_processes auto;
---
volumes:
- name: config
configMap:
name: app-config
defaultMode: 0644 # file permissions (octal)
items: # optional: project only specific keys
- key: app.yaml
path: config/app.yaml # path within the mountPath directory
mode: 0640 # per-file override
optional: false # if true, pod starts even if ConfigMap missing
Without items, every key becomes a file at the root of mountPath. The directory listing mirrors the ConfigMap's data keys exactly.
Update Propagation
Kubelet syncs ConfigMap volume content on a period governed by --sync-frequency (default 1 minute) plus an additional jitter. The update path:
- ConfigMap updated in etcd via API server
- Kubelet's reflector detects the change (watch event)
- Kubelet writes new content to a temporary directory (atomic rename)
- The
..datasymlink in the volume is atomically swapped to point to the new directory - Application reads the updated files (if it re-reads on change — inotify works on the symlink target)
subPath to mount a single key as a specific file path (e.g., /etc/nginx/nginx.conf), the atomic symlink swap does not apply — kubelet writes the file directly. Updates will not be propagated to a running container. The file is fixed at pod start time. Use a full directory mount and configure your app to read from the directory, or restart the pod on ConfigMap change.secret Volume
EphemeralMounts Secret keys as files. Identical to configMap volumes in mechanics, with two important differences: the files are backed by a tmpfs mount on the node (data never written to disk), and the default permission mode is 0644 (you should lower this to 0400 for credentials).
volumes:
- name: tls-certs
secret:
secretName: my-tls-secret
defaultMode: 0400 # read-only for owner; recommended for credentials
items:
- key: tls.crt
path: server.crt
- key: tls.key
path: server.key
optional: false
Immutable Secrets and ConfigMaps
Setting immutable: true on a Secret or ConfigMap prevents any updates to its data. Kubelet stops watching immutable objects — removing the watch overhead for large clusters with many ConfigMaps/Secrets.
apiVersion: v1
kind: Secret
metadata:
name: static-tls-creds
immutable: true # cannot be changed after creation; must delete and recreate
stringData:
api-key: "supersecret"
app-config-v3) and mark it immutable. Update the Deployment to reference the new name. This gives you a clear history, prevents accidental mutation, and reduces kubelet overhead. Rollback = point Deployment back to the previous name.downwardAPI Volume
EphemeralExposes pod and container metadata as files. The same information available via environment variable downward API, but as files — suitable for larger values like labels and annotations that can exceed env var size limits.
volumes:
- name: pod-info
downwardAPI:
defaultMode: 0444
items:
- path: pod-name
fieldRef:
fieldPath: metadata.name
- path: pod-namespace
fieldRef:
fieldPath: metadata.namespace
- path: pod-ip
fieldRef:
fieldPath: status.podIP
- path: node-name
fieldRef:
fieldPath: spec.nodeName
- path: labels
fieldRef:
fieldPath: metadata.labels # all labels as key="value"\n pairs
- path: annotations
fieldRef:
fieldPath: metadata.annotations
- path: cpu-limit
resourceFieldRef:
containerName: app
resource: limits.cpu
divisor: "1m" # express in millicores
- path: mem-request
resourceFieldRef:
containerName: app
resource: requests.memory
divisor: "1Mi"
Available fieldRef Fields
| fieldPath | Value | Live Updates? |
|---|---|---|
metadata.name | Pod name | No |
metadata.namespace | Pod namespace | No |
metadata.uid | Pod UID | No |
metadata.labels | All labels as key="value" pairs, one per line | Yes — kubelet updates file on label change |
metadata.annotations | All annotations, same format | Yes |
spec.nodeName | Node the pod is scheduled on | No |
spec.serviceAccountName | Service account name | No |
status.podIP | Primary pod IP | Yes |
status.hostIP | Node IP | No |
projected Volume
EphemeralA projected volume combines multiple sources — configMap, secret, downwardAPI, and serviceAccountToken — into a single directory mount. All sources appear as files at the same mount point.
volumes:
- name: combined
projected:
defaultMode: 0444
sources:
- configMap:
name: app-config
items:
- key: app.yaml
path: config/app.yaml
- secret:
name: db-creds
items:
- key: password
path: secrets/db-password
mode: 0400
- downwardAPI:
items:
- path: meta/pod-name
fieldRef:
fieldPath: metadata.name
- serviceAccountToken:
audience: api # audience for the token (OIDC aud claim)
expirationSeconds: 3600 # token rotated automatically by kubelet
path: token/sa-token
serviceAccountToken in projected Volume
This is the standard mechanism for injecting service account tokens into pods (replacing the legacy auto-mounted /var/run/secrets/kubernetes.io/serviceaccount/token). The token is a bound service account token (audience + expiry) and is rotated by kubelet before expiry without restarting the pod. The kubelet fetches a fresh token via the TokenRequest API and atomically replaces the file.
volumes:
- name: kube-api-access
projected:
sources:
- serviceAccountToken:
expirationSeconds: 3607 # kubelet rotates at 80% of expiry
path: token
- configMap:
name: kube-root-ca.crt # cluster CA bundle
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
fieldPath: metadata.namespace
secret-based token mount (auto-created kubernetes.io/service-account-token Secret) has no expiry. The projected serviceAccountToken is audience-bound and expires — a compromised token is only valid for <1 hour rather than indefinitely. Use automountServiceAccountToken: false + an explicit projected volume when you need token control.hostPath
EphemeralMounts a file or directory from the host node's filesystem directly into the container. Powerful but dangerous — misuse allows container escape to the node.
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock
type: Socket
- name: host-logs
hostPath:
path: /var/log/pods
type: Directory
- name: created-dir
hostPath:
path: /mnt/fast-ssd/myapp
type: DirectoryOrCreate # creates the directory if it doesn't exist
hostPath Type Options
| type | Behavior |
|---|---|
"" (empty) | No check — mount whatever exists (or nothing) at the path |
DirectoryOrCreate | Create directory with 0755 if it doesn't exist; fail if path is a file |
Directory | Path must exist and be a directory; fail otherwise |
FileOrCreate | Create empty file with 0644 if it doesn't exist; fail if path is a directory |
File | Path must exist and be a regular file |
Socket | Path must exist and be a Unix socket |
CharDevice | Path must be a character device |
BlockDevice | Path must be a block device |
• Mounting
/, /etc, or /var/run/docker.sock gives the container full node access.
• hostPath volumes bypass Kubernetes storage quotas and LimitRanges entirely.
• Pod scheduled to a different node sees a different (possibly missing) path — workloads are not portable.
• Restrict with PodSecurity admission (Restricted profile blocks hostPath) or OPA/Kyverno policies.
Legitimate hostPath Use Cases
- DaemonSets collecting node-level metrics (
/proc,/sys,/var/log) - Container runtime socket for container management tools (Falco, Portainer, node-level CRI tools)
- Node-local storage benchmarking tools
- CNI plugin configuration on
/etc/cni/net.d
NFS Volume
NetworkMounts an NFS export directly into the pod. No CSI driver required — the NFS client is built into the Linux kernel. Supports RWX (ReadWriteMany) natively.
volumes:
- name: nfs-share
nfs:
server: nfs-server.prod.svc.cluster.local # NFS server hostname or IP
path: /exports/shared-data # exported path on server
readOnly: false
nfs volume type does not expose mount options. For production NFS with custom options (nfsvers=4.1,rsize=1048576,hard,timeo=600), use a StorageClass backed by the NFS Subdir External Provisioner or a CSI NFS driver, which pass mount options through mountOptions on the StorageClass.NFS Production Considerations
- UID/GID mapping: NFS uses UID/GID for file ownership. If the container runs as UID 1000 but the NFS export expects UID 2000, permission errors occur. Use
fsGroupor configure NFSall_squash/anonuid. - NFSv4 vs v3: NFSv4 has stateful connections and simpler firewall rules (single port 2049). NFSv3 uses multiple ports. Default is v4 on modern systems.
- Hard vs soft mounts: Hard mounts (default) retry indefinitely on server failure — pods hang but don't corrupt. Soft mounts return EIO on timeout — pods crash but recover quickly.
- Connection from pods: The in-tree NFS volume connects from the node, not the pod. Firewall rules must allow node IPs to reach the NFS server on port 2049.
iSCSI Volume
Networkvolumes:
- name: iscsi-vol
iscsi:
targetPortal: 192.168.1.100:3260 # iSCSI target IP:port
iqn: iqn.2023-01.com.example:storage.target.1 # iSCSI Qualified Name
lun: 0 # LUN number
fsType: ext4
readOnly: false
chapAuthDiscovery: true # enable CHAP for discovery
chapAuthSession: true # enable CHAP per session
secretRef:
name: chap-secret # Secret with discovery-user/discovery-password
iSCSI volumes are RWO only — a single node can mount them read-write. Use for legacy SAN storage integration. For new deployments, prefer a CSI driver (e.g., iSCSI-based OpenEBS) which handles node failure, topology awareness, and monitoring.
Generic Ephemeral Volumes
EphemeralA generic ephemeral volume embeds a PVC template directly in the pod spec. Kubernetes creates the PVC when the pod is created and garbage-collects it when the pod is deleted (via owner reference). This enables ephemeral use of any StorageClass — including cloud SSDs — without pre-creating PVCs.
volumes:
- name: scratch
ephemeral:
volumeClaimTemplate:
metadata:
labels:
type: scratch-volume
spec:
accessModes: [ReadWriteOnce]
storageClassName: gp3-encrypted
resources:
requests:
storage: 50Gi
The created PVC is named <pod-name>-<volume-name> (e.g., my-pod-scratch). It has an owner reference to the pod, so it is deleted automatically when the pod is deleted. If the pod is part of a ReplicaSet/Deployment, each replica gets its own PVC.
CSI Ephemeral Volumes
EphemeralAn inline CSI volume — no PVC or PV objects are created. The CSI driver must declare volumeLifecycleModes: [Ephemeral] in its CSIDriver object. The most prominent use case is the Secrets Store CSI Driver, which mounts secrets from Vault, AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager as files.
volumes:
- name: secrets-store
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: my-aws-secrets # SecretProviderClass CRD reference
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: my-aws-secrets
spec:
provider: aws
parameters:
objects: |
- objectName: "prod/db/password"
objectType: "secretsmanager"
objectAlias: "db-password"
- objectName: "prod/tls/cert"
objectType: "secretsmanager"
objectAlias: "tls.crt"
The Secrets Store CSI Driver also supports syncing the mounted secret into a Kubernetes Secret object (for use as env vars or imagePullSecrets), configurable via secretObjects in the SecretProviderClass.
Image Volumes (Alpha, 1.31+)
EphemeralImage volumes (alpha in 1.31, requires ImageVolume feature gate) mount an OCI container image as a read-only volume. Useful for distributing large, immutable datasets, ML models, or binary assets packaged as OCI images without including them in the application container image.
volumes:
- name: ml-model
image:
reference: registry.example.com/models/resnet50:v3
pullPolicy: IfNotPresent
The image is pulled by the container runtime and mounted read-only. No write access. The image's filesystem layers are overlaid exactly as they are in a container image (overlayfs), but exposed as a bind-mount into the pod.
subPath and subPathExpr
By default, a volumeMount mounts the entire volume at the target path. subPath mounts only a specific file or subdirectory from within the volume, while still mounting it at mountPath.
Common subPath Use Cases
Multiple containers, same volume, different subdirs
volumes:
- name: data
emptyDir: {}
containers:
- name: app
volumeMounts:
- name: data
mountPath: /app/data
subPath: app # mounts data/app/ → /app/data
- name: sidecar
volumeMounts:
- name: data
mountPath: /sidecar/data
subPath: sidecar # mounts data/sidecar/ → /sidecar/data
Single ConfigMap key to specific file path
volumes:
- name: nginx-conf
configMap:
name: nginx-config
containers:
- name: nginx
volumeMounts:
- name: nginx-conf
mountPath: /etc/nginx/nginx.conf
subPath: nginx.conf # mount only key nginx.conf
# NOTE: live-reload does NOT work
subPathExpr
subPathExpr uses environment variable expansion to build the subPath dynamically. Requires $(VAR_NAME) syntax — not shell $VAR.
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- name: shared-logs
mountPath: /var/log/pods
subPathExpr: $(POD_NAME) # each pod writes to its own subdirectory
• ConfigMap/Secret live-reload (atomic symlink swap) does not work with subPath. The file is fixed at pod creation.
• You cannot use subPath with a projected volume's serviceAccountToken source.
• subPathExpr requires the env var to be defined in the same container's
env field (not just set in the environment).
volumeMounts Fields
volumeMounts:
- name: data # must match a volume name in spec.volumes
mountPath: /var/data # absolute path inside container
subPath: "" # optional: subdirectory within volume
readOnly: false # default false; true = read-only bind mount
mountPropagation: None # None | HostToContainer | Bidirectional
mountPropagation
Controls whether mount events (new bind mounts) inside the container or on the host are visible across the boundary:
| Value | Container sees host mounts? | Host sees container mounts? | Use Case |
|---|---|---|---|
None (default) | No (isolation) | No | 99% of workloads — complete isolation |
HostToContainer | Yes — new mounts on host under this path are visible | No | Monitoring agents that need to see node-level mounts (e.g., cAdvisor reading /proc/mounts) |
Bidirectional | Yes | Yes — container mounts propagate to host | FUSE filesystems, CSI node plugins that mount on behalf of other pods. Requires privileged: true. |
Bidirectional mountPropagation unless the container has securityContext.privileged: true. Bidirectional propagation means the container can create mounts visible to the host — a significant privilege. Only use it for CSI node plugin DaemonSets that explicitly need it.Init Containers and Shared Volumes
Init containers run to completion before any app containers start. They share the same pod volumes — the classic pattern is an init container that populates a volume, which the main container then reads from.
Init Container Volume Patterns
Pattern 1: git clone into emptyDir
initContainers:
- name: git-clone
image: alpine/git:latest
command: [git, clone, "https://github.com/org/app-config", /config]
volumeMounts:
- name: config-data
mountPath: /config
containers:
- name: app
image: myapp:latest
volumeMounts:
- name: config-data
mountPath: /app/config
readOnly: true
volumes:
- name: config-data
emptyDir: {}
Pattern 2: Wait for dependency + write signal file
initContainers:
- name: wait-for-db
image: busybox
command:
- sh
- -c
- |
until nc -z postgres-svc 5432; do
echo "waiting for database..."; sleep 2
done
echo "DB ready" > /signal/ready
volumeMounts:
- name: signal
mountPath: /signal
containers:
- name: app
volumeMounts:
- name: signal
mountPath: /signal
readOnly: true
volumes:
- name: signal
emptyDir: {}
Pattern 3: Certificate generation
initContainers:
- name: cert-gen
image: cfssl/cfssl
command: [/bin/sh, -c, "cfssl gencert ... | cfssljson -bare /certs/server"]
volumeMounts:
- name: certs
mountPath: /certs
containers:
- name: app
volumeMounts:
- name: certs
mountPath: /etc/ssl/app
readOnly: true
volumes:
- name: certs
emptyDir:
medium: Memory # certs in RAM — never hit disk
Sidecar Containers and Shared Volumes
Kubernetes 1.29 introduced native sidecar support via initContainers with restartPolicy: Always — sidecars start before main containers and stay running. The log-shipping pattern is the canonical use case:
initContainers:
- name: log-shipper # native sidecar: restartPolicy: Always
restartPolicy: Always
image: fluent/fluent-bit:latest
volumeMounts:
- name: log-dir
mountPath: /var/log/app
readOnly: true
- name: fluent-config
mountPath: /fluent-bit/etc
containers:
- name: app
image: myapp:latest
volumeMounts:
- name: log-dir
mountPath: /var/log/app # app writes here; sidecar reads from same dir
volumes:
- name: log-dir
emptyDir: {}
- name: fluent-config
configMap:
name: fluent-bit-config
The sidecar starts before the main container (blocking until the sidecar's startup probe passes if configured), and is terminated after the main container exits — ensuring all logs are flushed before the sidecar exits.
Volume Ownership: fsGroup and fsGroupChangePolicy
fsGroup in the pod's securityContext sets the supplemental GID for the pod and chowns all files in mounted volumes to that GID on mount. This solves the common problem where a container running as a non-root user (UID 1000) can't write to a volume provisioned with root ownership.
securityContext:
runAsUser: 1000
runAsGroup: 1000
fsGroup: 2000 # all volume files are chowned to GID 2000
fsGroupChangePolicy: OnRootMismatch # default: Always (chown every mount)
fsGroupChangePolicy
| Policy | Behavior | Performance |
|---|---|---|
Always (default) | Recursively chown all files on every mount — even if ownership is already correct | Slow for large volumes (millions of files) |
OnRootMismatch | Only chown if the root directory's ownership/permissions don't match the expected fsGroup | Fast after first mount; recommended for large PVCs |
fsGroupChangePolicy: Always will spend minutes chowning files on every pod restart. This blocks the container from starting and causes spurious CrashLoopBackOff-looking delays. Use OnRootMismatch for database pods, or set the fsGroup correctly at PVC creation time.supplementalGroups
securityContext:
fsGroup: 2000
supplementalGroups: [3000, 4000] # additional GIDs added to the process's group set
Deprecated and Removed Volume Types
| Volume Type | Status | Replacement |
|---|---|---|
gitRepo | Removed (1.25+) | Init container with git clone |
flocker | Removed (1.25+) | CSI driver |
glusterfs | Removed (1.26+) | CSI driver (glusterfs-csi) |
azureFile (in-tree) | Removed (1.27+) | file.csi.azure.com CSI driver |
azureDisk (in-tree) | Removed (1.27+) | disk.csi.azure.com CSI driver |
awsElasticBlockStore (in-tree) | Removed (1.27+) | ebs.csi.aws.com CSI driver |
gcePersistentDisk (in-tree) | Removed (1.28+) | pd.csi.storage.gke.io CSI driver |
cephfs / rbd (in-tree) | Deprecated, target removal 1.31+ | cephfs.csi.ceph.com / rbd.csi.ceph.com |
portworxVolume | Deprecated 1.25 | Portworx CSI driver |
awsElasticBlockStore) and upgrade past 1.27, the API server will reject those PV specs. You must migrate PVs to CSI before upgrading. Use the volume migration controller or manually reprovision.Volume Size Limits and Ephemeral Storage
emptyDir volumes with sizeLimit set are evicted when usage exceeds the limit. Without sizeLimit, emptyDir is unbounded but counts against the node's ephemeral storage. A container's resources.limits.ephemeral-storage limit applies to the sum of the container's writable layer + log files + all emptyDir volumes the container uses.
containers:
- name: app
resources:
requests:
ephemeral-storage: 1Gi
limits:
ephemeral-storage: 2Gi # enforced by kubelet; evicts pod if exceeded
volumes:
- name: tmp
emptyDir:
sizeLimit: 500Mi # subset of the container's ephemeral storage limit
The kubelet checks ephemeral storage usage periodically (default 1 minute). On eviction, the pod is terminated with Reason: Evicted and Message: Pod ephemeral local storage usage exceeds the total limit of containers.
Troubleshooting Runbooks
Runbook: ConfigMap Volume Not Updating in Container
# Verify the ConfigMap was actually updated
kubectl get cm <name> -o yaml | grep -A 5 data
# Check if subPath is in use — this blocks updates
kubectl get pod <name> -o yaml | grep subPath
# If subPath is present → updates won't propagate → must restart pod
# If no subPath, check kubelet sync delay
# Wait up to 2 minutes after ConfigMap update
# Force a check by annotating the pod to trigger a rollout
kubectl rollout restart deployment/<name>
Runbook: Permission Denied on Volume Mount
# Check what UID/GID the container runs as
kubectl exec -it <pod> -- id
# uid=1000(app) gid=1000(app)
# Check volume file ownership
kubectl exec -it <pod> -- ls -la /var/data
# drwxr-xr-x 2 root root 4096 Jan 1 00:00 . ← root-owned, GID 0
# Fix: add fsGroup to pod securityContext
# spec.securityContext.fsGroup: 1000
# Then rolling restart
# For PVs provisioned with specific UID: check CSI driver fsGroup support
# fsGroupPolicy: File = chown by kubelet; None = driver handles it
Runbook: emptyDir Memory Exhaustion (OOMKill)
# Symptoms: container OOMKilled despite low heap usage
# Cause: large writes to medium:Memory emptyDir counted in container memory
# Check tmpfs mounts in pod
kubectl exec -it <pod> -- df -h | grep tmpfs
# Add sizeLimit to the emptyDir to cap memory usage
# volumes:
# - name: cache
# emptyDir:
# medium: Memory
# sizeLimit: 256Mi # prevents this volume from eating container memory
Runbook: Volume Stuck — Previous Pod's Mount Not Cleaned Up
# Symptoms: new pod stuck in ContainerCreating with "already mounted" error
# Cause: previous pod on same node crashed without unmounting volume
# Check node events
kubectl describe node <node> | grep -i mount
# Force delete the stuck pod (use only if pod is truly gone from node)
kubectl delete pod <old-pod> --force --grace-period=0
# If VolumeAttachment is stuck (CSI block volumes)
kubectl get volumeattachment
kubectl delete volumeattachment <stuck-attachment>
# If node is partitioned/unreachable, CSI drivers respect:
# --node-drain-timeout / manual annotation: volume.kubernetes.io/selected-node
Runbook: Wrong fsGroup — Files Not Owned by Expected GID
# Verify pod securityContext
kubectl get pod <name> -o jsonpath='{.spec.securityContext}'
# Check if volume driver supports fsGroup
kubectl get csidriver <driver> -o yaml | grep fsGroupPolicy
# ReadWriteOnceWithFSType = only chown if fsType is set AND accessMode is RWO
# File = always chown (most drivers)
# None = kubelet does NOT chown — driver handles it (e.g., NFS with no-root-squash)
# For NFS: fsGroup has no effect unless nfs driver is configured for root-squash-off
# Set GID at NFS export level instead
Best Practices
- Prefer projected volumes over separate configMap/secret/downwardAPI mounts when you need multiple sources — one volume, one directory, less cognitive overhead.
- Never use subPath for live-reloaded config. Mount the full directory and configure the application to read from it. Use
inotify/fsnotifyin the app to watch the directory, not individual files. - Mark configuration ConfigMaps and Secrets immutable when they are versioned. Use the version in the name. This removes kubelet watch overhead and prevents accidental mutation.
- Use
fsGroupChangePolicy: OnRootMismatchfor any PVC with more than a few thousand files. The defaultAlwayscauses startup delays proportional to file count. - Avoid hostPath in application workloads. Enforce this with a Kyverno or OPA Gatekeeper policy that blocks hostPath except for designated DaemonSet namespaces.
- Set
sizeLimiton emptyDir volumes used for caches or scratch space. An unbounded emptyDir used by a runaway process can evict the entire pod (and others on the node) via ephemeral storage pressure. - Use native sidecar containers (1.29+) with
restartPolicy: Alwaysfor log shippers and metric collectors instead of regular sidecars. They have correct startup/shutdown ordering, and pod termination blocks until the sidecar exits.