Volume Attach Flow
Overview
Traces the detailed attach/detach controller flow — from a pod being scheduled to a node, through the VolumeAttachment lifecycle, ControllerPublishVolume, and NodeStageVolume, to the volume being ready for the kubelet to bind-mount into the pod.
Attach-Detach Controller Architecture
kube-controller-manager
└── AttachDetach controller (AD controller)
│
│ Responsibility:
│ - Watches pods (which volumes they need)
│ - Watches nodes (which volumes are attached)
│ - Creates/deletes VolumeAttachment objects
│ - Calls ControllerPublishVolume / ControllerUnpublishVolume via CSI
│
├── Desired state: all pods on node X need volumes [v1, v2]
└── Actual state: node X has [v1] attached
→ attach v2
In-tree volumes (e.g., awsElasticBlockStore) were managed directly by AD controller.
CSI volumes: AD controller creates VolumeAttachment → external-attacher watches and calls CSI.
Full Volume Attach Sequence
API Server AD Controller external-attacher CSI Controller kubelet etcd
│ │ │ │ │ │
│ [Pod scheduled to worker-3, │ │ │ │
│ references PVC backed by │ │ │ │
│ PV vol-0abc123] │ │ │ │
│ │ │ │ │ │
│──WATCH (pods) ─►│ │ │ │ │
│ pod.spec.nodeName=worker-3 │ │ │ │
│ pod.spec.volumes │ │ │ │
│ │ │ │ │ │
│ ┌─── AD Controller reconcile ──────────────────────────────────────┐ │
│ │ Desired: worker-3 needs vol-0abc123 │ │
│ │ Actual: worker-3 has [] (nothing attached) │ │
│ │ Action: create VolumeAttachment │ │
│ └─────────────────────────────────────────────────────────────── │ │
│◄── CREATE VolumeAttachment ─────│ │ │ │
│ spec: │ │ │ │
│ attacher: ebs.csi.aws.com │ │ │ │
│ source.persistentVolumeName: pv-vol-0abc123 │ │ │
│ nodeName: worker-3 │ │ │ │
│ status.attached: false │ │ │ │
│──WRITE VA ──────────────────────────────────────────────────────────────────►│
│ │ │ │ │ │
│──WATCH VA ───────────────────────► │ │ │
│ (VolumeAttachment created, │ │ │ │
│ attached=false) │ │ │ │
│ │ │ │ │ │
│ │ ┌─── external-attacher reconcile ──────────────────┐ │
│ │ │ VA.spec.attacher = ebs.csi.aws.com │ │
│ │ │ VA.status.attached = false │ │
│ │ │ Action: call ControllerPublishVolume │ │
│ │ └─────────────────────────────────────────────────┘ │
│ │ │──ControllerPublishVolume──────► │
│ │ │ volumeId: vol-0abc123 │ │
│ │ │ nodeId: i-0worker3aws │ │
│ │ │ volumeCapability: rw-once │ │
│ │ │ │──EC2 API─►│ │
│ │ │ │ AttachVolume │
│ │ │ │ vol-0abc123 │
│ │ │ │ to i-0worker3 │
│ │ │◄── {attached:true}──────────── │
│ │ │ publishContext: │ │
│ │ │ devicePath: /dev/xvdba │ │
│ │ │ │ │
│◄── PATCH VA.status ──────────────│ │ │ │
│ .attached: true │ │ │ │
│ .attachmentMetadata: │ │ │ │
│ devicePath: /dev/xvdba │ │ │ │
│──WRITE VA ──────────────────────────────────────────────────────────────────►│
│ │ │ │ │ │
│──WATCH VA.status.attached=true ──────────────────────────────────► │
│ │ │ │ [kubelet sees VA attached]
│ │ │ │ │ │
│ │ │ │ ┌─── kubelet volume manager ─┐
│ │ │ │ │ VA.attached=true │
│ │ │ │ │ PV devicePath=/dev/xvdba │
│ │ │ │ │ Call NodeStageVolume │
│ │ │ │ └────────────────────────────┘
│ │ │ │ │──NodeStageVolume──►
│ │ │ │ │ (format ext4 + │
│ │ │ │ │ mount to staging │
│ │ │ │ │ path) │
│ │ │ │ │◄── staged ──────────│
│ │ │ │ │ │
│ │ │ │ │──NodePublishVolume──►
│ │ │ │ │ (bind-mount staging│
│ │ │ │ │ → pod /data) │
│ │ │ │ │◄── published ───────│
│ │ │ │ │ │
│ [kubelet starts pod containers]
pod running with /data mounted from EBS vol-0abc123
VolumeAttachment Object Lifecycle
# VolumeAttachment — created by AD controller, updated by external-attacher
apiVersion: storage.k8s.io/v1
kind: VolumeAttachment
metadata:
name: csi-abc123def456
# cluster-scoped — no namespace
spec:
attacher: ebs.csi.aws.com # CSI driver name
source:
persistentVolumeName: pv-vol-0abc # which PV to attach
nodeName: worker-3 # target node
status:
attached: true # set by external-attacher after CSI call
attachmentMetadata:
devicePath: /dev/xvdba # device path on node (from publishContext)
detachError: null
attachError: null
Detach Sequence — Pod Deleted
API Server AD Controller external-attacher CSI Controller etcd
│ │ │ │ │
│ [Pod deleted / node drain] │ │ │
│──WATCH (pod deleted) ──────────► │ │ │
│ │ │ │ │
│ ┌─── AD Controller reconcile ─────────────────────────────┐ │
│ │ Desired: worker-3 needs [] (pod gone) │ │
│ │ Actual: worker-3 has [vol-0abc123] │ │
│ │ Action: delete VolumeAttachment │ │
│ └──────────────────────────────────────────────────────────┘ │
│◄── DELETE VA ────│ │ │ │
│ │ │ │ │
│ [VA deletion sets deletionTimestamp — finalizer blocks it] │
│ │ │ │ │
│──WATCH VA (terminating) ──────────► │ │
│ │ │ │ │
│ │ ┌─── external-attacher reconcile ───────┐ │
│ │ │ VA has deletionTimestamp │ │
│ │ │ VA.status.attached = true │ │
│ │ │ Action: ControllerUnpublishVolume │ │
│ │ └────────────────────────────────────────┘ │
│ │ │──ControllerUnpublishVolume─────► │
│ │ │ volumeId: vol-0abc123 │
│ │ │ nodeId: i-0worker3 │
│ │ │ │──EC2 Detach──►
│ │ │◄── OK ─────────────│ │
│◄── PATCH VA: remove finalizer ────│ │ │
│──DELETE VA ───────────────────────────────────────────────────────────►│
│ │ │ │ │
│ volume detached from node
AD Controller vs kubelet Volume Manager
AttachDetach Controller (runs in kube-controller-manager):
- Manages ATTACH and DETACH (cloud-level: EC2 AttachVolume / DetachVolume)
- Runs once per cluster — not per node
- Creates VolumeAttachment objects
kubelet Volume Manager (runs on each node):
- Manages MOUNT and UNMOUNT (node-level: mkfs, mount, bind-mount)
- NodeStageVolume: format filesystem, mount to staging path
- NodePublishVolume: bind-mount staging path into pod directory
- NodeUnstageVolume / NodeUnpublishVolume: cleanup on pod delete
Flag: --disable-attach-detach-reconciler
Set on kubelet to delegate attach/detach to the central AD controller.
Default: central AD controller handles it (recommended).
If unset on kubelet: kubelet does its own attach/detach (not recommended for CSI).
Multi-Attach Protection (RWO)
EBS volumes are ReadWriteOnce (RWO) — can be attached to only ONE node.
Race condition:
Pod on node-A has vol-abc mounted.
Pod migrated to node-B.
node-B tries to attach vol-abc → AWS API returns error: already attached.
Protection mechanisms:
1. AD controller watches: if VA for vol-abc on node-A → wait for detach before new attach
2. external-attacher: VolumeAttachment for node-B waits until node-A VA is gone
3. volumeAttachLimit: prevents too many volumes per node (AWS: 28 EBS max per instance)
Common stuck scenario:
Node-A crashes (not gracefully drained)
→ Pod moved to node-B by NodeLifecycleController
→ AD creates VA for node-B
→ external-attacher: vol-abc still attached to crashed node-A
→ Multi-Attach error — cannot force detach while node appears registered
Fix:
kubectl delete node node-a ← removes node from API server
→ AD controller sees no node-A → deletes VA for node-A
→ external-attacher can now attach to node-B
Debugging Volume Attach Issues
# Check VolumeAttachment status
kubectl get volumeattachment -o wide
kubectl describe volumeattachment <name>
# Look for: attachError, detachError, status.attached
# Multi-Attach error — find which node has the volume
kubectl get volumeattachment -o json | jq \
'.items[] | select(.spec.source.persistentVolumeName=="pv-name") |
{node:.spec.nodeName, attached:.status.attached}'
# Check external-attacher logs
kubectl logs -n kube-system \
$(kubectl get pod -n kube-system -l app=ebs-csi-controller \
-o jsonpath='{.items[0].metadata.name}') \
-c csi-attacher --tail=100
# Check CSI controller plugin logs (actual AWS API calls)
kubectl logs -n kube-system \
$(kubectl get pod -n kube-system -l app=ebs-csi-controller \
-o jsonpath='{.items[0].metadata.name}') \
-c ebs-plugin --tail=100
# Verify EBS volume attachment state from AWS
PV_HANDLE=$(kubectl get pv <pv-name> -o jsonpath='{.spec.csi.volumeHandle}')
aws ec2 describe-volumes --volume-ids $PV_HANDLE \
--query 'Volumes[0].{State:State,Attachments:Attachments[*].{Instance:InstanceId,State:State}}'
# Pod stuck ContainerCreating — check kubelet
kubectl describe pod <pod-name> -n <ns>
# Events: Unable to attach or mount volumes: ... waiting for a volume to finish attaching
# Force-detach stuck VolumeAttachment (last resort)
kubectl delete volumeattachment <va-name>
# This removes the finalizer via external-attacher if VA is stuck
# Or patch to remove finalizer directly:
kubectl patch volumeattachment <va-name> \
-p '{"metadata":{"finalizers":[]}}' --type=merge
Volume Attach Timing
kubectl apply pod
→ Scheduler assigns node: ~100ms
→ AD controller creates VolumeAttachment: ~500ms
→ external-attacher calls ControllerPublish: ~1s
→ AWS API AttachVolume: 5-30s (cloud latency)
→ VA.status.attached=true: ~1s after AWS confirms
→ kubelet NodeStageVolume (format+mount): 2-10s (format ~2s for new vol)
→ NodePublishVolume: ~100ms
→ Pod container started: ~1-3s
Total: typically 15-45s from pod scheduled to running (EBS)
Faster for pre-warmed volumes already attached.
NVMe (io2 Block Express):
Attach time: 5-15s (faster than gp3)
Mount: ~500ms
EFS (NFS-based):
No attach needed — mount directly in NodePublishVolume
~1-2s to mount NFS
Related
- 07 — CSI Flow — PVC creation to pod mount (full CSI lifecycle)
- 06 — Storage Operations — stuck volume operations playbook
- 04 — CSI Drivers — CSI driver architecture reference