Overview

Traces the detailed attach/detach controller flow — from a pod being scheduled to a node, through the VolumeAttachment lifecycle, ControllerPublishVolume, and NodeStageVolume, to the volume being ready for the kubelet to bind-mount into the pod.

Attach-Detach Controller Architecture

kube-controller-manager
  └── AttachDetach controller (AD controller)
        │
        │  Responsibility:
        │  - Watches pods (which volumes they need)
        │  - Watches nodes (which volumes are attached)
        │  - Creates/deletes VolumeAttachment objects
        │  - Calls ControllerPublishVolume / ControllerUnpublishVolume via CSI
        │
        ├── Desired state: all pods on node X need volumes [v1, v2]
        └── Actual state:  node X has [v1] attached
            → attach v2

In-tree volumes (e.g., awsElasticBlockStore) were managed directly by AD controller.
CSI volumes: AD controller creates VolumeAttachment → external-attacher watches and calls CSI.

Full Volume Attach Sequence

API Server      AD Controller   external-attacher   CSI Controller    kubelet     etcd
    │                │                 │                  │               │          │
    │  [Pod scheduled to worker-3,     │                  │               │          │
    │   references PVC backed by       │                  │               │          │
    │   PV vol-0abc123]                │                  │               │          │
    │                │                 │                  │               │          │
    │──WATCH (pods) ─►│                │                  │               │          │
    │  pod.spec.nodeName=worker-3      │                  │               │          │
    │  pod.spec.volumes                │                  │               │          │
    │                │                 │                  │               │          │
    │         ┌─── AD Controller reconcile ──────────────────────────────────────┐  │
    │         │  Desired: worker-3 needs vol-0abc123                            │  │
    │         │  Actual:  worker-3 has [] (nothing attached)                    │  │
    │         │  Action:  create VolumeAttachment                               │  │
    │         └───────────────────────────────────────────────────────────────  │  │
    │◄── CREATE VolumeAttachment ─────│                  │               │          │
    │  spec:                          │                  │               │          │
    │    attacher: ebs.csi.aws.com    │                  │               │          │
    │    source.persistentVolumeName: pv-vol-0abc123     │               │          │
    │    nodeName: worker-3           │                  │               │          │
    │    status.attached: false       │                  │               │          │
    │──WRITE VA ──────────────────────────────────────────────────────────────────►│
    │                │                 │                  │               │          │
    │──WATCH VA ───────────────────────►                  │               │          │
    │  (VolumeAttachment created,      │                  │               │          │
    │   attached=false)                │                  │               │          │
    │                │                 │                  │               │          │
    │                │         ┌─── external-attacher reconcile ──────────────────┐ │
    │                │         │  VA.spec.attacher = ebs.csi.aws.com              │ │
    │                │         │  VA.status.attached = false                      │ │
    │                │         │  Action: call ControllerPublishVolume            │ │
    │                │         └─────────────────────────────────────────────────┘ │
    │                │                 │──ControllerPublishVolume──────►           │
    │                │                 │  volumeId: vol-0abc123        │           │
    │                │                 │  nodeId: i-0worker3aws        │           │
    │                │                 │  volumeCapability: rw-once    │           │
    │                │                 │                   │──EC2 API─►│           │
    │                │                 │                   │ AttachVolume           │
    │                │                 │                   │ vol-0abc123            │
    │                │                 │                   │ to i-0worker3          │
    │                │                 │◄── {attached:true}────────────            │
    │                │                 │  publishContext:               │           │
    │                │                 │    devicePath: /dev/xvdba      │           │
    │                │                 │                                │           │
    │◄── PATCH VA.status ──────────────│                  │             │           │
    │    .attached: true               │                  │             │           │
    │    .attachmentMetadata:          │                  │             │           │
    │      devicePath: /dev/xvdba      │                  │             │           │
    │──WRITE VA ──────────────────────────────────────────────────────────────────►│
    │                │                 │                  │             │           │
    │──WATCH VA.status.attached=true ──────────────────────────────────►           │
    │                │                 │                  │    [kubelet sees VA attached]
    │                │                 │                  │             │           │
    │                │                 │                  │   ┌─── kubelet volume manager ─┐
    │                │                 │                  │   │  VA.attached=true          │
    │                │                 │                  │   │  PV devicePath=/dev/xvdba  │
    │                │                 │                  │   │  Call NodeStageVolume      │
    │                │                 │                  │   └────────────────────────────┘
    │                │                 │                  │             │──NodeStageVolume──►
    │                │                 │                  │             │  (format ext4 +    │
    │                │                 │                  │             │   mount to staging  │
    │                │                 │                  │             │   path)             │
    │                │                 │                  │             │◄── staged ──────────│
    │                │                 │                  │             │           │
    │                │                 │                  │             │──NodePublishVolume──►
    │                │                 │                  │             │  (bind-mount staging│
    │                │                 │                  │             │   → pod /data)      │
    │                │                 │                  │             │◄── published ───────│
    │                │                 │                  │             │           │
    │                                                             [kubelet starts pod containers]
    pod running with /data mounted from EBS vol-0abc123

VolumeAttachment Object Lifecycle

# VolumeAttachment — created by AD controller, updated by external-attacher
apiVersion: storage.k8s.io/v1
kind: VolumeAttachment
metadata:
  name: csi-abc123def456
  # cluster-scoped — no namespace
spec:
  attacher: ebs.csi.aws.com          # CSI driver name
  source:
    persistentVolumeName: pv-vol-0abc  # which PV to attach
  nodeName: worker-3                  # target node
status:
  attached: true                      # set by external-attacher after CSI call
  attachmentMetadata:
    devicePath: /dev/xvdba            # device path on node (from publishContext)
  detachError: null
  attachError: null

Detach Sequence — Pod Deleted

API Server       AD Controller    external-attacher    CSI Controller    etcd
    │                 │                 │                   │              │
    │  [Pod deleted / node drain]       │                   │              │
    │──WATCH (pod deleted) ──────────►  │                   │              │
    │                 │                 │                   │              │
    │         ┌─── AD Controller reconcile ─────────────────────────────┐ │
    │         │  Desired: worker-3 needs [] (pod gone)                   │ │
    │         │  Actual:  worker-3 has [vol-0abc123]                     │ │
    │         │  Action:  delete VolumeAttachment                        │ │
    │         └──────────────────────────────────────────────────────────┘ │
    │◄── DELETE VA ────│                │                   │              │
    │                  │                │                   │              │
    │         [VA deletion sets deletionTimestamp — finalizer blocks it]   │
    │                  │                │                   │              │
    │──WATCH VA (terminating) ──────────►                   │              │
    │                  │                │                   │              │
    │                  │         ┌─── external-attacher reconcile ───────┐ │
    │                  │         │  VA has deletionTimestamp              │ │
    │                  │         │  VA.status.attached = true             │ │
    │                  │         │  Action: ControllerUnpublishVolume     │ │
    │                  │         └────────────────────────────────────────┘ │
    │                  │                │──ControllerUnpublishVolume─────►  │
    │                  │                │  volumeId: vol-0abc123            │
    │                  │                │  nodeId: i-0worker3               │
    │                  │                │                    │──EC2 Detach──►
    │                  │                │◄── OK ─────────────│              │
    │◄── PATCH VA: remove finalizer ────│                    │              │
    │──DELETE VA ───────────────────────────────────────────────────────────►│
    │                  │                │                    │              │
    │                  volume detached from node

AD Controller vs kubelet Volume Manager

AttachDetach Controller (runs in kube-controller-manager):
  - Manages ATTACH and DETACH (cloud-level: EC2 AttachVolume / DetachVolume)
  - Runs once per cluster — not per node
  - Creates VolumeAttachment objects

kubelet Volume Manager (runs on each node):
  - Manages MOUNT and UNMOUNT (node-level: mkfs, mount, bind-mount)
  - NodeStageVolume: format filesystem, mount to staging path
  - NodePublishVolume: bind-mount staging path into pod directory
  - NodeUnstageVolume / NodeUnpublishVolume: cleanup on pod delete

Flag: --disable-attach-detach-reconciler
  Set on kubelet to delegate attach/detach to the central AD controller.
  Default: central AD controller handles it (recommended).
  If unset on kubelet: kubelet does its own attach/detach (not recommended for CSI).

Multi-Attach Protection (RWO)

EBS volumes are ReadWriteOnce (RWO) — can be attached to only ONE node.

Race condition:
  Pod on node-A has vol-abc mounted.
  Pod migrated to node-B.
  node-B tries to attach vol-abc → AWS API returns error: already attached.

Protection mechanisms:
  1. AD controller watches: if VA for vol-abc on node-A → wait for detach before new attach
  2. external-attacher: VolumeAttachment for node-B waits until node-A VA is gone
  3. volumeAttachLimit: prevents too many volumes per node (AWS: 28 EBS max per instance)

Common stuck scenario:
  Node-A crashes (not gracefully drained)
  → Pod moved to node-B by NodeLifecycleController
  → AD creates VA for node-B
  → external-attacher: vol-abc still attached to crashed node-A
  → Multi-Attach error — cannot force detach while node appears registered

Fix:
  kubectl delete node node-a   ← removes node from API server
  → AD controller sees no node-A → deletes VA for node-A
  → external-attacher can now attach to node-B

Debugging Volume Attach Issues

# Check VolumeAttachment status
kubectl get volumeattachment -o wide
kubectl describe volumeattachment <name>
# Look for: attachError, detachError, status.attached

# Multi-Attach error — find which node has the volume
kubectl get volumeattachment -o json | jq \
  '.items[] | select(.spec.source.persistentVolumeName=="pv-name") |
  {node:.spec.nodeName, attached:.status.attached}'

# Check external-attacher logs
kubectl logs -n kube-system \
  $(kubectl get pod -n kube-system -l app=ebs-csi-controller \
    -o jsonpath='{.items[0].metadata.name}') \
  -c csi-attacher --tail=100

# Check CSI controller plugin logs (actual AWS API calls)
kubectl logs -n kube-system \
  $(kubectl get pod -n kube-system -l app=ebs-csi-controller \
    -o jsonpath='{.items[0].metadata.name}') \
  -c ebs-plugin --tail=100

# Verify EBS volume attachment state from AWS
PV_HANDLE=$(kubectl get pv <pv-name> -o jsonpath='{.spec.csi.volumeHandle}')
aws ec2 describe-volumes --volume-ids $PV_HANDLE \
  --query 'Volumes[0].{State:State,Attachments:Attachments[*].{Instance:InstanceId,State:State}}'

# Pod stuck ContainerCreating — check kubelet
kubectl describe pod <pod-name> -n <ns>
# Events: Unable to attach or mount volumes: ... waiting for a volume to finish attaching

# Force-detach stuck VolumeAttachment (last resort)
kubectl delete volumeattachment <va-name>
# This removes the finalizer via external-attacher if VA is stuck
# Or patch to remove finalizer directly:
kubectl patch volumeattachment <va-name> \
  -p '{"metadata":{"finalizers":[]}}' --type=merge

Volume Attach Timing

kubectl apply pod
  → Scheduler assigns node:                     ~100ms
  → AD controller creates VolumeAttachment:     ~500ms
  → external-attacher calls ControllerPublish:  ~1s
  → AWS API AttachVolume:                        5-30s (cloud latency)
  → VA.status.attached=true:                    ~1s after AWS confirms
  → kubelet NodeStageVolume (format+mount):      2-10s (format ~2s for new vol)
  → NodePublishVolume:                           ~100ms
  → Pod container started:                      ~1-3s

Total: typically 15-45s from pod scheduled to running (EBS)
       Faster for pre-warmed volumes already attached.

NVMe (io2 Block Express):
  Attach time: 5-15s (faster than gp3)
  Mount: ~500ms

EFS (NFS-based):
  No attach needed — mount directly in NodePublishVolume
  ~1-2s to mount NFS