CSI Drivers

The Container Storage Interface in depth — the gRPC API contract between Kubernetes and storage backends, the sidecar architecture, every RPC call in the provision/attach/mount/resize/snapshot lifecycle, the CSIDriver object, volume health monitoring, driver upgrade strategy, and a minimal driver implementation walkthrough.

Section 04 of 13 File 5 of 8 Platform Engineer

What This Page Covers

Why CSI exists — in-tree plugin problems; CSI spec history (v0.1→v1.0→v1.6)

CSI architecture — controller plugin (Deployment) vs node plugin (DaemonSet); gRPC Unix socket communication

External sidecar responsibilities — external-provisioner, external-attacher, external-resizer, external-snapshotter, node-driver-registrar, liveness-probe, external-health-monitor

CSIDriver object — all spec fields: attachRequired, podInfoOnMount, volumeLifecycleModes, fsGroupPolicy, tokenRequests, requiresRepublish, seLinuxMount

Identity service RPCs — GetPluginInfo, GetPluginCapabilities, Probe

Controller service RPCs — CreateVolume (with topology, secrets, volume content source), DeleteVolume, ControllerPublishVolume, ControllerUnpublishVolume, ValidateVolumeCapabilities, ListVolumes, GetCapacity, CreateSnapshot, DeleteSnapshot, ListSnapshots, ControllerExpandVolume, ControllerGetVolume

Node service RPCs — NodeStageVolume (global mount / format), NodePublishVolume (bind-mount into pod), NodeUnpublishVolume, NodeUnstageVolume, NodeGetCapabilities, NodeGetInfo, NodeExpandVolume, NodeGetVolumeStats

NodeStage vs NodePublish distinction — staging path, bind mount, global device management

Volume access modes in CSI — SINGLE_NODE_WRITER, SINGLE_NODE_READER_ONLY, MULTI_NODE_READER_ONLY, MULTI_NODE_SINGLE_WRITER, MULTI_NODE_MULTI_WRITER

Topology — CreateVolume topology requirements/preferences; accessible topology response; scheduler interaction

Secrets in CSI — per-RPC secret references; StorageClass csi.storage.k8s.io/* parameter keys

Volume content source — clone from volume, restore from snapshot in CreateVolume

CSI ephemeral volumes — volumeLifecycleModes:Ephemeral; NodePublishVolume inline flow

Volume health monitoring — external-health-monitor-controller + node sidecar; VolumeCondition; NodeGetVolumeStats health fields

Pod Info on Mount — podInfoOnMount: true; kubelet injects pod UID/name/namespace into NodePublishVolume context

SELinux mount — seLinuxMount: true; CSI-level SELinux label propagation (1.27+)

Driver deployment patterns — Helm chart structure; RBAC requirements for each sidecar; leader election for controller HA

Driver upgrade strategy — rolling upgrade of DaemonSet (node plugin) and Deployment (controller); impact on running pods; zero-downtime approach

Minimal CSI driver skeleton — Go implementation outline; required RPCs for a basic block driver; registration flow

Secrets Store CSI Driver — SecretProviderClass CRD; provider implementations (AWS, Azure, GCP, Vault); sync to K8s Secret; token rotation

Node volume limits — max-volumes-per-node; AWS instance type limits; scheduler MaxAttachLimit plugin

CSI migration — in-tree plugin to CSI; feature gate timeline; migrated-to annotation; rollback constraints

5 metrics + 4 alerting rules + 5 troubleshooting runbooks

8 best practices

Why CSI Exists

Before CSI, storage drivers were compiled into the Kubernetes binary as in-tree plugins. Adding or updating a driver required a Kubernetes release, which took months. Bug fixes in a storage driver couldn't ship independently of Kubernetes. This created three problems:

Release coupling — storage vendors blocked on Kubernetes release cadence
Security surface — all storage code ran in kube-controller-manager and kubelet, with full node privileges
Maintenance burden — the core team had to maintain drivers for dozens of storage backends

CSI (Container Storage Interface) decouples storage drivers from Kubernetes. Drivers run as ordinary pods with only the permissions they need. They communicate with Kubernetes via a gRPC interface over a Unix socket. Kubernetes 1.13 declared CSI GA; the last major in-tree drivers were removed in 1.27–1.28.

CSI Architecture

┌─────────────────────────────────────────────────────────────────────┐
│  KUBERNETES CONTROL PLANE                                           │
│  kube-controller-manager:                                           │
│    AttachDetach controller  ──────────────────────────────┐        │
│    PV bind controller       ──────────────────────────────┤        │
│  kube-scheduler                                            │        │
└────────────────────────────────────────────────────────────┼────────┘
                                                             │ watches K8s API
┌─────────────────────────────────────────────────────── ───┼────────┐
│  CSI CONTROLLER PLUGIN (Deployment, 1-3 replicas)          │        │
│                                                             │        │
│  ┌──────────────────────┐   ┌───────────────────────────────┤       │
│  │  Your CSI Driver     │   │  Kubernetes Sidecar Containers │       │
│  │  (controller plugin) │◄──│                               │       │
│  │                      │   │  external-provisioner  ───────┘       │
│  │  gRPC Unix socket:   │   │  external-attacher                    │
│  │  /csi/csi.sock       │   │  external-resizer                     │
│  │                      │   │  external-snapshotter                 │
│  │  Implements:         │   │  liveness-probe                       │
│  │  Identity service    │   └───────────────────────────────────────┘
│  │  Controller service  │
│  └──────────────────────┘
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│  CSI NODE PLUGIN (DaemonSet — one pod per node)                     │
│                                                                     │
│  ┌──────────────────────┐   ┌───────────────────────────────────┐   │
│  │  Your CSI Driver     │   │  Kubernetes Sidecar Containers    │   │
│  │  (node plugin)       │◄──│                                   │   │
│  │                      │   │  node-driver-registrar            │   │
│  │  gRPC Unix socket:   │   │    (registers with kubelet at     │   │
│  │  /csi/csi.sock       │   │     /var/lib/kubelet/plugins_      │   │
│  │                      │   │     registry/)                    │   │
│  │  Implements:         │   │  liveness-probe                   │   │
│  │  Identity service    │   │  external-health-monitor-agent    │   │
│  │  Node service        │   └───────────────────────────────────┘   │
│  └──────────────────────┘                                           │
│                     ▲                                               │
│                     │ kubelet calls via gRPC Unix socket            │
└─────────────────────────────────────────────────────────────────────┘

The split into controller and node plugins is deliberate. The controller plugin runs anywhere in the cluster (usually on control plane nodes via node selector) and makes API calls to the storage backend (cloud APIs, Ceph, etc.). The node plugin runs on every worker node where volumes may be mounted, and performs the low-level format/mount/unmount operations that require direct disk access.

External Sidecar Responsibilities

Kubernetes ships a set of standard sidecar containers that bridge the Kubernetes API to CSI gRPC calls. You never write these — your driver implements the gRPC interface, and the sidecars translate Kubernetes events into gRPC calls.

Sidecar	Watches	Calls CSI	Required
`external-provisioner`	PVC with matching StorageClass	CreateVolume / DeleteVolume	Yes (dynamic provisioning)
`external-attacher`	VolumeAttachment objects	ControllerPublishVolume / ControllerUnpublishVolume	If attachRequired: true
`external-resizer`	PVC resize requests (status conditions)	ControllerExpandVolume	If expansion supported
`external-snapshotter`	VolumeSnapshot objects	CreateSnapshot / DeleteSnapshot / ListSnapshots	If snapshots supported
`node-driver-registrar`	—	GetPluginInfo (to register socket path with kubelet)	Yes (node plugin)
`liveness-probe`	—	Probe (health check; exposes /healthz HTTP endpoint)	Recommended
`external-health-monitor-controller`	PV objects	ControllerGetVolume / ListVolumes	Optional (volume health)
`external-health-monitor-agent`	Node-local volumes	NodeGetVolumeStats (health fields)	Optional (volume health)

ℹ️

Leader election for HA controller When the controller Deployment has replicas > 1, each sidecar must have leader election enabled (--leader-election flag). Only the elected leader processes events. Without leader election, multiple replicas calling CreateVolume simultaneously would create duplicate volumes. The sidecars use a Lease object for leader election.

CSIDriver Object

The CSIDriver cluster-scoped object declares a driver's capabilities to Kubernetes. It is created by the driver deployment (usually via Helm), not dynamically registered.

apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
  name: ebs.csi.aws.com     # MUST match the provisioner name in StorageClass
spec:
  # ── Attach behavior ───────────────────────────────────────────
  attachRequired: true       # true: driver manages attach/detach (block drivers)
                             # false: no VolumeAttachment created (NFS, CephFS)

  # ── Pod metadata injection ────────────────────────────────────
  podInfoOnMount: true       # kubelet passes pod UID/name/namespace in NodePublishVolume
                             # enables per-pod volume access tracking

  # ── Volume lifecycle modes ────────────────────────────────────
  volumeLifecycleModes:
    - Persistent             # supports PV/PVC (standard)
    # - Ephemeral            # supports inline CSI ephemeral volumes (no PVC)

  # ── fsGroup handling ──────────────────────────────────────────
  fsGroupPolicy: File        # None | File | ReadWriteOnceWithFSType
                             # File: kubelet always chowns mounted volume to fsGroup
                             # None: driver handles fsGroup itself (e.g., NFS squash)
                             # ReadWriteOnceWithFSType: chown only when fsType set + RWO

  # ── Token injection for cloud auth ────────────────────────────
  tokenRequests:
    - audience: sts.amazonaws.com      # AWS IRSA audience
      expirationSeconds: 86400         # token TTL
    # - audience: ""                   # default K8s service account token

  # ── Secrets Store CSI or drivers that re-publish volumes ──────
  requiresRepublish: false   # true: kubelet calls NodePublishVolume periodically
                             # used by Secrets Store CSI to refresh secret values

  # ── SELinux optimization (GA 1.27) ────────────────────────────
  seLinuxMount: true         # driver can mount with SELinux label directly
                             # avoids recursive relabeling of volume files

fsGroupPolicy Values

Value	Behavior	Use For
`File`	kubelet recursively chowns all files to fsGroup on every mount	Block CSI drivers (EBS, Azure Disk, GCE PD)
`None`	kubelet does not chown; driver is responsible	NFS, CephFS — server-side uid/gid management
`ReadWriteOnceWithFSType`	chown only when fsType is set AND accessMode is RWO	Drivers that support both block and file

Identity Service RPCs

Every CSI driver must implement the Identity service — it is called during driver registration and health checks.

RPC	Called By	Purpose
`GetPluginInfo`	node-driver-registrar, sidecars	Returns driver name and version. Name must match CSIDriver object name.
`GetPluginCapabilities`	All sidecars	Returns which services the plugin supports: CONTROLLER_SERVICE, VOLUME_ACCESSIBILITY_CONSTRAINTS (topology), ONLINE expansion
`Probe`	liveness-probe sidecar (periodic)	Returns driver readiness. Used for liveness/readiness probes. Return NOT_READY if not yet initialized.

Controller Service RPCs

The controller plugin runs in a Deployment and calls cloud/storage APIs. Not all RPCs are required — declare which you implement via ControllerGetCapabilities.

CreateVolume

Called by external-provisioner when a PVC needs a new volume. The most complex RPC — it must handle topology, secrets, cloning, and snapshot restore.

// CreateVolumeRequest (gRPC, simplified Go-like pseudocode)
{
  Name: "pvc-abc-123",              // unique name; idempotency key
  CapacityRange: {
    RequiredBytes: 20 * 1024^3,     // 20 GiB minimum
    LimitBytes:    0,               // no upper limit
  },
  VolumeCapabilities: [{
    AccessMode: { Mode: SINGLE_NODE_WRITER },
    Mount: { FsType: "ext4", MountFlags: ["noatime"] },
  }],
  Parameters: {                     // from StorageClass.parameters
    "type": "gp3",
    "encrypted": "true",
  },
  Secrets: { ... },                 // from csi.storage.k8s.io/provisioner-secret-name
  AccessibilityRequirements: {      // from scheduler topology hint
    Requisite: [{ Segments: {"topology.ebs.csi.aws.com/zone": "us-east-1a"} }],
    Preferred: [{ Segments: {"topology.ebs.csi.aws.com/zone": "us-east-1a"} }],
  },
  VolumeContentSource: {            // optional: clone or snapshot restore
    Type: &Snapshot{ SnapshotId: "snap-0abc123" },
    // OR
    Type: &Volume{ VolumeId: "vol-0source" },
  },
}

// CreateVolumeResponse
{
  Volume: {
    VolumeId: "vol-0abc123def456",     // opaque to K8s; stored in PV.spec.csi.volumeHandle
    CapacityBytes: 20 * 1024^3,
    VolumeContext: { "throughput": "250" },  // stored in PV.spec.csi.volumeAttributes
    AccessibleTopology: [{ Segments: {"topology.ebs.csi.aws.com/zone": "us-east-1a"} }],
  }
}

ControllerPublishVolume / ControllerUnpublishVolume

Called by external-attacher when a VolumeAttachment object is created/deleted. For cloud block storage, this translates to attaching/detaching the block device to the VM instance.

// ControllerPublishVolumeRequest
{
  VolumeId: "vol-0abc123def456",    // EBS volume ID
  NodeId: "i-0abc123def456789",     // EC2 instance ID (from NodeGetInfo)
  VolumeCapability: { ... },
  Readonly: false,
}
// Response contains PublishContext, e.g.: {"devicePath": "/dev/xvdf"}
// This devicePath is passed to NodeStageVolume

ControllerExpandVolume

Called by external-resizer when PVC storage is increased. Resizes the backing cloud volume. Returns the new capacity and whether NodeExpansionRequired is true (meaning the filesystem on the node also needs resizing).

CreateSnapshot / DeleteSnapshot

Called by external-snapshotter. Parameters come from the VolumeSnapshotClass. The driver must handle idempotency — if a snapshot with the same name already exists, return it rather than failing.

Node Service RPCs

The node plugin runs as a DaemonSet. Kubelet calls it directly via the Unix socket registered at /var/lib/kubelet/plugins/<driver-name>/csi.sock.

NodeStageVolume vs NodePublishVolume

This is the most subtle distinction in CSI. The two-phase mount exists to support multiple pods sharing the same block device on a single node:

BLOCK VOLUME MOUNT FLOW (two-phase)

Cloud storage backend
  vol-0abc123 attached to node as /dev/xvdf
        │
        ▼
NodeStageVolume (called once per volume per node)
  stagingTargetPath: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-abc/globalmount
  Operations:
    - fsck (filesystem check)
    - mkfs.ext4 /dev/xvdf  (if new volume)
    - mount /dev/xvdf /var/lib/kubelet/.../globalmount
  Result: device formatted and mounted at a GLOBAL path on the node
        │
        ▼ (for each pod using this volume)
NodePublishVolume (called once per pod per volume)
  targetPath: /var/lib/kubelet/pods/<pod-uid>/volumes/kubernetes.io~csi/<vol-name>/mount
  Operations:
    - mount --bind /globalmount /var/lib/kubelet/pods/.../mount
  Result: bind-mount from global mount into the specific pod's directory
        │
        ▼
Container sees /var/data (mountPath) backed by the bind-mounted filesystem

UNMOUNT FLOW (reverse):
  Pod terminates → NodeUnpublishVolume (remove bind-mount from pod dir)
  Last pod using volume → NodeUnstageVolume (unmount global mount, potentially format check)
  VolumeAttachment deleted → ControllerUnpublishVolume (detach from node)

For NFS/CephFS (filesystem volumes without attachment), attachRequired: false in the CSIDriver object, and the driver may skip NodeStage entirely — mounting directly in NodePublish.

NodeExpandVolume

Called by kubelet after ControllerExpandVolume completes and the pod remounts the volume. Runs the filesystem-level resize command:

// NodeExpandVolumeRequest
{
  VolumeId: "vol-0abc123",
  VolumePath: "/var/lib/kubelet/pods/.../mount",   // the pod's mountPath
  StagingTargetPath: "/var/lib/.../globalmount",
  CapacityRange: { RequiredBytes: 50 * 1024^3 },   // 50 GiB target
  VolumeCapability: { Mount: { FsType: "ext4" } },
}

// Driver implementation:
// 1. Find the block device from stagingTargetPath
// 2. Run: resize2fs /dev/xvdf  (for ext4)
//    Or:  xfs_growfs /var/lib/.../globalmount  (for xfs — uses mount path, not device)
// 3. Return new capacity

NodeGetInfo

Returns the node's unique ID (used in ControllerPublishVolume) and its topology segments. The topology information is used by the scheduler to place pods near their volumes.

// NodeGetInfoResponse
{
  NodeId: "i-0abc123def456789",           // instance ID for ControllerPublishVolume
  MaxVolumesPerNode: 25,                   // scheduler enforces this limit
  AccessibleTopology: {
    Segments: {
      "topology.ebs.csi.aws.com/zone": "us-east-1a"
    }
  }
}

NodeGetVolumeStats

Returns usage statistics for a mounted volume. Kubelet calls this periodically to populate kubelet_volume_stats_* metrics and to check volume health conditions.

// NodeGetVolumeStatsResponse
{
  Usage: [
    { Available: 15 * 1024^3, Total: 20 * 1024^3, Used: 5 * 1024^3, Unit: BYTES },
    { Available: 6500000, Total: 6553600, Used: 53600, Unit: INODES },
  ],
  VolumeCondition: {           // volume health monitoring (optional)
    Abnormal: false,
    Message: "",
    // Abnormal: true when driver detects I/O errors, corruption, etc.
  }
}

Volume Health Monitoring

CSI volume health (GA 1.21) provides a mechanism for drivers to report that a volume is unhealthy. The health signal bubbles up as a Kubernetes Event on the PVC and Pod objects.

Health Monitoring Components

Component	Where	Checks
`external-health-monitor-controller`	Controller Deployment sidecar	Calls ControllerGetVolume / ListVolumes periodically; detects cloud-side anomalies (corrupted snapshot, degraded volume)
`external-health-monitor-agent`	Node DaemonSet sidecar	Calls NodeGetVolumeStats on mounted volumes; checks VolumeCondition.Abnormal

When a volume is detected as unhormal, the controller emits a Kubernetes Event on the PVC:

kubectl describe pvc data-postgres-0 -n production
# Events:
#   Type     Reason               Age   From                       Message
#   ----     ------               ---   ----                       -------
#   Warning  VolumeConditionAbnormal  2m  external-health-monitor  Volume condition is abnormal: I/O error detected

Pod Info on Mount

When podInfoOnMount: true is set in the CSIDriver, kubelet injects the requesting pod's metadata into NodePublishVolume's VolumeContext. This enables drivers to implement per-pod access control, logging, or billing.

// NodePublishVolumeRequest.VolumeContext when podInfoOnMount:true
{
  "csi.storage.k8s.io/pod.name":            "postgres-0",
  "csi.storage.k8s.io/pod.namespace":       "production",
  "csi.storage.k8s.io/pod.uid":             "abc-123-def",
  "csi.storage.k8s.io/serviceAccount.name": "postgres",
  // plus any volumeAttributes from the PV spec
}

The Secrets Store CSI Driver uses this to look up the requesting pod's service account and assume the appropriate cloud IAM role for secret retrieval — without any cluster-wide credentials in the driver.

Secrets Store CSI Driver

The most widely-deployed CSI ephemeral driver. It mounts secrets from external secret stores (AWS Secrets Manager / Parameter Store, Azure Key Vault, GCP Secret Manager, HashiCorp Vault) as files in pods — without the secrets ever being stored in Kubernetes etcd.

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: aws-secrets
  namespace: production
spec:
  provider: aws
  parameters:
    objects: |
      - objectName: "prod/db/password"
        objectType: "secretsmanager"
        objectAlias: "db-password"        # filename in the pod
      - objectName: "/prod/api-key"
        objectType: "ssmparameter"
        objectAlias: "api-key"
  secretObjects:                          # optional: sync to K8s Secret for env var use
  - secretName: db-secret
    type: Opaque
    data:
    - objectName: db-password
      key: password

volumes:
- name: secrets
  csi:
    driver: secrets-store.csi.k8s.io
    readOnly: true
    volumeAttributes:
      secretProviderClass: aws-secrets    # reference to SecretProviderClass

containers:
- name: app
  volumeMounts:
  - name: secrets
    mountPath: /mnt/secrets
    readOnly: true
  env:
  - name: DB_PASSWORD                    # also available as env var via synced K8s Secret
    valueFrom:
      secretKeyRef:
        name: db-secret
        key: password

Token Rotation with requiresRepublish

When the CSIDriver has requiresRepublish: true, kubelet calls NodePublishVolume periodically (default 1 minute). The Secrets Store driver uses this to re-fetch secrets from the external store and update the files — enabling seamless secret rotation without pod restart.

Node Volume Limits

Each node can attach a limited number of block volumes. The CSI driver reports this via NodeGetInfo.MaxVolumesPerNode, and the kube-scheduler's NodeVolumeLimits plugin enforces it.

Cloud / Instance	Max Volumes	Notes
AWS (most instance types)	25–39 (type-dependent)	EBS CSI reads from instance metadata; c5.4xlarge = 25, i3.metal = 39
AWS (nitro instances, NVMe)	28	Separate limit for NVMe instance store volumes
GCE (most machine types)	16	pd.csi.storage.gke.io reports 16 by default
Azure (Standard_D series)	4–64 (SKU-dependent)	disk.csi.azure.com reads from instance metadata

# Check node attachment capacity
kubectl describe node <name> | grep -A 5 "Allocatable"
# attachable-volumes-aws-ebs: 25    ← scheduler enforces this

# Check current usage on a node
kubectl describe node <name> | grep "attachable-volumes" -A 20

# Identify pods that could exhaust volume limits
kubectl get pods --all-namespaces -o json | \
  jq -r '.items[] | select(.spec.nodeName=="<node>") |
  "\(.metadata.namespace)/\(.metadata.name): \(.spec.volumes | length) volumes"' | sort -t: -k2 -rn

Driver Deployment Patterns

A typical CSI driver Helm chart deploys two workloads and several supporting objects:

# Typical CSI driver RBAC requirements

# Controller ServiceAccount needs:
# - Secrets: get (for StorageClass secrets)
# - PersistentVolumes: get, list, watch, create, delete, patch
# - PersistentVolumeClaims: get, list, watch, update (for resize status)
# - StorageClasses: get, list, watch
# - VolumeAttachments: get, list, watch, patch
# - CSINodes: get, list, watch
# - VolumeSnapshots / VolumeSnapshotContents: get, list, watch, create, delete, patch
# - Events: create, patch
# - Leases: get, create, update (for leader election)

# Node ServiceAccount needs:
# - Secrets: get (for node-stage secrets)
# - Pods: get (when podInfoOnMount: true)
# - PersistentVolumeClaims: get, list, watch
# - CSINodes: get, create, update, patch
# - Nodes: get, list, watch, patch

Controller HA with Leader Election

# Controller Deployment sidecar flags for HA
containers:
- name: csi-provisioner
  image: registry.k8s.io/sig-storage/csi-provisioner:v3.6.0
  args:
    - --csi-address=/csi/csi.sock
    - --feature-gates=Topology=true
    - --leader-election                       # enable leader election
    - --leader-election-namespace=kube-system
    - --timeout=60s                           # RPC timeout
    - --retry-interval-start=1s
    - --retry-interval-max=5m
    - --worker-threads=100                    # parallel reconciliation workers

- name: csi-attacher
  image: registry.k8s.io/sig-storage/csi-attacher:v4.4.0
  args:
    - --csi-address=/csi/csi.sock
    - --leader-election
    - --leader-election-namespace=kube-system
    - --timeout=300s      # longer timeout for attach operations

Driver Upgrade Strategy

Upgrading a CSI driver without disrupting running pods requires careful ordering. The node plugin (DaemonSet) and controller plugin (Deployment) can be upgraded independently.

Check compatibility — verify new driver version supports existing PV volumeHandles and StorageClass parameters. Read the driver changelog for breaking changes.
Upgrade controller plugin first — scale down old Deployment, apply new Deployment. New provisioner/attacher sidecars start. Existing mounts are unaffected (controller handles only create/delete/attach/detach).
Upgrade node plugin (DaemonSet) — set maxUnavailable: 1 in DaemonSet update strategy. Pods on each node are replaced one at a time. Running pods continue using existing mounts during the upgrade window.
Verify — create a test PVC, mount it in a pod, write data, delete. Check CSI driver logs for errors.
Monitor — watch for csi_operations_seconds P99 latency increase during rollout; watch for storage_operation_errors_total.

# Safe DaemonSet update strategy for CSI node plugin
spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1     # replace one node plugin at a time
                            # avoids all nodes losing CSI capability simultaneously

⚠️

Node plugin upgrade does not restart pods Upgrading the node plugin DaemonSet does not affect currently running pods that have volumes mounted. The new node plugin simply handles any new mount/unmount requests. However, if the socket path changes between versions, running pods may see mount errors if they try to add/remove volumes while the node plugin is restarting.

Minimal CSI Driver Skeleton

A minimal read/write block CSI driver requires implementing three services. This outline shows the structure — full implementations use the csi-lib-utils library.

// main.go — CSI driver entry point
func main() {
    // 1. Parse flags: endpoint (unix:///csi/csi.sock), node-id, driver-name
    // 2. Create gRPC server
    // 3. Register all three services on the same server
    // 4. Start listening

    driver := &MyDriver{nodeID: *nodeID}
    server := grpc.NewServer(grpc.UnaryInterceptor(logInterceptor))

    csi.RegisterIdentityServer(server, driver)
    csi.RegisterControllerServer(server, driver)  // in controller plugin
    csi.RegisterNodeServer(server, driver)         // in node plugin

    lis, _ := net.Listen("unix", "/csi/csi.sock")
    server.Serve(lis)
}

// Identity Service (required by all)
func (d *MyDriver) GetPluginInfo(ctx, req) (*csi.GetPluginInfoResponse, error) {
    return &csi.GetPluginInfoResponse{
        Name:          "my.csi.driver.example",
        VendorVersion: "v1.0.0",
    }, nil
}
func (d *MyDriver) GetPluginCapabilities(ctx, req) (*csi.GetPluginCapabilitiesResponse, error) {
    return &csi.GetPluginCapabilitiesResponse{
        Capabilities: []*csi.PluginCapability{{
            Type: &csi.PluginCapability_Service_{
                Service: &csi.PluginCapability_Service{
                    Type: csi.PluginCapability_Service_CONTROLLER_SERVICE,
                },
            },
        }},
    }, nil
}
func (d *MyDriver) Probe(ctx, req) (*csi.ProbeResponse, error) {
    return &csi.ProbeResponse{Ready: &wrapperspb.BoolValue{Value: true}}, nil
}

// Controller Service (minimum for dynamic provisioning)
func (d *MyDriver) CreateVolume(ctx, req) (*csi.CreateVolumeResponse, error) {
    // 1. Validate parameters
    // 2. Call backend API to create disk (idempotent: check if exists first)
    // 3. Return volume ID and topology
}
func (d *MyDriver) DeleteVolume(ctx, req) (*csi.DeleteVolumeResponse, error) {
    // 1. Call backend API to delete disk
    // 2. Idempotent: if volume not found, return success
}

// Node Service (minimum for mounting)
func (d *MyDriver) NodeStageVolume(ctx, req) (*csi.NodeStageVolumeResponse, error) {
    // 1. Find the block device (from PublishContext devicePath)
    // 2. Format if needed: exec.Command("mkfs.ext4", "-F", devicePath)
    // 3. Mount: exec.Command("mount", devicePath, stagingTargetPath)
}
func (d *MyDriver) NodePublishVolume(ctx, req) (*csi.NodePublishVolumeResponse, error) {
    // 1. Bind-mount: exec.Command("mount", "--bind", stagingTargetPath, targetPath)
    // 2. Handle readOnly: remount with "-o", "bind,ro"
}
func (d *MyDriver) NodeUnpublishVolume(ctx, req) (*csi.NodeUnpublishVolumeResponse, error) {
    // 1. Unmount: exec.Command("umount", targetPath)
    // 2. Idempotent: if not mounted, return success
}
func (d *MyDriver) NodeUnstageVolume(ctx, req) (*csi.NodeUnstageVolumeResponse, error) {
    // 1. Unmount: exec.Command("umount", stagingTargetPath)
}
func (d *MyDriver) NodeGetInfo(ctx, req) (*csi.NodeGetInfoResponse, error) {
    return &csi.NodeGetInfoResponse{
        NodeId:            d.nodeID,
        MaxVolumesPerNode: 20,
        AccessibleTopology: &csi.Topology{
            Segments: map[string]string{"zone": d.zone},
        },
    }, nil
}

CSI Migration

The in-tree volume plugin migration to CSI was a multi-year project. Each plugin had a feature gate; migration is now complete for all major cloud providers. The migration is transparent: existing PVs with in-tree types are silently routed to the corresponding CSI driver.

In-Tree Plugin	CSI Driver	Migration Status
`kubernetes.io/aws-ebs`	`ebs.csi.aws.com`	GA 1.17; in-tree removed 1.27
`kubernetes.io/gce-pd`	`pd.csi.storage.gke.io`	GA 1.17; in-tree removed 1.28
`kubernetes.io/azure-disk`	`disk.csi.azure.com`	GA 1.19; in-tree removed 1.27
`kubernetes.io/azure-file`	`file.csi.azure.com`	GA 1.21; in-tree removed 1.27
`kubernetes.io/cinder` (OpenStack)	`cinder.csi.openstack.org`	GA 1.21; in-tree removed 1.26
`kubernetes.io/vsphere-volume`	`csi.vsphere.volume`	GA 1.19; in-tree removed 1.26

🔴

Upgrade path requirement Before upgrading past the removal version, you must install the CSI driver and ensure the migration feature gate was enabled in the prior version (to translate existing PV specs). If you skip the migration step and upgrade directly, existing PVs with removed in-tree plugin types will be invalid and pods will fail to start. Check your current Kubernetes version against the removal table above before upgrading.

Metrics and Alerting

Metric	Source	Alert Threshold
`csi_operations_seconds{driver,method_name}`	External sidecars	P99 CreateVolume >60s; NodePublish >30s
`storage_operation_duration_seconds{operation_name}`	kubelet	P99 volume_attach >60s, volume_mount >30s
`storage_operation_errors_total`	kubelet	Any non-zero rate
`attachdetach_controller_total_volumes{state="detached"}`	kube-controller-manager	Growing detached count indicates stuck detach
`volume_manager_total_volumes{state="desired_state_of_world"}`	kubelet	DSW vs ASW divergence for >5min

Alerting Rules

groups:
- name: csi-drivers
  rules:
  - alert: CSIProvisioningErrors
    expr: rate(storage_operation_errors_total{operation_name="provision"}[5m]) > 0
    for: 2m
    labels: {severity: warning}
    annotations:
      summary: "CSI provisioning errors on {{ $labels.volume_plugin }}"

  - alert: CSINodePublishSlow
    expr: |
      histogram_quantile(0.99,
        rate(storage_operation_duration_seconds_bucket{operation_name="volume_mount"}[10m])
      ) > 30
    for: 5m
    labels: {severity: warning}
    annotations:
      summary: "CSI NodePublishVolume P99 > 30s — pods taking too long to start"

  - alert: VolumeAttachDetachStuck
    expr: |
      attachdetach_controller_total_volumes{state="detached"} > 0
    for: 10m
    labels: {severity: warning}
    annotations:
      summary: "Volumes stuck in detached state for >10 minutes"

  - alert: CSIDriverDown
    expr: |
      up{job=~".*csi.*"} == 0
    for: 2m
    labels: {severity: critical}
    annotations:
      summary: "CSI driver {{ $labels.job }} is down — new mounts will fail"

Troubleshooting Runbooks

Runbook: Pod Stuck ContainerCreating — CSI NodePublish Failed

# 1. Get the error
kubectl describe pod <name> -n <ns>
# Events: "MountVolume.SetUp failed for volume... NodePublishVolume failed"

# 2. Check node plugin logs (DaemonSet pod on the same node)
NODE=$(kubectl get pod <name> -n <ns> -o jsonpath='{.spec.nodeName}')
kubectl logs -n kube-system \
  $(kubectl get pod -n kube-system -l app=ebs-csi-node -o name | grep $NODE) \
  -c csi-driver --tail=50

# 3. Check if the staging path exists (NodeStage may have partially completed)
kubectl debug node/$NODE -it --image=busybox -- \
  ls /var/lib/kubelet/plugins/kubernetes.io/csi/

# 4. Check if the block device is available on the node
kubectl debug node/$NODE -it --image=busybox -- lsblk

Runbook: CSI Provisioner Not Creating PVs

# 1. Describe PVC — check events
kubectl describe pvc <name> -n <ns>

# 2. Check external-provisioner logs
kubectl logs -n kube-system \
  $(kubectl get pod -n kube-system -l app=ebs-csi-controller -o name | head -1) \
  -c csi-provisioner --tail=100

# Common errors:
# "error calling CreateVolume: ... InvalidParameterValue" → bad StorageClass parameters
# "failed to create volume: ... UnauthorizedOperation" → missing IAM permissions
# "context deadline exceeded" → CSI driver not responding (check driver pod health)

# 3. Check driver pod is healthy
kubectl get pods -n kube-system -l app=ebs-csi-controller
# If CrashLoopBackOff: kubectl logs -n kube-system <pod> -c csi-driver --previous

Runbook: VolumeAttachment Stuck in Attaching State

# Attachment has been in progress for >5 minutes
kubectl get volumeattachment
# NAME                          ATTACHER          PV              NODE       ATTACHED
# csi-abc123                    ebs.csi.aws.com   pv-prod-db      node-1     false

# 1. Check external-attacher logs
kubectl logs -n kube-system \
  $(kubectl get pod -n kube-system -l app=ebs-csi-controller -o name | head -1) \
  -c csi-attacher --tail=50

# 2. Check cloud-side: is the volume actually attached to the node?
# AWS: aws ec2 describe-volumes --volume-ids vol-0abc123 | jq '.Volumes[].Attachments'

# 3. If cloud shows detached but attachment object is stuck, delete and recreate:
kubectl delete volumeattachment csi-abc123
# The attacher will recreate it and trigger a fresh attach attempt

Runbook: NodeGetVolumeStats Returns Abnormal — Volume Health Alert

# PVC Event shows VolumeConditionAbnormal
kubectl describe pvc <name> -n <ns>
# Warning: VolumeConditionAbnormal ...

# 1. Check what the driver is reporting
kubectl logs -n kube-system <node-plugin-pod> -c external-health-monitor-agent | tail -50

# 2. Check filesystem health from inside the pod
kubectl exec -it <pod> -- dmesg | tail -20   # kernel I/O errors
kubectl exec -it <pod> -- journalctl -k | grep -i "I/O error"

# 3. If filesystem corruption is suspected:
# - Take snapshot immediately (before further damage)
# - Stop writes to the volume
# - Run fsck on an unmounted clone of the snapshot

Runbook: Node Volume Limit Reached — Pod Pending

# Pod stuck Pending with:
# "0/3 nodes are available: 3 node(s) exceed max volume count"

# 1. Check current attachment count per node
kubectl describe nodes | grep "attachable-volumes" -A 3

# 2. Find which pods are using the most volumes on congested nodes
kubectl get pods --all-namespaces -o json | \
  jq -r '.items[] | select(.spec.nodeName=="<node>") |
  "\(.metadata.namespace)/\(.metadata.name): \(.spec.volumes | map(select(.persistentVolumeClaim)) | length) PVCs"' | \
  sort -t: -k2 -rn | head -10

# 3. Options:
# a) Add more nodes to the cluster
# b) Consolidate workloads (fewer, larger PVCs per pod)
# c) Use EBS multi-attach (io1/io2) for shared access where appropriate
# d) AWS: request limit increase via AWS Support for certain instance types

Best Practices

Always deploy the liveness-probe sidecar. Without it, a hung CSI driver process is invisible to Kubernetes — the pod appears healthy but all storage operations silently fail. Pair with a Prometheus alert on up{job="csi-driver"} == 0.
Enable leader election on all controller sidecars when running more than one controller replica. Without it, you get duplicate volumes, duplicate snapshots, or conflicting resize operations.
Use RBAC least-privilege for driver ServiceAccounts. The controller ServiceAccount does not need node access; the node ServiceAccount does not need PV create/delete. Follow the driver's recommended RBAC manifest exactly — don't grant cluster-admin to the driver.
Pin sidecar versions to your Kubernetes version. The external-provisioner, external-attacher, etc. have compatibility matrices. A sidecar built for K8s 1.28 may use API fields removed or changed in 1.31. Check the sidecar compatibility table before upgrading Kubernetes.
Set maxUnavailable: 1 on node plugin DaemonSet updates. Rolling the node plugin too fast (all nodes at once) means all nodes simultaneously lose the ability to perform new mount/unmount operations.
Set fsGroupPolicy: File for block drivers. The default is driver-dependent. An unset or wrong policy means fsGroup in pod SecurityContext has no effect, and containers get permission denied on newly provisioned volumes.
Pre-install the CSI driver before migrating in-tree volumes. If you upgrade Kubernetes past the in-tree removal version without having the CSI driver installed and migration enabled, existing PVs become unserviceable. There is no easy rollback.
Monitor storage_operation_errors_total by operation name. A rate of >0 for provision, volume_attach, or volume_mount means real workloads are experiencing failures. These operations are synchronous blockers for pod startup — every error translates directly to pod latency or unavailability.

← Previous Storage Classes Next → Volume Snapshots