CSI Drivers

The Container Storage Interface in depth — the gRPC API contract between Kubernetes and storage backends, the sidecar architecture, every RPC call in the provision/attach/mount/resize/snapshot lifecycle, the CSIDriver object, volume health monitoring, driver upgrade strategy, and a minimal driver implementation walkthrough.

Section 04 of 13 File 5 of 8 Platform Engineer
What This Page Covers
  • Why CSI exists — in-tree plugin problems; CSI spec history (v0.1→v1.0→v1.6)
  • CSI architecture — controller plugin (Deployment) vs node plugin (DaemonSet); gRPC Unix socket communication
  • External sidecar responsibilities — external-provisioner, external-attacher, external-resizer, external-snapshotter, node-driver-registrar, liveness-probe, external-health-monitor
  • CSIDriver object — all spec fields: attachRequired, podInfoOnMount, volumeLifecycleModes, fsGroupPolicy, tokenRequests, requiresRepublish, seLinuxMount
  • Identity service RPCs — GetPluginInfo, GetPluginCapabilities, Probe
  • Controller service RPCs — CreateVolume (with topology, secrets, volume content source), DeleteVolume, ControllerPublishVolume, ControllerUnpublishVolume, ValidateVolumeCapabilities, ListVolumes, GetCapacity, CreateSnapshot, DeleteSnapshot, ListSnapshots, ControllerExpandVolume, ControllerGetVolume
  • Node service RPCs — NodeStageVolume (global mount / format), NodePublishVolume (bind-mount into pod), NodeUnpublishVolume, NodeUnstageVolume, NodeGetCapabilities, NodeGetInfo, NodeExpandVolume, NodeGetVolumeStats
  • NodeStage vs NodePublish distinction — staging path, bind mount, global device management
  • Volume access modes in CSI — SINGLE_NODE_WRITER, SINGLE_NODE_READER_ONLY, MULTI_NODE_READER_ONLY, MULTI_NODE_SINGLE_WRITER, MULTI_NODE_MULTI_WRITER
  • Topology — CreateVolume topology requirements/preferences; accessible topology response; scheduler interaction
  • Secrets in CSI — per-RPC secret references; StorageClass csi.storage.k8s.io/* parameter keys
  • Volume content source — clone from volume, restore from snapshot in CreateVolume
  • CSI ephemeral volumes — volumeLifecycleModes:Ephemeral; NodePublishVolume inline flow
  • Volume health monitoring — external-health-monitor-controller + node sidecar; VolumeCondition; NodeGetVolumeStats health fields
  • Pod Info on Mount — podInfoOnMount: true; kubelet injects pod UID/name/namespace into NodePublishVolume context
  • SELinux mount — seLinuxMount: true; CSI-level SELinux label propagation (1.27+)
  • Driver deployment patterns — Helm chart structure; RBAC requirements for each sidecar; leader election for controller HA
  • Driver upgrade strategy — rolling upgrade of DaemonSet (node plugin) and Deployment (controller); impact on running pods; zero-downtime approach
  • Minimal CSI driver skeleton — Go implementation outline; required RPCs for a basic block driver; registration flow
  • Secrets Store CSI Driver — SecretProviderClass CRD; provider implementations (AWS, Azure, GCP, Vault); sync to K8s Secret; token rotation
  • Node volume limits — max-volumes-per-node; AWS instance type limits; scheduler MaxAttachLimit plugin
  • CSI migration — in-tree plugin to CSI; feature gate timeline; migrated-to annotation; rollback constraints
  • 5 metrics + 4 alerting rules + 5 troubleshooting runbooks
  • 8 best practices
  • Why CSI Exists

    Before CSI, storage drivers were compiled into the Kubernetes binary as in-tree plugins. Adding or updating a driver required a Kubernetes release, which took months. Bug fixes in a storage driver couldn't ship independently of Kubernetes. This created three problems:

    CSI (Container Storage Interface) decouples storage drivers from Kubernetes. Drivers run as ordinary pods with only the permissions they need. They communicate with Kubernetes via a gRPC interface over a Unix socket. Kubernetes 1.13 declared CSI GA; the last major in-tree drivers were removed in 1.27–1.28.

    CSI Architecture

    ┌─────────────────────────────────────────────────────────────────────┐
    │  KUBERNETES CONTROL PLANE                                           │
    │  kube-controller-manager:                                           │
    │    AttachDetach controller  ──────────────────────────────┐        │
    │    PV bind controller       ──────────────────────────────┤        │
    │  kube-scheduler                                            │        │
    └────────────────────────────────────────────────────────────┼────────┘
                                                                 │ watches K8s API
    ┌─────────────────────────────────────────────────────── ───┼────────┐
    │  CSI CONTROLLER PLUGIN (Deployment, 1-3 replicas)          │        │
    │                                                             │        │
    │  ┌──────────────────────┐   ┌───────────────────────────────┤       │
    │  │  Your CSI Driver     │   │  Kubernetes Sidecar Containers │       │
    │  │  (controller plugin) │◄──│                               │       │
    │  │                      │   │  external-provisioner  ───────┘       │
    │  │  gRPC Unix socket:   │   │  external-attacher                    │
    │  │  /csi/csi.sock       │   │  external-resizer                     │
    │  │                      │   │  external-snapshotter                 │
    │  │  Implements:         │   │  liveness-probe                       │
    │  │  Identity service    │   └───────────────────────────────────────┘
    │  │  Controller service  │
    │  └──────────────────────┘
    └─────────────────────────────────────────────────────────────────────┘
    
    ┌─────────────────────────────────────────────────────────────────────┐
    │  CSI NODE PLUGIN (DaemonSet — one pod per node)                     │
    │                                                                     │
    │  ┌──────────────────────┐   ┌───────────────────────────────────┐   │
    │  │  Your CSI Driver     │   │  Kubernetes Sidecar Containers    │   │
    │  │  (node plugin)       │◄──│                                   │   │
    │  │                      │   │  node-driver-registrar            │   │
    │  │  gRPC Unix socket:   │   │    (registers with kubelet at     │   │
    │  │  /csi/csi.sock       │   │     /var/lib/kubelet/plugins_      │   │
    │  │                      │   │     registry/)                    │   │
    │  │  Implements:         │   │  liveness-probe                   │   │
    │  │  Identity service    │   │  external-health-monitor-agent    │   │
    │  │  Node service        │   └───────────────────────────────────┘   │
    │  └──────────────────────┘                                           │
    │                     ▲                                               │
    │                     │ kubelet calls via gRPC Unix socket            │
    └─────────────────────────────────────────────────────────────────────┘
    

    The split into controller and node plugins is deliberate. The controller plugin runs anywhere in the cluster (usually on control plane nodes via node selector) and makes API calls to the storage backend (cloud APIs, Ceph, etc.). The node plugin runs on every worker node where volumes may be mounted, and performs the low-level format/mount/unmount operations that require direct disk access.

    External Sidecar Responsibilities

    Kubernetes ships a set of standard sidecar containers that bridge the Kubernetes API to CSI gRPC calls. You never write these — your driver implements the gRPC interface, and the sidecars translate Kubernetes events into gRPC calls.

    SidecarWatchesCalls CSIRequired
    external-provisionerPVC with matching StorageClassCreateVolume / DeleteVolumeYes (dynamic provisioning)
    external-attacherVolumeAttachment objectsControllerPublishVolume / ControllerUnpublishVolumeIf attachRequired: true
    external-resizerPVC resize requests (status conditions)ControllerExpandVolumeIf expansion supported
    external-snapshotterVolumeSnapshot objectsCreateSnapshot / DeleteSnapshot / ListSnapshotsIf snapshots supported
    node-driver-registrarGetPluginInfo (to register socket path with kubelet)Yes (node plugin)
    liveness-probeProbe (health check; exposes /healthz HTTP endpoint)Recommended
    external-health-monitor-controllerPV objectsControllerGetVolume / ListVolumesOptional (volume health)
    external-health-monitor-agentNode-local volumesNodeGetVolumeStats (health fields)Optional (volume health)
    ℹ️
    Leader election for HA controller When the controller Deployment has replicas > 1, each sidecar must have leader election enabled (--leader-election flag). Only the elected leader processes events. Without leader election, multiple replicas calling CreateVolume simultaneously would create duplicate volumes. The sidecars use a Lease object for leader election.

    CSIDriver Object

    The CSIDriver cluster-scoped object declares a driver's capabilities to Kubernetes. It is created by the driver deployment (usually via Helm), not dynamically registered.

    apiVersion: storage.k8s.io/v1
    kind: CSIDriver
    metadata:
      name: ebs.csi.aws.com     # MUST match the provisioner name in StorageClass
    spec:
      # ── Attach behavior ───────────────────────────────────────────
      attachRequired: true       # true: driver manages attach/detach (block drivers)
                                 # false: no VolumeAttachment created (NFS, CephFS)
    
      # ── Pod metadata injection ────────────────────────────────────
      podInfoOnMount: true       # kubelet passes pod UID/name/namespace in NodePublishVolume
                                 # enables per-pod volume access tracking
    
      # ── Volume lifecycle modes ────────────────────────────────────
      volumeLifecycleModes:
        - Persistent             # supports PV/PVC (standard)
        # - Ephemeral            # supports inline CSI ephemeral volumes (no PVC)
    
      # ── fsGroup handling ──────────────────────────────────────────
      fsGroupPolicy: File        # None | File | ReadWriteOnceWithFSType
                                 # File: kubelet always chowns mounted volume to fsGroup
                                 # None: driver handles fsGroup itself (e.g., NFS squash)
                                 # ReadWriteOnceWithFSType: chown only when fsType set + RWO
    
      # ── Token injection for cloud auth ────────────────────────────
      tokenRequests:
        - audience: sts.amazonaws.com      # AWS IRSA audience
          expirationSeconds: 86400         # token TTL
        # - audience: ""                   # default K8s service account token
    
      # ── Secrets Store CSI or drivers that re-publish volumes ──────
      requiresRepublish: false   # true: kubelet calls NodePublishVolume periodically
                                 # used by Secrets Store CSI to refresh secret values
    
      # ── SELinux optimization (GA 1.27) ────────────────────────────
      seLinuxMount: true         # driver can mount with SELinux label directly
                                 # avoids recursive relabeling of volume files

    fsGroupPolicy Values

    ValueBehaviorUse For
    Filekubelet recursively chowns all files to fsGroup on every mountBlock CSI drivers (EBS, Azure Disk, GCE PD)
    Nonekubelet does not chown; driver is responsibleNFS, CephFS — server-side uid/gid management
    ReadWriteOnceWithFSTypechown only when fsType is set AND accessMode is RWODrivers that support both block and file

    Identity Service RPCs

    Every CSI driver must implement the Identity service — it is called during driver registration and health checks.

    RPCCalled ByPurpose
    GetPluginInfonode-driver-registrar, sidecarsReturns driver name and version. Name must match CSIDriver object name.
    GetPluginCapabilitiesAll sidecarsReturns which services the plugin supports: CONTROLLER_SERVICE, VOLUME_ACCESSIBILITY_CONSTRAINTS (topology), ONLINE expansion
    Probeliveness-probe sidecar (periodic)Returns driver readiness. Used for liveness/readiness probes. Return NOT_READY if not yet initialized.

    Controller Service RPCs

    The controller plugin runs in a Deployment and calls cloud/storage APIs. Not all RPCs are required — declare which you implement via ControllerGetCapabilities.

    CreateVolume

    Called by external-provisioner when a PVC needs a new volume. The most complex RPC — it must handle topology, secrets, cloning, and snapshot restore.

    // CreateVolumeRequest (gRPC, simplified Go-like pseudocode)
    {
      Name: "pvc-abc-123",              // unique name; idempotency key
      CapacityRange: {
        RequiredBytes: 20 * 1024^3,     // 20 GiB minimum
        LimitBytes:    0,               // no upper limit
      },
      VolumeCapabilities: [{
        AccessMode: { Mode: SINGLE_NODE_WRITER },
        Mount: { FsType: "ext4", MountFlags: ["noatime"] },
      }],
      Parameters: {                     // from StorageClass.parameters
        "type": "gp3",
        "encrypted": "true",
      },
      Secrets: { ... },                 // from csi.storage.k8s.io/provisioner-secret-name
      AccessibilityRequirements: {      // from scheduler topology hint
        Requisite: [{ Segments: {"topology.ebs.csi.aws.com/zone": "us-east-1a"} }],
        Preferred: [{ Segments: {"topology.ebs.csi.aws.com/zone": "us-east-1a"} }],
      },
      VolumeContentSource: {            // optional: clone or snapshot restore
        Type: &Snapshot{ SnapshotId: "snap-0abc123" },
        // OR
        Type: &Volume{ VolumeId: "vol-0source" },
      },
    }
    
    // CreateVolumeResponse
    {
      Volume: {
        VolumeId: "vol-0abc123def456",     // opaque to K8s; stored in PV.spec.csi.volumeHandle
        CapacityBytes: 20 * 1024^3,
        VolumeContext: { "throughput": "250" },  // stored in PV.spec.csi.volumeAttributes
        AccessibleTopology: [{ Segments: {"topology.ebs.csi.aws.com/zone": "us-east-1a"} }],
      }
    }

    ControllerPublishVolume / ControllerUnpublishVolume

    Called by external-attacher when a VolumeAttachment object is created/deleted. For cloud block storage, this translates to attaching/detaching the block device to the VM instance.

    // ControllerPublishVolumeRequest
    {
      VolumeId: "vol-0abc123def456",    // EBS volume ID
      NodeId: "i-0abc123def456789",     // EC2 instance ID (from NodeGetInfo)
      VolumeCapability: { ... },
      Readonly: false,
    }
    // Response contains PublishContext, e.g.: {"devicePath": "/dev/xvdf"}
    // This devicePath is passed to NodeStageVolume

    ControllerExpandVolume

    Called by external-resizer when PVC storage is increased. Resizes the backing cloud volume. Returns the new capacity and whether NodeExpansionRequired is true (meaning the filesystem on the node also needs resizing).

    CreateSnapshot / DeleteSnapshot

    Called by external-snapshotter. Parameters come from the VolumeSnapshotClass. The driver must handle idempotency — if a snapshot with the same name already exists, return it rather than failing.

    Node Service RPCs

    The node plugin runs as a DaemonSet. Kubelet calls it directly via the Unix socket registered at /var/lib/kubelet/plugins/<driver-name>/csi.sock.

    NodeStageVolume vs NodePublishVolume

    This is the most subtle distinction in CSI. The two-phase mount exists to support multiple pods sharing the same block device on a single node:

    BLOCK VOLUME MOUNT FLOW (two-phase)
    
    Cloud storage backend
      vol-0abc123 attached to node as /dev/xvdf
            │
            ▼
    NodeStageVolume (called once per volume per node)
      stagingTargetPath: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-abc/globalmount
      Operations:
        - fsck (filesystem check)
        - mkfs.ext4 /dev/xvdf  (if new volume)
        - mount /dev/xvdf /var/lib/kubelet/.../globalmount
      Result: device formatted and mounted at a GLOBAL path on the node
            │
            ▼ (for each pod using this volume)
    NodePublishVolume (called once per pod per volume)
      targetPath: /var/lib/kubelet/pods/<pod-uid>/volumes/kubernetes.io~csi/<vol-name>/mount
      Operations:
        - mount --bind /globalmount /var/lib/kubelet/pods/.../mount
      Result: bind-mount from global mount into the specific pod's directory
            │
            ▼
    Container sees /var/data (mountPath) backed by the bind-mounted filesystem
    
    UNMOUNT FLOW (reverse):
      Pod terminates → NodeUnpublishVolume (remove bind-mount from pod dir)
      Last pod using volume → NodeUnstageVolume (unmount global mount, potentially format check)
      VolumeAttachment deleted → ControllerUnpublishVolume (detach from node)
    

    For NFS/CephFS (filesystem volumes without attachment), attachRequired: false in the CSIDriver object, and the driver may skip NodeStage entirely — mounting directly in NodePublish.

    NodeExpandVolume

    Called by kubelet after ControllerExpandVolume completes and the pod remounts the volume. Runs the filesystem-level resize command:

    // NodeExpandVolumeRequest
    {
      VolumeId: "vol-0abc123",
      VolumePath: "/var/lib/kubelet/pods/.../mount",   // the pod's mountPath
      StagingTargetPath: "/var/lib/.../globalmount",
      CapacityRange: { RequiredBytes: 50 * 1024^3 },   // 50 GiB target
      VolumeCapability: { Mount: { FsType: "ext4" } },
    }
    
    // Driver implementation:
    // 1. Find the block device from stagingTargetPath
    // 2. Run: resize2fs /dev/xvdf  (for ext4)
    //    Or:  xfs_growfs /var/lib/.../globalmount  (for xfs — uses mount path, not device)
    // 3. Return new capacity

    NodeGetInfo

    Returns the node's unique ID (used in ControllerPublishVolume) and its topology segments. The topology information is used by the scheduler to place pods near their volumes.

    // NodeGetInfoResponse
    {
      NodeId: "i-0abc123def456789",           // instance ID for ControllerPublishVolume
      MaxVolumesPerNode: 25,                   // scheduler enforces this limit
      AccessibleTopology: {
        Segments: {
          "topology.ebs.csi.aws.com/zone": "us-east-1a"
        }
      }
    }

    NodeGetVolumeStats

    Returns usage statistics for a mounted volume. Kubelet calls this periodically to populate kubelet_volume_stats_* metrics and to check volume health conditions.

    // NodeGetVolumeStatsResponse
    {
      Usage: [
        { Available: 15 * 1024^3, Total: 20 * 1024^3, Used: 5 * 1024^3, Unit: BYTES },
        { Available: 6500000, Total: 6553600, Used: 53600, Unit: INODES },
      ],
      VolumeCondition: {           // volume health monitoring (optional)
        Abnormal: false,
        Message: "",
        // Abnormal: true when driver detects I/O errors, corruption, etc.
      }
    }

    Volume Health Monitoring

    CSI volume health (GA 1.21) provides a mechanism for drivers to report that a volume is unhealthy. The health signal bubbles up as a Kubernetes Event on the PVC and Pod objects.

    Health Monitoring Components

    ComponentWhereChecks
    external-health-monitor-controllerController Deployment sidecarCalls ControllerGetVolume / ListVolumes periodically; detects cloud-side anomalies (corrupted snapshot, degraded volume)
    external-health-monitor-agentNode DaemonSet sidecarCalls NodeGetVolumeStats on mounted volumes; checks VolumeCondition.Abnormal

    When a volume is detected as unhormal, the controller emits a Kubernetes Event on the PVC:

    kubectl describe pvc data-postgres-0 -n production
    # Events:
    #   Type     Reason               Age   From                       Message
    #   ----     ------               ---   ----                       -------
    #   Warning  VolumeConditionAbnormal  2m  external-health-monitor  Volume condition is abnormal: I/O error detected

    Pod Info on Mount

    When podInfoOnMount: true is set in the CSIDriver, kubelet injects the requesting pod's metadata into NodePublishVolume's VolumeContext. This enables drivers to implement per-pod access control, logging, or billing.

    // NodePublishVolumeRequest.VolumeContext when podInfoOnMount:true
    {
      "csi.storage.k8s.io/pod.name":            "postgres-0",
      "csi.storage.k8s.io/pod.namespace":       "production",
      "csi.storage.k8s.io/pod.uid":             "abc-123-def",
      "csi.storage.k8s.io/serviceAccount.name": "postgres",
      // plus any volumeAttributes from the PV spec
    }

    The Secrets Store CSI Driver uses this to look up the requesting pod's service account and assume the appropriate cloud IAM role for secret retrieval — without any cluster-wide credentials in the driver.

    Secrets Store CSI Driver

    The most widely-deployed CSI ephemeral driver. It mounts secrets from external secret stores (AWS Secrets Manager / Parameter Store, Azure Key Vault, GCP Secret Manager, HashiCorp Vault) as files in pods — without the secrets ever being stored in Kubernetes etcd.

    apiVersion: secrets-store.csi.x-k8s.io/v1
    kind: SecretProviderClass
    metadata:
      name: aws-secrets
      namespace: production
    spec:
      provider: aws
      parameters:
        objects: |
          - objectName: "prod/db/password"
            objectType: "secretsmanager"
            objectAlias: "db-password"        # filename in the pod
          - objectName: "/prod/api-key"
            objectType: "ssmparameter"
            objectAlias: "api-key"
      secretObjects:                          # optional: sync to K8s Secret for env var use
      - secretName: db-secret
        type: Opaque
        data:
        - objectName: db-password
          key: password
    volumes:
    - name: secrets
      csi:
        driver: secrets-store.csi.k8s.io
        readOnly: true
        volumeAttributes:
          secretProviderClass: aws-secrets    # reference to SecretProviderClass
    
    containers:
    - name: app
      volumeMounts:
      - name: secrets
        mountPath: /mnt/secrets
        readOnly: true
      env:
      - name: DB_PASSWORD                    # also available as env var via synced K8s Secret
        valueFrom:
          secretKeyRef:
            name: db-secret
            key: password

    Token Rotation with requiresRepublish

    When the CSIDriver has requiresRepublish: true, kubelet calls NodePublishVolume periodically (default 1 minute). The Secrets Store driver uses this to re-fetch secrets from the external store and update the files — enabling seamless secret rotation without pod restart.

    Node Volume Limits

    Each node can attach a limited number of block volumes. The CSI driver reports this via NodeGetInfo.MaxVolumesPerNode, and the kube-scheduler's NodeVolumeLimits plugin enforces it.

    Cloud / InstanceMax VolumesNotes
    AWS (most instance types)25–39 (type-dependent)EBS CSI reads from instance metadata; c5.4xlarge = 25, i3.metal = 39
    AWS (nitro instances, NVMe)28Separate limit for NVMe instance store volumes
    GCE (most machine types)16pd.csi.storage.gke.io reports 16 by default
    Azure (Standard_D series)4–64 (SKU-dependent)disk.csi.azure.com reads from instance metadata
    # Check node attachment capacity
    kubectl describe node <name> | grep -A 5 "Allocatable"
    # attachable-volumes-aws-ebs: 25    ← scheduler enforces this
    
    # Check current usage on a node
    kubectl describe node <name> | grep "attachable-volumes" -A 20
    
    # Identify pods that could exhaust volume limits
    kubectl get pods --all-namespaces -o json | \
      jq -r '.items[] | select(.spec.nodeName=="<node>") |
      "\(.metadata.namespace)/\(.metadata.name): \(.spec.volumes | length) volumes"' | sort -t: -k2 -rn

    Driver Deployment Patterns

    A typical CSI driver Helm chart deploys two workloads and several supporting objects:

    # Typical CSI driver RBAC requirements
    
    # Controller ServiceAccount needs:
    # - Secrets: get (for StorageClass secrets)
    # - PersistentVolumes: get, list, watch, create, delete, patch
    # - PersistentVolumeClaims: get, list, watch, update (for resize status)
    # - StorageClasses: get, list, watch
    # - VolumeAttachments: get, list, watch, patch
    # - CSINodes: get, list, watch
    # - VolumeSnapshots / VolumeSnapshotContents: get, list, watch, create, delete, patch
    # - Events: create, patch
    # - Leases: get, create, update (for leader election)
    
    # Node ServiceAccount needs:
    # - Secrets: get (for node-stage secrets)
    # - Pods: get (when podInfoOnMount: true)
    # - PersistentVolumeClaims: get, list, watch
    # - CSINodes: get, create, update, patch
    # - Nodes: get, list, watch, patch

    Controller HA with Leader Election

    # Controller Deployment sidecar flags for HA
    containers:
    - name: csi-provisioner
      image: registry.k8s.io/sig-storage/csi-provisioner:v3.6.0
      args:
        - --csi-address=/csi/csi.sock
        - --feature-gates=Topology=true
        - --leader-election                       # enable leader election
        - --leader-election-namespace=kube-system
        - --timeout=60s                           # RPC timeout
        - --retry-interval-start=1s
        - --retry-interval-max=5m
        - --worker-threads=100                    # parallel reconciliation workers
    
    - name: csi-attacher
      image: registry.k8s.io/sig-storage/csi-attacher:v4.4.0
      args:
        - --csi-address=/csi/csi.sock
        - --leader-election
        - --leader-election-namespace=kube-system
        - --timeout=300s      # longer timeout for attach operations

    Driver Upgrade Strategy

    Upgrading a CSI driver without disrupting running pods requires careful ordering. The node plugin (DaemonSet) and controller plugin (Deployment) can be upgraded independently.

    1. Check compatibility — verify new driver version supports existing PV volumeHandles and StorageClass parameters. Read the driver changelog for breaking changes.
    2. Upgrade controller plugin first — scale down old Deployment, apply new Deployment. New provisioner/attacher sidecars start. Existing mounts are unaffected (controller handles only create/delete/attach/detach).
    3. Upgrade node plugin (DaemonSet) — set maxUnavailable: 1 in DaemonSet update strategy. Pods on each node are replaced one at a time. Running pods continue using existing mounts during the upgrade window.
    4. Verify — create a test PVC, mount it in a pod, write data, delete. Check CSI driver logs for errors.
    5. Monitor — watch for csi_operations_seconds P99 latency increase during rollout; watch for storage_operation_errors_total.
    # Safe DaemonSet update strategy for CSI node plugin
    spec:
      updateStrategy:
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 1     # replace one node plugin at a time
                                # avoids all nodes losing CSI capability simultaneously
    ⚠️
    Node plugin upgrade does not restart pods Upgrading the node plugin DaemonSet does not affect currently running pods that have volumes mounted. The new node plugin simply handles any new mount/unmount requests. However, if the socket path changes between versions, running pods may see mount errors if they try to add/remove volumes while the node plugin is restarting.

    Minimal CSI Driver Skeleton

    A minimal read/write block CSI driver requires implementing three services. This outline shows the structure — full implementations use the csi-lib-utils library.

    // main.go — CSI driver entry point
    func main() {
        // 1. Parse flags: endpoint (unix:///csi/csi.sock), node-id, driver-name
        // 2. Create gRPC server
        // 3. Register all three services on the same server
        // 4. Start listening
    
        driver := &MyDriver{nodeID: *nodeID}
        server := grpc.NewServer(grpc.UnaryInterceptor(logInterceptor))
    
        csi.RegisterIdentityServer(server, driver)
        csi.RegisterControllerServer(server, driver)  // in controller plugin
        csi.RegisterNodeServer(server, driver)         // in node plugin
    
        lis, _ := net.Listen("unix", "/csi/csi.sock")
        server.Serve(lis)
    }
    
    // Identity Service (required by all)
    func (d *MyDriver) GetPluginInfo(ctx, req) (*csi.GetPluginInfoResponse, error) {
        return &csi.GetPluginInfoResponse{
            Name:          "my.csi.driver.example",
            VendorVersion: "v1.0.0",
        }, nil
    }
    func (d *MyDriver) GetPluginCapabilities(ctx, req) (*csi.GetPluginCapabilitiesResponse, error) {
        return &csi.GetPluginCapabilitiesResponse{
            Capabilities: []*csi.PluginCapability{{
                Type: &csi.PluginCapability_Service_{
                    Service: &csi.PluginCapability_Service{
                        Type: csi.PluginCapability_Service_CONTROLLER_SERVICE,
                    },
                },
            }},
        }, nil
    }
    func (d *MyDriver) Probe(ctx, req) (*csi.ProbeResponse, error) {
        return &csi.ProbeResponse{Ready: &wrapperspb.BoolValue{Value: true}}, nil
    }
    
    // Controller Service (minimum for dynamic provisioning)
    func (d *MyDriver) CreateVolume(ctx, req) (*csi.CreateVolumeResponse, error) {
        // 1. Validate parameters
        // 2. Call backend API to create disk (idempotent: check if exists first)
        // 3. Return volume ID and topology
    }
    func (d *MyDriver) DeleteVolume(ctx, req) (*csi.DeleteVolumeResponse, error) {
        // 1. Call backend API to delete disk
        // 2. Idempotent: if volume not found, return success
    }
    
    // Node Service (minimum for mounting)
    func (d *MyDriver) NodeStageVolume(ctx, req) (*csi.NodeStageVolumeResponse, error) {
        // 1. Find the block device (from PublishContext devicePath)
        // 2. Format if needed: exec.Command("mkfs.ext4", "-F", devicePath)
        // 3. Mount: exec.Command("mount", devicePath, stagingTargetPath)
    }
    func (d *MyDriver) NodePublishVolume(ctx, req) (*csi.NodePublishVolumeResponse, error) {
        // 1. Bind-mount: exec.Command("mount", "--bind", stagingTargetPath, targetPath)
        // 2. Handle readOnly: remount with "-o", "bind,ro"
    }
    func (d *MyDriver) NodeUnpublishVolume(ctx, req) (*csi.NodeUnpublishVolumeResponse, error) {
        // 1. Unmount: exec.Command("umount", targetPath)
        // 2. Idempotent: if not mounted, return success
    }
    func (d *MyDriver) NodeUnstageVolume(ctx, req) (*csi.NodeUnstageVolumeResponse, error) {
        // 1. Unmount: exec.Command("umount", stagingTargetPath)
    }
    func (d *MyDriver) NodeGetInfo(ctx, req) (*csi.NodeGetInfoResponse, error) {
        return &csi.NodeGetInfoResponse{
            NodeId:            d.nodeID,
            MaxVolumesPerNode: 20,
            AccessibleTopology: &csi.Topology{
                Segments: map[string]string{"zone": d.zone},
            },
        }, nil
    }

    CSI Migration

    The in-tree volume plugin migration to CSI was a multi-year project. Each plugin had a feature gate; migration is now complete for all major cloud providers. The migration is transparent: existing PVs with in-tree types are silently routed to the corresponding CSI driver.

    In-Tree PluginCSI DriverMigration Status
    kubernetes.io/aws-ebsebs.csi.aws.comGA 1.17; in-tree removed 1.27
    kubernetes.io/gce-pdpd.csi.storage.gke.ioGA 1.17; in-tree removed 1.28
    kubernetes.io/azure-diskdisk.csi.azure.comGA 1.19; in-tree removed 1.27
    kubernetes.io/azure-filefile.csi.azure.comGA 1.21; in-tree removed 1.27
    kubernetes.io/cinder (OpenStack)cinder.csi.openstack.orgGA 1.21; in-tree removed 1.26
    kubernetes.io/vsphere-volumecsi.vsphere.volumeGA 1.19; in-tree removed 1.26
    🔴
    Upgrade path requirement Before upgrading past the removal version, you must install the CSI driver and ensure the migration feature gate was enabled in the prior version (to translate existing PV specs). If you skip the migration step and upgrade directly, existing PVs with removed in-tree plugin types will be invalid and pods will fail to start. Check your current Kubernetes version against the removal table above before upgrading.

    Metrics and Alerting

    MetricSourceAlert Threshold
    csi_operations_seconds{driver,method_name}External sidecarsP99 CreateVolume >60s; NodePublish >30s
    storage_operation_duration_seconds{operation_name}kubeletP99 volume_attach >60s, volume_mount >30s
    storage_operation_errors_totalkubeletAny non-zero rate
    attachdetach_controller_total_volumes{state="detached"}kube-controller-managerGrowing detached count indicates stuck detach
    volume_manager_total_volumes{state="desired_state_of_world"}kubeletDSW vs ASW divergence for >5min

    Alerting Rules

    groups:
    - name: csi-drivers
      rules:
      - alert: CSIProvisioningErrors
        expr: rate(storage_operation_errors_total{operation_name="provision"}[5m]) > 0
        for: 2m
        labels: {severity: warning}
        annotations:
          summary: "CSI provisioning errors on {{ $labels.volume_plugin }}"
    
      - alert: CSINodePublishSlow
        expr: |
          histogram_quantile(0.99,
            rate(storage_operation_duration_seconds_bucket{operation_name="volume_mount"}[10m])
          ) > 30
        for: 5m
        labels: {severity: warning}
        annotations:
          summary: "CSI NodePublishVolume P99 > 30s — pods taking too long to start"
    
      - alert: VolumeAttachDetachStuck
        expr: |
          attachdetach_controller_total_volumes{state="detached"} > 0
        for: 10m
        labels: {severity: warning}
        annotations:
          summary: "Volumes stuck in detached state for >10 minutes"
    
      - alert: CSIDriverDown
        expr: |
          up{job=~".*csi.*"} == 0
        for: 2m
        labels: {severity: critical}
        annotations:
          summary: "CSI driver {{ $labels.job }} is down — new mounts will fail"

    Troubleshooting Runbooks

    Runbook: Pod Stuck ContainerCreating — CSI NodePublish Failed

    # 1. Get the error
    kubectl describe pod <name> -n <ns>
    # Events: "MountVolume.SetUp failed for volume... NodePublishVolume failed"
    
    # 2. Check node plugin logs (DaemonSet pod on the same node)
    NODE=$(kubectl get pod <name> -n <ns> -o jsonpath='{.spec.nodeName}')
    kubectl logs -n kube-system \
      $(kubectl get pod -n kube-system -l app=ebs-csi-node -o name | grep $NODE) \
      -c csi-driver --tail=50
    
    # 3. Check if the staging path exists (NodeStage may have partially completed)
    kubectl debug node/$NODE -it --image=busybox -- \
      ls /var/lib/kubelet/plugins/kubernetes.io/csi/
    
    # 4. Check if the block device is available on the node
    kubectl debug node/$NODE -it --image=busybox -- lsblk

    Runbook: CSI Provisioner Not Creating PVs

    # 1. Describe PVC — check events
    kubectl describe pvc <name> -n <ns>
    
    # 2. Check external-provisioner logs
    kubectl logs -n kube-system \
      $(kubectl get pod -n kube-system -l app=ebs-csi-controller -o name | head -1) \
      -c csi-provisioner --tail=100
    
    # Common errors:
    # "error calling CreateVolume: ... InvalidParameterValue" → bad StorageClass parameters
    # "failed to create volume: ... UnauthorizedOperation" → missing IAM permissions
    # "context deadline exceeded" → CSI driver not responding (check driver pod health)
    
    # 3. Check driver pod is healthy
    kubectl get pods -n kube-system -l app=ebs-csi-controller
    # If CrashLoopBackOff: kubectl logs -n kube-system <pod> -c csi-driver --previous

    Runbook: VolumeAttachment Stuck in Attaching State

    # Attachment has been in progress for >5 minutes
    kubectl get volumeattachment
    # NAME                          ATTACHER          PV              NODE       ATTACHED
    # csi-abc123                    ebs.csi.aws.com   pv-prod-db      node-1     false
    
    # 1. Check external-attacher logs
    kubectl logs -n kube-system \
      $(kubectl get pod -n kube-system -l app=ebs-csi-controller -o name | head -1) \
      -c csi-attacher --tail=50
    
    # 2. Check cloud-side: is the volume actually attached to the node?
    # AWS: aws ec2 describe-volumes --volume-ids vol-0abc123 | jq '.Volumes[].Attachments'
    
    # 3. If cloud shows detached but attachment object is stuck, delete and recreate:
    kubectl delete volumeattachment csi-abc123
    # The attacher will recreate it and trigger a fresh attach attempt

    Runbook: NodeGetVolumeStats Returns Abnormal — Volume Health Alert

    # PVC Event shows VolumeConditionAbnormal
    kubectl describe pvc <name> -n <ns>
    # Warning: VolumeConditionAbnormal ...
    
    # 1. Check what the driver is reporting
    kubectl logs -n kube-system <node-plugin-pod> -c external-health-monitor-agent | tail -50
    
    # 2. Check filesystem health from inside the pod
    kubectl exec -it <pod> -- dmesg | tail -20   # kernel I/O errors
    kubectl exec -it <pod> -- journalctl -k | grep -i "I/O error"
    
    # 3. If filesystem corruption is suspected:
    # - Take snapshot immediately (before further damage)
    # - Stop writes to the volume
    # - Run fsck on an unmounted clone of the snapshot

    Runbook: Node Volume Limit Reached — Pod Pending

    # Pod stuck Pending with:
    # "0/3 nodes are available: 3 node(s) exceed max volume count"
    
    # 1. Check current attachment count per node
    kubectl describe nodes | grep "attachable-volumes" -A 3
    
    # 2. Find which pods are using the most volumes on congested nodes
    kubectl get pods --all-namespaces -o json | \
      jq -r '.items[] | select(.spec.nodeName=="<node>") |
      "\(.metadata.namespace)/\(.metadata.name): \(.spec.volumes | map(select(.persistentVolumeClaim)) | length) PVCs"' | \
      sort -t: -k2 -rn | head -10
    
    # 3. Options:
    # a) Add more nodes to the cluster
    # b) Consolidate workloads (fewer, larger PVCs per pod)
    # c) Use EBS multi-attach (io1/io2) for shared access where appropriate
    # d) AWS: request limit increase via AWS Support for certain instance types

    Best Practices

    1. Always deploy the liveness-probe sidecar. Without it, a hung CSI driver process is invisible to Kubernetes — the pod appears healthy but all storage operations silently fail. Pair with a Prometheus alert on up{job="csi-driver"} == 0.
    2. Enable leader election on all controller sidecars when running more than one controller replica. Without it, you get duplicate volumes, duplicate snapshots, or conflicting resize operations.
    3. Use RBAC least-privilege for driver ServiceAccounts. The controller ServiceAccount does not need node access; the node ServiceAccount does not need PV create/delete. Follow the driver's recommended RBAC manifest exactly — don't grant cluster-admin to the driver.
    4. Pin sidecar versions to your Kubernetes version. The external-provisioner, external-attacher, etc. have compatibility matrices. A sidecar built for K8s 1.28 may use API fields removed or changed in 1.31. Check the sidecar compatibility table before upgrading Kubernetes.
    5. Set maxUnavailable: 1 on node plugin DaemonSet updates. Rolling the node plugin too fast (all nodes at once) means all nodes simultaneously lose the ability to perform new mount/unmount operations.
    6. Set fsGroupPolicy: File for block drivers. The default is driver-dependent. An unset or wrong policy means fsGroup in pod SecurityContext has no effect, and containers get permission denied on newly provisioned volumes.
    7. Pre-install the CSI driver before migrating in-tree volumes. If you upgrade Kubernetes past the in-tree removal version without having the CSI driver installed and migration enabled, existing PVs become unserviceable. There is no easy rollback.
    8. Monitor storage_operation_errors_total by operation name. A rate of >0 for provision, volume_attach, or volume_mount means real workloads are experiencing failures. These operations are synchronous blockers for pod startup — every error translates directly to pod latency or unavailability.