01-control-plane/05-cloud-controller-manager.html Port: :10258 (secure) Prerequisites: 04-kube-controller-manager.html Related: 02-node-components/06-node-lifecycle.html

cloud-controller-manager

The bridge between Kubernetes and cloud provider APIs — how nodes get addresses, load balancers get provisioned, routes get programmed, and zones get discovered. Covers the CCM architecture, the cloud provider interface, out-of-tree provider pattern, and all four built-in controllers.

Why cloud-controller-manager Exists

Before Kubernetes 1.6, cloud-specific logic (AWS ELB creation, GCE route management, Azure VM metadata) was compiled directly into kube-apiserver, kube-controller-manager, and kubelet. This created a tight coupling: every cloud provider had to submit code to the upstream Kubernetes repo, and a bug in the AWS provider could break GKE clusters.

CCM extracts all cloud-specific code into a separate binary that runs alongside — not inside — the core Kubernetes components. This enables:

Cloud providers to release fixes independently of the Kubernetes release cycle
Clusters to run without any cloud provider (bare-metal, on-premises)
Third-party cloud providers (Hetzner, DigitalOcean, OVH) to integrate without upstream changes
The core Kubernetes binary to shrink and stabilize

Process identity

Binary: cloud-controller-manager (cloud-provider-specific build)
Secure port: :10258 (HTTPS, metrics + healthz)
Leader-elected: one active instance per cluster
Kubeconfig: typically /etc/kubernetes/cloud-controller-manager.conf
Cloud credentials: IAM role (preferred), static key file, or Workload Identity
Deployed as: DaemonSet on control-plane nodes, or Deployment

What it manages

Node controller: cloud metadata on Node objects (addresses, zones, instance type)
Route controller: cloud VPC routing tables for pod CIDR blocks
Service controller: cloud load balancers for type: LoadBalancer Services
Cloud node lifecycle: detect and handle cloud instance termination

CCM Architecture and the Cloud Provider Interface

CCM implements the same reconciliation loop pattern as kube-controller-manager (see 04-kube-controller-manager.html § Reconciliation Loop). It watches Kubernetes API objects and calls cloud provider APIs to converge state.

The Cloud Provider Interface (Go)

Every CCM implementation must implement the cloudprovider.Interface Go interface from k8s.io/cloud-provider. This defines the contract between Kubernetes and any cloud platform.

// k8s.io/cloud-provider/cloud.go (simplified)
type Interface interface {
    Initialize(clientBuilder ControllerClientBuilder, stop <-chan struct{})
    LoadBalancer() (LoadBalancer, bool)      // nil, false if not supported
    Instances() (Instances, bool)            // VM metadata lookup
    InstancesV2() (InstancesV2, bool)        // newer V2 interface
    Zones() (Zones, bool)                    // zone/region metadata
    Clusters() (Clusters, bool)              // cluster management (rarely used)
    Routes() (Routes, bool)                  // VPC route table management
    ProviderName() string                    // "aws", "gce", "azure", etc.
    HasClusterID() bool
}

type LoadBalancer interface {
    GetLoadBalancer(ctx, clusterName, service) (*LoadBalancerStatus, bool, error)
    GetLoadBalancerName(ctx, clusterName, service) string
    EnsureLoadBalancer(ctx, clusterName, service, nodes) (*LoadBalancerStatus, error)
    UpdateLoadBalancer(ctx, clusterName, service, nodes) error
    EnsureLoadBalancerDeleted(ctx, clusterName, service) error
}

type Instances interface {
    NodeAddresses(ctx, nodeName) ([]NodeAddress, error)
    NodeAddressesByProviderID(ctx, providerID) ([]NodeAddress, error)
    InstanceID(ctx, nodeName) (string, error)
    InstanceType(ctx, nodeName) (string, error)
    InstanceTypeByProviderID(ctx, providerID) (string, error)
    InstanceExistsByProviderID(ctx, providerID) (bool, error)
    InstanceShutdownByProviderID(ctx, providerID) (bool, error)
}

The Four Built-in Controllers

Node Controller

The Node Controller populates cloud-specific metadata on Node objects after they register. When a new Node appears with spec.providerID set but no zone/address annotations, this controller calls the cloud API to fetch instance details and patches the Node.

Node registers (kubelet)

→

CCM watches Node

→

Cloud API: DescribeInstance

→

PATCH node.status + labels

Fields populated by the Node Controller:

# After CCM processes the new node:
status:
  addresses:
  - type: InternalIP
    address: "10.0.1.42"
  - type: ExternalIP
    address: "54.123.45.67"     # AWS: EIP or public IP
  - type: InternalDNS
    address: "ip-10-0-1-42.ec2.internal"
  - type: ExternalDNS
    address: "ec2-54-123-45-67.compute-1.amazonaws.com"
  - type: Hostname
    address: "ip-10-0-1-42.ec2.internal"

# Labels added by cloud provider
labels:
  topology.kubernetes.io/zone: "us-east-1a"
  topology.kubernetes.io/region: "us-east-1"
  node.kubernetes.io/instance-type: "m5.xlarge"
  failure-domain.beta.kubernetes.io/zone: "us-east-1a"   # legacy label
  failure-domain.beta.kubernetes.io/region: "us-east-1"  # legacy label

# ProviderID set by kubelet, used by CCM to look up instance
spec:
  providerID: "aws:///us-east-1a/i-0abc123def456"

▶ providerID Format

Each cloud has its own providerID format. AWS: aws:///<zone>/<instance-id>. GCE: gce://<project>/<zone>/<instance-name>. Azure: azure:///subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Compute/virtualMachines/<name>. The providerID is set by kubelet via --provider-id flag or auto-detected from the instance metadata service (IMDS).

Route Controller

Kubernetes requires every Pod to be directly reachable from every other Pod without NAT. On cloud platforms, this means each node's pod CIDR must be programmed into the cloud's VPC routing table so that inter-node pod traffic is routed correctly.

Node gets podCIDR assigned

→

CCM watches Node.spec.podCIDR

→

Cloud API: CreateRoute

→

VPC route: 10.244.1.0/24 → i-0abc123

# Example VPC route entry created by CCM (AWS)
# Destination: 10.244.1.0/24 (pod CIDR for node worker-1)
# Target:      i-0abc123def456 (EC2 instance ID for worker-1)

# This allows pod-on-worker-2 (10.244.2.5) to reach pod-on-worker-1 (10.244.1.3)
# without SNAT — traffic routes directly through the VPC.

# Check routes created by CCM
aws ec2 describe-route-tables --filter Name=tag:kubernetes.io/cluster/my-cluster,Values=owned \
  --query 'RouteTables[].Routes[]' | jq '.[] | select(.DestinationCidrBlock | startswith("10.244"))'

# On GCP:
gcloud compute routes list --filter="name~k8s-*"

▶ Route Controller vs CNI

Not all CNI plugins need the Route Controller. Flannel in VXLan mode encapsulates inter-node traffic in UDP — no VPC routes needed. Calico in BGP mode programs routes via BGP peers — no CCM involvement. The Route Controller is specifically for AWS VPC CNI, GCE native routing, and similar "native routing" setups where pod IPs are directly routable in the cloud network without encapsulation.

Service Controller (LoadBalancer Provisioning)

When a Service of type: LoadBalancer is created, the Service Controller calls the cloud's load balancer API to provision an external LB, then writes the allocated IP/hostname back to service.status.loadBalancer.ingress.

apiVersion: v1
kind: Service
metadata:
  name: nginx
  annotations:
    # AWS-specific annotations controlling LB behavior
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:us-east-1:123:certificate/abc"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "ssl"
    # GCP-specific
    cloud.google.com/load-balancer-type: "Internal"
    cloud.google.com/neg: '{"ingress": true}'
spec:
  type: LoadBalancer
  selector:
    app: nginx
  ports:
  - port: 443
    targetPort: 8443
status:
  loadBalancer:
    ingress:
    - hostname: a1b2c3d4e5.elb.amazonaws.com  # set by CCM after provisioning

Cloud Node Lifecycle Controller

This controller handles the case where a cloud VM is terminated externally (e.g., AWS terminates a spot instance, or an admin deletes a VM in the cloud console). Without this controller, the deleted VM's Node object would remain in the cluster indefinitely, and workloads wouldn't be rescheduled.

Cloud terminates VM

→

CCM: InstanceExistsByProviderID = false

→

Node tainted: node.cloudprovider.kubernetes.io/shutdown

→

Node deleted from API

# The cloud-node-lifecycle controller polls cloud API for each node
# If the instance no longer exists in the cloud:
# 1. Add taint: node.cloudprovider.kubernetes.io/shutdown:NoSchedule
# 2. After configurable delay, delete the Node object
# This triggers the node lifecycle controller in kcm to evict pods

# Check for nodes in shutdown state
kubectl get nodes -o wide | grep -v Ready
kubectl describe node <node> | grep -A5 "Taints:"

# Spot instance interruption handling (AWS):
# AWS sends 2-minute warning → IMDS endpoint /spot/termination-notice
# Cluster Autoscaler + cloud-node-lifecycle controller coordinate graceful drain

Out-of-Tree Provider Pattern

Every major cloud provider now ships its own CCM binary, versioned and released independently from Kubernetes core. The provider's CCM binary imports k8s.io/cloud-provider and implements the interface.

Cloud Provider	Repository	Notes
AWS	`kubernetes/cloud-provider-aws`	Classic ELB/NLB provisioning. AWS Load Balancer Controller (separate) handles ALB via Ingress
GCP	`kubernetes/cloud-provider-gcp`	GCE persistent disk, GKE integrated LB, Cloud NAT routes
Azure	`kubernetes-sigs/cloud-provider-azure`	Azure Load Balancer, Azure Disk/File CSI, AKS nodepools
OpenStack	`kubernetes/cloud-provider-openstack`	Octavia LB, Cinder volumes, Nova instance metadata
vSphere	`kubernetes/cloud-provider-vsphere`	vCenter VMs, vSAN storage, NSX-T networking
Hetzner	`hetznercloud/hcloud-cloud-controller-manager`	Community provider; Hcloud LB, Floating IPs
DigitalOcean	`digitalocean/digitalocean-cloud-controller-manager`	DO Load Balancers, Floating IPs
Bare metal / None	N/A	Run with `--cloud-provider=external` but no CCM deployed; route/LB controllers disabled

In-Tree to Out-of-Tree Migration

Clusters created before CCM existed may still use the in-tree cloud providers via --cloud-provider=aws on kube-apiserver and kube-controller-manager. This is deprecated since 1.29 and will be removed. Migration steps:

# Step 1: Update kube-apiserver and kube-controller-manager flags
# Remove: --cloud-provider=aws
# Add:    --cloud-provider=external

# Step 2: Deploy the out-of-tree CCM DaemonSet
# (example: AWS CCM via Helm)
helm repo add aws-cloud-controller-manager https://kubernetes.github.io/cloud-provider-aws
helm upgrade --install aws-cloud-controller-manager \
  aws-cloud-controller-manager/aws-cloud-controller-manager \
  --namespace kube-system \
  --set args={"--v=2","--cloud-provider=aws"}

# Step 3: Verify nodes get cloud metadata from CCM
kubectl get nodes -o yaml | grep -A 10 "addresses:"

# Step 4: CSI migration — replace in-tree volume plugins
# In-tree: kubernetes.io/aws-ebs (deprecated)
# Out-of-tree: ebs.csi.aws.com (CSI driver)
# Migration controlled by feature gate: CSIMigrationAWS=true (default in 1.23+)

# Check if CSI migration is active
kubectl get csidriver ebs.csi.aws.com
kubectl get storageclass | grep ebs

Kubelet and the --cloud-provider=external Flag

Kubelet must be told to wait for CCM to initialize the Node before marking it Ready for scheduling. Without this, pods may be scheduled to a node whose zone/address labels haven't been populated yet.

# kubelet flag required when using CCM:
--cloud-provider=external

# Effect: kubelet adds taint to newly registered node:
# node.cloudprovider.kubernetes.io/uninitialized:NoSchedule
# CCM Node Controller removes this taint after populating cloud metadata
# Only then does the scheduler consider the node for pod placement

# Verify the taint is removed after CCM processes the node
kubectl describe node new-node | grep -A5 Taints:
# Should show: <none> (or only user-defined taints)

# If node stays with uninitialized taint:
kubectl -n kube-system logs <ccm-pod> | grep "node controller"

⚠ Missing --cloud-provider=external

If kubelet is not started with --cloud-provider=external, nodes will be marked Ready immediately without the uninitialized taint. Pods may land on nodes whose zone labels are still empty, breaking topology-aware routing and zone-aware scheduling. Always set this flag on all kubelets when deploying CCM.

RBAC, Credentials, and Security

Kubernetes RBAC

CCM needs broad Kubernetes API access to watch and patch Nodes, Services, and Events. A ClusterRole is required:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:cloud-controller-manager
rules:
- apiGroups: [""]
  resources: [events]
  verbs: [create, patch, update]
- apiGroups: [""]
  resources: [nodes]
  verbs: [get, list, watch, delete, patch, update]
- apiGroups: [""]
  resources: [nodes/status]
  verbs: [patch]
- apiGroups: [""]
  resources: [services]
  verbs: [get, list, watch, patch]
- apiGroups: [""]
  resources: [services/status]
  verbs: [update, patch]
- apiGroups: [""]
  resources: [serviceaccounts]
  verbs: [create]
- apiGroups: [""]
  resources: [persistentvolumes]
  verbs: [get, list, update, watch]
- apiGroups: [""]
  resources: [endpoints]
  verbs: [create, get, list, watch, update]
- apiGroups: ["coordination.k8s.io"]
  resources: [leases]
  verbs: [get, create, update]

Cloud Provider Credentials

IAM Role (Recommended — AWS)

Attach an IAM role to the EC2 instances running the control plane. CCM uses the EC2 instance metadata service (IMDS) to fetch credentials. No key files on disk — credentials rotate automatically.

# Required IAM permissions for AWS CCM:
ec2:DescribeInstances
ec2:DescribeRegions
ec2:DescribeRouteTables
ec2:CreateRoute
ec2:DeleteRoute
ec2:ModifyInstanceAttribute
elasticloadbalancing:* (for Service controller)
autoscaling:DescribeAutoScalingGroups

Workload Identity (GCP)

GKE uses Workload Identity to bind a Kubernetes ServiceAccount to a GCP ServiceAccount. CCM's pod SA is annotated with the GCP SA — no JSON key files needed.

# GKE Workload Identity annotation
metadata:
  annotations:
    iam.gke.io/gcp-service-account: \
      ccm@my-project.iam.gserviceaccount.com

# Required GCP roles:
roles/compute.instanceAdmin.v1
roles/compute.networkAdmin
roles/iam.serviceAccountUser

Azure Managed Identity

AKS uses Managed Identity (system-assigned or user-assigned) for the CCM. No client secrets — IMDS provides a token. Ensure the managed identity has Contributor on the MC_ resource group and Network Contributor on the VNet.

# Azure CCM config (cloud-config secret)
apiVersion: v1
kind: Secret
metadata:
  name: cloud-config
  namespace: kube-system
data:
  cloud-config: |
    {
      "cloud": "AzurePublicCloud",
      "useManagedIdentityExtension": true,
      "subscriptionId": "...",
      "resourceGroup": "...",
      "vnetName": "..."
    }

Static Credentials (Not Recommended)

A JSON/YAML config file containing API keys, client secrets, or access keys mounted as a Secret into the CCM pod. Acceptable for dev/test but not production — credentials don't auto-rotate and are stored in etcd (encrypted at rest required).

# Flag to pass static credentials
--cloud-config=/etc/kubernetes/cloud-config.json

# Ensure Secrets are encrypted at rest!
# See 01-kube-apiserver.html § Encryption at Rest

Deployment Patterns

DaemonSet on Control Plane Nodes (most common)

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cloud-controller-manager
  namespace: kube-system
spec:
  selector:
    matchLabels:
      component: cloud-controller-manager
  template:
    metadata:
      labels:
        component: cloud-controller-manager
    spec:
      serviceAccountName: cloud-controller-manager
      tolerations:
      - key: node-role.kubernetes.io/control-plane   # run on control-plane nodes
        effect: NoSchedule
      - key: node.cloudprovider.kubernetes.io/uninitialized  # tolerate uninitialized nodes
        effect: NoSchedule
        value: "true"
      nodeSelector:
        node-role.kubernetes.io/control-plane: ""
      priorityClassName: system-cluster-critical
      hostNetwork: true          # access cloud IMDS on link-local (169.254.169.254)
      containers:
      - name: cloud-controller-manager
        image: registry.k8s.io/provider-aws/cloud-controller-manager:v1.30.0
        command:
        - /bin/aws-cloud-controller-manager
        - --cloud-provider=aws
        - --leader-elect=true
        - --use-service-account-credentials=true
        - --configure-cloud-routes=true  # enable Route Controller
        - --v=2

▶ Why tolerations for uninitialized nodes?

The CCM pod itself must be able to run on nodes that have the node.cloudprovider.kubernetes.io/uninitialized taint — otherwise there's a chicken-and-egg deadlock: the taint prevents pods from running, but the CCM pod (which removes the taint) can't be scheduled. The hostNetwork: true is needed to reach the cloud IMDS at 169.254.169.254, which is link-local and not routable through the pod network.

Configuration Reference

Flag	Default	Purpose
`--cloud-provider`	—	Cloud provider name (must match ProviderName() implementation)
`--cloud-config`	—	Path to cloud provider config file (credentials, region, VPC IDs)
`--leader-elect`	true	Enable leader election; only one CCM active at a time
`--use-service-account-credentials`	false	Use individual SA tokens per controller (mirrors kcm behavior)
`--configure-cloud-routes`	true	Enable Route Controller (set false for encapsulated CNIs like Calico VXLAN)
`--cluster-name`	—	Cluster name used to tag cloud resources; must match the value used when provisioning
`--cidr-allocator-type`	RangeAllocator	CIDR allocation strategy: RangeAllocator or CloudAllocator
`--cluster-cidr`	—	Pod CIDR range (required for Route Controller)
`--allocate-node-cidrs`	false	Enable CIDR allocation by CCM instead of kube-controller-manager
`--node-sync-period`	10s	How often Node Controller polls cloud API for node metadata
`--route-reconciliation-period`	10s	How often Route Controller reconciles VPC routes
`--concurrent-service-syncs`	1	Number of Service LB provisioning goroutines
`--v`	2	Log verbosity. 4+ shows cloud API calls

Metrics and Alerting

# Scrape CCM metrics
curl -sk https://localhost:10258/metrics

# Key metrics
cloudprovider_aws_api_request_duration_seconds{request="DescribeInstances"}
cloudprovider_aws_api_request_errors_total{request="CreateLoadBalancer"}

# Controller-generic metrics (same workqueue metrics as kcm)
workqueue_depth{name="cloud-node"}
workqueue_depth{name="service"}
workqueue_depth{name="cloud-node-lifecycle"}
workqueue_retries_total{name="service"}

# Load balancer specific (varies by provider)
cloudprovider_aws_api_throttled_requests_total   # AWS throttling events
cloud_provider_reconcile_attempts_total{provider="aws",controller="service"}

# Node initialization health
# Monitor: nodes stuck with uninitialized taint > 2 minutes
kubectl get nodes -o json | jq -r '
  .items[] |
  select(.spec.taints != null) |
  select(.spec.taints[] | .key == "node.cloudprovider.kubernetes.io/uninitialized") |
  .metadata.name
'

Prometheus Alerting Rules

groups:
- name: cloud-controller-manager
  rules:
  - alert: CCMDown
    expr: absent(up{job="cloud-controller-manager"} == 1)
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "cloud-controller-manager is down"
      description: "No CCM is running. LoadBalancer Services will not be provisioned and new nodes will stay uninitialized."

  - alert: CCMNodeUninitializedStuck
    expr: |
      kube_node_spec_taint{key="node.cloudprovider.kubernetes.io/uninitialized"} > 0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Node stuck with cloud-uninitialized taint"
      description: "Node {{ $labels.node }} has been uninitialized for >5m. Check CCM logs."

  - alert: CCMCloudAPIErrors
    expr: rate(cloudprovider_aws_api_request_errors_total[5m]) > 0.1
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High CCM cloud API error rate"
      description: "CCM is seeing {{ $value }} cloud API errors/s on {{ $labels.request }}."

  - alert: CCMLoadBalancerSyncFailed
    expr: rate(workqueue_retries_total{job="cloud-controller-manager",name="service"}[5m]) > 0.5
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Service LB sync failing repeatedly"
      description: "CCM service controller is retrying LB sync at {{ $value }}/s."

Troubleshooting

Service type LoadBalancer stuck in <pending>

# Check CCM is running
kubectl -n kube-system get pods -l component=cloud-controller-manager

# Check CCM logs for the service
kubectl -n kube-system logs -l component=cloud-controller-manager | grep -i "service\|loadbalancer\|error" | tail -30

# Check service events
kubectl describe service my-svc | grep -A 10 Events:
# Look for: "Error creating load balancer", "Timeout", "Throttling"

# AWS: check IAM permissions
# "AccessDenied" errors indicate missing IAM policy on control plane node role
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::ACCOUNT:role/k8s-control-plane \
  --action-names "elasticloadbalancing:CreateLoadBalancer" \
  --resource-arns "*"

# GCP: check service account permissions
gcloud projects get-iam-policy my-project \
  --flatten="bindings[].members" \
  --filter="bindings.members:serviceAccount:ccm@my-project.iam.gserviceaccount.com"

Nodes stuck with cloud-uninitialized taint

# Check CCM logs for node initialization
kubectl -n kube-system logs <ccm-pod> | grep "node controller\|initialize\|providerID"

# Verify providerID is set on the node (required for CCM lookup)
kubectl get node <node-name> -o jsonpath='{.spec.providerID}'
# Should be: aws:///us-east-1a/i-0abc123def456
# If empty, kubelet is missing --provider-id or --cloud-provider=external

# Check IMDS access from CCM pod (for IAM role fetching)
kubectl -n kube-system exec <ccm-pod> -- \
  curl -s http://169.254.169.254/latest/meta-data/instance-id
# Must return an instance ID; failure = network/IMDS issue

# Check if Node has valid providerID format for the cloud
kubectl get nodes -o json | jq -r '.items[] | "\(.metadata.name): \(.spec.providerID)"'

VPC routes not created / pod cross-node connectivity broken

# Verify Route Controller is enabled
kubectl -n kube-system logs <ccm-pod> | grep -i "route"

# Check nodes have podCIDR assigned
kubectl get nodes -o json | jq -r '.items[] | "\(.metadata.name): \(.spec.podCIDR)"'

# AWS: verify routes in VPC route table
aws ec2 describe-route-tables \
  --filter Name=vpc-id,Values=vpc-0abc123 \
  --query 'RouteTables[].Routes[?InstanceId!=null]'

# If using Calico VXLAN or Flannel VXLAN, routes are NOT needed:
# Set --configure-cloud-routes=false to disable Route Controller

# Check if cloud CIDR conflicts with VPC CIDR
# Pod CIDR should not overlap with VPC subnet CIDRs

Cloud API rate limiting / throttling

# AWS: EC2 API has per-region throttling limits
# Reduce polling frequency:
# --node-sync-period=30s (default 10s)
# --route-reconciliation-period=30s

# Check for throttling in CCM logs
kubectl -n kube-system logs <ccm-pod> | grep -i "throttl\|rateLim\|RequestLimitExceeded"

# AWS: request a limit increase for EC2 API calls via AWS console

# GCP: Cloud Resource Manager API quotas
# Enable API rate limiting in cloud config:
# rateLimitConfig:
#   cloudProviderRateLimit: true
#   cloudProviderRateLimitQPS: 3
#   cloudProviderRateLimitBucket: 5

# Monitor cloud API call rate
kubectl get --raw /metrics | grep cloudprovider_.*_api_request_duration

On-Premises Clusters Without CCM

Clusters running on bare metal, VMware, or private cloud without a CCM must handle the functionality themselves or accept limitations.

Load Balancer alternatives

MetalLB: BGP or L2 mode load balancer for bare metal. Acts as a CCM Service controller substitute. Watches type: LoadBalancer Services and allocates IPs from configured address pools.
kube-vip: VIP-based LB using ARP/BGP. Works well for small clusters.
External DNS + NodePort: Route traffic to NodePort services via external DNS A records pointing to node IPs.

Node metadata alternatives

Without CCM, manually label nodes with topology information:
kubectl label node worker-1 topology.kubernetes.io/zone=dc1-rack-a
kubectl label node worker-1 topology.kubernetes.io/region=dc1
Or use Node Feature Discovery (NFD) to auto-label nodes based on CPU features, PCI devices, and kernel capabilities.

Route controller alternatives

On bare metal, pod routing is handled entirely by the CNI plugin:
Calico BGP: advertises pod CIDRs via BGP to ToR switches
Cilium: eBPF-based routing without VPC routes needed
Flannel VXLAN: encapsulates pod traffic, no underlay routing required

Node deletion handling

Without CCM's cloud-node-lifecycle controller, deleted VMs leave stale Node objects. Use node problem detector + custom scripts, or run kubectl delete node manually. Some node managers (like Cluster Autoscaler) have built-in cleanup logic.

Production Best Practices

Use IAM roles / Workload Identity, never static credentials

Static credentials in a Secret or config file require manual rotation, are stored in etcd, and are a security liability. IAM roles (AWS), Workload Identity (GCP), or Managed Identity (Azure) auto-rotate and follow least-privilege without keys on disk.

Match CCM version to Kubernetes version

CCM follows the same N±1 skew policy as other control plane components. Deploy the CCM version that matches your Kubernetes minor version. Cloud providers typically release a new CCM tag within days of a Kubernetes release.

Disable Route Controller for overlay CNIs

If using Calico VXLAN, Flannel VXLAN, or Cilium with VXLAN/Geneve encapsulation, set --configure-cloud-routes=false. Creating unused VPC routes wastes cloud API quota and can cause routing conflicts if pod CIDRs overlap with subnet CIDRs.

Run on control-plane nodes with system-cluster-critical priority

CCM must run before worker nodes can be initialized. Place it on control-plane nodes (taints tolerated) with priorityClassName: system-cluster-critical to prevent eviction during resource pressure.

Monitor node initialization latency

Alert if any node has the uninitialized taint for more than 2 minutes. This indicates CCM is having trouble reaching the cloud API (throttling, IAM permission issue, or CCM pod is not running). Nodes stuck uninitialized cannot receive workloads.

Use Service annotations for LB customization

Cloud-specific Service annotations (AWS, GCP, Azure) control LB type (NLB vs CLB), scheme (internal vs internet-facing), health check paths, SSL certificates, and more. Document your organization's standard LB annotations in a runbook — they vary significantly between clouds.

Plan for LB provisioning latency

Cloud LB provisioning takes 30 seconds to 3 minutes depending on provider and LB type. Do not treat a newly-created Service's EXTERNAL-IP as instantly ready. Use health check endpoints and readiness gates in your CI/CD pipeline rather than a fixed sleep.