What Is the Control Plane?

The control plane is the set of processes that implement the desired-state management loop of the cluster. It never runs your workloads; it only watches, decides, and instructs. Every cluster mutation flows through the control plane, and every node periodically reports status back to it.

Design Principle
The control plane is the "brain" of Kubernetes. It is intentionally separated from the data plane (where your workloads run) so that node failures never corrupt cluster state, and control-plane upgrades don't interrupt running containers.

The Five Core Components

kube-apiserver

:6443 (HTTPS)

The single, authoritative REST API gateway for the entire cluster. All state reads and writes go through it. Stateless horizontally scalable. Backs every object into etcd.

etcd

:2379 (client) · :2380 (peer)

Strongly consistent, distributed key-value store. The ground truth for all cluster state. Uses Raft consensus. Loss of etcd = loss of the cluster.

kube-scheduler

:10259 (HTTPS metrics/healthz)

Watches for unscheduled Pods and assigns each to a Node. Two-phase: Filter (which nodes can run this pod?) → Score (which is optimal?). Highly extensible via plugins.

kube-controller-manager

:10257 (HTTPS metrics/healthz)

Runs 30+ reconciliation loops (controllers) in a single binary. Each controller drives actual cluster state toward desired state. Node controller, ReplicaSet controller, Job controller, etc.

cloud-controller-manager

:10258 (HTTPS metrics/healthz)

Optional. Integrates with cloud provider APIs for LoadBalancer provisioning, Node address annotation, Route programming. Decouples cloud logic from core controllers.

Control Plane Architecture Diagram

kubectl / API clients VIP / Load Balancer kube-apiserver :6443 (HTTPS) etcd Cluster (3 members) etcd-0 leader etcd-1 follower etcd-2 follower gRPC+TLS kube-scheduler :10259 (metrics) kube-controller -manager :10257 (metrics) cloud-controller -manager :10258 (metrics) Watch/Patch Watch/Update Worker Nodes (N) kubelet :10250 kube-proxy :10256 containerd Pod A Pod B Pod C apiserver→kubelet (exec/log/portfwd) Legend etcd gRPC Watch/Update (HTTP/2) Controller reconcile Cloud API calls kubelet reporting

Figure 1: Control plane component communication topology. All external and internal traffic routes through kube-apiserver. etcd is only accessible by apiserver. The scheduler and controllers only talk to the apiserver — never directly to nodes.

Communication Matrix

Understanding who talks to whom, over what protocol, on what port, authenticated how is critical for firewall rules, mTLS policy, and audit log interpretation.

SourceDestinationPort(s)ProtocolAuthN MethodDirectionNotes
kubectl / API clientskube-apiserver6443HTTPS (TLS 1.3)x509 cert / token / OIDC→ CPExternal-facing. Must be behind LB in HA.
kube-apiserveretcd2379gRPC / TLSmTLS (client cert)→ etcdOnly apiserver talks to etcd. Dedicated cert.
kube-schedulerkube-apiserver6443HTTPS + HTTP/2in-cluster ServiceAccount / kubeconfig→ CPWatch Pods (unscheduled), patch Pod.Spec.NodeName
kube-controller-managerkube-apiserver6443HTTPS + HTTP/2in-cluster ServiceAccount / kubeconfig→ CPEach controller has separate SA + RBAC
cloud-controller-managerkube-apiserver6443HTTPS + HTTP/2in-cluster ServiceAccount / kubeconfig→ CPAlso calls cloud provider API externally
kubeletkube-apiserver6443HTTPS + HTTP/2TLS bootstrap → node cert→ CPNode authn group system:nodes
kube-apiserverkubelet10250HTTPSapiserver presents client cert to kubelet→ Nodeexec, log, portforward, attach
kube-proxykube-apiserver6443HTTPS + HTTP/2ServiceAccount / kubeconfig→ CPWatches Services and EndpointSlices
etcd leaderetcd followers2380gRPC / TLSmTLS (peer cert)intra-etcdRaft replication and heartbeats
kube-apiserver (HA)kube-apiserver peersindependentAPIservers are stateless; they don't talk to each other. LB distributes.
Critical Security Rule
etcd must NEVER be accessible from outside the control-plane network. Firewall TCP 2379 and 2380 to only apiserver IPs. Direct etcd access bypasses all Kubernetes RBAC and audit logging.

Component Roles — Internal Mechanics

kube-apiserver — The API Gateway

The apiserver is the only component that reads from and writes to etcd. It is completely stateless: all state lives in etcd. Multiple apiserver replicas can run simultaneously because they share etcd as the source of truth.

Internal request lifecycle (full detail in 04-kubernetes-api-model.html §Request Lifecycle):

TLS Termination
Authentication
Authorization (RBAC)
Mutating Admission
Schema Validation
Validating Admission
etcd persist

The apiserver also proxies requests to webhooks (MutatingAdmissionWebhook, ValidatingAdmissionWebhook) and to aggregated API servers (metrics-server, custom API servers via APIService). It handles Watch via long-lived HTTP/2 streaming connections — see the Informer architecture in 04-kubernetes-api-model.html §Informers.

Key apiserver flags

# Minimal production-relevant flags
kube-apiserver \
  --advertise-address=                  # IP used by kubelet to reach apiserver
  --bind-address=0.0.0.0                            # Listen on all interfaces
  --secure-port=6443                                # HTTPS port (default)
  --etcd-servers=https://etcd-0:2379,https://etcd-1:2379,https://etcd-2:2379
  --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
  --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
  --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
  --client-ca-file=/etc/kubernetes/pki/ca.crt
  --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
  --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
  --authorization-mode=Node,RBAC
  --enable-admission-plugins=NodeRestriction,PodSecurity,MutatingAdmissionWebhook,ValidatingAdmissionWebhook
  --service-cluster-ip-range=10.96.0.0/12
  --service-node-port-range=30000-32767
  --allow-privileged=false
  --audit-log-path=/var/log/kubernetes/audit.log
  --audit-policy-file=/etc/kubernetes/audit-policy.yaml
  --feature-gates=...                               # feature flags
  --max-requests-inflight=800                       # concurrency limit
  --max-mutating-requests-inflight=400
  --request-timeout=60s

etcd — The Ground Truth

etcd is a distributed key-value store using the Raft consensus algorithm. It guarantees strong consistency — every read reflects the most recently committed write (given quorum). Kubernetes stores every API object as a protobuf-encoded value at a path like /registry/{group}/{resource}/{namespace}/{name}.

Raft requires a quorum of ⌊n/2⌋ + 1 members. A 3-member cluster tolerates 1 failure (quorum = 2). A 5-member cluster tolerates 2 failures (quorum = 3). Always run an odd number of members.

etcd Raft Leader Election Flow
  1. All members start as Followers. Each has a randomized election timeout (150–300ms).
  2. When a Follower doesn't receive a heartbeat within its timeout, it transitions to Candidate and increments its term.
  3. The Candidate sends RequestVote RPCs to all peers. Each peer grants one vote per term to the first Candidate that has an up-to-date log.
  4. If the Candidate receives votes from a majority, it becomes Leader.
  5. The Leader sends periodic heartbeat AppendEntries RPCs (≤50ms) to prevent elections.
  6. All client writes (from apiserver) go to the Leader. It appends the entry to its log, replicates to followers, and commits once a majority acknowledges.
# Inspect etcd leader
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/peer.crt \
  --key=/etc/kubernetes/pki/etcd/peer.key \
  endpoint status --write-out=table

# Output columns: ENDPOINT, ID, VERSION, DB SIZE, IS LEADER, RAFT TERM, RAFT INDEX

kube-scheduler — Two-Phase Placement

The scheduler watches the apiserver for Pods in Pending state with spec.nodeName == "". For each such pod, it runs a two-phase algorithm:

Phase 1: Filter (Predicates)

Eliminate nodes that cannot run this pod. Built-in filter plugins include:

  • NodeResourcesFit — CPU/memory requests must fit
  • NodeSelector — nodeSelector labels must match
  • NodeAffinity — requiredDuringScheduling rules
  • TaintToleration — pod must tolerate all taints
  • PodTopologySpread — spread constraints
  • VolumeBinding — PVCs must be bindable to node
  • InterPodAffinity — pod anti-affinity hard rules
  • NodePorts — host ports must be free

Phase 2: Score (Priorities)

Rank remaining feasible nodes. Built-in score plugins include:

  • LeastAllocated — prefer nodes with most free resources
  • BalancedAllocation — balance CPU vs memory
  • NodeAffinityPriority — preferredDuringScheduling
  • InterPodAffinityPriority — soft affinity/anti-affinity
  • ImageLocality — prefer nodes with image already pulled
  • TaintToleration — deprioritize tainted nodes

Highest-scoring node wins. Scheduler writes pod.spec.nodeName via a Bind operation.

The scheduler uses the Scheduling Framework (introduced v1.15, stable v1.19) — a plugin-based system where all logic is expressed as plugins implementing extension points: PreFilter, Filter, PostFilter, PreScore, Score, NormalizeScore, Reserve, Permit, PreBind, Bind, PostBind.

Scheduler Internals: Scheduling Queue and Backoff

The scheduler maintains a priority queue (heap-based) ordered by pod priority class. When a pod fails scheduling, it goes into a backoff queue with exponential backoff (2s → 4s → 8s → max 10s). Failed pods are retried when cluster state changes (node added, resource freed, taint removed).

# Check scheduling failures
kubectl get events --field-selector reason=FailedScheduling
kubectl describe pod    # "Events" section shows scheduler decisions

# Enable verbose scheduler logs
kube-scheduler --v=10  # Logs filter/score results per node

# Inspect scheduler metrics
curl -sk https://localhost:10259/metrics | grep scheduler_

kube-controller-manager — The Reconciliation Engine

The kube-controller-manager runs 30+ controllers in a single binary with a single process. Each controller is an independent goroutine running a reconciliation loop:

// Generic reconciliation loop pattern (all controllers follow this)
for {
    desired := readDesiredState(apiserver)  // via Informer cache
    actual  := readActualState(apiserver)   // via Informer cache
    if desired != actual {
        makeChanges(apiserver)              // Create/Update/Delete child objects
    }
    // Sleep or wait for next Informer event
}

The key controllers and what they manage:

ControllerWatchesCreates / ManagesKey Action
ReplicationControllerRC objects + PodsPodsMaintain pod count == spec.replicas
ReplicaSet controllerRS objects + PodsPodsMaintain pod count, owns pods via ownerRef
Deployment controllerDeployments + RSReplicaSetsManage rolling updates, rollbacks
StatefulSet controllerStatefulSets + Pods + PVCsPods, PVCsOrdered pod creation, stable network IDs
DaemonSet controllerDaemonSets + Nodes + PodsPods (one per node)Schedule pod to every matching node
Job controllerJobs + PodsPodsRun pods to completion, handle retries
CronJob controllerCronJobsJobsCreate Job objects on schedule
Node controllerNodesTaint nodes unreachable after 40s, evict pods after 5min
Endpoints controllerServices + PodsEndpointsUpdate endpoints when pod IPs change (legacy; EndpointSlice preferred)
EndpointSlice controllerServices + PodsEndpointSlicesScale-friendly endpoint tracking
Namespace controllerNamespacesDelete all objects when namespace is deleted
ServiceAccount controllerNamespacesServiceAccountsCreate default SA in every namespace
Token controllerServiceAccounts + SecretsSecrets (token)Create SA token secrets (pre-v1.22 legacy)
PersistentVolume controllerPVs + PVCsBind PVCs to PVs, handle reclaim policies
ResourceQuota controllerResourceQuotasTrack and enforce quota usage
GarbageCollection controllerAll objectsDelete orphaned objects via ownerReferences
TTLAfterFinished controllerJobsDelete finished Jobs after TTL
Single Binary, Multiple Controllers
All controllers share one process and one process-wide leader election Lease. If the kube-controller-manager pod restarts, all controllers restart together. This simplifies deployment but means a single bug can impact all controllers. The --controllers flag can selectively disable specific controllers (e.g., --controllers=-ttl).

cloud-controller-manager — Cloud Integration

Introduced to decouple cloud-provider-specific logic from the core Kubernetes binary. Before its introduction (pre-v1.6), every cloud provider patched their logic directly into kube-controller-manager and kubelet, creating a monolithic and slow release cycle.

The CCM runs three cloud-specific controllers:

High Availability Control Plane

Stacked vs External etcd Topology

Stacked etcd (Default kubeadm)

etcd runs on the same nodes as the control plane components. Simpler to manage. Used by default in kubeadm clusters.

  • Each CP node: apiserver + scheduler + controller-manager + etcd
  • etcd cluster: 3 members on 3 CP nodes
  • Risk: Losing a CP node loses both a control plane member AND an etcd member simultaneously
  • Minimum: 3 CP nodes for quorum tolerance

External etcd

etcd runs on separate dedicated nodes. More resilient, harder to operate.

  • CP nodes: apiserver + scheduler + controller-manager (no etcd)
  • etcd nodes: 3 or 5 dedicated etcd members
  • CP node failure doesn't affect etcd quorum
  • Requires more nodes: minimum 3 CP + 3 etcd = 6 nodes
  • Recommended for production clusters > 50 nodes

apiserver HA — Stateless Horizontal Scale

Since the apiserver is stateless (all state in etcd), you can run any number of replicas. A TCP load balancer (or VIP via keepalived/haproxy) distributes client connections across all healthy apiserver instances. There is no leader election for apiservers — all replicas serve reads and writes concurrently.

# Verify HA apiserver (kubeadm)
kubectl get pods -n kube-system -l component=kube-apiserver

# Check which apiserver is serving your request
kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}'

# Each kubeconfig points to the VIP; inspect apiserver endpoints
kubectl get endpoints kubernetes -n default

Scheduler and Controller-Manager Leader Election

Unlike the apiserver, the scheduler and controller-manager are NOT safe to run as multiple active instances — two schedulers could assign the same pod to two nodes. They use Kubernetes Lease-based leader election:

The leader election mechanism:

  1. Each replica tries to acquire a Lease object in the kube-system namespace.
  2. The Lease has a leaseDurationSeconds (default 15s) and a renewDeadlineSeconds (default 10s).
  3. The leader continuously renews the Lease by updating renewTime.
  4. If the leader fails to renew within leaseDuration, another replica acquires the Lease and becomes leader.
  5. Non-leaders sleep and periodically attempt to acquire the Lease.
# Check scheduler leader
kubectl get lease kube-scheduler -n kube-system -o yaml
# holderIdentity: kube-scheduler-node1_abc-uuid

# Check controller-manager leader
kubectl get lease kube-controller-manager -n kube-system -o yaml

# Watch for leadership changes
kubectl get lease -n kube-system --watch

# Controller-manager leader election flags
kube-controller-manager \
  --leader-elect=true \
  --leader-elect-lease-duration=15s \
  --leader-elect-renew-deadline=10s \
  --leader-elect-retry-period=2s
Split-Brain Risk
If the leader election Lease is not renewed due to a network partition (not a process crash), both old and new leaders may believe they are active for up to leaseDurationSeconds. This is the "split-brain" window. Kubernetes mitigates this with optimistic locking (resourceVersion conflicts) but double-scheduling is theoretically possible during this window.

Static Pods — How Control Plane Components Run

On clusters provisioned by kubeadm, all control plane components (apiserver, etcd, scheduler, controller-manager) run as static pods. Static pods are managed directly by the kubelet on the node — not by the apiserver or any controller.

The kubelet's --pod-manifest-path (default: /etc/kubernetes/manifests/) contains YAML files. The kubelet watches this directory and creates/restarts pods whenever files are added, modified, or deleted.

# Static pod manifests location (kubeadm)
ls /etc/kubernetes/manifests/
# etcd.yaml
# kube-apiserver.yaml
# kube-controller-manager.yaml
# kube-scheduler.yaml

# Static pods appear as mirror pods in the API server
# (prefixed with node name, e.g. kube-apiserver-control-plane-1)
kubectl get pods -n kube-system

# Editing a static pod manifest immediately restarts the component
# (kubelet detects inotify change and recreates the pod)
vim /etc/kubernetes/manifests/kube-apiserver.yaml
Static Pod Bootstrap Problem
The kube-apiserver runs as a static pod managed by kubelet. But the kubelet itself needs to communicate with the apiserver for many operations. This is the chicken-and-egg bootstrap: kubelet can start and manage static pods before the apiserver is running. Once the apiserver comes up, the kubelet registers the static pods as "mirror pods" (read-only copies in etcd). The kubelet continues to manage these pods locally — deleting the mirror pod from the API server does NOT delete the actual static pod.

Component Startup Order and Dependencies

1. etcd
Must be fully up and serving before any other component starts
2. kube-apiserver
Connects to etcd. Other components cannot start without a healthy apiserver.
3. kube-controller-manager
Connects to apiserver. Acquires leader election Lease.
4. kube-scheduler
Connects to apiserver. Acquires leader election Lease. Can start before or after controller-manager.
5. cloud-controller-manager
Optional. Connects to apiserver and cloud provider API.
6. kubelet (on each node)
Registers Node object. Requires apiserver to be reachable.
7. kube-proxy (on each node)
Watches Services and EndpointSlices. Requires apiserver.
8. CoreDNS (addon)
Scheduled as pods. Requires scheduler and kubelet to be running.

PKI and Certificate Architecture

Every communication in the control plane uses TLS. The control plane PKI is a hierarchy of Certificate Authorities managed by kubeadm (or manually for more complex setups).

Kubernetes Root CA /etc/kubernetes/pki/ca.crt etcd CA /etc/kubernetes/pki/etcd/ca.crt Front-Proxy CA /etc/kubernetes/pki/front-proxy-ca.crt Kubernetes CA (same as root in kubeadm) apiserver.crt SANs: localhost, kubernetes.default.svc, VIP apiserver-kubelet-client.crt apiserver→kubelet authn CN: kube-apiserver-kubelet-client controller-manager.crt CN: system:kube-controller-manager scheduler.crt CN: system:kube-scheduler etcd-server.crt etcd TLS serving cert apiserver-etcd-client.crt apiserver mTLS to etcd
# View all cluster certificates and their expiry
kubeadm certs check-expiration

# Renew all certificates (kubeadm)
kubeadm certs renew all

# View cert details
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep -A 5 "Subject Alternative"
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -dates

# Check controller-manager cert
openssl x509 -in /etc/kubernetes/pki/apiserver-etcd-client.crt -noout -text | grep "Subject:"

Health Check Endpoints and Monitoring

Every control plane component exposes health check endpoints. These are polled by load balancers, monitoring systems, and readiness probes in the static pod manifests.

ComponentEndpointPortExpected Response
kube-apiserver/healthz6443ok
kube-apiserver/readyz6443ok (all checks pass)
kube-apiserver/livez6443ok
kube-apiserver/metrics6443Prometheus text format
kube-scheduler/healthz10259ok
kube-scheduler/metrics10259Prometheus text format
kube-controller-manager/healthz10257ok
kube-controller-manager/metrics10257Prometheus text format
etcd/health2379{"health":"true"}
etcd/metrics2381Prometheus text format
# Check apiserver health from within cluster
kubectl get --raw /healthz
kubectl get --raw /readyz
kubectl get --raw /livez
kubectl get --raw /readyz?verbose   # Shows each check's status

# Individual readiness checks
kubectl get --raw /readyz/poststarthook/rbac/bootstrap-roles
kubectl get --raw /readyz/etcd

# From node directly
curl -sk https://localhost:6443/healthz --cert /etc/kubernetes/pki/admin.crt --key /etc/kubernetes/pki/admin.key

# Check scheduler and controller-manager from CP node
curl -sk https://127.0.0.1:10259/healthz
curl -sk https://127.0.0.1:10257/healthz

# etcd health check
ETCDCTL_API=3 etcdctl endpoint health \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key

Critical Metrics to Monitor

MetricComponentAlert ThresholdMeaning
apiserver_request_duration_secondsapiserverp99 > 1sAPI request latency by verb/resource
apiserver_request_totalapiservererror rate > 1%Total requests by code/verb/resource
etcd_request_duration_secondsetcdp99 > 100msetcd operation latency (critical for apiserver throughput)
etcd_server_leader_changes_seen_totaletcd> 0 in 5minetcd leader elections (indicates instability)
etcd_mvcc_db_total_size_in_bytesetcd> 8GBetcd database size (default quota 2GB, configurable to 8GB)
scheduler_schedule_attempts_totalschedulerunschedulable > 0Pods that couldn't be scheduled
scheduler_pending_podsscheduler> 50 sustainedPods waiting to be scheduled
workqueue_depthcontroller-manager> 100 sustainedController reconciliation queue backlog
workqueue_queue_duration_secondscontroller-managerp99 > 5sTime items wait in controller work queue
apiserver_current_inflight_requestsapiservernear max-requests-inflightConcurrency pressure on apiserver

Troubleshooting the Control Plane

apiserver Not Responding

# Step 1: Check if the static pod is running on CP node
ssh cp-node-1
crictl ps | grep kube-apiserver

# Step 2: Check kubelet logs (manages static pods)
journalctl -u kubelet --since "10 minutes ago" | grep apiserver

# Step 3: Check apiserver logs directly
kubectl logs -n kube-system kube-apiserver-cp-node-1
# Or directly via crictl:
crictl logs $(crictl ps --name kube-apiserver -q)

# Step 4: Check manifest for syntax errors
cat /etc/kubernetes/manifests/kube-apiserver.yaml | python3 -c "import sys,yaml;yaml.safe_load(sys.stdin)"

# Step 5: Verify etcd connectivity
ETCDCTL_API=3 etcdctl endpoint health ...
# If etcd is down → apiserver will not serve writes (reads from cache possible)

# Step 6: Check certificates
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -dates
kubeadm certs check-expiration

Scheduler Not Scheduling Pods

# Check if scheduler is running
kubectl get pods -n kube-system -l component=kube-scheduler

# Check leader election
kubectl get lease kube-scheduler -n kube-system

# Check scheduler logs for specific pod
kubectl logs -n kube-system kube-scheduler-cp-node-1 | grep "Failed to schedule"
kubectl logs -n kube-system kube-scheduler-cp-node-1 | grep 

# Get scheduling failure events
kubectl get events --field-selector reason=FailedScheduling --all-namespaces

# Describe the unscheduled pod
kubectl describe pod    # Look at "Events:" section

# Common causes:
# - Insufficient CPU/memory on all nodes
# - Taint with no toleration
# - NodeSelector/Affinity that matches no nodes
# - PVC cannot be bound (VolumeBinding plugin failure)

Controller Not Reconciling

# Check controller-manager status
kubectl get pods -n kube-system -l component=kube-controller-manager
kubectl logs -n kube-system kube-controller-manager-cp-node-1 --tail=100

# Check leader
kubectl get lease kube-controller-manager -n kube-system

# Check work queue metrics (if prometheus is available)
kubectl port-forward -n kube-system kube-controller-manager-cp-node-1 10257:10257
curl -sk https://localhost:10257/metrics | grep workqueue

# Check for throttling / rate limiting
kubectl logs ... | grep "Throttling request"

# Check RBAC permissions for specific controller SA
kubectl auth can-i list deployments --as=system:serviceaccount:kube-system:deployment-controller

etcd Issues

# etcd cannot reach quorum
# Symptom: apiserver returns 503 for writes, etcd logs show "failed to reach quorum"
ETCDCTL_API=3 etcdctl endpoint status --write-out=table ...

# etcd database size too large
ETCDCTL_API=3 etcdctl endpoint status ...  # check DB SIZE column
# Compact and defragment:
ETCDCTL_API=3 etcdctl compact $(etcdctl endpoint status --write-out=json | jq '.[0].Status.header.revision')
ETCDCTL_API=3 etcdctl defrag

# etcd slow writes / high fsync latency
# Check disk performance:
iostat -x 1 5
# etcd is sensitive to fsync latency; use SSDs; avoid NFS/network storage

# Check etcd alarms
ETCDCTL_API=3 etcdctl alarm list
ETCDCTL_API=3 etcdctl alarm disarm  # After resolving the root cause

Upgrading the Control Plane

Control plane upgrades must be done before upgrading worker nodes. The supported pattern is to upgrade one minor version at a time (e.g., 1.29 → 1.30, not 1.29 → 1.31).

# Using kubeadm (standard upgrade flow)

# 1. Upgrade kubeadm on first CP node
apt-get update && apt-get install -y kubeadm=1.30.0-1.1

# 2. Verify upgrade plan
kubeadm upgrade plan

# 3. Apply upgrade to first CP node (upgrades apiserver, scheduler, controller-manager, etcd)
kubeadm upgrade apply v1.30.0

# 4. Upgrade kubeadm on remaining CP nodes
# Then on each additional CP node:
kubeadm upgrade node

# 5. Upgrade kubelet and kubectl on CP nodes
apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
systemctl daemon-reload && systemctl restart kubelet

# 6. Then upgrade worker nodes (drain → upgrade kubelet → uncordon)
Upgrade Order is Critical
NEVER upgrade worker node kubelets before the control plane. kubelet version must be within 1 minor version of the apiserver. kube-proxy must be at the same minor version as the apiserver. Violating this skew policy causes undefined behavior.

Production Control Plane Checklist

20-Item Production Readiness Checklist
#ItemDefaultProduction Setting
1etcd node count1 (single)3 or 5 (odd, for quorum)
2apiserver replicas13 (behind VIP/LB)
3etcd encryption at restDisabledEnable with --encryption-provider-config
4Audit loggingDisabledEnable with policy covering Secrets, RBAC, exec
5Certificate rotationManualUse cert-manager or kubeadm auto-renewal + alerting
6etcd backupNoneAutomated hourly snapshots to off-cluster storage
7etcd diskAnyDedicated NVMe SSD; separate from OS disk
8API server request limits400/800Tune per cluster size; enable APF FlowSchemas
9etcd quota2GBIncrease to 8GB for large clusters; add compaction cron
10Authorization modeAlwaysAllow (dev)--authorization-mode=Node,RBAC
11Admission pluginsminimalEnable PodSecurity, NodeRestriction, ResourceQuota
12anonymous-authtrueDisable: --anonymous-auth=false
13insecure-port0 (disabled v1.20+)Ensure --insecure-port=0
14profilingEnabledDisable in production: --profiling=false
15etcd peer encryptionmTLSVerify peer certs are separate from client certs
16Control plane node isolationTolerates all (kubeadm default)Taint CP nodes: NoSchedule for workloads
17Resource requests on CP podsNone (static pods)Set requests in static pod manifests; use separate resource classes
18etcd compactionAuto (every 5min default)Verify compaction is running; alert on DB growth rate
19OIDC integrationx509 onlyIntegrate with org IdP (Dex, Okta) for human user authn
20Monitoring coverageNonePrometheus + Alertmanager for all 10 metrics above

Dependency Graph and Next Files

This File Covers

  • Control plane component roles
  • Communication topology
  • HA: stacked vs external etcd
  • Leader election mechanism
  • Static pods and bootstrap
  • PKI and certificate hierarchy
  • Health checks and metrics
  • Upgrade procedure