File Path00-foundations/03-cluster-architecture-overview.html
Prerequisites 00-intro, 01-history, 02-containers
Concepts Covered
Cluster topologyControl plane components Worker node componentsHA architecture Multi-master setupComponent communication Port referenceCertificate topology Failure domainsNetwork topology Cloud vs bare metalStacked vs external etcd
Related Files

Cluster Architecture Overview

This document is the architectural map of a Kubernetes cluster — every component, its role, how it communicates with other components, what ports it uses, and how the whole system maintains consistency under failures. Read this before diving into any individual component's deep-dive.

1 · The Two Planes

A Kubernetes cluster is logically split into two planes:

  • Control Plane — makes decisions (scheduling, reconciliation, admission), stores state. Brains of the cluster.
  • Data Plane (Worker Nodes) — executes workloads. Runs the actual Pods that serve traffic.

In a production cluster these planes run on separate physical/virtual machines. The control plane should never run user workloads — use NoSchedule taints.

External Load Balancer (VIP) CONTROL PLANE (3 replicas for HA) Control Plane Node 1 kube-apiserver :6443 HTTPS etcd :2379 client kube-scheduler controller-mgr cloud-controller-manager Control Plane Node 2 (identical) Same components Leader-elected controllers Control Plane Node 3 (identical) etcd quorum: 2/3 required Raft Raft Worker Node 1 kubelet :10250 kube-proxy containerd (CRI) :unix sock CNI plugin (Calico/Cilium/Flannel) Pod A app + sidecar 10.244.1.5 Pod B nginx 10.244.1.6 Worker Node 2 kubelet :10250 kube-proxy containerd (CRI) CNI plugin Pod C 10.244.2.5 Pod D 10.244.2.6 Worker Node N (1 to 5000+) kubelet :10250 kube-proxy ── Blue: kubelet→apiserver watch/updates ── Orange: etcd Raft replication ── Purple: client→VIP→apiserver
Figure 1 — Production HA cluster: 3 control plane nodes (etcd stacked), N worker nodes. All components communicate via kube-apiserver.

2 · Control Plane Components In Depth

2.1 kube-apiserver

The API server is the only stateful, central component that all other components talk to. It is the single source of truth gateway — no component talks to another component directly (exception: kubelet can be called back by the API server for exec/logs/port-forward, and etcd is called only by the API server).

PropertyValue
Port6443 (HTTPS). Never 8080 (insecure) in production.
Horizontally scalableYes — multiple replicas behind a load balancer. All replicas are active (no leader election).
Stateful?No — completely stateless. All state is in etcd.
Auth mechanismsx509 client certificates, bearer tokens, OIDC, webhook, service account tokens
Deep-dive01-control-plane/01-kube-apiserver.html

2.2 etcd

etcd is the cluster's brain memory. Every Kubernetes object (Pod spec, Deployment state, Secret value, Node status, Service endpoint) is persisted as a key-value entry in etcd. The kube-apiserver is the only component that reads from or writes to etcd directly.

PropertyValue
Client port2379 (only API server connects)
Peer port2380 (etcd-to-etcd Raft replication)
ConsensusRaft — requires quorum of (n/2)+1 members to commit a write
Recommended cluster size3 members (1 failure tolerance) or 5 members (2 failure tolerance)
Deep-dive01-control-plane/02-etcd.html

2.3 kube-scheduler

The scheduler watches for Pods with spec.nodeName == "" (unscheduled) and writes a nodeName into the Pod spec via a binding API call. It runs a filter → score → bind pipeline for every Pod.

PropertyValue
Metrics/health port10259 (HTTPS)
Leader electionYes — only one scheduler instance is active at a time, via Lease API object
Talks toOnly kube-apiserver (watch for unscheduled Pods, write bindings)
Deep-dive01-control-plane/03-kube-scheduler.html

2.4 kube-controller-manager

A single binary that embeds 30+ individual controllers, each running its own reconciliation loop. All controllers use the same watch-and-reconcile pattern.

PropertyValue
Metrics/health port10257 (HTTPS)
Leader electionYes — one active instance at a time via Lease
Key controllers includedReplicationController, ReplicaSet, Deployment, StatefulSet, DaemonSet, Job, CronJob, Node, Endpoint/EndpointSlice, Namespace, PersistentVolume, ServiceAccount, Token, Certificate
Deep-dive01-control-plane/04-kube-controller-manager.html

2.5 cloud-controller-manager

Separates cloud-provider-specific logic from the core Kubernetes code. Manages: cloud load balancers (LoadBalancer type Services), cloud node metadata (instance types, zones), and cloud routes (for overlay-less networking on GCE/AWS).

PropertyValue
Port10258 (HTTPS)
Leader electionYes
Absent onbare metal clusters without a cloud provider integration
Deep-dive01-control-plane/05-cloud-controller-manager.html

3 · Worker Node Components In Depth

3.1 kubelet

The kubelet is the primary node agent. It is the bridge between the Kubernetes API and the container runtime. It watches the API server for Pods assigned to its node, drives the CRI to create/destroy containers, runs probes, and reports status back.

PropertyValue
HTTPS API port10250 — API server calls back here for exec/logs/port-forward
Read-only port10255 (deprecated, disabled by default since 1.16)
Healthz port10248
Talks tokube-apiserver (watch + status updates), CRI socket (containerd/CRI-O), CNI plugins
Deep-dive02-node-components/01-kubelet.html

3.2 kube-proxy

kube-proxy watches Service and EndpointSlice objects and programs the host's networking stack (iptables rules, IPVS virtual servers, or nftables) to implement the Service virtual IP abstraction.

PropertyValue
Healthz port10256
Modesiptables (default), IPVS (large clusters), nftables (beta in 1.31), eBPF (via Cilium replacing kube-proxy entirely)
Deep-dive02-node-components/02-kube-proxy.html

3.3 Container Runtime (CRI)

The CRI-compliant runtime (containerd or CRI-O) receives instructions from the kubelet via gRPC, pulls images, manages container lifecycle, and interacts with the OCI low-level runtime (runc).

3.4 CNI Plugin

The CNI plugin (Calico, Cilium, Flannel, Weave, etc.) is invoked by the kubelet via the CRI when a Pod sandbox is created. It assigns a Pod IP, creates the veth pair, and programs routes. CNI is not a running daemon — it is a set of binary executables in /opt/cni/bin/ called by the runtime.

4 · Component Communication Matrix

A critical production security concept: understanding exactly which components talk to which, over which port, authenticated with which certificate.

SourceDestinationProtocol/PortAuth methodDirection
kubectl / any clientkube-apiserverHTTPS :6443x509 cert, OIDC token, bearer tokenClient → API server
kube-apiserveretcdHTTPS :2379x509 client cert (etcd CA signed)API server → etcd only
kube-schedulerkube-apiserverHTTPS :6443x509 client cert (system:kube-scheduler)scheduler → API server (watch + write binding)
kube-controller-managerkube-apiserverHTTPS :6443x509 client cert (system:kube-controller-manager)controller-manager → API server
cloud-controller-managerkube-apiserverHTTPS :6443x509 client certccm → API server
kubeletkube-apiserverHTTPS :6443x509 client cert (system:node:<nodeName>, bootstrapped via TLS bootstrap or kubeadm)kubelet → API server (watch Pods, update status)
kube-apiserverkubeletHTTPS :10250API server presents its own cert; kubelet CA must be trustedAPI server → kubelet (exec, logs, port-forward)
kube-proxykube-apiserverHTTPS :6443x509 client cert or ServiceAccount tokenkube-proxy → API server (watch Services/EndpointSlices)
CoreDNSkube-apiserverHTTPS :6443ServiceAccount token (RBAC: get/list/watch Services/Endpoints)CoreDNS → API server
etcd peeretcd peerHTTPS :2380Mutual TLS (etcd peer CA)Raft replication between etcd members
Admission webhookkube-apiserverHTTPS (webhook TLS)Webhook server cert trusted by API server caBundle fieldAPI server → webhook server
The "Hub and Spoke" model The kube-apiserver is at the centre of all communication. No two non-API-server components communicate directly with each other (except etcd peer replication). This is a deliberate design choice that simplifies security: you only need to secure one endpoint, and all state transitions are auditable in one place.

5 · Complete Port Reference

ComponentPortProtocolPurposeFirewall rule needed?
kube-apiserver6443HTTPSKubernetes API (all clients)Yes — from all nodes + kubectl clients
etcd client2379HTTPSetcd API (API server only)Yes — from control plane nodes only
etcd peer2380HTTPSRaft replication between etcd membersYes — between control plane nodes only
kube-scheduler10259HTTPSHealthz + metrics (Prometheus scrape)Optional — monitoring only
kube-controller-manager10257HTTPSHealthz + metricsOptional — monitoring only
cloud-controller-manager10258HTTPSHealthz + metricsOptional — monitoring only
kubelet10250HTTPSkubelet API (exec/logs/port-forward)Yes — from control plane nodes
kubelet healthz10248HTTPLocal healthcheck onlyNo — local loopback only
kube-proxy healthz10256HTTPHealth probeNo — local
NodePort range30000–32767TCP/UDPExternal Service exposureYes — from external clients
CoreDNS53 (Pod IP)UDP/TCPDNS resolution for cluster DNSNo — internal Pod traffic only
metrics-server443 (Service)HTTPSResource metrics (kubectl top)No — internal

6 · High Availability Architecture

6.1 Stacked vs. External etcd

There are two HA topologies for the control plane:

STACKED etcd (default kubeadm) CP Node 1 apiserver etcd sched+ctrlmgr CP Node 2 apiserver etcd sched+ctrlmgr CP Node 3 apiserver etcd sched+ctrlmgr ✓ Simpler to operate ✓ Fewer nodes needed ✓ kubeadm default ✗ etcd and apiserver compete for I/O ✗ Noisy-neighbour risk ✗ Tied upgrade cycles
EXTERNAL etcd (production recommended) CP Node 1 apiserver sched+ctrlmgr CP Node 2 apiserver sched+ctrlmgr CP Node 3 apiserver sched+ctrlmgr etcd cluster (3 dedicated nodes) etcd1:2379 etcd2:2379 etcd3:2379 Raft peer :2380 ✓ No resource contention ✓ Independent etcd scaling/upgrade ✓ Best for large clusters

6.2 Quorum Math and Failure Tolerance

etcd membersQuorum requiredFailure toleranceRecommendation
110Development only — any failure loses the cluster
220Worse than 1 — both must agree, but 1 failure still breaks quorum
321Minimum production HA — standard for most clusters
532Large/critical clusters, concurrent failure scenarios
743Very large clusters; write latency increases — rarely worth it
Never run an even number of etcd members 2 and 4 members provide the same or worse fault tolerance as 1 and 3, respectively, but require more nodes. The split-brain scenario (network partition with equal halves) is deadlier with even numbers. Always use odd numbers.

6.3 Controller/Scheduler Leader Election

While the API server is active-active (all replicas serve traffic), the scheduler and controller-manager use leader election to ensure only one instance makes decisions at a time. Leader election is implemented using a coordination.k8s.io/v1/Lease object in the kube-system namespace.

# Inspect leader election leases
kubectl get leases -n kube-system
# NAME                      HOLDER                    AGE
# kube-controller-manager   node1.cluster.internal    45d
# kube-scheduler            node1.cluster.internal    45d

# Who currently holds the scheduler lease?
kubectl get lease kube-scheduler -n kube-system -o jsonpath='{.spec.holderIdentity}'

# Leader election config (in controller-manager flags)
# --leader-elect=true (default)
# --leader-elect-lease-duration=15s  (time a lease is valid)
# --leader-elect-renew-deadline=10s  (how long leader retries to renew)
# --leader-elect-retry-period=2s    (how often non-leader tries to acquire)

7 · Network Topology

A Kubernetes cluster uses four distinct IP address spaces:

NetworkDefault rangeWhat it coversConfigured in
Node networkDepends on infrastructurePhysical/VM NIC IPs on each nodeCloud VPC or bare metal network
Pod CIDR10.244.0.0/16 (Flannel) / 192.168.0.0/16 (Calico)Pod IP addresses (one unique IP per Pod)--pod-network-cidr on kubeadm / CNI config
Service CIDR10.96.0.0/12ClusterIP virtual IPs for Services--service-cluster-ip-range on API server
DNS ClusterIP10.96.0.10CoreDNS Service IP (fixed)--cluster-dns on kubelet
IP range overlap causes silent routing failures The Pod CIDR, Service CIDR, and Node network MUST NOT overlap with each other or with on-premises corporate networks that the nodes can reach. A Pod CIDR of 10.244.0.0/16 will cause routing issues if any corporate subnet uses 10.244.x.x. Plan your CIDR allocations carefully before cluster creation — changing them requires a cluster rebuild.

7.1 Kubernetes Networking Requirements

The Kubernetes networking model mandates:

  1. Every Pod gets a unique, cluster-routable IP (no NAT for Pod-to-Pod traffic within the cluster)
  2. Every Pod can reach every other Pod by IP, regardless of which node they are on
  3. Every node can reach every Pod IP
  4. Pods see their own IP as their IP (no masquerading for inbound)

CNI plugins are responsible for implementing these rules. They achieve this via: VXLAN overlays (Flannel), BGP routing (Calico), eBPF dataplane (Cilium), etc.

8 · Failure Domains and Topology

8.1 Multi-AZ Worker Node Distribution

In cloud deployments, worker nodes should be spread across at least 3 availability zones. Kubernetes provides topology-aware scheduling to enforce this.

# Spread Pods evenly across AZs (topologySpreadConstraints)
spec:
  topologySpreadConstraints:
  - maxSkew: 1                                  # Max imbalance between zones
    topologyKey: topology.kubernetes.io/zone   # Use the zone label on nodes
    whenUnsatisfiable: DoNotSchedule           # Fail scheduling rather than violate
    labelSelector:
      matchLabels:
        app: web
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname        # Also spread across nodes within a zone
    whenUnsatisfiable: ScheduleAnyway          # Best-effort node spread

8.2 Control Plane AZ Distribution

# Verify control plane nodes are spread across AZs
kubectl get nodes -l node-role.kubernetes.io/control-plane \
  -o custom-columns='NAME:.metadata.name,ZONE:.metadata.labels.topology\.kubernetes\.io/zone'

# Example output (good):
# NAME              ZONE
# cp-node-1         us-east-1a
# cp-node-2         us-east-1b
# cp-node-3         us-east-1c

9 · Cluster Sizing Reference

Cluster sizeNodesPodsControl planeetcdAPI server instances
Development1–5<1001 node, all-in-oneSingle (stacked)1
Small production5–20<5003 nodes (stacked)3 members3 (one per CP node)
Medium production20–100<5,0003 nodes (external etcd recommended)3–5 dedicated nodes3
Large production100–500<25,0005 nodes (external etcd)5 dedicated nodes3–5 behind LB
Very large500–5000+150,000+5+ nodes, tuned API server5 dedicated high-I/O nodes (SSD)5+ behind LB + caching
Kubernetes scalability benchmarks (SIG Scalability) As of Kubernetes 1.30+, tested to: 5,000 nodes, 150,000 Pods, 300,000 total containers. API server p99 response time SLO: <1s for mutating, <5s for list. These limits require careful etcd tuning, sufficient API server memory, and low-latency storage for etcd (NVMe SSD, fsync latency <10ms).

10 · Cloud vs Bare Metal Architecture Differences

ConcernCloud (GKE/EKS/AKS)Bare Metal / On-Premises
Control plane managementFully managed — you never see CP nodesYou manage CP nodes, upgrades, HA, backups
etcd managementManaged by cloud providerYou manage backup, recovery, disk sizing
Load balancercloud-controller-manager provisions cloud LB automatically for LoadBalancer ServicesMetalLB or kube-vip or external F5/HAProxy required
Node provisioningCluster Autoscaler calls cloud API (ASG, MIG) to add nodesCluster API with MAAS/vSphere provider, or manual
StorageEBS/GCE PD/Azure Disk CSI drivers built-inCeph/Longhorn/NFS CSI, or SAN with custom driver
NetworkingVPC-native routing (no overlay needed on GKE, EKS VPC CNI)Must configure overlay (VXLAN, BGP) or routed fabric
Certificate rotationAutomatic (managed control plane)Must monitor expiry; kubelet rotates via RotateKubeletClientCertificate
OIDC identity integrationWorkload Identity (GKE) / IRSA (EKS) / Pod Identity (AKS)OIDC provider + external Vault / Keycloak integration

11 · Component Startup Order

When bootstrapping or recovering a cluster, components must start in the correct order:

  1. etcd — must be healthy first; API server cannot start without etcd
  2. kube-apiserver — reads/writes etcd; all other components depend on it
  3. kube-controller-manager — starts watching API server; leader election requires API server
  4. kube-scheduler — starts watching API server; leader election requires API server
  5. cloud-controller-manager (if present) — starts after API server
  6. kubelet on each node — registers with API server; starts DaemonSet Pods (CoreDNS, CNI agent, kube-proxy)
  7. kube-proxy — DaemonSet Pod, starts after kubelet is ready
  8. CoreDNS — Deployment, starts after kube-proxy is programming Service rules
  9. CNI agent (if daemon-based, e.g., Calico node, Cilium agent) — DaemonSet, started by kubelet
# Monitor startup sequence on a control plane node
journalctl -u etcd -f &
journalctl -u kube-apiserver -f &
journalctl -u kube-controller-manager -f &
journalctl -u kube-scheduler -f &

# For kubeadm clusters (static Pods in /etc/kubernetes/manifests/):
# These are started by kubelet as static Pods — no systemd units
ls /etc/kubernetes/manifests/
# etcd.yaml  kube-apiserver.yaml  kube-controller-manager.yaml  kube-scheduler.yaml

# Check static Pod health
kubectl get pods -n kube-system | grep -E "etcd|apiserver|scheduler|controller"

12 · Certificate Topology

Kubernetes uses mutual TLS for all inter-component communication. Understanding the certificate chain is essential for troubleshooting and for manually bootstrapping clusters.

Full certificate authority hierarchy
Kubernetes PKI hierarchy (typical kubeadm cluster):

/etc/kubernetes/pki/
├── ca.crt / ca.key
│   └── Cluster CA — signs:
│       ├── apiserver.crt (server cert for kube-apiserver TLS)
│       ├── apiserver-kubelet-client.crt (API server → kubelet client cert)
│       ├── controller-manager.conf (embedded client cert for controller-manager)
│       ├── scheduler.conf (embedded client cert for scheduler)
│       └── kubelet client certs (in /var/lib/kubelet/pki/ on each node)
│
├── etcd/ca.crt / etcd/ca.key
│   └── etcd CA (separate) — signs:
│       ├── etcd/server.crt (etcd server TLS)
│       ├── etcd/peer.crt (etcd peer replication TLS)
│       └── apiserver-etcd-client.crt (API server → etcd client cert)
│
├── front-proxy-ca.crt / front-proxy-ca.key
│   └── Front-proxy CA — signs:
│       └── front-proxy-client.crt (used for API aggregation layer)
│
└── sa.pub / sa.key
    └── Service Account signing key pair
        (sa.key signs SA tokens; sa.pub used by API server to verify them)

Default certificate validity: 1 year (kubeadm)
CA validity: 10 years
Auto-rotation: kubelet client certs rotate automatically via
  RotateKubeletClientCertificate feature (enabled by default since 1.8)
# Check certificate expiry dates on a control plane node
kubeadm certs check-expiration

# Rotate certificates manually (kubeadm clusters)
kubeadm certs renew all

# Check a specific cert
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -dates
# notAfter=Jun  1 00:00:00 2026 GMT

# Check kubelet node cert (on each worker node)
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -subject -dates
Certificate expiry is a production incident waiting to happen Kubernetes certificates default to 1-year validity with kubeadm. Set up monitoring: x509_cert_expiry metric from prometheus/blackbox-exporter or use kubeadm certs check-expiration in a CronJob. Alert at 60 days remaining. At expiry, the entire cluster becomes inaccessible.

13 · Single-Node Development Clusters

For local development, all components run on a single machine. The control plane and worker role are combined.

ToolMechanismUse caseProduction parity
kind (K8s in Docker)Each node is a Docker container; uses kubeadm internallyCI/CD pipelines, multi-node local testingHigh — uses real kubeadm, real etcd
minikubeSingle node VM or Docker driverDeveloper laptop, quick prototypingMedium — AddOns differ from production CNI/CSI
k3sSingle binary, SQLite or embedded etcd, replaces kube-proxy with iptablesEdge, IoT, CI, resource-constrainedMedium — some defaults differ (Traefik instead of nginx-ingress)
k3dk3s in Docker containersFast local multi-node k3s clustersMedium
Desktop (Docker/Rancher)Bundled K8s in Docker Desktop or Rancher DesktopDeveloper convenience, integrated with local registryLow — opinionated defaults
# Create a multi-node kind cluster (3 nodes: 1 control-plane, 2 workers)
cat <<EOF | kind create cluster --name dev --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
networking:
  podSubnet: "10.244.0.0/16"
  serviceSubnet: "10.96.0.0/12"
EOF

# Load a local image into kind (no registry needed)
kind load docker-image myapp:dev --name dev

14 · Cluster Health Checks

# ---- Overall cluster health ----
kubectl get nodes -o wide                    # All nodes, status, version
kubectl get pods -n kube-system              # All control plane + system Pods

# ---- API server health ----
kubectl get --raw /healthz                   # "ok"
kubectl get --raw /healthz/etcd              # "ok"
kubectl get --raw /readyz                    # Checks all health indicators
kubectl get --raw /livez                     # Liveness check

# ---- etcd health (on control plane node) ----
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  endpoint health

ETCDCTL_API=3 etcdctl ... endpoint status --write-out=table
# Shows: DB size, Raft index, leader, version

# ---- Component health (modern) ----
kubectl get --raw /api/v1/nodes | jq .
kubectl describe node <cp-node> | grep -A5 "Conditions:"

# ---- Scheduler / controller-manager ----
kubectl get --raw /healthz    # through API server aggregation
# or directly on the pod:
kubectl exec -n kube-system kube-scheduler-<hash> -- wget -O- http://localhost:10259/healthz
kubectl exec -n kube-system kube-controller-manager-<hash> -- wget -O- http://localhost:10257/healthz

# ---- Event stream for recent issues ----
kubectl get events --sort-by='.lastTimestamp' -A | tail -30

15 · Architecture Production Checklist

15-point production architecture checklist
  1. 3 control plane nodes minimum, spread across 3 AZs. Prefer 5 for critical clusters.
  2. External etcd for large/critical clusters (100+ nodes or high churn). Dedicate high-I/O SSDs with <10ms fsync latency.
  3. Load balancer in front of API servers. The LB VIP must be included in the API server TLS SAN (Subject Alternative Name) or all kubectl connections fail.
  4. Taint control plane nodes with node-role.kubernetes.io/control-plane:NoSchedule to prevent workloads from landing on CP nodes.
  5. Size control plane nodes for API request volume: 4 CPU / 8 GB RAM minimum; 8 CPU / 16 GB for 100+ node clusters; etcd needs dedicated I/O.
  6. Monitor certificate expiry with alerting at ≥60 days before expiry.
  7. Back up etcd daily to off-cluster storage. Test restores quarterly.
  8. Plan CIDR ranges before cluster creation — they cannot be changed without a rebuild.
  9. Audit API server flags: ensure --anonymous-auth=false, --audit-log-path is set, --enable-admission-plugins includes critical plugins.
  10. Enable audit logging. Store logs in an external SIEM. Default audit policy logs all request metadata.
  11. Worker node sizing: avoid >110 Pods/node (default kubelet limit). Use at least 4 vCPU / 8 GB RAM for general worker nodes.
  12. Use Cluster Autoscaler with min/max node group bounds to prevent runaway scale-out.
  13. Deploy PodDisruptionBudgets for all critical workloads before the first node drain.
  14. Document your disaster recovery procedure and test it. etcd restore → API server restart → verify all Pods recover.
  15. Use a managed K8s service unless you have a strong reason for self-hosted. The operational burden of managing control planes is significant.

Next Files

Dependency graph — recommended reading order

References

  • Kubernetes architecture docs — kubernetes.io/docs/concepts/architecture/
  • kubeadm HA docs — kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/
  • etcd documentation — etcd.io/docs/
  • Kubernetes scalability thresholds — github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md
  • PKI certificates documentation — kubernetes.io/docs/setup/best-practices/certificates/
  • Production Kubernetes — Josh Rosso, Craig Tracey et al., O'Reilly 2021