History of Kubernetes
Kubernetes did not emerge in a vacuum. It carries over a decade of hard-won operational knowledge from running the world's largest fleets of containers at Google. Understanding this lineage is not merely academic — the architectural decisions made in Borg and Omega explain why Kubernetes works the way it does today, and why certain design choices that seem unusual have deep roots in production experience at scale.
1 · Google Borg (2003–2013)
1.1 Origin and Purpose
Around 2003, Google's infrastructure team built the first version of what would become Borg. The immediate trigger was the rapid growth of Google's internal services — the Web search crawler, Gmail, Maps, YouTube — each of which needed to run on thousands of machines simultaneously. Managing this by hand was impossible.
Borg's design goal was to maximise utilisation of Google's data centre hardware while providing a reliable, scalable runtime for both long-running services (user-facing production jobs) and batch workloads (MapReduce, analytics jobs) on the same shared fleet of machines.
1.2 Borg Architecture
Borgmaster
- Centralised control plane
- Replicated 5× for HA (Paxos)
- Handles all scheduling
- Stores state in Paxos-replicated log
- Serves read/write API (not HTTP)
- Equivalent: kube-apiserver + scheduler + etcd
Borglet
- Agent on every machine
- Launches/stops tasks
- Reports machine state to Borgmaster
- Restarts failed tasks locally
- Manages local cgroup resources
- Equivalent: kubelet
Borg Scheduler
- Feasibility checking (like K8s filter plugins)
- Scoring with worst-fit packing by default
- Resource reclamation: buys back unused reservations
- Priority and quota system
- Equivalent: kube-scheduler
1.3 Borg Key Concepts That Influenced Kubernetes
| Borg concept | Kubernetes equivalent | Lesson learned |
|---|---|---|
| Job (set of identical Tasks) | ReplicaSet / Deployment | Grouping identical tasks simplifies management |
| Task | Pod / Container | The atomic scheduling unit |
| Alloc (resource envelope shared by tasks) | Pod (shared network/cgroup envelope for containers) | Co-located containers should share IPC and network |
| Priority classes (prod vs. batch) | PriorityClass, QoS classes | Mixed workloads need preemption and tiered eviction |
| Resource reclamation (compaction) | VPA, resource requests vs. limits | Applications over-request; reclaim unused capacity |
| BNS (Borg Naming Service) | Service + CoreDNS | Service discovery must be cluster-native, not DNS-only |
| Sigma (Borg UI/debugger) | kubectl, Dashboard | Operators need rich introspection tooling |
| Faborg (failure injection) | Chaos engineering tools | Build for failure, not just availability |
| ConfigFile (job description language) | Kubernetes YAML manifests | Declarative job descriptions beat imperative scripts |
1.4 Lessons Borg Taught Google (and Kubernetes)
The 10 most important operational lessons from Borg
- Don't expose raw machine IDs to users. Borg users could see which machine a task ran on and sometimes hard-coded assumptions about it. This created fragility. Kubernetes abstracts node identity — workloads should be location-agnostic.
- Allocs (pods) are the right unit of co-location. Tasks that needed to communicate or share data should run as a group with shared resources. This became the Pod — the fundamental scheduling atom in K8s.
-
Introspection is critical at scale.
Sigma (Borg's dashboarding/debugging system) was the most heavily used internal tool.
K8s invested in
kubectl describe, events, and the metrics API from day one. - The master is the bottleneck. Borgmaster's monolithic architecture limited scale. K8s addressed this with watch semantics (not poll), horizontal API server scaling, and etcd as a separate store.
- Users want higher-level abstractions. Raw Borg jobs were too low-level. Users built frameworks on top. K8s formalised this as Deployments, StatefulSets, DaemonSets — and made them extensible via CRDs.
- Configuration sprawl is a real problem. BCL (Borg Configuration Language) became too complex. K8s YAML, combined with Helm/Kustomize/CUE, gives structured templating without an embedded language.
- Resource reclamation matters enormously. Real utilisation is often 10–30% of requested. Borg reclaimed unused reservations. K8s exposes this via requests vs. limits, and VPA automatically right-sizes.
- Priority and preemption are necessary for mixed workloads. Running batch next to production requires a clear priority hierarchy and controlled preemption. K8s PriorityClass + preemption implement this.
- Treat infrastructure as code. Borg job files were versioned in source control at Google. K8s doubles down on this — GitOps (ArgoCD/Flux) is the standard operating model.
- Health checking must be cluster-native. External health monitors that SSH into machines do not scale. Liveness and readiness probes run inside the kubelet, which is co-located with the workload.
2 · Google Omega (2011–2013)
2.1 What Omega Was
While Borg continued evolving, a separate Google infrastructure team designed Omega as a research prototype to address Borg's architectural limitations. Omega was published academically in the 2013 EuroSys paper "Omega: flexible, scalable schedulers for large compute clusters".
2.2 Omega's Key Innovations
| Innovation | Description | K8s impact |
|---|---|---|
| Shared-state scheduling | All schedulers see the full cluster state via a shared, consistent in-memory store — no single scheduler bottleneck. Multiple schedulers run in parallel with optimistic concurrency (OCC). | K8s allows multiple schedulers via schedulerName. The scheduler framework plugin model comes from Omega's extensibility ideas. |
| Optimistic concurrency control | Schedulers speculatively assign tasks; a transaction commits only if no conflicts occurred. Conflicts trigger retry, not blocking. | K8s uses resourceVersion optimistic locking on API objects. Etcd's MVCC is the underlying mechanism. |
| No central scheduler lock | Borg's scheduler held a global lock. Omega's OCC model eliminated this, allowing 10× throughput. | K8s API server is horizontally scalable; controllers and schedulers run independently using watch + compare-and-swap. |
| Cell state visibility | All components can read the full cluster state — enables richer scheduling decisions (e.g., topology awareness). | K8s's watch API makes full cluster state available to any authorised client, enabling topology-aware scheduling, custom schedulers, and cluster-autoscaler. |
2.3 Architecture Comparison: Borg vs Omega vs Kubernetes
3 · Birth of Kubernetes (2013–2014)
3.1 The Origin Story
In mid-2013, three engineers who had worked on Borg and Omega at Google — Joe Beda, Brendan Burns, and Craig McLuckie — began a skunkworks project internally to build an open-source container orchestrator. Their insight was that Docker's container format was becoming an industry standard, but nobody had built the orchestration layer for it yet.
They were joined by Brian Grant (who led much of the API design), Tim Hockin (networking), and many others. The project was internally code-named "Project Seven" (a reference to Star Trek's Seven of Nine — a Borg character, fittingly).
3.2 Founding Design Principles
The founding team made several explicit architectural decisions that deliberately diverged from Borg:
| Decision | Borg approach | Kubernetes approach | Rationale |
|---|---|---|---|
| API surface | Internal RPC, not HTTP | Public REST API over HTTPS | Open ecosystem; any language can speak HTTP |
| State storage | Paxos-replicated in-memory in Borgmaster | External etcd (Raft) | Decouple storage from control plane; API servers are stateless |
| Job model | Single "Job" concept | Multiple resource types (Pod, RC, Service, etc.) | Different workloads need different management semantics |
| Networking | Custom Google fabric | CNI plugin interface | Cloud-agnostic; any network fabric can implement CNI |
| Storage | Google's Colossus (GFS) | CSI plugin interface | Vendor-neutral; EBS, GCP PD, Ceph, NFS all work the same |
| Identity model | Internal Loas/GAIA | Pluggable auth (x509, OIDC, tokens) | Works with any existing enterprise identity system |
| Label system | Named "labels" in Borg, not queryable across jobs | First-class labels with selectors | Flexible grouping, decouples controllers from specific resource names |
| Extension model | Fork Borg (very hard) | CRDs + Webhooks + Aggregation API | Users can extend without forking the core |
3.3 First Public Commit and Launch
First public commit — Kubernetes v0.1
Joe Beda made the first public commit to GitHub on June 6, 2014. The initial codebase had ~14,000 lines of Go. The core concepts were already there: Pods, Replication Controllers, Services, and a REST API. Google announced the project at DockerCon 2014.
Microsoft, Red Hat, IBM, Docker join
Within one month of the announcement, major vendors committed to contributing. This was the first sign that K8s would become the industry standard rather than a Google-proprietary system.
v0.4 — Namespaces, resource quotas, persistent volumes
Early features that are still core today: Namespaces for multi-tenancy, ResourceQuotas to limit namespace consumption, and the first PersistentVolume implementation (GCE PD-backed).
4 · Kubernetes v1.0 and CNCF (2015)
Kubernetes v1.0 Released STABLE
Announced at OSCON 2015, v1.0 was declared production-ready. Along with the release, Google and the Linux Foundation announced the Cloud Native Computing Foundation (CNCF), with Kubernetes as its first seed project. This was a strategic move: by donating K8s to a neutral foundation, Google signalled that no single vendor would control it — which accelerated enterprise adoption.
KubeCon North America I — 500 attendees
The first KubeCon had 500 attendees. By 2023 it regularly draws 10,000+ in person and tens of thousands virtually. This growth arc mirrors the explosive adoption of K8s across the industry.
v1.1 – v1.3: Horizontal autoscaling, federation, node resource management
v1.1 added Horizontal Pod Autoscaler (HPA) and HTTP-based health checks. v1.2 added ConfigMaps, Ingress, DaemonSets, and rolling deployments (Deployments object). v1.3 added PodDisruptionBudgets, init containers (alpha), and the first CSI prototype (flexVolume).
5 · Major Version Milestones (1.0 → 1.32)
Full milestone table — v1.0 through v1.32 with key features
| Version | Date | Key features / changes |
|---|---|---|
v1.0 | Jul 2015 | GA release, CNCF donation. Pods, RCs, Services, Namespaces, basic auth. |
v1.1 | Nov 2015 | HPA, HTTP probes, resource limits on containers. |
v1.2 | Mar 2016 | Deployments GA, ConfigMaps, Ingress (beta), DaemonSets, PetSets (alpha). 1000-node clusters. |
v1.3 | Jul 2016 | Init containers (alpha), PodDisruptionBudgets, flexVolume, Cluster Federation (alpha). 2000-node clusters. |
v1.4 | Sep 2016 | PodPresets, kubeadm (alpha), ScheduledJobs (CronJob predecessor). PetSets renamed StatefulSets. |
v1.5 | Dec 2016 | StatefulSets (beta), kubefed (federation), Windows node support (alpha), RBAC (beta). |
v1.6 | Mar 2017 | RBAC GA, etcd v3 default, 5000-node clusters, dynamic volume provisioning (beta). |
v1.7 | Jun 2017 | Network Policy GA, StatefulSets GA, API aggregation layer, kubeadm GA. |
v1.8 | Sep 2017 | RBAC GA, CronJob beta, Priority and Preemption, Volume snapshots (alpha). |
v1.9 | Dec 2017 | Workloads API GA (Deployments, DaemonSets, ReplicaSets, StatefulSets stable under apps/v1). CRI stable. |
v1.10 | Mar 2018 | External cloud provider support (cloud-controller-manager alpha). CSI (beta). Lease API. |
v1.11 | Jun 2018 | IPVS kube-proxy (stable), CoreDNS default DNS. Dynamic kubelet config. |
v1.12 | Sep 2018 | RuntimeClass (alpha), TLS bootstrapping improvements, CSI (stable). |
v1.13 | Dec 2018 | kubeadm GA, CSI GA, CoreDNS GA. SimplestKubernetesRelease™ — fewest features, highest quality. |
v1.14 | Mar 2019 | Windows nodes (stable), PersistentLocalVolumes GA, kubectl plugin mechanism (stable). |
v1.15 | Jun 2019 | CRD structural schemas, CustomResourceWebhookConversion, go module support. |
v1.16 | Sep 2019 | CRD GA (v1), deprecated beta Deployments/DaemonSets/ReplicaSets (apps/v1beta), endpoint slices (alpha). |
v1.17 | Dec 2019 | Cloud provider labels GA, Volume snapshots (beta), CSI migration (alpha, moving in-tree to CSI). |
v1.18 | Mar 2020 | Topology manager (beta), Server-side apply (beta), IngressClass resource, HPA v2 (beta). |
v1.19 | Aug 2020 | Ingress GA, Immutable Secrets/ConfigMaps, EndpointSlices (beta), Storage capacity tracking (alpha). |
v1.20 | Dec 2020 | Docker shim deprecation announced. CronJob GA, API Priority and Fairness (beta), graceful node shutdown (alpha). |
v1.21 | Apr 2021 | CronJob GA, Immutable Secrets GA, PodDisruptionBudget GA, EndpointSlices GA, IPv4/IPv6 dual-stack (stable). |
v1.22 | Aug 2021 | Server-side apply GA, ephemeral containers (beta), memory manager (beta), removal of beta Ingress/RBAC/CRD APIs. |
v1.23 | Dec 2021 | FlexVolume deprecated, HPA v2 GA, IPv4/IPv6 dual-stack GA, PodSecurity admission (beta, replacing PSP). |
v1.24 | May 2022 | Docker shim removed. Ephemeral containers GA, gRPC probes (beta), PodSecurity (stable), OpenAPI v3. |
v1.25 | Aug 2022 | PodSecurityPolicy removed (deprecated since 1.21). CSI migration complete for most in-tree providers. cgroup v2 (stable). |
v1.26 | Dec 2022 | Cross-namespace VolumeDataSource (alpha), CPUManager static policy improvements, ValidatingAdmissionPolicy (alpha). |
v1.27 | Apr 2023 | SeccompDefault GA, In-place pod resource resize (alpha), node log access API (alpha). |
v1.28 | Aug 2023 | Retroactive default StorageClass (GA), NodeSwap (beta), sidecar containers (alpha, KEP-753). |
v1.29 | Dec 2023 | ReadWriteOncePod PV access mode (GA), KV audit log (beta), LoadBalancer IP mode (alpha). |
v1.30 | Apr 2024 | Structured auth config (beta), ValidatingAdmissionPolicy GA, AppArmor GA, sidecar containers (beta). |
v1.31 | Aug 2024 | AppArmor stable, persistent volume last phase transition time GA, nftables kube-proxy (beta). |
v1.32 | Dec 2024 | Asynchronous preemption (alpha), mutating admission policies (alpha), DRA structured parameters (beta). |
6 · The Docker Shim Removal — A Case Study in API Evolution
One of the most misunderstood events in Kubernetes history is the removal of the Docker shim in v1.24. This section explains exactly what happened and why it was the right decision.
6.1 What Was the Docker Shim?
When Kubernetes introduced the Container Runtime Interface (CRI) in v1.5, Docker did not natively implement CRI. To continue supporting Docker, the Kubernetes team added a shim — a translation layer built into the kubelet that converted CRI calls into Docker API calls. This shim was maintained inside the kubelet source tree.
6.2 Removal Timeline
| Version | Action |
|---|---|
| v1.5 (Dec 2016) | CRI interface introduced; dockershim added as compatibility layer |
| v1.20 (Dec 2020) | dockershim deprecated; warning added to kubelet logs |
| v1.23 (Dec 2021) | Last release with dockershim in-tree |
| v1.24 (May 2022) | dockershim removed from kubelet. cri-dockerd external shim available for Docker users. |
7 · CNCF Ecosystem Growth
Kubernetes' donation to the CNCF in 2015 seeded an entire ecosystem of complementary projects. The CNCF now has 160+ projects spanning every layer of the cloud-native stack.
7.1 Key CNCF Projects by Layer
| Layer | Graduated CNCF projects | Role |
|---|---|---|
| Runtime | containerd, CRI-O | CRI-compliant container runtimes |
| Networking | CNI (spec), Cilium, Calico (sandbox) | Pod networking, NetworkPolicy, eBPF |
| Storage | Rook, Longhorn | Cloud-native distributed storage |
| Service mesh | Istio (2023), Linkerd (incubating) | mTLS, traffic management, observability |
| Monitoring | Prometheus, Thanos | Metrics collection, long-term storage |
| Tracing | OpenTelemetry, Jaeger | Distributed tracing, OTLP |
| Logging | Fluentd, Fluentbit | Log aggregation and routing |
| GitOps / CD | ArgoCD (incubating), Flux (graduated) | Declarative continuous delivery |
| Package management | Helm | Kubernetes application packaging |
| Policy | OPA (Graduated) | Policy-as-code for admission, authorization |
| Security | Falco, TUF, in-toto | Runtime threat detection, supply chain |
| Cluster lifecycle | Cluster API | Declarative cluster provisioning |
| Registry | Harbor | Container image registry with scanning |
7.2 The Container Orchestrator Wars (2015–2017)
Kubernetes did not become the de-facto standard without competition. Three orchestrators competed from 2015 to 2017:
Docker Swarm, Apache Mesos, and Kubernetes — comparative analysis
| Dimension | Docker Swarm | Apache Mesos + Marathon | Kubernetes |
|---|---|---|---|
| Origin | Docker Inc., 2015 | Twitter/AirBnB/Apple, 2009/2014 | Google, 2014 |
| Ease of setup | Extremely easy (built into Docker) | Complex multi-component stack | Moderate; kubeadm simplified this |
| Scheduling model | Simple, host-affinity only | Two-level (Mesos + Marathon). Very flexible for heterogeneous workloads. | Rich multi-constraint scheduler with plugin framework |
| API | Docker Compose-like YAML, Docker API | Marathon REST API, Mesos API | Open versioned REST API, CRDs |
| Networking | Docker overlay network | Custom; CNI support added later | CNI from the start |
| Extensibility | Limited | High (frameworks), but complex | Very high (CRDs, webhooks, operators) |
| Ecosystem | Docker-centric | Datacenter-oriented, Hadoop/Spark focus | CNCF, vendor-neutral, cloud-native |
| Outcome | Deprecated; Docker Inc. acquired by Mirantis 2019 | Still used at scale in some enterprises; D2iQ (formerly Mesosphere) pivoted to Kubernetes | Industry standard as of 2018–2019 |
Kubernetes won for several reasons: rich API, strong community, CNCF neutrality, powerful extensibility (CRDs), and managed Kubernetes services from all major cloud providers (GKE 2015, AKS 2017, EKS 2018) that removed the operational complexity.
8 · Managed Kubernetes Services
The most significant accelerator for enterprise Kubernetes adoption was managed services — cloud providers abstracting away control plane management entirely.
| Service | Provider | GA date | Notable features |
|---|---|---|---|
| GKE (Google Kubernetes Engine) | Google Cloud | Aug 2015 | Autopilot mode, Workload Identity, GKE Sandbox (gVisor), integrated logging |
| AKS (Azure Kubernetes Service) | Microsoft Azure | Jun 2018 | Virtual nodes (ACI), Azure AD integration, confidential computing nodes |
| EKS (Elastic Kubernetes Service) | AWS | Jun 2018 | Fargate serverless nodes, IAM for ServiceAccounts, EKS Anywhere (bare metal) |
| DOKS (DigitalOcean Kubernetes) | DigitalOcean | May 2019 | Simplified cluster management for smaller teams |
| OKE (Oracle Container Engine) | Oracle Cloud | 2018 | Virtual nodes, free control plane |
| ROKS (Red Hat OpenShift on IBM Cloud) | IBM Cloud | 2019 | OpenShift layer (SCC, Routes, Operators) on top of Kubernetes |
8.1 On-Premises Distributions
| Distribution | Vendor | Key differentiator |
|---|---|---|
| OpenShift | Red Hat | Enterprise hardening, SCCs, Routes, built-in CI/CD (Tekton), Operator Framework |
| Rancher (RKE/RKE2) | SUSE | Multi-cluster management, simplified UI, Rancher Desktop for local dev |
| Tanzu Kubernetes Grid | VMware/Broadcom | vSphere integration, Carvel tooling, regulated industry focus |
| k3s | Rancher/SUSE | Lightweight (<100MB binary), SQLite or etcd, ARM support, edge/IoT |
| microk8s | Canonical | Single-binary snap, add-on ecosystem, Ubuntu-native |
| kind (Kubernetes IN Docker) | SIG Testing | Local testing clusters inside Docker containers; CI/CD pipeline standard |
| minikube | Community | Single-node local dev cluster, multi-driver (Docker, QEMU, VirtualBox) |
9 · Community Governance and SIGs
Kubernetes is governed by the CNCF Technical Oversight Committee (TOC) and the Kubernetes Steering Committee. The project is divided into Special Interest Groups (SIGs) and Working Groups (WGs), each owning a specific domain.
9.1 Key SIGs and Their Scope
| SIG | Domain | Key deliverables |
|---|---|---|
| SIG API Machinery | Core API server, CRDs, webhooks, client-go | API versioning, server-side apply, CRD validation, watch semantics |
| SIG Apps | Workload APIs | Deployments, StatefulSets, DaemonSets, Jobs, CronJobs |
| SIG Node | kubelet, CRI, cgroups, resource management | kubelet, resource manager, device plugins, topology manager |
| SIG Network | CNI, Services, Ingress, Gateway API, DNS, NetworkPolicy | kube-proxy, EndpointSlices, dual-stack, network policy spec |
| SIG Storage | CSI, PV/PVC lifecycle, volume plugins | CSI spec, dynamic provisioning, volume snapshots, volume health |
| SIG Scheduling | kube-scheduler, scheduler framework | Scheduling framework, topology-aware scheduling, descheduler |
| SIG Auth | RBAC, AuthN/AuthZ, Secrets, admission, certificates | RBAC, TokenRequest API, BoundServiceAccount tokens, audit |
| SIG Security | Pod security, supply chain, security policy | PodSecurity admission, security benchmarks, SLSA compliance |
| SIG Instrumentation | Metrics, logging, events, tracing | metrics-server, structured logging, OpenTelemetry integration |
| SIG Cluster Lifecycle | kubeadm, cluster bootstrap, upgrades | kubeadm, cluster-api, upgrade tooling |
| SIG Multicluster | Federation, multi-cluster service, Cluster API | MCS (Multi-Cluster Services), KubeFed, Cluster API |
| SIG Windows | Windows node support | Windows containerd, GMSA, HostProcess containers |
| WG Batch | Batch workloads at scale | JobSet, indexed jobs, pod failure policy, job backoff |
| WG Structured Logging | Contextual logging | Contextual logging, structured JSON output |
9.2 The KEP Process — How Features Get Added
All significant changes to Kubernetes go through a Kubernetes Enhancement Proposal (KEP). This is a design document that follows a lifecycle:
Provisional → Implementable → Implemented → Deferred / Withdrawn
KEP lifecycle stages:
Alpha (opt-in, hidden behind feature gate) — one release minimum
Beta (on by default, may have API changes) — two releases minimum
Stable (GA, cannot be removed without deprecation period)
Feature gates control alpha/beta features:
--feature-gates=NewFeature=true # Enable in kubelet/apiserver
# List all feature gates and their status in your cluster
kubectl get --raw /healthz/ping
# Or check apiserver flags:
ps aux | grep kube-apiserver | grep feature-gates
# Check feature gate defaults for your version
# https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
10 · API Deprecation and Removal Policy
One of the most operationally important aspects of Kubernetes history is its API deprecation policy. Understanding this prevents surprise breakage during upgrades.
| API maturity | Deprecation notice required | Minimum support period after deprecation |
|---|---|---|
| GA (v1, v2…) | Yes, via release notes | 12 months OR 3 releases (whichever is longer) |
| Beta (v1beta1, v1beta2…) | Yes | 9 months OR 3 releases (whichever is longer) |
| Alpha (v1alpha1…) | No guarantee; may disappear in any release | None — always opt-in, never enabled by default |
v1.16:extensions/v1beta1Deployments, ReplicaSets, DaemonSets removed. Replacement:apps/v1.v1.22:networking.k8s.io/v1beta1Ingress removed. Replacement:networking.k8s.io/v1.v1.25: PodSecurityPolicy removed. Replacement: PodSecurity admission controller.v1.25:batch/v1beta1CronJob removed. Replacement:batch/v1.
kubectl convert and review deprecated API usage before upgrading.
# Detect deprecated API usage in your cluster before upgrading
# Option 1: pluto (recommended tool)
pluto detect-all-in-cluster --target-versions k8s=v1.26
# Option 2: Kubernetes built-in audit log with deprecated API filter
kubectl get --raw /metrics | grep apiserver_requested_deprecated_apis
# Option 3: kubent (kube no-trouble)
kubent
11 · Production Implications of Kubernetes History
Understanding this history directly impacts how you run Kubernetes in production:
7 production lessons from K8s architectural history
-
The declarative API is non-negotiable.
The entire ecosystem — GitOps, operators, autoscalers — depends on it.
Avoid imperative
kubectl runorkubectl createin production; always usekubectl apply -fwith versioned manifests. - etcd is the source of truth. The Borg lesson: store state outside the control plane components. Back up etcd. Treat it as your most critical service — it IS your cluster. Without etcd, you cannot recover the cluster state.
-
Controllers are eventually consistent.
Like Borg's reconciliation loops, Kubernetes controllers do not guarantee
immediate consistency. A
kubectl applyreturns success when the API server accepts the write — not when the change has fully propagated. Design your tooling around watch + status checking, not timing assumptions. - The watch API scales better than polling. Both Borg and Omega learned that polling creates thundering herds. K8s controllers use informers (List+Watch with local cache). Your own operators should do the same — use controller-runtime or client-go informers.
- Upgrades require version skew awareness. K8s's support policy (N-2 skew) is strict for good reasons — the Borg team learned that running mixed versions creates impossible-to-debug race conditions. Always upgrade control plane first, then nodes, never skip minor versions.
- Labels and selectors decouple services from implementation. Borg's tight coupling between job name and service discovery was a maintenance burden. K8s label selectors mean you can replace all Pods in a Deployment without changing any Service configuration — blue/green is just a label swap.
- Extension points exist for a reason — use them. CRDs, admission webhooks, and scheduler plugins are the designed extension points. Adding features by forking K8s (as some operators tried early on) creates an unmaintainable upgrade nightmare. The operator pattern solved this.
Next Files to Study
Dependency Graph — recommended reading order from this file
- 00-foundations/02-container-orchestration.html — Linux primitives (cgroups, namespaces) that make containers possible
- 00-foundations/03-cluster-architecture-overview.html — Full component interaction diagram and HA topology
- 00-foundations/04-kubernetes-api-model.html — API versioning, watch semantics, etcd key encoding
- 01-control-plane/02-etcd.html — Raft consensus, etcd internals, backup/restore
- 09-production-operations/02-cluster-upgrades.html — Safe upgrade procedures informed by version history
References
- Large-scale cluster management at Google with Borg — Verma et al., EuroSys 2015. The foundational paper.
- Omega: flexible, scalable schedulers for large compute clusters — Schwarzkopf et al., EuroSys 2013.
- Borg, Omega, and Kubernetes — Burns, Grant, Oppenheimer, Brewer, Wilkes. ACM Queue, 2016. The best single overview of the lineage.
- Kubernetes GitHub — github.com/kubernetes/kubernetes — CHANGELOG.md for per-version history
- Kubernetes Enhancement Proposals — github.com/kubernetes/enhancements
- CNCF Landscape — landscape.cncf.io
- Kubernetes release notes — kubernetes.io/docs/setup/release/notes
- The Kubernetes Book — Nigel Poulton (updated annually)