Introduction to Kubernetes
Kubernetes (often abbreviated as K8s) is a production-grade, open-source platform for automating deployment, scaling, and management of containerised applications. It provides a uniform, declarative API surface that abstracts away the underlying infrastructure so engineers can focus on expressing what they want rather than how to achieve it.
1 · What is Kubernetes?
At its core Kubernetes is a distributed systems operating system. Just as a single-machine OS (Linux, Windows) schedules processes, manages memory, handles I/O, and provides system-call abstractions, Kubernetes performs the equivalent tasks across a fleet of machines for containerised workloads.
| Single-Machine OS analogy | Kubernetes equivalent |
|---|---|
| Process (PID) | Container (inside a Pod) |
| Process group | Pod |
| Process supervisor (systemd/launchd) | kubelet + controllers |
| Scheduler (CFS) | kube-scheduler |
| File system mount | PersistentVolume + CSI driver |
| Network socket | Service + kube-proxy |
| System config (/etc) | ConfigMap / Secret |
| Kernel (kernel.org) | kube-apiserver + etcd |
| cron | CronJob |
| Daemon (/etc/init.d) | DaemonSet |
Unlike a traditional OS, Kubernetes is cluster-wide: the unit of resource is a node (physical or virtual machine), and workloads are transparently distributed across all nodes by the scheduler.
Official definition (project README)
"Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery."
Broader production definition
For platform engineers, Kubernetes is also:
- A declarative configuration API – you describe desired state in YAML/JSON; the system converges reality toward that state.
- A self-healing runtime – components watch for drift and correct it automatically, without human intervention.
- A control plane for your entire infrastructure – networking, storage, compute, certificates, and secrets are all first-class API resources.
- An extensibility framework – CRDs, webhooks, and the aggregation layer let you extend the API to model any domain concept.
- A standard layer – the same manifests run on bare metal in your own data centre, AWS, GCP, Azure, Raspberry Pi, or a laptop.
2 · Why Kubernetes Exists
To understand why Kubernetes exists, you must understand the problems that preceded it. The evolution goes through three distinct eras:
Era 1 — Bare Metal (pre-2000s)
- One application per physical server
- Massive under-utilisation (5–15% CPU avg)
- Long provisioning cycles (weeks)
- No isolation between services
- Manual patching and deployment
Era 2 — Virtual Machines (2000s–2013)
- Better utilisation via hypervisors
- Still slow (minutes to boot)
- Image drift between environments
- "Works on my VM" problems
- Config management sprawl (Chef/Puppet)
Era 3 — Containers (2013–present)
- Millisecond startup via cgroups/namespaces
- Immutable images = reproducibility
- High density (100s per node)
- New problem: who orchestrates them?
- K8s solves the orchestration layer
When Docker popularised containers in 2013, organisations quickly ran into the "container sprawl" problem: running hundreds of containers manually across dozens of hosts is operationally impossible. You need:
- Automatic placement of containers onto healthy nodes
- Restart of containers that crash
- Scaling up or down based on demand
- Rolling updates with zero downtime
- Service discovery – how does container A find container B?
- Load balancing across replicas
- Secret and configuration management
- Resource accounting and fair-share scheduling
- Storage lifecycle management
- Health checking and eviction
Google had already solved this problem internally with Borg (2003) and later Omega. Kubernetes (2014) is the open-source, redesigned-for-community successor that incorporates a decade of Google's production learnings. See 01 · History for the full timeline.
3 · Problems Kubernetes Solves
3.1 Scheduling and Placement
Manually deciding which server runs which container is error-prone and doesn't scale. The kube-scheduler evaluates every node's available CPU/memory, hardware topology, affinity rules, taints, and policies to place each Pod on the optimal node — automatically, in under a millisecond at moderate scale.
3.2 Self-Healing
Controllers run reconciliation loops: they compare desired state (stored in etcd) with observed state (reported by the kubelet) and take corrective actions — restarting crashed containers, replacing failed nodes, rescheduling evicted Pods — without paging anyone.
Reconciliation loop pseudocode
// Simplified reconciler pattern (every controller implements this)
func (r *DeploymentController) Reconcile(ctx context.Context, req Request) (Result, error) {
// 1. Fetch desired state from API server
desired := r.Get(req.NamespacedName)
// 2. Fetch current state from cluster
current := r.ListPods(desired.Spec.Selector)
// 3. Diff
missing := desired.Replicas - len(current.Running)
// 4. Act
if missing > 0 {
r.CreatePods(missing, desired.Template)
} else if missing < 0 {
r.DeletePods(-missing)
}
// 5. Update status
r.UpdateStatus(desired, current)
return Result{}, nil
}
3.3 Horizontal Scaling
The Horizontal Pod Autoscaler reads metrics (CPU, memory, custom) from the Metrics Server
or Prometheus Adapter and adjusts the replicas field of a Deployment/ReplicaSet
automatically. Cluster Autoscaler then provisions or deprovisions nodes as the pod demand changes.
3.4 Rolling Updates and Rollbacks
Deployments orchestrate rolling updates: it spins up new Pods with the new image,
waits for them to become Ready, then terminates old Pods — ensuring zero downtime.
A single kubectl rollout undo reverts to the previous ReplicaSet revision.
# Trigger a rolling update
kubectl set image deployment/web app=myimage:v2
# Watch rollout progress
kubectl rollout status deployment/web
# Undo if something is wrong
kubectl rollout undo deployment/web
# Rollout history
kubectl rollout history deployment/web
3.5 Service Discovery and Load Balancing
A Service object gives a stable virtual IP (ClusterIP) and DNS name to a dynamic set of Pods. kube-proxy programs iptables/IPVS rules so that traffic to the ClusterIP is load-balanced across all healthy Pod endpoints. The Pods themselves can be rescheduled and get new IPs — the Service absorbs that churn transparently.
3.6 Configuration and Secret Management
ConfigMaps store non-sensitive configuration (environment variables, config files). Secrets store sensitive data (passwords, TLS certs, tokens) encrypted at rest in etcd and mounted into Pods as environment variables or files. External secret stores (Vault, AWS Secrets Manager) can integrate via CSI Secret Store or External Secrets Operator.
3.7 Storage Lifecycle
The CSI (Container Storage Interface) plugin model decouples storage drivers from the Kubernetes release cycle. PersistentVolume and PersistentVolumeClaim objects model the storage lifecycle: claim, bind, mount, unmount, release, and (optionally) reclaim. Storage Classes enable dynamic provisioning — Pods get volumes without pre-provisioned PVs.
3.8 Environment Parity
The same container image and manifest that runs on a developer's laptop (kind, minikube, k3s) runs identically in production. The CRI, CNI, and CSI plugin model means only the plugin implementations differ — the API is identical everywhere.
4 · Traditional Systems vs Kubernetes
| Dimension | Traditional (VMs + scripts) | Kubernetes |
|---|---|---|
| Deployment unit | RPM/deb package, systemd unit, VM image | OCI container image + YAML manifest |
| Configuration model | Imperative (bash/Ansible/Chef) | Declarative (YAML, controllers reconcile) |
| Scheduling | Manual or simplistic (round-robin) | Multi-constraint bin-packing scheduler |
| Self-healing | Nagios alert → on-call → SSH → restart | kubelet detects crash → container restarted in seconds |
| Scaling | Manual + pre-provisioned capacity | HPA + Cluster Autoscaler (dynamic, seconds) |
| Service discovery | Static DNS / load balancer config | CoreDNS auto-registers every Service |
| Rolling updates | Complex scripts, long maintenance windows | Built-in, zero-downtime by default |
| Secret management | Config files on disk, SSH key sharing | Secrets API, RBAC-controlled, optional encryption |
| Multi-tenancy | Separate physical/VM clusters | Namespaces + RBAC + Network Policies + quotas |
| Observability | Fragmented (Nagios, Splunk, bespoke) | Standardised (metrics-server, Prometheus, OpenTelemetry) |
| Audit trail | Sparse, per-machine | Structured audit log from API server for every mutation |
| Time-to-deploy | Hours (VM) / minutes (script) | Seconds (Pod scheduling + container pull) |
| Infrastructure portability | Cloud-specific, vendor locked | Same manifests on any conformant cluster |
| Extension model | Bespoke automation, ad-hoc tooling | CRDs + Operators + Admission Webhooks |
5 · The Kubernetes Mental Model
Before diving into internals, you need the right mental model. There are three key ideas that unlock everything else:
5.1 Desired State vs. Observed State
Every Kubernetes resource object has two logical halves:
spec— what you want (desired state). You write this.status— what the cluster has (observed state). The system writes this.
Controllers are the agents that continuously work to make status match spec.
This is the control loop, and it is the fundamental pattern behind every Kubernetes component.
5.2 Everything is an API Object
Kubernetes exposes a RESTful API. Every resource — Pod, Node, Service, ConfigMap,
CRD instance — is an object with a versioned API group, a Kind,
metadata (name, namespace, labels, annotations), spec, and status.
# Anatomy of a Kubernetes resource object
apiVersion: apps/v1 # <group>/<version> — "core" group uses just "v1"
kind: Deployment # Object type
metadata:
name: web # Unique name within namespace
namespace: production # Logical partition
labels: # Key/value pairs — used by selectors
app: web
version: v3
annotations: # Non-identifying metadata — arbitrary data
deployment.kubernetes.io/revision: "3"
spec: # DESIRED STATE — you write this
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: app
image: myapp:v3
resources:
requests:
cpu: "250m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
status: # OBSERVED STATE — system writes this
replicas: 3
readyReplicas: 3
availableReplicas: 3
conditions:
- type: Available
status: "True"
5.3 Watch, Not Poll
The API server supports long-lived watch connections (HTTP chunked streaming, WebSocket, or gRPC depending on client). Every controller, kubelet, and kubectl informer uses watch to receive change events immediately instead of polling. This is what makes Kubernetes highly responsive — a controller reacts in milliseconds to a state change, not after a 30-second poll interval.
# Watching all Pod events in real time (same mechanism controllers use internally)
kubectl get pods --watch
# Watch a specific resource version (raw API)
curl -k https://<apiserver>/api/v1/pods?watch=1&resourceVersion=12345 \
-H "Authorization: Bearer <token>"
6 · Cluster Architecture Overview
A Kubernetes cluster consists of two logical planes:
- Control Plane — the brain: API server, etcd, scheduler, controller manager, cloud-controller-manager.
- Data Plane (worker nodes) — the muscle: kubelet, kube-proxy, container runtime, CNI plugin.
6.1 Control Plane Components
The control plane is typically run on dedicated nodes (control plane nodes, formerly called masters). In a production HA cluster you have 3 or 5 control plane nodes.
| Component | Role | Port(s) | Deep-dive |
|---|---|---|---|
kube-apiserver |
The only stateful frontend. Validates, persists, and serves all API objects. The single source of truth gateway. | 6443 (HTTPS) | → |
etcd |
Distributed key-value store using the Raft consensus algorithm. Stores all cluster state. The only persistent component. | 2379 (client), 2380 (peer) | → |
kube-scheduler |
Watches for unscheduled Pods and assigns them to nodes using a pipeline of filter and score plugins. | 10259 (HTTPS metrics) | → |
kube-controller-manager |
Runs 30+ control loops: ReplicaSet, Node, Endpoint, Namespace, Job, etc. Each loop reconciles a resource type. | 10257 (HTTPS metrics) | → |
cloud-controller-manager |
Cloud-provider-specific controllers: LoadBalancer, Node (cloud metadata), Route. Decouples cloud logic from core. | 10258 (HTTPS metrics) | → |
6.2 Worker Node Components
| Component | Role | Port(s) | Deep-dive |
|---|---|---|---|
kubelet |
Primary node agent. Watches for Pods assigned to its node via API server watch. Creates/destroys containers via CRI. Reports node/pod status. | 10250 (HTTPS) | → |
kube-proxy |
Programs host networking rules (iptables/IPVS/nftables/eBPF) to implement the Service virtual IP abstraction. | 10256 (healthz) | → |
| Container Runtime (CRI) | Runs containers. containerd and CRI-O are standard. Docker shim was removed in 1.24. | Unix socket / gRPC | → |
| CNI plugin | Programs pod networking: assigns IPs, creates veth pairs, sets up routes. Calico, Cilium, Flannel, Weave, etc. | — | → |
7 · The Declarative Model in Depth
The declarative model is Kubernetes' most important design decision. Contrast it with the imperative alternative:
Imperative vs Declarative — side-by-side example
## IMPERATIVE (old way — scripting)
# Create a container
docker run -d --name web -p 80:8080 --restart=always myapp:v1
# Scale it (manual loop)
for i in 2 3; do
docker run -d --name web-$i -p 808$i:8080 myapp:v1
done
# Update it (manual, risky)
docker stop web; docker rm web
docker run -d --name web -p 80:8080 myapp:v2
# Problems:
# - State is in your head / scripts
# - Not idempotent
# - No audit trail
# - Does not self-heal
## DECLARATIVE (Kubernetes way)
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
replicas: 3
selector:
matchLabels: {app: web}
template:
metadata:
labels: {app: web}
spec:
containers:
- name: app
image: myapp:v1
ports:
- containerPort: 8080
---
# Benefits:
# - Apply is idempotent (kubectl apply is safe to run repeatedly)
# - State is in version control (GitOps)
# - Any drift is auto-corrected by controllers
# - Rolling update = change image: myapp:v2, reapply
When you run kubectl apply -f deployment.yaml, here is what happens internally:
- kubectl serialises the object to JSON and sends an HTTP PATCH (or POST for new objects) to the API server.
- The API server authenticates and authorises the request, runs admission webhooks, validates the schema, and persists the object to etcd.
- The relevant controller (e.g., Deployment controller) receives a watch event for the changed object.
- The controller computes the diff and makes further API calls (e.g., creates a new ReplicaSet).
- The scheduler sees the unscheduled Pods and binds them to nodes.
- The kubelet on each target node sees the Pod binding and asks the CRI to pull the image and start containers.
- The kubelet reports Pod status back to the API server, which stores it in etcd.
The entire chain from kubectl apply to running containers typically completes in 2–30 seconds, depending on image pull time.
8 · The API Object Model
8.1 API Groups and Versions
Kubernetes organises resources into API groups that evolve independently.
| API Group | apiVersion example | Key resources |
|---|---|---|
| core (legacy) | v1 | Pod, Service, ConfigMap, Secret, Node, PersistentVolume, Namespace |
| apps | apps/v1 | Deployment, ReplicaSet, StatefulSet, DaemonSet |
| batch | batch/v1 | Job, CronJob |
| networking.k8s.io | networking.k8s.io/v1 | Ingress, NetworkPolicy, IngressClass |
| gateway.networking.k8s.io | gateway.networking.k8s.io/v1 | Gateway, HTTPRoute, GRPCRoute |
| storage.k8s.io | storage.k8s.io/v1 | StorageClass, VolumeAttachment, CSINode |
| rbac.authorization.k8s.io | rbac.authorization.k8s.io/v1 | Role, ClusterRole, RoleBinding, ClusterRoleBinding |
| autoscaling | autoscaling/v2 | HorizontalPodAutoscaler |
| policy | policy/v1 | PodDisruptionBudget |
| apiextensions.k8s.io | apiextensions.k8s.io/v1 | CustomResourceDefinition |
| Custom (CRDs) | myco.io/v1alpha1 | Whatever you define via CRD |
8.2 Resource Naming and Namespacing
Resources are either cluster-scoped (Nodes, PersistentVolumes, ClusterRoles, Namespaces)
or namespace-scoped (Pods, Deployments, Services, Secrets).
A fully-qualified resource reference is:
<apiVersion>/<kind>/<namespace>/<name> for namespace-scoped,
<apiVersion>/<kind>/<name> for cluster-scoped.
8.3 Labels, Annotations, and Selectors
Labels are key/value pairs used for grouping and selection. Selectors are queries over labels — Services, ReplicaSets, and NetworkPolicies all use label selectors to identify target Pods.
Annotations carry arbitrary non-identifying metadata — tool configuration, audit information, last-applied config. They are not queryable as selectors.
metadata:
labels:
app: web
env: production
tier: frontend
version: v3.2.1
annotations:
# Ingress controller config
nginx.ingress.kubernetes.io/rewrite-target: /
# GitOps metadata
argocd.argoproj.io/managed-by: argocd
# Deployment tooling
kubectl.kubernetes.io/last-applied-configuration: |
{...}
9 · Core Concepts Glossary
Full glossary of 50+ Kubernetes terms
| Term | Definition |
|---|---|
| Pod | Smallest deployable unit. 1+ containers sharing a network namespace and (optionally) volumes. |
| Container | An OCI-compliant process with its own filesystem layer, run by a CRI-compliant runtime. |
| Node | A physical or virtual machine in the cluster running kubelet, kube-proxy, and a CRI runtime. |
| Namespace | Virtual cluster within a cluster. Provides scope for names, RBAC, quotas. |
| ReplicaSet | Maintains N replicas of a Pod template. Rarely used directly — Deployments own them. |
| Deployment | Manages ReplicaSets to provide declarative updates and rollbacks for stateless workloads. |
| StatefulSet | Like a Deployment but with stable network identities and ordered, graceful operations — for stateful apps. |
| DaemonSet | Ensures one Pod per node (or per matching node) — for cluster-wide agents: CNI, log shippers, etc. |
| Job | Runs Pods to completion. Retries on failure. Tracks success/failure counts. |
| CronJob | Creates Jobs on a cron schedule. |
| Service | Stable virtual IP + DNS name for a set of Pods. Implements load balancing via kube-proxy. |
| ClusterIP | Default Service type. VIP only reachable within the cluster. |
| NodePort | Exposes a Service on a static port on every node's IP. Rarely used in production directly. |
| LoadBalancer | Requests a cloud load balancer. Usually used for external-facing services in cloud environments. |
| Ingress | HTTP/HTTPS layer-7 routing rules (virtual hosts, paths). Requires an Ingress controller. |
| Gateway API | Next-gen replacement for Ingress. Richer model with roles, traffic policies, TCP/UDP support. |
| ConfigMap | Stores non-sensitive key-value config. Injected as env vars or mounted files. |
| Secret | Base64-encoded (optionally encrypted) sensitive data. Same injection mechanics as ConfigMap. |
| PersistentVolume (PV) | A piece of storage provisioned by an admin or dynamically. Cluster resource, not namespaced. |
| PersistentVolumeClaim (PVC) | A namespace-scoped request for storage. Binds to a matching PV. |
| StorageClass | Defines how storage is dynamically provisioned (provisioner, parameters, reclaim policy). |
| Volume | A directory accessible to containers in a Pod. Ephemeral or persistent. |
| emptyDir | Temporary volume, lives as long as the Pod. Good for inter-container data sharing. |
| hostPath | Mounts a host filesystem path. Security risk — avoid in multi-tenant clusters. |
| ServiceAccount | Pod identity for API server authentication. Mounted as a projected token. |
| Role / ClusterRole | A set of permissions (verbs on resources). Role is namespace-scoped, ClusterRole is cluster-scoped. |
| RoleBinding / ClusterRoleBinding | Assigns a Role or ClusterRole to a subject (user, group, ServiceAccount). |
| RBAC | Role-Based Access Control — Kubernetes' primary authorisation mechanism. |
| NetworkPolicy | Firewall rules at the Pod level. Requires a CNI plugin that implements them (Calico, Cilium, etc.). |
| HPA | HorizontalPodAutoscaler — scales replicas based on metrics. |
| VPA | VerticalPodAutoscaler — right-sizes container resource requests/limits. |
| PodDisruptionBudget (PDB) | Limits voluntary disruptions to maintain minimum availability during node drains/upgrades. |
| LimitRange | Sets default and maximum resource requests/limits per Pod/container in a namespace. |
| ResourceQuota | Caps total resource consumption (CPU, memory, object counts) per namespace. |
| CRD | CustomResourceDefinition — extends the Kubernetes API with new resource types. |
| Operator | A controller that manages a CRD-based resource, encoding domain-specific operational knowledge. |
| Admission Controller | API server plug-in that intercepts requests before persistence to validate or mutate them. |
| Webhook | External HTTP server called by the API server for admission. Validating or Mutating. |
| OwnerReference | Links child objects to their parent (e.g., Pod → ReplicaSet → Deployment). Enables cascading deletes. |
| Finalizer | Prevents object deletion until a controller removes the finalizer — used for clean-up logic. |
| taint | A node property that repels Pods unless they have a matching toleration. |
| toleration | A Pod property allowing it to be scheduled on a tainted node. |
| affinity | Scheduling constraint expressing attraction (required or preferred) to nodes or other Pods. |
| anti-affinity | Scheduling constraint expressing repulsion from nodes or other Pods — for spread/HA. |
| QoS class | Guaranteed, Burstable, or BestEffort — determines eviction priority under memory pressure. |
| cgroups | Linux kernel resource isolation: CPU, memory, I/O, PID counts — per container. |
| namespace (Linux) | Linux kernel isolation primitive: pid, net, mnt, uts, ipc, user, cgroup — not Kubernetes Namespace. |
| eBPF | Extended Berkeley Packet Filter — kernel-level programmable hooks used by Cilium, Falco, etc. |
| Informer | Client-go library component: combines List+Watch with a local cache + event handlers. |
| leader election | Mechanism using a Lease API object to ensure only one controller instance is active at a time. |
10 · Platform Personas — Who Uses Kubernetes and How
Kubernetes serves many different engineering roles. Understanding which persona you are helps you focus on the right sections of this documentation.
Platform Engineers
- Install and upgrade clusters
- Design namespace and RBAC topology
- Manage CRDs, operators, admission webhooks
- Build internal developer platforms on K8s
- → See full guide
Application Developers
- Write Deployments, Services, Ingress
- Configure resource requests/limits
- Use ConfigMaps and Secrets
- Debug pod failures, read logs
- → See full guide
SRE Teams
- Define SLOs and PDBs
- Tune HPA/VPA/Cluster Autoscaler
- Operate on-call for cluster incidents
- Capacity planning, cost attribution
- → See full guide
Security Teams
- RBAC auditing and policy enforcement
- OPA/Gatekeeper or Kyverno policies
- Pod Security Standards / Admission
- Image scanning, runtime security (Falco)
- → See full guide
DevOps / CI-CD
- Build CI pipelines that push images
- Manage Helm charts / Kustomize overlays
- GitOps with ArgoCD or FluxCD
- Blue/green and canary deployments
- → See full guide
Data / MLOps Engineers
- Run Spark/Flink/Ray on Kubernetes
- GPU node pools and device plugins
- Kubeflow for ML pipelines
- StatefulSet databases, object storage
- → See full guide
11 · Quick Reference: Essential kubectl Commands
# ---- Cluster Info ----
kubectl cluster-info # API server URL and CoreDNS
kubectl get nodes -o wide # All nodes with IPs, OS, kernel
kubectl describe node <node-name> # Node capacity, allocatable, events
kubectl get componentstatuses # etcd, scheduler, controller-manager health (deprecated 1.20+)
kubectl get --raw /healthz # API server health check
kubectl get --raw /metrics | head -40 # Prometheus metrics from API server
# ---- Namespace management ----
kubectl get namespaces
kubectl create namespace prod
kubectl config set-context --current --namespace=prod
# ---- Pod operations ----
kubectl get pods -A -o wide # All pods, all namespaces, node info
kubectl describe pod <name> # Full event stream, status, volumes
kubectl logs <pod> -c <container> --previous # Previous container's logs
kubectl exec -it <pod> -- bash # Interactive shell
kubectl port-forward pod/<name> 8080:80 # Local port tunnel
# ---- Deployment operations ----
kubectl apply -f manifest.yaml # Declarative apply (idempotent)
kubectl diff -f manifest.yaml # Preview changes before apply
kubectl rollout status deployment/<name> # Watch rollout progress
kubectl rollout history deployment/<name> # See revision history
kubectl rollout undo deployment/<name> # Rollback to previous revision
kubectl scale deployment/<name> --replicas=5
# ---- Debugging ----
kubectl get events --sort-by=.lastTimestamp
kubectl top nodes # CPU/Memory usage (needs metrics-server)
kubectl top pods -A --sort-by=memory
kubectl debug node/<node> -it --image=busybox # Debug node with privileged pod
kubectl auth can-i create pods --as=serviceaccount:default:mysa # RBAC check
# ---- Raw API access ----
kubectl get --raw /api/v1/namespaces
kubectl get --raw /apis/apps/v1/deployments
kubectl proxy & # Proxy API to localhost:8001 (no TLS)
12 · Production Best Practices (Introduction Level)
12 foundational best practices
- Always set resource requests AND limits on every container. Without requests, the scheduler cannot make good placement decisions. Without limits, a runaway container can starve the entire node.
-
Use namespaces for isolation. Don't run production workloads in
default. At minimum:kube-system(platform),monitoring, one namespace per team or application. - Configure liveness and readiness probes. Without them, Kubernetes cannot determine if your container is actually healthy. A started but broken container with no probe will receive traffic indefinitely.
- Run 3 control plane nodes for HA. etcd requires a quorum; 3 nodes tolerate 1 failure. 5 nodes tolerate 2 failures. Never run a single control plane in production.
- Separate etcd from other control plane components on dedicated nodes in large clusters. etcd is I/O-heavy; noisy neighbours degrade consensus latency.
-
Enable RBAC and disable anonymous access.
Never use
cluster-adminfor application service accounts. Follow the principle of least privilege. - Use PodDisruptionBudgets for critical workloads to prevent total unavailability during node drains and cluster upgrades.
-
Tag everything with labels:
app,env,team,version. This enables cost attribution, filtering, and policy targeting. - Run a Network Policy default-deny in each namespace and explicitly allow only required flows. Without it, any pod can reach any other pod.
- Enable audit logging on the API server. Store audit logs externally. They are your forensic record for security incidents and compliance.
- Back up etcd regularly to an off-cluster location. etcd holds the entire cluster state — without it, recovery is a rebuild from scratch.
- Use a container image vulnerability scanner (Trivy, Snyk, Anchore) in CI and enforce admission policies to block known-critical CVEs.
13 · Kubernetes Versioning and Release Cadence
Kubernetes follows a semantic-ish versioning scheme: v<MAJOR>.<MINOR>.<PATCH>.
In practice MAJOR is always 1; every new feature release is a MINOR bump. As of early 2025,
the current stable release is 1.32.
- Release cadence: 3 minor releases per year (roughly every 4 months).
- Support window: 3 minor versions supported at any time (N, N-1, N-2). Each minor version supported for ~14 months.
- API deprecation policy: GA APIs supported for 12 months / 3 releases minimum. Beta APIs for 9 months / 3 releases.
- Skew policy: kubelet may be at most 2 minor versions behind the API server. kube-proxy must match kubelet or be 1 behind.
# Check cluster and client versions
kubectl version --short
# Deprecated API check (before upgrading)
kubectl convert -f old-manifest.yaml --output-version apps/v1 # requires kubectl convert plugin
# API resource availability
kubectl api-resources
kubectl api-versions | grep apps
14 · Introduction to Troubleshooting Mindset
Kubernetes debugging follows a systematic top-down approach:
- Start with events:
kubectl describe pod <name>— the Events section at the bottom tells you what happened. - Check controller logs: if a Deployment isn't rolling, check the controller-manager logs.
- Check scheduler logs: if Pods are stuck in
Pending, the scheduler log shows why it couldn't place them. - Check kubelet logs on the target node: container pull errors, CRI failures, volume mount failures appear here.
- Check the container logs:
kubectl logs <pod> -p(previous container) for crash reasons. - Check network: use
kubectl exec -it <debug-pod> -- curl http://<service>to test connectivity.
Next Files to Study
Dependency Graph — where to go after this file
- 00-foundations/01-history-of-kubernetes.html — Borg → Omega → Kubernetes lineage, key design decisions
- 00-foundations/02-container-orchestration.html — cgroups, namespaces, OCI, CRI deep dive
- 00-foundations/03-cluster-architecture-overview.html — Component interactions, HA topology, network topology
- 00-foundations/04-kubernetes-api-model.html — API object lifecycle, watch semantics, etcd encoding
- 01-control-plane/00-control-plane-overview.html — Full control plane internals
- 11-api-flows/01-pod-creation-sequence.html — End-to-end pod creation sequence diagram
References
- Kubernetes Official Documentation — kubernetes.io/docs
- Borg, Omega, and Kubernetes (Google, ACM Queue 2016) — foundational paper describing the lineage
- Kubernetes the Hard Way — Kelsey Hightower's manual cluster bootstrap tutorial
- CNCF Kubernetes Architecture Guide — github.com/cncf/k8s-conformance
- Kubernetes API Reference — kubernetes.io/docs/reference/kubernetes-api/
- client-go source — github.com/kubernetes/client-go (informers, work queues, leader election)
- OCI Image Spec — github.com/opencontainers/image-spec
- OCI Runtime Spec — github.com/opencontainers/runtime-spec
- CNI Spec — github.com/containernetworking/cni
- CSI Spec — github.com/container-storage-interface/spec