File Path 00-foundations/00-introduction.html
Prerequisites Basic Linux familiarity, understanding of processes, and networking (TCP/IP). No prior Kubernetes knowledge required.
Concepts Covered
What is Kubernetes Why K8s exists Problems solved Traditional vs container deployment Mental model Architecture overview Declarative model Control loop API resource model Core components Cluster topology Platform personas
Related Files

Introduction to Kubernetes

Kubernetes (often abbreviated as K8s) is a production-grade, open-source platform for automating deployment, scaling, and management of containerised applications. It provides a uniform, declarative API surface that abstracts away the underlying infrastructure so engineers can focus on expressing what they want rather than how to achieve it.

Reading guide This document is the conceptual entry-point to a 13-section, 90+ file reference. Every section below links forward to the dedicated deep-dive file. Start here, then follow the Next → button at the bottom to proceed through the series in order.

1 · What is Kubernetes?

At its core Kubernetes is a distributed systems operating system. Just as a single-machine OS (Linux, Windows) schedules processes, manages memory, handles I/O, and provides system-call abstractions, Kubernetes performs the equivalent tasks across a fleet of machines for containerised workloads.

Single-Machine OS analogyKubernetes equivalent
Process (PID)Container (inside a Pod)
Process groupPod
Process supervisor (systemd/launchd)kubelet + controllers
Scheduler (CFS)kube-scheduler
File system mountPersistentVolume + CSI driver
Network socketService + kube-proxy
System config (/etc)ConfigMap / Secret
Kernel (kernel.org)kube-apiserver + etcd
cronCronJob
Daemon (/etc/init.d)DaemonSet

Unlike a traditional OS, Kubernetes is cluster-wide: the unit of resource is a node (physical or virtual machine), and workloads are transparently distributed across all nodes by the scheduler.

Official definition (project README)

"Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery."

Broader production definition

For platform engineers, Kubernetes is also:

  • A declarative configuration API – you describe desired state in YAML/JSON; the system converges reality toward that state.
  • A self-healing runtime – components watch for drift and correct it automatically, without human intervention.
  • A control plane for your entire infrastructure – networking, storage, compute, certificates, and secrets are all first-class API resources.
  • An extensibility framework – CRDs, webhooks, and the aggregation layer let you extend the API to model any domain concept.
  • A standard layer – the same manifests run on bare metal in your own data centre, AWS, GCP, Azure, Raspberry Pi, or a laptop.

2 · Why Kubernetes Exists

To understand why Kubernetes exists, you must understand the problems that preceded it. The evolution goes through three distinct eras:

Era 1 — Bare Metal (pre-2000s)

  • One application per physical server
  • Massive under-utilisation (5–15% CPU avg)
  • Long provisioning cycles (weeks)
  • No isolation between services
  • Manual patching and deployment

Era 2 — Virtual Machines (2000s–2013)

  • Better utilisation via hypervisors
  • Still slow (minutes to boot)
  • Image drift between environments
  • "Works on my VM" problems
  • Config management sprawl (Chef/Puppet)

Era 3 — Containers (2013–present)

  • Millisecond startup via cgroups/namespaces
  • Immutable images = reproducibility
  • High density (100s per node)
  • New problem: who orchestrates them?
  • K8s solves the orchestration layer

When Docker popularised containers in 2013, organisations quickly ran into the "container sprawl" problem: running hundreds of containers manually across dozens of hosts is operationally impossible. You need:

  • Automatic placement of containers onto healthy nodes
  • Restart of containers that crash
  • Scaling up or down based on demand
  • Rolling updates with zero downtime
  • Service discovery – how does container A find container B?
  • Load balancing across replicas
  • Secret and configuration management
  • Resource accounting and fair-share scheduling
  • Storage lifecycle management
  • Health checking and eviction

Google had already solved this problem internally with Borg (2003) and later Omega. Kubernetes (2014) is the open-source, redesigned-for-community successor that incorporates a decade of Google's production learnings. See 01 · History for the full timeline.

3 · Problems Kubernetes Solves

3.1 Scheduling and Placement

Manually deciding which server runs which container is error-prone and doesn't scale. The kube-scheduler evaluates every node's available CPU/memory, hardware topology, affinity rules, taints, and policies to place each Pod on the optimal node — automatically, in under a millisecond at moderate scale.

3.2 Self-Healing

Controllers run reconciliation loops: they compare desired state (stored in etcd) with observed state (reported by the kubelet) and take corrective actions — restarting crashed containers, replacing failed nodes, rescheduling evicted Pods — without paging anyone.

Reconciliation loop pseudocode
// Simplified reconciler pattern (every controller implements this)
func (r *DeploymentController) Reconcile(ctx context.Context, req Request) (Result, error) {
    // 1. Fetch desired state from API server
    desired := r.Get(req.NamespacedName)

    // 2. Fetch current state from cluster
    current := r.ListPods(desired.Spec.Selector)

    // 3. Diff
    missing := desired.Replicas - len(current.Running)

    // 4. Act
    if missing > 0 {
        r.CreatePods(missing, desired.Template)
    } else if missing < 0 {
        r.DeletePods(-missing)
    }

    // 5. Update status
    r.UpdateStatus(desired, current)
    return Result{}, nil
}

3.3 Horizontal Scaling

The Horizontal Pod Autoscaler reads metrics (CPU, memory, custom) from the Metrics Server or Prometheus Adapter and adjusts the replicas field of a Deployment/ReplicaSet automatically. Cluster Autoscaler then provisions or deprovisions nodes as the pod demand changes.

3.4 Rolling Updates and Rollbacks

Deployments orchestrate rolling updates: it spins up new Pods with the new image, waits for them to become Ready, then terminates old Pods — ensuring zero downtime. A single kubectl rollout undo reverts to the previous ReplicaSet revision.

# Trigger a rolling update
kubectl set image deployment/web app=myimage:v2

# Watch rollout progress
kubectl rollout status deployment/web

# Undo if something is wrong
kubectl rollout undo deployment/web

# Rollout history
kubectl rollout history deployment/web

3.5 Service Discovery and Load Balancing

A Service object gives a stable virtual IP (ClusterIP) and DNS name to a dynamic set of Pods. kube-proxy programs iptables/IPVS rules so that traffic to the ClusterIP is load-balanced across all healthy Pod endpoints. The Pods themselves can be rescheduled and get new IPs — the Service absorbs that churn transparently.

3.6 Configuration and Secret Management

ConfigMaps store non-sensitive configuration (environment variables, config files). Secrets store sensitive data (passwords, TLS certs, tokens) encrypted at rest in etcd and mounted into Pods as environment variables or files. External secret stores (Vault, AWS Secrets Manager) can integrate via CSI Secret Store or External Secrets Operator.

3.7 Storage Lifecycle

The CSI (Container Storage Interface) plugin model decouples storage drivers from the Kubernetes release cycle. PersistentVolume and PersistentVolumeClaim objects model the storage lifecycle: claim, bind, mount, unmount, release, and (optionally) reclaim. Storage Classes enable dynamic provisioning — Pods get volumes without pre-provisioned PVs.

3.8 Environment Parity

The same container image and manifest that runs on a developer's laptop (kind, minikube, k3s) runs identically in production. The CRI, CNI, and CSI plugin model means only the plugin implementations differ — the API is identical everywhere.

4 · Traditional Systems vs Kubernetes

Dimension Traditional (VMs + scripts) Kubernetes
Deployment unit RPM/deb package, systemd unit, VM image OCI container image + YAML manifest
Configuration model Imperative (bash/Ansible/Chef) Declarative (YAML, controllers reconcile)
Scheduling Manual or simplistic (round-robin) Multi-constraint bin-packing scheduler
Self-healing Nagios alert → on-call → SSH → restart kubelet detects crash → container restarted in seconds
Scaling Manual + pre-provisioned capacity HPA + Cluster Autoscaler (dynamic, seconds)
Service discovery Static DNS / load balancer config CoreDNS auto-registers every Service
Rolling updates Complex scripts, long maintenance windows Built-in, zero-downtime by default
Secret management Config files on disk, SSH key sharing Secrets API, RBAC-controlled, optional encryption
Multi-tenancy Separate physical/VM clusters Namespaces + RBAC + Network Policies + quotas
Observability Fragmented (Nagios, Splunk, bespoke) Standardised (metrics-server, Prometheus, OpenTelemetry)
Audit trail Sparse, per-machine Structured audit log from API server for every mutation
Time-to-deploy Hours (VM) / minutes (script) Seconds (Pod scheduling + container pull)
Infrastructure portability Cloud-specific, vendor locked Same manifests on any conformant cluster
Extension model Bespoke automation, ad-hoc tooling CRDs + Operators + Admission Webhooks

5 · The Kubernetes Mental Model

Before diving into internals, you need the right mental model. There are three key ideas that unlock everything else:

5.1 Desired State vs. Observed State

Every Kubernetes resource object has two logical halves:

  • spec — what you want (desired state). You write this.
  • status — what the cluster has (observed state). The system writes this.

Controllers are the agents that continuously work to make status match spec. This is the control loop, and it is the fundamental pattern behind every Kubernetes component.

You (kubectl apply) spec: replicas: 3 etcd (desired state) PersistentStore Controller ReplicaSet controller Cluster (actual) Running Pods WRITE WATCH CREATE/DELETE UPDATE STATUS
Figure 1 — The control loop: desired state in etcd, controllers drive the cluster toward it.

5.2 Everything is an API Object

Kubernetes exposes a RESTful API. Every resource — Pod, Node, Service, ConfigMap, CRD instance — is an object with a versioned API group, a Kind, metadata (name, namespace, labels, annotations), spec, and status.

# Anatomy of a Kubernetes resource object
apiVersion: apps/v1         # <group>/<version> — "core" group uses just "v1"
kind: Deployment            # Object type
metadata:
  name: web                 # Unique name within namespace
  namespace: production     # Logical partition
  labels:                   # Key/value pairs — used by selectors
    app: web
    version: v3
  annotations:              # Non-identifying metadata — arbitrary data
    deployment.kubernetes.io/revision: "3"
spec:                       # DESIRED STATE — you write this
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: app
        image: myapp:v3
        resources:
          requests:
            cpu: "250m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "256Mi"
status:                     # OBSERVED STATE — system writes this
  replicas: 3
  readyReplicas: 3
  availableReplicas: 3
  conditions:
  - type: Available
    status: "True"

5.3 Watch, Not Poll

The API server supports long-lived watch connections (HTTP chunked streaming, WebSocket, or gRPC depending on client). Every controller, kubelet, and kubectl informer uses watch to receive change events immediately instead of polling. This is what makes Kubernetes highly responsive — a controller reacts in milliseconds to a state change, not after a 30-second poll interval.

# Watching all Pod events in real time (same mechanism controllers use internally)
kubectl get pods --watch

# Watch a specific resource version (raw API)
curl -k https://<apiserver>/api/v1/pods?watch=1&resourceVersion=12345 \
  -H "Authorization: Bearer <token>"

6 · Cluster Architecture Overview

A Kubernetes cluster consists of two logical planes:

  • Control Plane — the brain: API server, etcd, scheduler, controller manager, cloud-controller-manager.
  • Data Plane (worker nodes) — the muscle: kubelet, kube-proxy, container runtime, CNI plugin.
CONTROL PLANE kube-apiserver REST + Watch API Port 6443 (TLS) etcd Distributed KV Raft consensus kube-scheduler Pod placement Filter + Score kube-controller-manager Node, ReplicaSet, Deployment, Job… 30+ controllers cloud-controller-manager LoadBalancer, Node, Route WORKER NODE 1 kubelet Node agent kube-proxy iptables/IPVS containerd / CRI-O Container Runtime (CRI) Pod A 2 containers Pod B 1 container WORKER NODE 2 kubelet Node agent kube-proxy iptables/IPVS containerd / CRI-O Container Runtime (CRI) Pod C Pod D WORKER NODE N kubelet Node agent kube-proxy iptables/IPVS 1…1000s of nodes
Figure 2 — High-level cluster architecture: Control Plane (top) + N worker nodes (bottom). All components communicate via the kube-apiserver.

6.1 Control Plane Components

The control plane is typically run on dedicated nodes (control plane nodes, formerly called masters). In a production HA cluster you have 3 or 5 control plane nodes.

ComponentRolePort(s)Deep-dive
kube-apiserver The only stateful frontend. Validates, persists, and serves all API objects. The single source of truth gateway. 6443 (HTTPS)
etcd Distributed key-value store using the Raft consensus algorithm. Stores all cluster state. The only persistent component. 2379 (client), 2380 (peer)
kube-scheduler Watches for unscheduled Pods and assigns them to nodes using a pipeline of filter and score plugins. 10259 (HTTPS metrics)
kube-controller-manager Runs 30+ control loops: ReplicaSet, Node, Endpoint, Namespace, Job, etc. Each loop reconciles a resource type. 10257 (HTTPS metrics)
cloud-controller-manager Cloud-provider-specific controllers: LoadBalancer, Node (cloud metadata), Route. Decouples cloud logic from core. 10258 (HTTPS metrics)

6.2 Worker Node Components

ComponentRolePort(s)Deep-dive
kubelet Primary node agent. Watches for Pods assigned to its node via API server watch. Creates/destroys containers via CRI. Reports node/pod status. 10250 (HTTPS)
kube-proxy Programs host networking rules (iptables/IPVS/nftables/eBPF) to implement the Service virtual IP abstraction. 10256 (healthz)
Container Runtime (CRI) Runs containers. containerd and CRI-O are standard. Docker shim was removed in 1.24. Unix socket / gRPC
CNI plugin Programs pod networking: assigns IPs, creates veth pairs, sets up routes. Calico, Cilium, Flannel, Weave, etc.

7 · The Declarative Model in Depth

The declarative model is Kubernetes' most important design decision. Contrast it with the imperative alternative:

Imperative vs Declarative — side-by-side example
## IMPERATIVE (old way — scripting)
# Create a container
docker run -d --name web -p 80:8080 --restart=always myapp:v1

# Scale it (manual loop)
for i in 2 3; do
  docker run -d --name web-$i -p 808$i:8080 myapp:v1
done

# Update it (manual, risky)
docker stop web; docker rm web
docker run -d --name web -p 80:8080 myapp:v2

# Problems:
# - State is in your head / scripts
# - Not idempotent
# - No audit trail
# - Does not self-heal
## DECLARATIVE (Kubernetes way)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  selector:
    matchLabels: {app: web}
  template:
    metadata:
      labels: {app: web}
    spec:
      containers:
      - name: app
        image: myapp:v1
        ports:
        - containerPort: 8080
---
# Benefits:
# - Apply is idempotent (kubectl apply is safe to run repeatedly)
# - State is in version control (GitOps)
# - Any drift is auto-corrected by controllers
# - Rolling update = change image: myapp:v2, reapply

When you run kubectl apply -f deployment.yaml, here is what happens internally:

  1. kubectl serialises the object to JSON and sends an HTTP PATCH (or POST for new objects) to the API server.
  2. The API server authenticates and authorises the request, runs admission webhooks, validates the schema, and persists the object to etcd.
  3. The relevant controller (e.g., Deployment controller) receives a watch event for the changed object.
  4. The controller computes the diff and makes further API calls (e.g., creates a new ReplicaSet).
  5. The scheduler sees the unscheduled Pods and binds them to nodes.
  6. The kubelet on each target node sees the Pod binding and asks the CRI to pull the image and start containers.
  7. The kubelet reports Pod status back to the API server, which stores it in etcd.

The entire chain from kubectl apply to running containers typically completes in 2–30 seconds, depending on image pull time.

8 · The API Object Model

8.1 API Groups and Versions

Kubernetes organises resources into API groups that evolve independently.

API GroupapiVersion exampleKey resources
core (legacy)v1Pod, Service, ConfigMap, Secret, Node, PersistentVolume, Namespace
appsapps/v1Deployment, ReplicaSet, StatefulSet, DaemonSet
batchbatch/v1Job, CronJob
networking.k8s.ionetworking.k8s.io/v1Ingress, NetworkPolicy, IngressClass
gateway.networking.k8s.iogateway.networking.k8s.io/v1Gateway, HTTPRoute, GRPCRoute
storage.k8s.iostorage.k8s.io/v1StorageClass, VolumeAttachment, CSINode
rbac.authorization.k8s.iorbac.authorization.k8s.io/v1Role, ClusterRole, RoleBinding, ClusterRoleBinding
autoscalingautoscaling/v2HorizontalPodAutoscaler
policypolicy/v1PodDisruptionBudget
apiextensions.k8s.ioapiextensions.k8s.io/v1CustomResourceDefinition
Custom (CRDs)myco.io/v1alpha1Whatever you define via CRD

8.2 Resource Naming and Namespacing

Resources are either cluster-scoped (Nodes, PersistentVolumes, ClusterRoles, Namespaces) or namespace-scoped (Pods, Deployments, Services, Secrets). A fully-qualified resource reference is: <apiVersion>/<kind>/<namespace>/<name> for namespace-scoped, <apiVersion>/<kind>/<name> for cluster-scoped.

8.3 Labels, Annotations, and Selectors

Labels are key/value pairs used for grouping and selection. Selectors are queries over labels — Services, ReplicaSets, and NetworkPolicies all use label selectors to identify target Pods.

Annotations carry arbitrary non-identifying metadata — tool configuration, audit information, last-applied config. They are not queryable as selectors.

metadata:
  labels:
    app: web
    env: production
    tier: frontend
    version: v3.2.1
  annotations:
    # Ingress controller config
    nginx.ingress.kubernetes.io/rewrite-target: /
    # GitOps metadata
    argocd.argoproj.io/managed-by: argocd
    # Deployment tooling
    kubectl.kubernetes.io/last-applied-configuration: |
      {...}

9 · Core Concepts Glossary

Full glossary of 50+ Kubernetes terms
TermDefinition
PodSmallest deployable unit. 1+ containers sharing a network namespace and (optionally) volumes.
ContainerAn OCI-compliant process with its own filesystem layer, run by a CRI-compliant runtime.
NodeA physical or virtual machine in the cluster running kubelet, kube-proxy, and a CRI runtime.
NamespaceVirtual cluster within a cluster. Provides scope for names, RBAC, quotas.
ReplicaSetMaintains N replicas of a Pod template. Rarely used directly — Deployments own them.
DeploymentManages ReplicaSets to provide declarative updates and rollbacks for stateless workloads.
StatefulSetLike a Deployment but with stable network identities and ordered, graceful operations — for stateful apps.
DaemonSetEnsures one Pod per node (or per matching node) — for cluster-wide agents: CNI, log shippers, etc.
JobRuns Pods to completion. Retries on failure. Tracks success/failure counts.
CronJobCreates Jobs on a cron schedule.
ServiceStable virtual IP + DNS name for a set of Pods. Implements load balancing via kube-proxy.
ClusterIPDefault Service type. VIP only reachable within the cluster.
NodePortExposes a Service on a static port on every node's IP. Rarely used in production directly.
LoadBalancerRequests a cloud load balancer. Usually used for external-facing services in cloud environments.
IngressHTTP/HTTPS layer-7 routing rules (virtual hosts, paths). Requires an Ingress controller.
Gateway APINext-gen replacement for Ingress. Richer model with roles, traffic policies, TCP/UDP support.
ConfigMapStores non-sensitive key-value config. Injected as env vars or mounted files.
SecretBase64-encoded (optionally encrypted) sensitive data. Same injection mechanics as ConfigMap.
PersistentVolume (PV)A piece of storage provisioned by an admin or dynamically. Cluster resource, not namespaced.
PersistentVolumeClaim (PVC)A namespace-scoped request for storage. Binds to a matching PV.
StorageClassDefines how storage is dynamically provisioned (provisioner, parameters, reclaim policy).
VolumeA directory accessible to containers in a Pod. Ephemeral or persistent.
emptyDirTemporary volume, lives as long as the Pod. Good for inter-container data sharing.
hostPathMounts a host filesystem path. Security risk — avoid in multi-tenant clusters.
ServiceAccountPod identity for API server authentication. Mounted as a projected token.
Role / ClusterRoleA set of permissions (verbs on resources). Role is namespace-scoped, ClusterRole is cluster-scoped.
RoleBinding / ClusterRoleBindingAssigns a Role or ClusterRole to a subject (user, group, ServiceAccount).
RBACRole-Based Access Control — Kubernetes' primary authorisation mechanism.
NetworkPolicyFirewall rules at the Pod level. Requires a CNI plugin that implements them (Calico, Cilium, etc.).
HPAHorizontalPodAutoscaler — scales replicas based on metrics.
VPAVerticalPodAutoscaler — right-sizes container resource requests/limits.
PodDisruptionBudget (PDB)Limits voluntary disruptions to maintain minimum availability during node drains/upgrades.
LimitRangeSets default and maximum resource requests/limits per Pod/container in a namespace.
ResourceQuotaCaps total resource consumption (CPU, memory, object counts) per namespace.
CRDCustomResourceDefinition — extends the Kubernetes API with new resource types.
OperatorA controller that manages a CRD-based resource, encoding domain-specific operational knowledge.
Admission ControllerAPI server plug-in that intercepts requests before persistence to validate or mutate them.
WebhookExternal HTTP server called by the API server for admission. Validating or Mutating.
OwnerReferenceLinks child objects to their parent (e.g., Pod → ReplicaSet → Deployment). Enables cascading deletes.
FinalizerPrevents object deletion until a controller removes the finalizer — used for clean-up logic.
taintA node property that repels Pods unless they have a matching toleration.
tolerationA Pod property allowing it to be scheduled on a tainted node.
affinityScheduling constraint expressing attraction (required or preferred) to nodes or other Pods.
anti-affinityScheduling constraint expressing repulsion from nodes or other Pods — for spread/HA.
QoS classGuaranteed, Burstable, or BestEffort — determines eviction priority under memory pressure.
cgroupsLinux kernel resource isolation: CPU, memory, I/O, PID counts — per container.
namespace (Linux)Linux kernel isolation primitive: pid, net, mnt, uts, ipc, user, cgroup — not Kubernetes Namespace.
eBPFExtended Berkeley Packet Filter — kernel-level programmable hooks used by Cilium, Falco, etc.
InformerClient-go library component: combines List+Watch with a local cache + event handlers.
leader electionMechanism using a Lease API object to ensure only one controller instance is active at a time.

10 · Platform Personas — Who Uses Kubernetes and How

Kubernetes serves many different engineering roles. Understanding which persona you are helps you focus on the right sections of this documentation.

Platform Engineers

  • Install and upgrade clusters
  • Design namespace and RBAC topology
  • Manage CRDs, operators, admission webhooks
  • Build internal developer platforms on K8s
  • See full guide

Application Developers

  • Write Deployments, Services, Ingress
  • Configure resource requests/limits
  • Use ConfigMaps and Secrets
  • Debug pod failures, read logs
  • See full guide

SRE Teams

  • Define SLOs and PDBs
  • Tune HPA/VPA/Cluster Autoscaler
  • Operate on-call for cluster incidents
  • Capacity planning, cost attribution
  • See full guide

Security Teams

  • RBAC auditing and policy enforcement
  • OPA/Gatekeeper or Kyverno policies
  • Pod Security Standards / Admission
  • Image scanning, runtime security (Falco)
  • See full guide

DevOps / CI-CD

  • Build CI pipelines that push images
  • Manage Helm charts / Kustomize overlays
  • GitOps with ArgoCD or FluxCD
  • Blue/green and canary deployments
  • See full guide

Data / MLOps Engineers

  • Run Spark/Flink/Ray on Kubernetes
  • GPU node pools and device plugins
  • Kubeflow for ML pipelines
  • StatefulSet databases, object storage
  • See full guide

11 · Quick Reference: Essential kubectl Commands

# ---- Cluster Info ----
kubectl cluster-info                      # API server URL and CoreDNS
kubectl get nodes -o wide                 # All nodes with IPs, OS, kernel
kubectl describe node <node-name>         # Node capacity, allocatable, events
kubectl get componentstatuses             # etcd, scheduler, controller-manager health (deprecated 1.20+)
kubectl get --raw /healthz                # API server health check
kubectl get --raw /metrics | head -40     # Prometheus metrics from API server

# ---- Namespace management ----
kubectl get namespaces
kubectl create namespace prod
kubectl config set-context --current --namespace=prod

# ---- Pod operations ----
kubectl get pods -A -o wide               # All pods, all namespaces, node info
kubectl describe pod <name>              # Full event stream, status, volumes
kubectl logs <pod> -c <container> --previous  # Previous container's logs
kubectl exec -it <pod> -- bash           # Interactive shell
kubectl port-forward pod/<name> 8080:80  # Local port tunnel

# ---- Deployment operations ----
kubectl apply -f manifest.yaml           # Declarative apply (idempotent)
kubectl diff -f manifest.yaml            # Preview changes before apply
kubectl rollout status deployment/<name> # Watch rollout progress
kubectl rollout history deployment/<name> # See revision history
kubectl rollout undo deployment/<name>   # Rollback to previous revision
kubectl scale deployment/<name> --replicas=5

# ---- Debugging ----
kubectl get events --sort-by=.lastTimestamp
kubectl top nodes                        # CPU/Memory usage (needs metrics-server)
kubectl top pods -A --sort-by=memory
kubectl debug node/<node> -it --image=busybox  # Debug node with privileged pod
kubectl auth can-i create pods --as=serviceaccount:default:mysa  # RBAC check

# ---- Raw API access ----
kubectl get --raw /api/v1/namespaces
kubectl get --raw /apis/apps/v1/deployments
kubectl proxy &                          # Proxy API to localhost:8001 (no TLS)

12 · Production Best Practices (Introduction Level)

Start here before reading anything else These are the non-negotiable practices that distinguish a production Kubernetes cluster from a demo.
12 foundational best practices
  1. Always set resource requests AND limits on every container. Without requests, the scheduler cannot make good placement decisions. Without limits, a runaway container can starve the entire node.
  2. Use namespaces for isolation. Don't run production workloads in default. At minimum: kube-system (platform), monitoring, one namespace per team or application.
  3. Configure liveness and readiness probes. Without them, Kubernetes cannot determine if your container is actually healthy. A started but broken container with no probe will receive traffic indefinitely.
  4. Run 3 control plane nodes for HA. etcd requires a quorum; 3 nodes tolerate 1 failure. 5 nodes tolerate 2 failures. Never run a single control plane in production.
  5. Separate etcd from other control plane components on dedicated nodes in large clusters. etcd is I/O-heavy; noisy neighbours degrade consensus latency.
  6. Enable RBAC and disable anonymous access. Never use cluster-admin for application service accounts. Follow the principle of least privilege.
  7. Use PodDisruptionBudgets for critical workloads to prevent total unavailability during node drains and cluster upgrades.
  8. Tag everything with labels: app, env, team, version. This enables cost attribution, filtering, and policy targeting.
  9. Run a Network Policy default-deny in each namespace and explicitly allow only required flows. Without it, any pod can reach any other pod.
  10. Enable audit logging on the API server. Store audit logs externally. They are your forensic record for security incidents and compliance.
  11. Back up etcd regularly to an off-cluster location. etcd holds the entire cluster state — without it, recovery is a rebuild from scratch.
  12. Use a container image vulnerability scanner (Trivy, Snyk, Anchore) in CI and enforce admission policies to block known-critical CVEs.

13 · Kubernetes Versioning and Release Cadence

Kubernetes follows a semantic-ish versioning scheme: v<MAJOR>.<MINOR>.<PATCH>. In practice MAJOR is always 1; every new feature release is a MINOR bump. As of early 2025, the current stable release is 1.32.

  • Release cadence: 3 minor releases per year (roughly every 4 months).
  • Support window: 3 minor versions supported at any time (N, N-1, N-2). Each minor version supported for ~14 months.
  • API deprecation policy: GA APIs supported for 12 months / 3 releases minimum. Beta APIs for 9 months / 3 releases.
  • Skew policy: kubelet may be at most 2 minor versions behind the API server. kube-proxy must match kubelet or be 1 behind.
# Check cluster and client versions
kubectl version --short

# Deprecated API check (before upgrading)
kubectl convert -f old-manifest.yaml --output-version apps/v1  # requires kubectl convert plugin

# API resource availability
kubectl api-resources
kubectl api-versions | grep apps
Version skew is a common production incident When upgrading, upgrade the control plane first, then worker nodes. Never let a kubelet be newer than the API server. See Cluster Upgrades for the full upgrade sequence.

14 · Introduction to Troubleshooting Mindset

Kubernetes debugging follows a systematic top-down approach:

  1. Start with events: kubectl describe pod <name> — the Events section at the bottom tells you what happened.
  2. Check controller logs: if a Deployment isn't rolling, check the controller-manager logs.
  3. Check scheduler logs: if Pods are stuck in Pending, the scheduler log shows why it couldn't place them.
  4. Check kubelet logs on the target node: container pull errors, CRI failures, volume mount failures appear here.
  5. Check the container logs: kubectl logs <pod> -p (previous container) for crash reasons.
  6. Check network: use kubectl exec -it <debug-pod> -- curl http://<service> to test connectivity.
Full troubleshooting guides See 12 · Troubleshooting for exhaustive guides on every failure category.

Next Files to Study

Dependency Graph — where to go after this file

References

  • Kubernetes Official Documentation — kubernetes.io/docs
  • Borg, Omega, and Kubernetes (Google, ACM Queue 2016) — foundational paper describing the lineage
  • Kubernetes the Hard Way — Kelsey Hightower's manual cluster bootstrap tutorial
  • CNCF Kubernetes Architecture Guide — github.com/cncf/k8s-conformance
  • Kubernetes API Reference — kubernetes.io/docs/reference/kubernetes-api/
  • client-go source — github.com/kubernetes/client-go (informers, work queues, leader election)
  • OCI Image Spec — github.com/opencontainers/image-spec
  • OCI Runtime Spec — github.com/opencontainers/runtime-spec
  • CNI Spec — github.com/containernetworking/cni
  • CSI Spec — github.com/container-storage-interface/spec