Pod Creation Flow
Overview
Tracing a kubectl apply for a Pod from client to running container — through every Kubernetes component that participates.
Full Sequence Diagram
kubectl API Server etcd Scheduler kubelet CRI (containerd)
│ │ │ │ │ │
│──POST /api/v1/pods──► │ │ │ │ │
│ (with manifest) │ │ │ │ │
│ │─ authn / authz ─►│ │ │ │
│ │ (see admission │ │ │ │
│ │ flow 03) │ │ │ │
│ │ │ │ │ │
│ │──WRITE pod ────► │ │ │ │
│ │ (phase:Pending, │ │ │ │
│ │ nodeName: "") │ │ │ │
│ │ │ │ │ │
│◄── 201 Created ────────│ │ │ │ │
│ (pod object) │ │ │ │ │
│ │ │ │ │ │
│ │──WATCH event ──► │ │ │
│ │ (pod Added, │ │ │
│ │ nodeName=="") │ │ │
│ │ │ │ │
│ │ ┌─ Filter phase ──────┤ │ │
│ │ │ (NodeResourcesFit, │ │ │
│ │ │ NodeAffinity, │ │ │
│ │ │ Taints/Tolerations)│ │ │
│ │ │ │ │ │
│ │ └─ Score phase ────────┤ │ │
│ │ (LeastAllocated, │ │ │
│ │ ImageLocality) │ │ │
│ │ │ │ │
│ │◄─ PATCH pod.spec.nodeName ─────┤ │ │
│ │ (Bind: node="worker-2") │ │ │
│ │ │ │ │
│ │──WRITE nodeName ────────────► │ │ │
│ │ │ │ │
│ │──WATCH event ──────────────────────────────► │ │
│ │ (pod Modified, nodeName=="worker-2") │ │
│ │ │ │
│ │ ┌─ kubelet reconcile ─────────────────►│ │
│ │ │ pod assigned to this node │ │
│ │ │ │ │
│ │ │ ┌─ Pull image (if not cached) ─────►│──► containerd │
│ │ │ │ │ (ImagePull) │
│ │ │ │ │◄── image ready │
│ │ │ │ │ │
│ │ │ └─ RunPodSandbox ──────────────────►│──► pause container │
│ │ │ (network namespace created) │ (CNI called) │
│ │ │ │ │
│ │ │ ┌─ CreateContainer ────────────────►│──► container │
│ │ │ │ (for each container in spec) │ created │
│ │ │ │ │ │
│ │ │ └─ StartContainer ─────────────────►│──► container │
│ │ │ │ started │
│ │ │ │ │
│ │◄─ PATCH pod status ────────────────────────── │ │
│ │ phase: Running │ │
│ │ containerStatuses: [{ready:true}] │ │
│ │ │ │
│ │──WRITE status ──────────────────────────────► │ │
│ │ │ │
kubectl get pod → phase: Running, IP: 10.0.1.5
Step-by-Step Breakdown
Step 1: Client submits the manifest
kubectl apply -f pod.yaml
# kubectl serialises the manifest to JSON, sends:
# POST /api/v1/namespaces/default/pods HTTP/1.1
The API server runs the full admission chain before writing to etcd:
- Authentication — verifies the bearer token / client cert
- Authorization — RBAC check: can this identity
create podsin this namespace? - Mutating admission webhooks — may inject sidecars, add labels, set defaults
- Schema validation — validates the manifest against the OpenAPI schema
- Validating admission webhooks — reject if policy violations
Only after all of the above pass does the API server write to etcd.
Step 2: etcd write — pod in Pending state
The pod is persisted with:
spec.nodeName: ""— no node assigned yetstatus.phase: Pendingstatus.conditions: [{type:PodScheduled, status:False}]
The API server returns 201 Created to kubectl.
Step 3: Scheduler watches for unscheduled pods
The scheduler has a watch on pods where spec.nodeName == "". When it receives the Added event, it places the pod in the scheduling queue.
Filter phase eliminates nodes that cannot run the pod:
NodeResourcesFit— node has enough CPU/memoryNodeAffinity— required node selectors matchTaintToleration— pod tolerates node taintsPodTopologySpread— topology spread constraints satisfiedVolumeZone— PVC's zone matches node's zone
Score phase ranks remaining nodes (0–100):
LeastAllocated— prefer nodes with more free resourcesImageLocality— prefer nodes already caching the imageInterPodAffinity— prefer/avoid co-location with other pods
The scheduler binds by PATCHing pod.spec.nodeName = "worker-2" on the API server.
Step 4: kubelet watches for pods assigned to its node
The kubelet on worker-2 has a watch filtered to pods where spec.nodeName == "worker-2". When the pod's nodeName is set, the kubelet receives a Modified event and begins the pod lifecycle.
Step 5: kubelet drives container creation
The kubelet calls containerd via the CRI (gRPC):
1. ImageService.PullImage("ghcr.io/acme/app:sha-abc") → if not in image cache
2. RuntimeService.RunPodSandbox(config) → creates pause container
└─ CNI plugin called here (sets up veth pair, assigns IP)
3. RuntimeService.CreateContainer(sandboxID, containerConfig)
└─ for each container in pod.spec.containers
4. RuntimeService.StartContainer(containerID)
└─ entrypoint process started
Step 6: kubelet patches pod status
After all containers start and pass their startupProbe/readinessProbe:
kubelet → PATCH /api/v1/namespaces/default/pods/<name>/status
{
"status": {
"phase": "Running",
"podIP": "10.0.1.5",
"containerStatuses": [{"ready": true, "state": {"running": {...}}}],
"conditions": [
{"type": "PodScheduled", "status": "True"},
{"type": "Initialized", "status": "True"},
{"type": "ContainersReady", "status": "True"},
{"type": "Ready", "status": "True"}
]
}
}
Key Timing Characteristics
| Phase | Typical duration | What dominates |
|---|---|---|
| API server + etcd write | 5–50ms | etcd fsync latency |
| Scheduling | 1–100ms | Number of nodes × scoring plugins |
| Image pull (cached) | 1–5ms | Local disk read |
| Image pull (uncached) | 5s–5min | Image size × network bandwidth |
| Container start | 10–500ms | Application startup time |
| Readiness probe pass | seconds to minutes | Application-specific |
What Can Go Wrong
| Symptom | Phase | Likely cause |
|---|---|---|
Pending — no events | Post-write | Scheduler not running |
Pending: Unschedulable | Scheduling | No node passes filter (resource, affinity, taint) |
ContainerCreating | Post-bind | Image pull / CNI failure |
ImagePullBackOff | Image pull | Auth failure, registry unreachable, tag not found |
CrashLoopBackOff | Post-start | App exits immediately (check logs --previous) |
Running but not Ready | Post-start | Readiness probe failing |
Related
- 03 — Admission Flow — the admission chain in step 1
- 04 — Scheduler Flow — scheduler internals in step 3
- 13 — CNI Setup Flow — network namespace wiring in step 5
- 14 — Volume Attach Flow — PVC attachment before step 5