π Page Coverage Checklist
Pod Lifecycle
Phases, conditions, probes, init containers, graceful termination, and restart policies
Every pod moves through a defined lifecycle from creation to termination. Understanding the exact semantics of each phase, condition, container state, and probe β and how they interact with controllers, load balancers, and autoscalers β is the foundation for building operationally reliable workloads. This page covers the complete lifecycle model that underpins everything from rolling updates to graceful shutdowns.
Lifecycle Overview
Pod Phases
A pod's status.phase is a high-level summary of where in its lifecycle the pod is. It is a single string set by the kubelet.
| Phase | Meaning | Containers running? |
|---|---|---|
| Pending | Pod accepted by the cluster but not yet running. Covers: waiting for scheduling, image pulling, init containers running, PVC binding. | No (or only init containers) |
| Running | Pod bound to a node and at least one container is running (or starting/restarting). | Yes (at least one) |
| Succeeded | All containers exited with code 0 and will not be restarted. Terminal state. | No |
| Failed | All containers have terminated; at least one exited with non-zero code or was killed. Terminal state. | No |
| Unknown | Pod state cannot be determined β typically the node hosting it stopped reporting to the control plane. | Unknown |
A pod in
Running phase may have containers in CrashLoopBackOff or failing readiness probes. Running only means at least one container is running or being restarted. For accurate health assessment, read status.conditions (especially Ready) and status.containerStatuses.
Pod Conditions
Conditions are structured status fields that provide granular lifecycle state. Each condition has type, status (True/False/Unknown), reason, and message.
| Condition type | True when | Gate for |
|---|---|---|
PodScheduled |
Pod has been assigned to a node | Kubelet begins pulling images and starting init containers |
Initialized |
All init containers have completed successfully | App containers start |
ContainersReady |
All app containers are ready (passing readiness probes) | Contributes to pod Ready condition |
Ready |
Pod is able to serve requests: ContainersReady=True AND all readinessGates satisfied |
Pod added to Service Endpoints / EndpointSlices |
DisruptionTarget |
Pod is being terminated due to a voluntary disruption (drain, preemption, eviction) | Informs podFailurePolicy in Jobs (Ignore action) |
PodReadyToStartContainers |
Sandbox created and network configured (1.29+) | Init containers may start |
# View all pod conditions
kubectl get pod <pod> -o jsonpath='{.status.conditions}' | jq .
# Quick condition summary
kubectl describe pod <pod> | grep -A15 "^Conditions:"
# Find all pods not Ready in a namespace
kubectl get pods -n <namespace> \
-o jsonpath='{range .items[?(@.status.conditions[?(@.type=="Ready")].status!="True")]}{.metadata.name}{"\n"}{end}'
Container States
Each container within a pod has an independent state captured in status.containerStatuses[].state.
| State | Meaning | Key sub-fields |
|---|---|---|
Waiting |
Container not yet running β waiting for image pull, init containers, or backoff | reason: ContainerCreating, ImagePullBackOff, ErrImagePull, CrashLoopBackOff, CreateContainerConfigError |
Running |
Container executing | startedAt: timestamp when container started |
Terminated |
Container finished (success or failure) | exitCode, reason (Completed, OOMKilled, Error, ContainerCannotRun), startedAt, finishedAt |
Common Container Reason Codes
| Reason | State | Cause |
|---|---|---|
Completed | Terminated | Exit code 0 β normal successful exit |
OOMKilled | Terminated | Exit code 137 β exceeded memory limit |
Error | Terminated | Non-zero exit code (app error) |
ContainerCannotRun | Terminated | Container runtime failed to start the container (bad entrypoint, permission error) |
DeadlineExceeded | Terminated | Job activeDeadlineSeconds expired |
CrashLoopBackOff | Waiting | Container repeatedly failing β kubelet backing off restarts |
ImagePullBackOff | Waiting | Container image cannot be pulled β auth failure, image not found |
CreateContainerConfigError | Waiting | Referenced Secret or ConfigMap does not exist |
# Get current and last container state
kubectl get pod <pod> -o jsonpath='{range .status.containerStatuses[*]}{.name}: state={.state}, lastState={.lastState}{"\n"}{end}'
# Check exit code and reason
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'
Container Restart Backoff
When a container fails and restartPolicy allows restart, kubelet uses exponential backoff before restarting: 10s β 20s β 40s β 80s β 160s β 300s (capped). The pod shows CrashLoopBackOff in Waiting state during the backoff window.
A pod with a container in CrashLoopBackOff is still in
Running phase (kubelet is actively managing it). The pod will never enter Failed phase while kubelet is retrying. To terminate the retry loop, either fix the container or set restartPolicy: Never.
restartPolicy
| Policy | Restart on failure? | Restart on success? | Use case |
|---|---|---|---|
Always (default) | Yes | Yes β container is always restarted when it exits | Long-running services (Deployments, DaemonSets) |
OnFailure | Yes (non-zero exit) | No β container left in Terminated state on exit 0 | Jobs where success = done, failure = retry |
Never | No | No β pod enters Succeeded/Failed immediately | One-shot tasks, migration Jobs |
| restartPolicy | All exit 0 | Any exit non-0 |
|---|---|---|
Always | Running (restarting) | Running (restarting) |
OnFailure | Succeeded | Running (restarting) until backoffLimit |
Never | Succeeded | Failed |
Init Containers
Init containers run sequentially before app containers start. Each must complete successfully before the next begins. If an init container fails, it is retried (per restartPolicy) before the pod proceeds.
spec:
initContainers:
# Wait for database to be ready
- name: wait-for-db
image: busybox:1.36
command: ['sh', '-c',
'until nc -z postgres-svc 5432; do echo waiting; sleep 2; done']
# Run schema migration
- name: migrate
image: myapp:v3
command: ['./migrate', '--direction=up']
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
containers:
- name: app
image: myapp:v3
Only startupProbe is meaningful on init containers (to extend their deadline). Liveness and readiness probes on init containers are silently ignored. An init container that hangs indefinitely will block the pod from starting β use
activeDeadlineSeconds on the pod or design init containers to fail fast.
Native Sidecar Containers (Stable 1.33)
A sidecar is an init container with restartPolicy: Always. It starts before app containers (like other init containers) but keeps running alongside them, and is shut down gracefully after app containers exit.
spec:
initContainers:
# Native sidecar β starts first, stays running alongside app
- name: log-forwarder
image: fluent/fluent-bit:3.0
restartPolicy: Always # β makes this a sidecar container
resources:
requests:
cpu: 50m
memory: 64Mi
volumeMounts:
- name: varlog
mountPath: /var/log/app
- name: envoy-proxy
image: envoyproxy/envoy:v1.28
restartPolicy: Always # β sidecar
ports:
- containerPort: 15001
containers:
- name: app
image: myapp:v3
Before native sidecars, running Istio or Datadog alongside a Job prevented the Job from completing (sidecar kept running after the worker exited). With
restartPolicy: Always on the sidecar init container, Kubernetes sends SIGTERM to sidecars when the main container exits, enabling clean Job completion. See Jobs page.
Probes
Kubelet runs three types of probes against containers to determine their health and readiness. They are independent β each has its own timing, threshold, and action when it fails.
| Probe | Failure action | Success action | Purpose |
|---|---|---|---|
startupProbe |
After failureThreshold failures: container killed and restarted |
Probe disabled permanently; liveness + readiness activate | One-time startup gate for slow-starting containers (JVM, model loading) |
livenessProbe |
After failureThreshold failures: container killed and restarted |
No action | Detect hung/deadlocked containers that should be restarted |
readinessProbe |
Pod condition Ready=False β removed from Service Endpoints |
Pod condition Ready=True β added back to Endpoints |
Signal when container is ready to receive traffic |
Probe Mechanisms
# 1. httpGet β HTTP GET request; success = 200-399 response
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Custom-Header
value: probe
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
successThreshold: 1
timeoutSeconds: 5
# 2. exec β run command in container; success = exit code 0
readinessProbe:
exec:
command: ["/bin/sh", "-c", "redis-cli ping | grep PONG"]
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
# 3. tcpSocket β TCP connection attempt; success = connection established
livenessProbe:
tcpSocket:
port: 5432
initialDelaySeconds: 15
periodSeconds: 20
# 4. grpc β gRPC Health Check Protocol (v1); success = SERVING status
readinessProbe:
grpc:
port: 9090
service: "liveness" # gRPC service name (optional)
initialDelaySeconds: 10
periodSeconds: 5
Probe Timing Fields
| Field | Default | Meaning |
|---|---|---|
initialDelaySeconds | 0 | Seconds to wait after container start before first probe. Use for slow-starting apps if not using startupProbe. |
periodSeconds | 10 | How often to run the probe (minimum 1s) |
timeoutSeconds | 1 | Seconds to wait for probe response before counting as failure |
successThreshold | 1 | Minimum consecutive successes to mark probe as passing (must be 1 for liveness/startup) |
failureThreshold | 3 | Consecutive failures before action is taken (restart for liveness/startup; remove from endpoints for readiness) |
terminationGracePeriodSeconds | pod-level default | Override for liveness/startup probe: grace period before SIGKILL after probe-triggered kill |
startupProbe: Extending Slow-Start Grace
# Pattern: startupProbe gives up to 5 minutes for startup,
# then hands off to liveness with a 30s timeout window
startupProbe:
httpGet:
path: /ready
port: 8080
failureThreshold: 30 # 30 failures Γ 10s period = 5 minute startup window
periodSeconds: 10
timeoutSeconds: 5
livenessProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 3 # 3 failures Γ 10s = 30s to detect hung state
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
failureThreshold: 2
periodSeconds: 5
successThreshold: 1
A liveness probe that calls an upstream service (database, cache, external API) will restart the container when the dependency is down β not when the container itself is broken. This causes cascading restarts across all replicas simultaneously when a shared dependency degrades. Liveness probes must only check the container's own internal health (goroutine liveness, heap state, internal deadlock detection). Dependency health belongs in readiness probes.
Graceful Termination
When a pod is deleted, Kubernetes executes a structured termination sequence designed to drain in-flight requests before killing the process.
spec:
terminationGracePeriodSeconds: 60 # Total time before SIGKILL (default: 30)
containers:
- name: app
lifecycle:
preStop:
# Sleep to absorb kube-proxy iptables update propagation (1-5s typical)
exec:
command: ["/bin/sh", "-c", "sleep 10"]
# OR: call a shutdown endpoint
# httpGet:
# path: /shutdown
# port: 8080
When a pod is deleted, the Endpoints controller removes the pod from the EndpointSlice, but kube-proxy on each node needs time to update iptables/ipvs rules. New connections may still route to the terminating pod for 1β5 seconds after SIGTERM arrives. The canonical fix is a
preStop: sleep 5 (or longer), which delays SIGTERM delivery, giving kube-proxy time to drain the pod from the load balancer before the application starts refusing new connections.
# Production-grade graceful termination config
spec:
terminationGracePeriodSeconds: 90 # preStop(10s) + drain(60s) + buffer(20s)
containers:
- name: app
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"] # Wait for LB drain
postStart:
# Run after container starts (before readiness probe)
exec:
command: ["/bin/sh", "-c", "./scripts/warm-cache.sh"]
Lifecycle Hooks
| Hook | When it runs | Blocks? | Failure behavior |
|---|---|---|---|
postStart |
After container is created and started (not guaranteed to run before entrypoint) | Yes β container stays in ContainerCreating until hook completes or times out |
Container killed and restarted |
preStop |
Before container receives SIGTERM (executed synchronously) | Yes β SIGTERM is delayed until hook completes (within grace period) | SIGKILL sent immediately; hook failure is logged but not propagated |
The
postStart hook runs concurrently with the container's entrypoint β there is no guaranteed ordering. For initialization that must complete before the application serves traffic, use an init container instead. Use postStart only for best-effort work (cache warming, metric registration) that doesn't block application startup.
lifecycle:
postStart:
httpGet: # Register with service discovery after start
path: /register
port: 8500 # Consul agent port
host: localhost
preStop:
httpGet: # Deregister before shutdown
path: /deregister
port: 8500
host: localhost
Pod Readiness Gates
Readiness gates (GA 1.26) add custom conditions to the pod's readiness calculation. A pod is only considered Ready if both all container readiness probes pass AND all readiness gate conditions are True.
spec:
readinessGates:
- conditionType: "feature-gates.example.com/canary-approved"
- conditionType: "load-balancer.example.com/in-pool"
# An external controller must set these conditions:
# kubectl patch pod <pod> --type=merge -p '{"status":{"conditions":[
# {"type":"feature-gates.example.com/canary-approved","status":"True"}
# ]}}'
Readiness gates enable external systems to gate traffic routing independently of container health:
- Canary gates: progressive delivery controllers hold a pod out of rotation until analysis approves
- Load balancer registration: cloud LB controller signals when the pod is registered in the target group
- Warm-up gates: custom warm-up controller marks pod ready only after cache pre-population
Pod Deletion vs Force Delete
# Normal deletion β graceful (uses terminationGracePeriodSeconds)
kubectl delete pod <pod> -n <namespace>
# Override grace period (useful when pod is stuck terminating)
kubectl delete pod <pod> -n <namespace> --grace-period=5
# Force delete β immediately removes from API server WITHOUT waiting for kubelet confirmation
# WARNING: The pod process may still be running on the node
kubectl delete pod <pod> -n <namespace> --grace-period=0 --force
# Force delete is dangerous for StatefulSets:
# The pod identity may be recreated before the old pod has fully stopped,
# causing two pods with the same identity (split-brain, data corruption)
Force-deleting a StatefulSet pod removes it from the API immediately, allowing a new pod with the same identity to start. If the old pod is still running (kubelet temporarily unreachable, not crashed), two pods will have the same network identity and claim the same PVC β causing data corruption in databases and split-brain in consensus systems. Only force-delete when you have confirmed the node is truly dead and the old pod cannot be running.
Complete Lifecycle Configuration
apiVersion: v1
kind: Pod
spec:
terminationGracePeriodSeconds: 90
# Readiness gate from external controller
readinessGates:
- conditionType: "platform.example.com/warmed-up"
initContainers:
# Native sidecar (runs alongside app containers)
- name: log-agent
image: fluent/fluent-bit:3.0
restartPolicy: Always
# Classic init container (runs before app, then exits)
- name: db-migrate
image: myapp:v3
command: ["./migrate", "--up"]
containers:
- name: app
image: myapp:v3
ports:
- containerPort: 8080
# Startup: up to 5 min for JVM to warm up
startupProbe:
httpGet:
path: /actuator/health
port: 8080
failureThreshold: 30
periodSeconds: 10
# Liveness: internal deadlock detection only
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
periodSeconds: 10
failureThreshold: 3
# Readiness: ready to accept traffic
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
periodSeconds: 5
failureThreshold: 2
successThreshold: 1
# Lifecycle hooks
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo started >> /tmp/lifecycle.log"]
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10 && /app/shutdown.sh"]
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2
memory: 2Gi
Metrics
| Metric | Labels | Use |
|---|---|---|
kube_pod_status_phase | phase, pod, namespace | Count pods in each phase β watch for rising Pending/Failed |
kube_pod_container_status_restarts_total | container, pod, namespace | Cumulative restarts β high rate = CrashLoopBackOff |
kube_pod_container_status_ready | container, pod | Binary readiness β 0 = not ready (removed from endpoints) |
kube_pod_status_ready | condition, pod, namespace | Pod-level Ready condition |
kubelet_pod_start_duration_seconds | quantile | Pod start latency β includes image pull + init containers |
Alerting Rules
groups:
- name: pod-lifecycle
rules:
# Container restart rate high (CrashLoopBackOff indicator)
- alert: ContainerHighRestartRate
expr: |
rate(kube_pod_container_status_restarts_total[15m]) * 60 > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "{{ $labels.namespace }}/{{ $labels.pod }}/{{ $labels.container }} restarting >6/hr"
# Pod stuck in Pending for > 5 minutes
- alert: PodLongPending
expr: kube_pod_status_phase{phase="Pending"} == 1
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} stuck in Pending"
# Pod stuck in Failed phase
- alert: PodFailed
expr: kube_pod_status_phase{phase="Failed"} == 1
for: 0m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is in Failed phase"
# Container not ready for extended period
- alert: ContainerNotReady
expr: kube_pod_container_status_ready == 0
for: 10m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.container }} in {{ $labels.namespace }}/{{ $labels.pod }} not ready for 10m"
Runbooks
Pod Stuck in Pending
# Check events for scheduling failure reason
kubectl describe pod <pod> -n <namespace> | grep -A20 Events
# Common: insufficient resources β check node allocatable
kubectl describe nodes | grep -A5 "Allocatable:\|Allocated resources:"
# Common: image pull error β check image name and registry credentials
kubectl get pod <pod> -o jsonpath='{.spec.containers[*].image}'
kubectl get events -n <namespace> --field-selector reason=Failed | grep ImagePull
# Common: PVC not bound β check StorageClass and PV availability
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>
Container in CrashLoopBackOff
# Get current logs (if container is running)
kubectl logs <pod> -n <namespace> -c <container>
# Get logs from previous container instance
kubectl logs <pod> -n <namespace> -c <container> --previous
# Get exit code and reason
kubectl describe pod <pod> -n <namespace> | grep -A5 "Last State"
# Common: OOMKilled β increase memory limit
kubectl set resources deployment <name> --limits=memory=2Gi -n <namespace>
# Common: bad entrypoint β check image CMD/entrypoint
kubectl get pod <pod> -o jsonpath='{.spec.containers[*].command} {.spec.containers[*].args}'
# Common: missing Secret/ConfigMap (CreateContainerConfigError)
kubectl get pod <pod> -o yaml | grep -A5 "envFrom\|secretRef\|configMapRef"
kubectl get secret <secret-name> -n <namespace>
Pod Not Becoming Ready (Readiness Probe Failing)
# Check readiness probe config
kubectl describe pod <pod> -n <namespace> | grep -A10 "Readiness:"
# Run readiness check manually inside the container
kubectl exec <pod> -n <namespace> -- curl -s http://localhost:8080/ready
# Check if app is actually listening
kubectl exec <pod> -n <namespace> -- ss -tlnp
# Check if it's a readiness gate blocking (not the container probe)
kubectl get pod <pod> -n <namespace> -o jsonpath='{.status.conditions}' | \
jq '.[] | select(.type != "Ready" and .status != "True")'
Pod Stuck in Terminating
# Check if finalizers are blocking deletion
kubectl get pod <pod> -n <namespace> -o jsonpath='{.metadata.finalizers}'
# Remove finalizer (if stuck)
kubectl patch pod <pod> -n <namespace> -p '{"metadata":{"finalizers":[]}}' --type=merge
# Check if preStop hook is hanging
kubectl describe pod <pod> -n <namespace> | grep -A5 "PreStop\|terminationGrace"
# Force delete if node is dead (StatefulSet β confirm old pod is truly gone first)
kubectl delete pod <pod> -n <namespace> --grace-period=0 --force
Liveness Probe Causing Cascading Restarts
# Check restart pattern β all replicas restarting simultaneously?
kubectl get pods -n <namespace> -l app=<app> \
-o custom-columns=NAME:.metadata.name,RESTARTS:.status.containerStatuses[0].restartCount,AGE:.status.startTime
# Check liveness probe config
kubectl describe pods -n <namespace> -l app=<app> | grep -A10 "Liveness:"
# If probe checks external dependency, change to internal health only
# Temporary mitigation: increase failureThreshold to reduce restart rate
kubectl patch deployment <name> -n <namespace> --type=json -p='[{
"op":"replace",
"path":"/spec/template/spec/containers/0/livenessProbe/failureThreshold",
"value":10
}]'
Best Practices
- Use startupProbe for slow-starting applications β set
failureThreshold Γ periodSecondsto cover your worst-case startup time. This prevents liveness probe failures during startup without requiring a dangerously highinitialDelaySecondson the liveness probe. - Liveness probes must check only internal state β never check database connectivity, upstream services, or external dependencies in a liveness probe. Failures cascade across all replicas simultaneously when a shared dependency degrades. Dependency health belongs in readiness probes.
- Add a preStop sleep to absorb kube-proxy propagation delay β
preStop: exec: sleep 5(or 10 for high-traffic services) gives kube-proxy time to update iptables before the application stops accepting connections, preventing connection resets on in-flight requests. - Set
terminationGracePeriodSecondsto cover preStop + drain time β formula:preStop duration + max request duration + buffer β€ terminationGracePeriodSeconds. The default 30s is often too short for services with long-running requests. - Use native sidecar containers (1.33) for Jobs with injected sidecars β Istio, Datadog, and Vault Agent injected sidecars previously blocked Job completion. Upgrading to native sidecars (
restartPolicy: Alwaysin initContainers) resolves this cleanly. - Set
readinessProbe.successThreshold: 1for all non-startup probes β the default is 1, but higher values require the probe to pass multiple consecutive times before marking the pod ready, which slows down rolling updates. Only increase if you need debouncing. - Never force-delete StatefulSet pods without confirming the node is truly dead β force deletion removes the pod from the API immediately, allowing a replacement to start with the same identity. If the original pod is still running (temporary network partition), two pods will share the same PVC and network identity β causing data corruption.
- Use readiness gates for external traffic management coordination β if a progressive delivery controller (Argo Rollouts, Flagger) or cloud load balancer controller needs to signal readiness independently of container health, readiness gates provide the correct integration point without hacking probes.