Deployments
▶ What This Page Covers
Deployment → ReplicaSet → Pod Ownership Chain
A Deployment never creates pods directly. Instead, the Deployment controller creates and manages ReplicaSets. Each distinct pod template version gets its own ReplicaSet. During a rolling update, the Deployment controller scales up the new ReplicaSet and scales down the old one simultaneously — the old ReplicaSet is retained (scaled to zero) for rollback purposes.
Full Deployment Spec
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
namespace: production
labels:
app: web-app
annotations:
deployment.kubernetes.io/revision: "7" # auto-managed; do not set manually
spec:
replicas: 4
# Selector is IMMUTABLE after creation
selector:
matchLabels:
app: web-app # must be a subset of template.metadata.labels
# Revision history: number of old ReplicaSets to retain (default 10)
revisionHistoryLimit: 5
# Rollout is marked failed if not complete within this many seconds (default 600)
progressDeadlineSeconds: 300
# Minimum seconds a new pod must be ready (all probes passing) before
# it counts as "available" — soak time (default 0)
minReadySeconds: 10
strategy:
type: RollingUpdate # RollingUpdate (default) | Recreate
rollingUpdate:
maxSurge: 1 # max pods ABOVE desired during update (absolute or %)
maxUnavailable: 0 # max pods BELOW desired during update (absolute or %)
# maxSurge: 25% # percentage form: ceil(replicas * 0.25)
# maxUnavailable: 25% # percentage form: floor(replicas * 0.25)
# Both cannot be 0 simultaneously
template:
metadata:
labels:
app: web-app # must satisfy selector.matchLabels
version: v2.1.0 # informational; changing this triggers rollout
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
spec:
terminationGracePeriodSeconds: 60
containers:
- name: web-app
image: myregistry/web-app:v2.1.0@sha256:abc123... # digest pin
ports:
- containerPort: 8080
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
memory: "512Mi"
readinessProbe: # REQUIRED for zero-downtime rolling updates
httpGet:
path: /ready
port: 8080
periodSeconds: 5
failureThreshold: 3
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 5"] # drain race prevention
RollingUpdate Strategy
RollingUpdate replaces pods incrementally so that the application remains available throughout the update. The speed and safety of the rollout is controlled by two parameters that work in opposite directions:
| Parameter | Controls | Default | Effect of Higher Value |
|---|---|---|---|
maxSurge |
Max pods above desired during update | 25% | Faster rollout (more new pods at once), higher temporary resource usage |
maxUnavailable |
Max pods below desired that can be unavailable | 25% | Faster rollout (removes old pods faster), lower availability during update |
Step-by-Step Sequence (replicas=4, maxSurge=1, maxUnavailable=0)
maxSurge=25% and maxUnavailable=25% (defaults, replicas=8)
Special Cases and Edge Cases
| Configuration | Behavior | Use Case |
|---|---|---|
maxSurge=1, maxUnavailable=0 | Always at full capacity; one-at-a-time replacement; slowest but safest | Production APIs where capacity must never drop |
maxSurge=0, maxUnavailable=1 | No extra capacity needed (good for resource-constrained clusters); one pod terminated before new one starts | Resource-constrained environments; acceptable brief capacity reduction |
maxSurge=100%, maxUnavailable=0 | Doubles pod count temporarily; all new pods created before any old ones deleted | Maximum speed with guaranteed capacity; needs 2× resources |
maxSurge=0, maxUnavailable=100% | Effectively Recreate — all old pods deleted before new ones start | Don't use — same as Recreate but less explicit |
| replicas=1, maxSurge=0, maxUnavailable=1 | Brief downtime (old pod terminated, new started); effectively Recreate for single replicas | Dev environments; single-replica workloads that can afford brief downtime |
maxSurge percentages are rounded up (ceiling). maxUnavailable percentages are rounded down (floor). For replicas=3, maxSurge=33%: ceil(3×0.33)=1. For replicas=3, maxUnavailable=33%: floor(3×0.33)=0. This means with 3 replicas and 33%/33% defaults, you effectively get maxSurge=1, maxUnavailable=0 — the safe profile.
Recreate Strategy
Recreate terminates all existing pods before creating new ones. There is a period of zero availability between the old pods terminating and the new pods becoming ready. Use it only when two versions of the app cannot run simultaneously — exclusive database schema migrations, singleton processes, apps that conflict on shared resources.
strategy:
type: Recreate
# No rollingUpdate block — Recreate has no parameters
# Timeline:
# 1. All old pods receive SIGTERM simultaneously
# 2. Wait for all old pods to terminate (terminationGracePeriodSeconds)
# 3. All new pods created simultaneously
# 4. Wait for all new pods to become Ready
# Downtime = termination time + new pod startup time
minReadySeconds and progressDeadlineSeconds
minReadySeconds — Soak Time
minReadySeconds specifies the minimum number of seconds a newly created pod must be continuously Ready (all readiness probes passing) before it counts as available. The Deployment controller waits this duration before considering the pod stable and proceeding to the next step of the rollout.
spec:
minReadySeconds: 30 # pod must stay Ready for 30s before it's "available"
# Without this (default 0): a pod that passes readiness probe once then immediately
# crashes might appear available, allowing the rollout to proceed before the issue
# is caught. 30s soak catches flapping pods.
# Effective max rollout time with minReadySeconds:
# replicas × (pod-start-time + minReadySeconds + periodSeconds) / parallelism
progressDeadlineSeconds — Stalled Rollout Detection
If a rollout makes no progress for progressDeadlineSeconds (default 600s), the Deployment is marked with condition Progressing=False, reason=ProgressDeadlineExceeded. The rollout is NOT automatically rolled back — it stalls and waits for operator intervention.
spec:
progressDeadlineSeconds: 300 # fail with DeadlineExceeded after 5 min of no progress
# Check deployment conditions:
kubectl rollout status deployment/web-app
# Waiting for deployment "web-app" rollout to finish: 1 out of 4 new replicas have been updated...
# error: deployment "web-app" exceeded its progress deadline
kubectl get deployment web-app -o jsonpath='{.status.conditions}' | jq .
# [
# {"type":"Available","status":"True",...},
# {"type":"Progressing","status":"False","reason":"ProgressDeadlineExceeded",...}
# ]
| Condition Type | Status | Reason | Meaning |
|---|---|---|---|
| Available | True | MinimumReplicasAvailable | ≥ desired−maxUnavailable pods available |
| Available | False | MinimumReplicasUnavailable | Fewer than minimum required pods available |
| Progressing | True | NewReplicaSetAvailable | Rollout completed successfully |
| Progressing | True | ReplicaSetUpdated | Rollout in progress |
| Progressing | False | ProgressDeadlineExceeded | Rollout stalled; deadline hit |
| ReplicaFailure | True | FailedCreate | ReplicaSet cannot create pods (quota, RBAC, admission) |
Revision History and Rollback
How Revisions Work
Each time the pod template changes, a new revision is created. The Deployment controller assigns increasing revision numbers by annotating the ReplicaSet with deployment.kubernetes.io/revision. Old ReplicaSets (scaled to zero) are kept up to revisionHistoryLimit (default 10). Keeping these is what makes rollback possible.
# Inspect rollout history
kubectl rollout history deployment/web-app
# REVISION CHANGE-CAUSE
# 1 Initial deployment
# 2 Update to v1.1.0
# 3 Update to v1.2.0
# 4 Update to v2.0.0 (current)
# Record change cause (annotate template — add to CI pipeline)
kubectl annotate deployment/web-app kubernetes.io/change-cause="Deploy v2.0.0: adds OAuth2 login"
# OR set in manifest:
metadata:
annotations:
kubernetes.io/change-cause: "Deploy v2.0.0: adds OAuth2 login"
# Inspect a specific revision
kubectl rollout history deployment/web-app --revision=3
# Shows the full pod template at that revision
# See the underlying ReplicaSets with their revision annotations
kubectl get rs -l app=web-app \
-o custom-columns='NAME:.metadata.name,REVISION:.metadata.annotations.deployment\.kubernetes\.io/revision,REPLICAS:.status.replicas'
Rollback Commands
# Roll back to the previous revision (most common)
kubectl rollout undo deployment/web-app
# Roll back to a specific revision
kubectl rollout undo deployment/web-app --to-revision=2
# Rollback is just a new rollout using the old ReplicaSet's pod template
# It increments the revision number (rollback to rev 2 creates rev 5, not rev 2)
# The rolled-back ReplicaSet is promoted (scaled up), not recreated
# Verify rollback completed
kubectl rollout status deployment/web-app
kubectl get deployment web-app -o jsonpath='{.spec.template.spec.containers[0].image}'
Rolling back a Deployment restores the pod template (image, env vars, resource limits) to a previous state. It does NOT restore ConfigMap or Secret contents that the pods reference. If a bad ConfigMap was deployed alongside the image update, rolling back the Deployment while the bad ConfigMap remains will still result in broken pods. Always version ConfigMaps alongside Deployments for config-driven rollbacks.
Pausing and Resuming Rollouts
Pausing a Deployment freezes the rollout at its current state — new pods matching the new template are still created (up to maxSurge), but the Deployment controller stops scaling down old pods until resumed. Useful for manual canary validation.
# Update image (triggers rollout)
kubectl set image deployment/web-app web-app=myapp:v2.1.0
# Immediately pause the rollout (freeze mid-flight)
kubectl rollout pause deployment/web-app
# At this point: some pods run new version, some run old
# Manually inspect the new pods:
kubectl get pods -l app=web-app -o wide
kubectl exec -it POD_NAME -- curl localhost:8080/version
# If new pods look good, resume the rollout
kubectl rollout resume deployment/web-app
# If new pods have issues, roll back while paused
kubectl rollout undo deployment/web-app # works even while paused
kubectl rollout resume deployment/web-app # must resume after undo to unstick
Zero-Downtime Deployment Checklist
A Deployment by itself does not guarantee zero downtime. Zero-downtime requires a combination of several mechanisms working together:
| # | Requirement | Why It Matters | Configuration |
|---|---|---|---|
| 1 | readinessProbe defined | Without it, pods are added to Endpoints immediately after starting — before the app is ready to serve traffic | readinessProbe: httpGet: path: /ready |
| 2 | maxUnavailable: 0 | Ensures capacity never drops below desired during rollout | strategy.rollingUpdate.maxUnavailable: 0 |
| 3 | preStop sleep | Prevents kube-proxy/Endpoints propagation race: new requests routed to terminating pods | lifecycle.preStop.exec: ["sleep","5"] |
| 4 | terminationGracePeriodSeconds sufficient | App needs time to finish in-flight requests after SIGTERM | ≥ preStop duration + max request duration + buffer |
| 5 | PodDisruptionBudget | Protects against node drain and cluster operations removing too many pods at once | minAvailable: N-1 or maxUnavailable: 1 |
| 6 | minReadySeconds | Catches flapping pods that pass readiness probe briefly then crash | minReadySeconds: 10 (tune per app stability) |
| 7 | replicas ≥ 2 | Single-replica Deployments always have brief downtime even with maxUnavailable:0 (can't create surge if not enough capacity in cluster) | At least 2 replicas in production |
| 8 | Pod anti-affinity | Without it, all replicas may land on one node — node failure = full outage | podAntiAffinity: topologyKey: kubernetes.io/hostname |
Blue/Green Deployment Pattern
Blue/green runs both the old (blue) and new (green) versions simultaneously. Traffic is switched atomically by updating the Service selector. Rollback is instant — switch the selector back.
# Blue Deployment (current live)
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app-blue
spec:
replicas: 4
selector:
matchLabels:
app: web-app
slot: blue
template:
metadata:
labels:
app: web-app
slot: blue
spec:
containers:
- name: web-app
image: myapp:v1.0.0
---
# Green Deployment (new version, deployed alongside blue)
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app-green
spec:
replicas: 4
selector:
matchLabels:
app: web-app
slot: green
template:
metadata:
labels:
app: web-app
slot: green
spec:
containers:
- name: web-app
image: myapp:v2.0.0
---
# Service — points at blue initially
apiVersion: v1
kind: Service
metadata:
name: web-app
spec:
selector:
app: web-app
slot: blue # ← change to "green" to cut over; change back to rollback
ports:
- port: 80
targetPort: 8080
# Cut over to green (atomic — no rolling; zero downtime if green is healthy)
kubectl patch service web-app -p '{"spec":{"selector":{"slot":"green"}}}'
# Rollback: switch back to blue (instant)
kubectl patch service web-app -p '{"spec":{"selector":{"slot":"blue"}}}'
# After successful validation, scale down blue to free resources
kubectl scale deployment web-app-blue --replicas=0
Blue/green requires 2× the normal pod count during the switchover window. Ensure the cluster has sufficient headroom before deploying the green version. For cost optimization, scale the standby (blue after cutover) to zero rather than deleting it — keeping it at zero allows instant rollback by scaling up and switching the Service selector.
Canary Deployment Pattern
A canary release routes a small percentage of traffic to the new version to validate it under real load before full rollout. Two main approaches: replica-ratio (built-in) or Ingress-based weighted routing (more precise).
Replica-Ratio Canary (Simple)
# Stable deployment: 9 replicas
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app-stable
spec:
replicas: 9
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: myapp:v1.0.0
---
# Canary deployment: 1 replica (≈10% of traffic if Service uses both)
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app-canary
spec:
replicas: 1
selector:
matchLabels:
app: web-app # same label → same Service selector
template:
metadata:
labels:
app: web-app
track: canary # extra label for monitoring differentiation
spec:
containers:
- name: web-app
image: myapp:v2.0.0
---
# Single Service — load-balances across ALL 10 pods (9 stable + 1 canary = 10% canary)
apiVersion: v1
kind: Service
metadata:
name: web-app
spec:
selector:
app: web-app # matches both stable and canary pods
Ingress-Based Weighted Canary (Precise)
# NGINX Ingress Controller canary annotations:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-app-canary
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10" # exactly 10% of requests
# Header-based canary (bypass weight for internal testing):
# nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
# nginx.ingress.kubernetes.io/canary-by-header-value: "always"
spec:
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-app-canary-svc
port:
number: 80
HPA Interaction
When an HPA targets a Deployment, the HPA controller writes to spec.replicas. This creates a potential conflict: if you also set spec.replicas in your manifest and apply it, you will override the HPA's current scale. The conventional solution is to omit spec.replicas from the manifest (or set it only on initial creation) and let the HPA own it thereafter.
# WRONG: applying this manifest will reset replicas to 3, overriding HPA
spec:
replicas: 3 # ← this will stomp the HPA's current value every kubectl apply
# RIGHT: omit replicas from the manifest when HPA is managing it
spec:
# replicas: field absent — HPA manages this field
selector:
matchLabels:
app: web-app
template: ...
# OR: use server-side apply with field ownership
# The HPA will own the replicas field; kubectl apply will not overwrite it
kubectl apply --server-side deployment.yaml
An HPA with minReplicas: 2 prevents the Deployment from being scaled below 2, even if the metric drops to zero. Always set minReplicas ≥ 2 in production to maintain availability. Without it, a metric dip (e.g., no traffic at 3am) scales the Deployment to minReplicas: 1 by default, leaving a single replica exposed to node failures.
Forcing Rollout on ConfigMap/Secret Changes
Kubernetes does not automatically roll out a Deployment when a referenced ConfigMap or Secret changes. Pods continue running with the old config until manually triggered.
# Option 1: kubectl rollout restart (adds restartedAt annotation to pod template)
kubectl rollout restart deployment/web-app
# Option 2: checksum annotation in pod template (GitOps-friendly)
# In your deployment manifest, compute a checksum of the ConfigMap content:
template:
metadata:
annotations:
checksum/config: "sha256:abc123..." # update this when ConfigMap changes
# With Helm: use the built-in template function:
# checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
# Option 3: immutable ConfigMaps with versioned names
# ConfigMap: app-config-v7 → new ConfigMap: app-config-v8
# Update envFrom.configMapRef.name to app-config-v8 in deployment → triggers rollout
# Best for GitOps: the manifest change itself IS the trigger
Image Digest Pinning
# Mutable tag (DANGEROUS): same tag can point to different images on different nodes
image: myapp:v2.1.0
# Immutable digest (SAFE): always the exact same image bytes
image: myapp:v2.1.0@sha256:a3b4c5d6e7f8...
# Get the digest for a tag:
docker manifest inspect myapp:v2.1.0 | jq -r '.config.digest'
crane digest myapp:v2.1.0 # github.com/google/go-containerregistry
# Cosign for supply-chain verification:
cosign verify --key cosign.pub myapp:v2.1.0@sha256:a3b4c5d6...
# In CI: resolve tag to digest at build time, pin digest in manifest
# Keel, Renovate, or Flux image automation can manage digest updates
Useful Operational Commands
# Watch rollout progress in real time
kubectl rollout status deployment/web-app --watch
# Force restart without template change
kubectl rollout restart deployment/web-app
# Scale immediately (bypasses HPA until HPA next reconciles)
kubectl scale deployment/web-app --replicas=8
# Update image
kubectl set image deployment/web-app web-app=myapp:v2.2.0
# Set resource requests/limits
kubectl set resources deployment/web-app --containers=web-app \
--requests=cpu=500m,memory=512Mi --limits=memory=1Gi
# Show all ReplicaSets for a Deployment (to see revision history)
kubectl get rs -l app=web-app --show-labels
# Annotate a revision for change-cause tracking
kubectl annotate deployment/web-app kubernetes.io/change-cause="v2.1.0: fix memory leak"
# Describe to see events and conditions
kubectl describe deployment/web-app
Metrics, Alerts, and Runbooks
Key Metrics
| Metric | Source | Alert Condition |
|---|---|---|
kube_deployment_status_replicas_available | kube-state-metrics | < kube_deployment_spec_replicas for > 5 min |
kube_deployment_status_replicas_updated | kube-state-metrics | < kube_deployment_spec_replicas for > 15 min → rollout stalled |
kube_deployment_status_observed_generation != kube_deployment_metadata_generation | kube-state-metrics | Controller hasn't processed latest spec update |
kube_deployment_spec_paused | kube-state-metrics | Deployment paused > 1h (forgotten pause) |
kube_replicaset_status_ready_replicas | kube-state-metrics | Multiple non-zero RSes (old RS not scaling down = stalled) |
Alerting Rules
groups:
- name: deployment-health
rules:
- alert: DeploymentReplicasMismatch
expr: |
kube_deployment_spec_replicas
!= kube_deployment_status_replicas_available
for: 5m
labels:
severity: warning
annotations:
summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has fewer available replicas than desired"
- alert: DeploymentRolloutStuck
expr: |
kube_deployment_status_replicas_updated
!= kube_deployment_spec_replicas
for: 15m
labels:
severity: warning
annotations:
summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} rollout stuck for 15 minutes"
- alert: DeploymentProgressDeadlineExceeded
expr: |
kube_deployment_status_condition{condition="Progressing",status="false"} == 1
for: 1m
labels:
severity: critical
annotations:
summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} progress deadline exceeded"
- alert: DeploymentPausedTooLong
expr: |
kube_deployment_spec_paused == 1
for: 1h
labels:
severity: warning
annotations:
summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has been paused for over 1 hour"
Runbooks
Check new RS pods: kubectl describe rs NEW_RS_NAME for events. Common causes: image pull error (wrong tag, missing pull secret), readiness probe never passing (app bug, wrong port), resource quota exceeded, admission webhook rejection. Fix the underlying cause; rollout continues automatically or roll back with kubectl rollout undo.
Check pod status: kubectl get pods -l app=NAME. If pods are Pending: node capacity, affinity, PVC issues. If CrashLoopBackOff: kubectl logs POD --previous for exit reason. If ImagePullBackOff: verify image tag and pull secrets. Check kubectl describe deployment NAME for ReplicaFailure condition.
Immediate rollback: kubectl rollout undo deployment/NAME. If rollout not yet complete (old RS still exists): rollback re-scales old RS up, new RS down — faster than new rollout. Verify: kubectl rollout status deployment/NAME. If rollback itself stalls: check if old RS pods also fail (bad ConfigMap still present).
Symptom: desired replicas keep resetting. Check if HPA exists: kubectl get hpa. If HPA manages the Deployment, remove spec.replicas from your manifest or use server-side apply. If both kubectl and HPA are writing replicas, the last writer wins — use field ownership via SSA to resolve permanently.
Pods reference old config. Trigger restart: kubectl rollout restart deployment/NAME. Verify pods restarted: kubectl get pods -l app=NAME -w. For future: adopt versioned ConfigMap names or checksum annotations in pod template. Verify new pods have updated config: kubectl exec POD -- env | grep KEY.
Best Practices
- Always define a readinessProbe — a Deployment without a readiness probe marks pods Available immediately after the container starts, before the application is ready. This causes traffic to hit unready pods during rollouts. The readiness probe is the single most impactful zero-downtime configuration.
- Use
maxUnavailable: 0for production APIs — this guarantees full capacity throughout the rollout at the cost of needingmaxSurge ≥ 1(temporary extra capacity). For resource-constrained environments,maxSurge: 0, maxUnavailable: 1accepts brief partial capacity reduction. - Set
revisionHistoryLimit: 5not 0 — keeping zero history prevents rollbacks. 5 revisions is a reasonable balance between rollback capability and etcd storage. Never set to 0 in production. - Record change causes with
kubernetes.io/change-causeannotations —kubectl rollout historyis meaningless without this. Automate it in your CI pipeline: annotate every deployment with the git commit SHA and PR title. - Pin image digests, not tags — a mutable tag can silently point to a different image digest if re-pushed. Digest-pinned images are fully reproducible and support supply-chain verification with Cosign. Automate digest resolution in CI.
- Set
minReadySeconds: 10–30— without a soak time, a pod that passes readiness probe once and immediately crashes is counted as available, causing the rollout to proceed. Even a 10-second soak catches most flapping pod patterns. - Create a PodDisruptionBudget alongside every Deployment — rolling updates respect PDBs. Without one, a node drain during a rolling update can simultaneously remove more pods than
maxUnavailableallows, causing an outage. See PodDisruptionBudgets. - Omit
spec.replicasfrom manifests managed by HPA — applying a manifest with a hard-coded replicas value will override the HPA's current scale on everykubectl apply. Use server-side apply with field ownership, or simply omit the field after initial creation.