Deployments — Kubernetes Docs

▶ What This Page Covers

Deployment → ReplicaSet → Pod ownership chain

Full Deployment spec anatomy with all fields

RollingUpdate strategy: maxSurge and maxUnavailable mechanics

Recreate strategy: full downtime, use cases

RollingUpdate step-by-step sequence diagram

maxSurge/maxUnavailable: absolute vs percentage values, edge cases

Revision history: revisionHistoryLimit, how old ReplicaSets are retained

Rollback mechanics: kubectl rollout undo, --to-revision, history inspection

Pausing and resuming a rollout

Zero-downtime deployment checklist: readiness probe, PDB, preStop, minReadySeconds

minReadySeconds — soak time before pod counts as available

progressDeadlineSeconds — stalled rollout detection

Deployment conditions: Available, Progressing, ReplicaFailure

Blue/green deployment pattern with Services

Canary deployment pattern with weighted Services or Ingress

HPA interaction with Deployments: who wins on replicas

ConfigMap/Secret updates and how to force pod rollout

Image digest pinning for reproducible deployments

Deployment anti-patterns: missing readiness probe, no PDB, :latest image

5 metrics + 4 alerts + 5 runbooks + 8 best practices

Deployment → ReplicaSet → Pod Ownership Chain

A Deployment never creates pods directly. Instead, the Deployment controller creates and manages ReplicaSets. Each distinct pod template version gets its own ReplicaSet. During a rolling update, the Deployment controller scales up the new ReplicaSet and scales down the old one simultaneously — the old ReplicaSet is retained (scaled to zero) for rollback purposes.

Deployment object model: Deployment: nginx-deploy (desired: 4 replicas, image: nginx:1.25) │ ├── ReplicaSet: nginx-deploy-7d9f8b6c4 (current, 4/4 pods) ← pod-template-hash=7d9f8b6c4 │ ├── Pod: nginx-deploy-7d9f8b6c4-xk9p2 │ ├── Pod: nginx-deploy-7d9f8b6c4-mn7q1 │ ├── Pod: nginx-deploy-7d9f8b6c4-r4t8w │ └── Pod: nginx-deploy-7d9f8b6c4-vb3s6 │ └── ReplicaSet: nginx-deploy-5c8f9d2a1 (previous, 0/0 pods) ← retained for rollback (scaled to zero after successful rollout) After update to nginx:1.26: Deployment: nginx-deploy (desired: 4 replicas, image: nginx:1.26) │ ├── ReplicaSet: nginx-deploy-9a2b7c5e8 (new, scaling up) └── ReplicaSet: nginx-deploy-7d9f8b6c4 (old, scaling down) pod-template-hash label: auto-injected by Deployment controller to distinguish RS pods Each RS owns its pods via ownerReferences + pod-template-hash label selector

Full Deployment Spec

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: production
  labels:
    app: web-app
  annotations:
    deployment.kubernetes.io/revision: "7"   # auto-managed; do not set manually
spec:
  replicas: 4

  # Selector is IMMUTABLE after creation
  selector:
    matchLabels:
      app: web-app       # must be a subset of template.metadata.labels

  # Revision history: number of old ReplicaSets to retain (default 10)
  revisionHistoryLimit: 5

  # Rollout is marked failed if not complete within this many seconds (default 600)
  progressDeadlineSeconds: 300

  # Minimum seconds a new pod must be ready (all probes passing) before
  # it counts as "available" — soak time (default 0)
  minReadySeconds: 10

  strategy:
    type: RollingUpdate     # RollingUpdate (default) | Recreate
    rollingUpdate:
      maxSurge: 1           # max pods ABOVE desired during update (absolute or %)
      maxUnavailable: 0     # max pods BELOW desired during update (absolute or %)
      # maxSurge: 25%       # percentage form: ceil(replicas * 0.25)
      # maxUnavailable: 25% # percentage form: floor(replicas * 0.25)
      # Both cannot be 0 simultaneously

  template:
    metadata:
      labels:
        app: web-app        # must satisfy selector.matchLabels
        version: v2.1.0     # informational; changing this triggers rollout
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: web-app
        image: myregistry/web-app:v2.1.0@sha256:abc123...  # digest pin
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            memory: "512Mi"
        readinessProbe:            # REQUIRED for zero-downtime rolling updates
          httpGet:
            path: /ready
            port: 8080
          periodSeconds: 5
          failureThreshold: 3
        lifecycle:
          preStop:
            exec:
              command: ["sh", "-c", "sleep 5"]  # drain race prevention

RollingUpdate Strategy

RollingUpdate replaces pods incrementally so that the application remains available throughout the update. The speed and safety of the rollout is controlled by two parameters that work in opposite directions:

Parameter	Controls	Default	Effect of Higher Value
`maxSurge`	Max pods above desired during update	25%	Faster rollout (more new pods at once), higher temporary resource usage
`maxUnavailable`	Max pods below desired that can be unavailable	25%	Faster rollout (removes old pods faster), lower availability during update

Step-by-Step Sequence (replicas=4, maxSurge=1, maxUnavailable=0)

Initial state: 4 old pods running (v1) [v1] [v1] [v1] [v1] running=4, desired=4 Step 1: Create 1 new pod (surge to 5 total = desired+maxSurge) [v1] [v1] [v1] [v1] [v2-pending] Step 2: New pod becomes Ready [v1] [v1] [v1] [v1] [v2] ← v2 ready; now 5 available ≥ desired-maxUnavailable (4) Step 3: Delete 1 old pod (back to desired count=4) [v1] [v1] [v1] [v2] ← 4 available; at minimum Step 4: Create another new pod [v1] [v1] [v1] [v2] [v2-pending] Step 5: New pod Ready, delete old [v1] [v1] [v2] [v2] ... repeat until all 4 are v2 ... Final state: 4 new pods (v2) [v2] [v2] [v2] [v2] Total pods alive at any time: 4–5 (never below 4, never above 5) Total time ≈ 4 × (pod-start-time + minReadySeconds)

maxSurge=25% and maxUnavailable=25% (defaults, replicas=8)

replicas=8, maxSurge=25%→ceil(8×0.25)=2, maxUnavailable=25%→floor(8×0.25)=2 Pods alive range: [desired - maxUnavailable, desired + maxSurge] = [6, 10] Rollout can proceed 2 pods at a time (faster but briefly less available) Step 1: Create 2 new pods + mark 2 old as terminating simultaneously New: [v2][v2] pending Old: 8 → 6 available (2 terminating = maxUnavailable) Total: up to 10 (= 8 + maxSurge)

Special Cases and Edge Cases

Configuration	Behavior	Use Case
`maxSurge=1, maxUnavailable=0`	Always at full capacity; one-at-a-time replacement; slowest but safest	Production APIs where capacity must never drop
`maxSurge=0, maxUnavailable=1`	No extra capacity needed (good for resource-constrained clusters); one pod terminated before new one starts	Resource-constrained environments; acceptable brief capacity reduction
`maxSurge=100%, maxUnavailable=0`	Doubles pod count temporarily; all new pods created before any old ones deleted	Maximum speed with guaranteed capacity; needs 2× resources
`maxSurge=0, maxUnavailable=100%`	Effectively Recreate — all old pods deleted before new ones start	Don't use — same as Recreate but less explicit
replicas=1, maxSurge=0, maxUnavailable=1	Brief downtime (old pod terminated, new started); effectively Recreate for single replicas	Dev environments; single-replica workloads that can afford brief downtime

Percentage Rounding Rules

maxSurge percentages are rounded up (ceiling). maxUnavailable percentages are rounded down (floor). For replicas=3, maxSurge=33%: ceil(3×0.33)=1. For replicas=3, maxUnavailable=33%: floor(3×0.33)=0. This means with 3 replicas and 33%/33% defaults, you effectively get maxSurge=1, maxUnavailable=0 — the safe profile.

Recreate Strategy

Recreate terminates all existing pods before creating new ones. There is a period of zero availability between the old pods terminating and the new pods becoming ready. Use it only when two versions of the app cannot run simultaneously — exclusive database schema migrations, singleton processes, apps that conflict on shared resources.

strategy:
  type: Recreate
  # No rollingUpdate block — Recreate has no parameters

# Timeline:
# 1. All old pods receive SIGTERM simultaneously
# 2. Wait for all old pods to terminate (terminationGracePeriodSeconds)
# 3. All new pods created simultaneously
# 4. Wait for all new pods to become Ready
# Downtime = termination time + new pod startup time

minReadySeconds and progressDeadlineSeconds

minReadySeconds — Soak Time

minReadySeconds specifies the minimum number of seconds a newly created pod must be continuously Ready (all readiness probes passing) before it counts as available. The Deployment controller waits this duration before considering the pod stable and proceeding to the next step of the rollout.

spec:
  minReadySeconds: 30   # pod must stay Ready for 30s before it's "available"
  # Without this (default 0): a pod that passes readiness probe once then immediately
  # crashes might appear available, allowing the rollout to proceed before the issue
  # is caught. 30s soak catches flapping pods.

# Effective max rollout time with minReadySeconds:
# replicas × (pod-start-time + minReadySeconds + periodSeconds) / parallelism

progressDeadlineSeconds — Stalled Rollout Detection

If a rollout makes no progress for progressDeadlineSeconds (default 600s), the Deployment is marked with condition Progressing=False, reason=ProgressDeadlineExceeded. The rollout is NOT automatically rolled back — it stalls and waits for operator intervention.

spec:
  progressDeadlineSeconds: 300   # fail with DeadlineExceeded after 5 min of no progress

# Check deployment conditions:
kubectl rollout status deployment/web-app
# Waiting for deployment "web-app" rollout to finish: 1 out of 4 new replicas have been updated...
# error: deployment "web-app" exceeded its progress deadline

kubectl get deployment web-app -o jsonpath='{.status.conditions}' | jq .
# [
#   {"type":"Available","status":"True",...},
#   {"type":"Progressing","status":"False","reason":"ProgressDeadlineExceeded",...}
# ]

Condition Type	Status	Reason	Meaning
Available	True	MinimumReplicasAvailable	≥ desired−maxUnavailable pods available
Available	False	MinimumReplicasUnavailable	Fewer than minimum required pods available
Progressing	True	NewReplicaSetAvailable	Rollout completed successfully
Progressing	True	ReplicaSetUpdated	Rollout in progress
Progressing	False	ProgressDeadlineExceeded	Rollout stalled; deadline hit
ReplicaFailure	True	FailedCreate	ReplicaSet cannot create pods (quota, RBAC, admission)

Revision History and Rollback

How Revisions Work

Each time the pod template changes, a new revision is created. The Deployment controller assigns increasing revision numbers by annotating the ReplicaSet with deployment.kubernetes.io/revision. Old ReplicaSets (scaled to zero) are kept up to revisionHistoryLimit (default 10). Keeping these is what makes rollback possible.

# Inspect rollout history
kubectl rollout history deployment/web-app
# REVISION  CHANGE-CAUSE
# 1         Initial deployment
# 2         Update to v1.1.0
# 3         Update to v1.2.0
# 4         Update to v2.0.0 (current)

# Record change cause (annotate template — add to CI pipeline)
kubectl annotate deployment/web-app kubernetes.io/change-cause="Deploy v2.0.0: adds OAuth2 login"
# OR set in manifest:
metadata:
  annotations:
    kubernetes.io/change-cause: "Deploy v2.0.0: adds OAuth2 login"

# Inspect a specific revision
kubectl rollout history deployment/web-app --revision=3
# Shows the full pod template at that revision

# See the underlying ReplicaSets with their revision annotations
kubectl get rs -l app=web-app \
  -o custom-columns='NAME:.metadata.name,REVISION:.metadata.annotations.deployment\.kubernetes\.io/revision,REPLICAS:.status.replicas'

Rollback Commands

# Roll back to the previous revision (most common)
kubectl rollout undo deployment/web-app

# Roll back to a specific revision
kubectl rollout undo deployment/web-app --to-revision=2

# Rollback is just a new rollout using the old ReplicaSet's pod template
# It increments the revision number (rollback to rev 2 creates rev 5, not rev 2)
# The rolled-back ReplicaSet is promoted (scaled up), not recreated

# Verify rollback completed
kubectl rollout status deployment/web-app
kubectl get deployment web-app -o jsonpath='{.spec.template.spec.containers[0].image}'

Rollback Does Not Restore ConfigMaps or Secrets

Rolling back a Deployment restores the pod template (image, env vars, resource limits) to a previous state. It does NOT restore ConfigMap or Secret contents that the pods reference. If a bad ConfigMap was deployed alongside the image update, rolling back the Deployment while the bad ConfigMap remains will still result in broken pods. Always version ConfigMaps alongside Deployments for config-driven rollbacks.

Pausing and Resuming Rollouts

Pausing a Deployment freezes the rollout at its current state — new pods matching the new template are still created (up to maxSurge), but the Deployment controller stops scaling down old pods until resumed. Useful for manual canary validation.

# Update image (triggers rollout)
kubectl set image deployment/web-app web-app=myapp:v2.1.0

# Immediately pause the rollout (freeze mid-flight)
kubectl rollout pause deployment/web-app

# At this point: some pods run new version, some run old
# Manually inspect the new pods:
kubectl get pods -l app=web-app -o wide
kubectl exec -it POD_NAME -- curl localhost:8080/version

# If new pods look good, resume the rollout
kubectl rollout resume deployment/web-app

# If new pods have issues, roll back while paused
kubectl rollout undo deployment/web-app    # works even while paused
kubectl rollout resume deployment/web-app  # must resume after undo to unstick

Zero-Downtime Deployment Checklist

A Deployment by itself does not guarantee zero downtime. Zero-downtime requires a combination of several mechanisms working together:

#	Requirement	Why It Matters	Configuration
1	readinessProbe defined	Without it, pods are added to Endpoints immediately after starting — before the app is ready to serve traffic	`readinessProbe: httpGet: path: /ready`
2	maxUnavailable: 0	Ensures capacity never drops below desired during rollout	`strategy.rollingUpdate.maxUnavailable: 0`
3	preStop sleep	Prevents kube-proxy/Endpoints propagation race: new requests routed to terminating pods	`lifecycle.preStop.exec: ["sleep","5"]`
4	terminationGracePeriodSeconds sufficient	App needs time to finish in-flight requests after SIGTERM	≥ preStop duration + max request duration + buffer
5	PodDisruptionBudget	Protects against node drain and cluster operations removing too many pods at once	`minAvailable: N-1` or `maxUnavailable: 1`
6	minReadySeconds	Catches flapping pods that pass readiness probe briefly then crash	`minReadySeconds: 10` (tune per app stability)
7	replicas ≥ 2	Single-replica Deployments always have brief downtime even with maxUnavailable:0 (can't create surge if not enough capacity in cluster)	At least 2 replicas in production
8	Pod anti-affinity	Without it, all replicas may land on one node — node failure = full outage	`podAntiAffinity: topologyKey: kubernetes.io/hostname`

Blue/Green Deployment Pattern

Blue/green runs both the old (blue) and new (green) versions simultaneously. Traffic is switched atomically by updating the Service selector. Rollback is instant — switch the selector back.

# Blue Deployment (current live)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app-blue
spec:
  replicas: 4
  selector:
    matchLabels:
      app: web-app
      slot: blue
  template:
    metadata:
      labels:
        app: web-app
        slot: blue
    spec:
      containers:
      - name: web-app
        image: myapp:v1.0.0
---
# Green Deployment (new version, deployed alongside blue)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app-green
spec:
  replicas: 4
  selector:
    matchLabels:
      app: web-app
      slot: green
  template:
    metadata:
      labels:
        app: web-app
        slot: green
    spec:
      containers:
      - name: web-app
        image: myapp:v2.0.0
---
# Service — points at blue initially
apiVersion: v1
kind: Service
metadata:
  name: web-app
spec:
  selector:
    app: web-app
    slot: blue      # ← change to "green" to cut over; change back to rollback
  ports:
  - port: 80
    targetPort: 8080

# Cut over to green (atomic — no rolling; zero downtime if green is healthy)
kubectl patch service web-app -p '{"spec":{"selector":{"slot":"green"}}}'

# Rollback: switch back to blue (instant)
kubectl patch service web-app -p '{"spec":{"selector":{"slot":"blue"}}}'

# After successful validation, scale down blue to free resources
kubectl scale deployment web-app-blue --replicas=0

Blue/Green Resource Cost

Blue/green requires 2× the normal pod count during the switchover window. Ensure the cluster has sufficient headroom before deploying the green version. For cost optimization, scale the standby (blue after cutover) to zero rather than deleting it — keeping it at zero allows instant rollback by scaling up and switching the Service selector.

Canary Deployment Pattern

A canary release routes a small percentage of traffic to the new version to validate it under real load before full rollout. Two main approaches: replica-ratio (built-in) or Ingress-based weighted routing (more precise).

Replica-Ratio Canary (Simple)

# Stable deployment: 9 replicas
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app-stable
spec:
  replicas: 9
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: myapp:v1.0.0
---
# Canary deployment: 1 replica (≈10% of traffic if Service uses both)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app-canary
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web-app       # same label → same Service selector
  template:
    metadata:
      labels:
        app: web-app
        track: canary    # extra label for monitoring differentiation
    spec:
      containers:
      - name: web-app
        image: myapp:v2.0.0
---
# Single Service — load-balances across ALL 10 pods (9 stable + 1 canary = 10% canary)
apiVersion: v1
kind: Service
metadata:
  name: web-app
spec:
  selector:
    app: web-app    # matches both stable and canary pods

Ingress-Based Weighted Canary (Precise)

# NGINX Ingress Controller canary annotations:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-app-canary
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"  # exactly 10% of requests
    # Header-based canary (bypass weight for internal testing):
    # nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
    # nginx.ingress.kubernetes.io/canary-by-header-value: "always"
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-app-canary-svc
            port:
              number: 80

HPA Interaction

When an HPA targets a Deployment, the HPA controller writes to spec.replicas. This creates a potential conflict: if you also set spec.replicas in your manifest and apply it, you will override the HPA's current scale. The conventional solution is to omit spec.replicas from the manifest (or set it only on initial creation) and let the HPA own it thereafter.

# WRONG: applying this manifest will reset replicas to 3, overriding HPA
spec:
  replicas: 3    # ← this will stomp the HPA's current value every kubectl apply

# RIGHT: omit replicas from the manifest when HPA is managing it
spec:
  # replicas: field absent — HPA manages this field
  selector:
    matchLabels:
      app: web-app
  template: ...

# OR: use server-side apply with field ownership
# The HPA will own the replicas field; kubectl apply will not overwrite it
kubectl apply --server-side deployment.yaml

HPA minReplicas Protects Against Scale-to-Zero

An HPA with minReplicas: 2 prevents the Deployment from being scaled below 2, even if the metric drops to zero. Always set minReplicas ≥ 2 in production to maintain availability. Without it, a metric dip (e.g., no traffic at 3am) scales the Deployment to minReplicas: 1 by default, leaving a single replica exposed to node failures.

Forcing Rollout on ConfigMap/Secret Changes

Kubernetes does not automatically roll out a Deployment when a referenced ConfigMap or Secret changes. Pods continue running with the old config until manually triggered.

# Option 1: kubectl rollout restart (adds restartedAt annotation to pod template)
kubectl rollout restart deployment/web-app

# Option 2: checksum annotation in pod template (GitOps-friendly)
# In your deployment manifest, compute a checksum of the ConfigMap content:
template:
  metadata:
    annotations:
      checksum/config: "sha256:abc123..."   # update this when ConfigMap changes

# With Helm: use the built-in template function:
# checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}

# Option 3: immutable ConfigMaps with versioned names
# ConfigMap: app-config-v7 → new ConfigMap: app-config-v8
# Update envFrom.configMapRef.name to app-config-v8 in deployment → triggers rollout
# Best for GitOps: the manifest change itself IS the trigger

Image Digest Pinning

# Mutable tag (DANGEROUS): same tag can point to different images on different nodes
image: myapp:v2.1.0

# Immutable digest (SAFE): always the exact same image bytes
image: myapp:v2.1.0@sha256:a3b4c5d6e7f8...

# Get the digest for a tag:
docker manifest inspect myapp:v2.1.0 | jq -r '.config.digest'
crane digest myapp:v2.1.0   # github.com/google/go-containerregistry

# Cosign for supply-chain verification:
cosign verify --key cosign.pub myapp:v2.1.0@sha256:a3b4c5d6...

# In CI: resolve tag to digest at build time, pin digest in manifest
# Keel, Renovate, or Flux image automation can manage digest updates

Useful Operational Commands

# Watch rollout progress in real time
kubectl rollout status deployment/web-app --watch

# Force restart without template change
kubectl rollout restart deployment/web-app

# Scale immediately (bypasses HPA until HPA next reconciles)
kubectl scale deployment/web-app --replicas=8

# Update image
kubectl set image deployment/web-app web-app=myapp:v2.2.0

# Set resource requests/limits
kubectl set resources deployment/web-app --containers=web-app \
  --requests=cpu=500m,memory=512Mi --limits=memory=1Gi

# Show all ReplicaSets for a Deployment (to see revision history)
kubectl get rs -l app=web-app --show-labels

# Annotate a revision for change-cause tracking
kubectl annotate deployment/web-app kubernetes.io/change-cause="v2.1.0: fix memory leak"

# Describe to see events and conditions
kubectl describe deployment/web-app

Metrics, Alerts, and Runbooks

Key Metrics

Metric	Source	Alert Condition
`kube_deployment_status_replicas_available`	kube-state-metrics	< `kube_deployment_spec_replicas` for > 5 min
`kube_deployment_status_replicas_updated`	kube-state-metrics	< `kube_deployment_spec_replicas` for > 15 min → rollout stalled
`kube_deployment_status_observed_generation != kube_deployment_metadata_generation`	kube-state-metrics	Controller hasn't processed latest spec update
`kube_deployment_spec_paused`	kube-state-metrics	Deployment paused > 1h (forgotten pause)
`kube_replicaset_status_ready_replicas`	kube-state-metrics	Multiple non-zero RSes (old RS not scaling down = stalled)

Alerting Rules

groups:
- name: deployment-health
  rules:
  - alert: DeploymentReplicasMismatch
    expr: |
      kube_deployment_spec_replicas
        != kube_deployment_status_replicas_available
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has fewer available replicas than desired"

  - alert: DeploymentRolloutStuck
    expr: |
      kube_deployment_status_replicas_updated
        != kube_deployment_spec_replicas
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} rollout stuck for 15 minutes"

  - alert: DeploymentProgressDeadlineExceeded
    expr: |
      kube_deployment_status_condition{condition="Progressing",status="false"} == 1
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} progress deadline exceeded"

  - alert: DeploymentPausedTooLong
    expr: |
      kube_deployment_spec_paused == 1
    for: 1h
    labels:
      severity: warning
    annotations:
      summary: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has been paused for over 1 hour"

Runbooks

Rollout Stuck / ProgressDeadlineExceeded

Check new RS pods: kubectl describe rs NEW_RS_NAME for events. Common causes: image pull error (wrong tag, missing pull secret), readiness probe never passing (app bug, wrong port), resource quota exceeded, admission webhook rejection. Fix the underlying cause; rollout continues automatically or roll back with kubectl rollout undo.

Deployment Available < Desired

Check pod status: kubectl get pods -l app=NAME. If pods are Pending: node capacity, affinity, PVC issues. If CrashLoopBackOff: kubectl logs POD --previous for exit reason. If ImagePullBackOff: verify image tag and pull secrets. Check kubectl describe deployment NAME for ReplicaFailure condition.

Bad Deploy — Emergency Rollback

Immediate rollback: kubectl rollout undo deployment/NAME. If rollout not yet complete (old RS still exists): rollback re-scales old RS up, new RS down — faster than new rollout. Verify: kubectl rollout status deployment/NAME. If rollback itself stalls: check if old RS pods also fail (bad ConfigMap still present).

HPA Fighting Deployment replicas

Symptom: desired replicas keep resetting. Check if HPA exists: kubectl get hpa. If HPA manages the Deployment, remove spec.replicas from your manifest or use server-side apply. If both kubectl and HPA are writing replicas, the last writer wins — use field ownership via SSA to resolve permanently.

ConfigMap Change Not Taking Effect

Pods reference old config. Trigger restart: kubectl rollout restart deployment/NAME. Verify pods restarted: kubectl get pods -l app=NAME -w. For future: adopt versioned ConfigMap names or checksum annotations in pod template. Verify new pods have updated config: kubectl exec POD -- env | grep KEY.

Best Practices

Always define a readinessProbe — a Deployment without a readiness probe marks pods Available immediately after the container starts, before the application is ready. This causes traffic to hit unready pods during rollouts. The readiness probe is the single most impactful zero-downtime configuration.
Use maxUnavailable: 0 for production APIs — this guarantees full capacity throughout the rollout at the cost of needing maxSurge ≥ 1 (temporary extra capacity). For resource-constrained environments, maxSurge: 0, maxUnavailable: 1 accepts brief partial capacity reduction.
Set revisionHistoryLimit: 5 not 0 — keeping zero history prevents rollbacks. 5 revisions is a reasonable balance between rollback capability and etcd storage. Never set to 0 in production.
Record change causes with kubernetes.io/change-cause annotations — kubectl rollout history is meaningless without this. Automate it in your CI pipeline: annotate every deployment with the git commit SHA and PR title.
Pin image digests, not tags — a mutable tag can silently point to a different image digest if re-pushed. Digest-pinned images are fully reproducible and support supply-chain verification with Cosign. Automate digest resolution in CI.
Set minReadySeconds: 10–30 — without a soak time, a pod that passes readiness probe once and immediately crashes is counted as available, causing the rollout to proceed. Even a 10-second soak catches most flapping pod patterns.
Create a PodDisruptionBudget alongside every Deployment — rolling updates respect PDBs. Without one, a node drain during a rolling update can simultaneously remove more pods than maxUnavailable allows, causing an outage. See PodDisruptionBudgets.
Omit spec.replicas from manifests managed by HPA — applying a manifest with a hard-coded replicas value will override the HPA's current scale on every kubectl apply. Use server-side apply with field ownership, or simply omit the field after initial creation.

← Previous Pods Next → StatefulSets