π Page Coverage Checklist
Jobs & CronJobs
Run-to-completion workloads, indexed parallelism, and time-based scheduling
Unlike Deployments or StatefulSets, Jobs and CronJobs model finite work: they create Pods, track their completions, and succeed or fail as a whole unit. Understanding their controller mechanics β completion tracking, retry semantics, parallelism, and scheduling guarantees β is essential for building reliable batch pipelines, ETL jobs, database migrations, and any workload that must run once (or on a schedule) and stop.
Job Controller Mechanics
The Job controller lives in kube-controller-manager and reconciles the observed Pod states against the desired completion count. Its core loop:
Key invariant: the Job controller does not use a ReplicaSet intermediary. It owns Pods directly, identified by the auto-generated label controller-uid=<job-uid>. This selector is immutable once set.
Job Spec Anatomy
apiVersion: batch/v1
kind: Job
metadata:
name: data-migration-v2
namespace: platform
labels:
app: data-migration
version: v2
spec:
# --- Completion semantics ---
completions: 5 # Total successful pods needed (default: 1)
parallelism: 2 # Max pods running simultaneously (default: 1)
completionMode: Indexed # NonIndexed (default) or Indexed
# --- Failure handling ---
backoffLimit: 4 # Max pod failures before Job fails (default: 6)
backoffLimitPerIndex: 1 # Per-index failures before that index fails (1.26+)
activeDeadlineSeconds: 600 # Job-level timeout (wall clock)
# --- Cleanup ---
ttlSecondsAfterFinished: 3600 # Auto-delete 1hr after completion
# --- Suspension ---
suspend: false # Set true to pause (deletes active pods)
# --- Pod replacement ---
podReplacementPolicy: Failed # Failed (default) | TerminatingOrFailed
# --- Pod failure policy (1.25+) ---
podFailurePolicy:
rules:
- action: FailJob # Fail entire Job immediately
onExitCodes:
containerName: worker
operator: In
values: [42] # Exit code 42 = non-retriable error
- action: Ignore # Don't count toward backoffLimit
onPodConditions:
- type: DisruptionTarget # Node preemption / eviction
- action: FailIndex # Fail this index only (Indexed mode)
onExitCodes:
operator: In
values: [1, 2, 127]
# --- Selector (auto-generated; only set if manualSelector: true) ---
# selector:
# matchLabels:
# controller-uid: <job-uid>
template:
metadata:
labels:
app: data-migration
spec:
restartPolicy: Never # REQUIRED: Never or OnFailure
containers:
- name: worker
image: registry.example.com/migration:v2@sha256:abc123
env:
- name: JOB_COMPLETION_INDEX # Injected by controller (Indexed mode)
valueFrom:
fieldRef:
fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2
memory: 1Gi
volumeMounts:
- name: work-dir
mountPath: /work
volumes:
- name: work-dir
emptyDir: {}
# Batch pods typically don't need high availability
tolerations:
- key: batch
operator: Equal
value: "true"
effect: NoSchedule
nodeSelector:
workload-type: batch
Jobs require
restartPolicy: Never or restartPolicy: OnFailure. Always is forbidden β a pod that always restarts never counts as "succeeded." With Never, each failure creates a new Pod (counting against backoffLimit). With OnFailure, the same Pod is restarted in-place (container restart, not pod replacement).
Completion Modes
NonIndexed (default)
Pods are interchangeable. The controller creates pods until completions successful pods exist. Use this when each pod unit of work is identical (e.g., draining a shared queue).
- No completion index assigned to pods
- Any pod success counts toward total; pod failures are retried up to
backoffLimit - Work distribution is external (e.g., message queue, database cursor)
Indexed (GA 1.24)
Each pod gets a unique stable index from 0 to completions-1. Exactly one pod must succeed at each index for the Job to complete.
# Inject the index from annotation (recommended pattern)
env:
- name: JOB_COMPLETION_INDEX
valueFrom:
fieldRef:
fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
# Or from the downward API (alternative)
env:
- name: JOB_COMPLETION_INDEX
valueFrom:
fieldRef:
fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
The index can also be read from a static file mounted at /etc/podinfo/job-completion-index if you use a downwardAPI volume.
Indexed Job use cases
Shard-parallel ETL
Each index processes one partition/shard. Index β shard mapping is deterministic from the env var.
Distributed training
Index maps to worker rank. Used by PyTorch distributed and MPI-based frameworks.
Parallel test matrix
Each index runs one test suite combination. Scatter-gather after all complete.
Pod Failure Policy (1.25+)
The default behavior counts every pod failure toward backoffLimit. Pod failure policy lets you classify failures and take different actions β critically, distinguishing between application bugs (retriable) and infrastructure events (should not burn retries).
| Action | Effect | Best for |
|---|---|---|
FailJob | Immediately mark entire Job as failed, delete active pods | Non-retriable exit codes (config error, data corruption) |
FailIndex | Fail only this index (Indexed mode), others continue | Per-shard non-retriable error without killing the whole job |
Ignore | Don't count toward backoffLimit, pod failure is recorded but ignored | Node preemption, spot instance reclaim (DisruptionTarget condition) |
Count | Default: increment failure counter toward backoffLimit | Retriable transient errors |
podFailurePolicy:
rules:
# Rule 1: non-retriable application error
- action: FailJob
onExitCodes:
containerName: worker
operator: In
values: [42, 43] # Exit 42/43 = configuration / data error
# Rule 2: OOM β infrastructure issue, retry
- action: Count
onExitCodes:
operator: In
values: [137] # SIGKILL (OOM)
# Rule 3: preemption / eviction β don't waste retries
- action: Ignore
onPodConditions:
- type: DisruptionTarget
# Rule 4: per-index isolation (Indexed jobs only)
- action: FailIndex
onExitCodes:
operator: NotIn
values: [0] # Any non-zero exit in this index = index fails
Kubernetes 1.25+ sets the
DisruptionTarget pod condition when a Pod is terminated due to node pressure eviction, preemption, or kubectl drain. Using Ignore on this condition prevents spot instance reclaims from burning your retry budget.
Per-Index Backoff (1.26+)
backoffLimitPerIndex applies backoffLimit semantics independently to each index. Without it, failures across all indexes share a single global counter β one pathologically failing index can exhaust retries for all others.
spec:
completions: 100
parallelism: 10
completionMode: Indexed
backoffLimit: 10000 # High global limit (not meaningful with perIndex)
backoffLimitPerIndex: 2 # Each index may fail 2 times before that index fails
maxFailedIndexes: 5 # Job fails if more than 5 indexes fail (optional)
Suspend & Resume
Setting spec.suspend: true pauses a Job: all active Pods are deleted (terminated gracefully), and no new Pods are created. The Job's Suspended condition is set to True. Resuming (setting suspend: false) restores scheduling.
# Suspend a running job
kubectl patch job data-migration-v2 -p '{"spec":{"suspend":true}}'
# Resume it
kubectl patch job data-migration-v2 -p '{"spec":{"suspend":false}}'
# Check suspension status
kubectl get job data-migration-v2 -o jsonpath='{.status.conditions}'
Suspend/resume is the foundation for external job schedulers (Volcano, Yunikorn, Apache Airflow Kubernetes executor) that need to queue Jobs without creating Pods until resources are available.
TTL-based Cleanup
Finished Jobs (succeeded or failed) accumulate indefinitely without cleanup. The TTL-after-finished controller (GA 1.23) auto-deletes Jobs and their owned Pods after a configurable delay.
spec:
ttlSecondsAfterFinished: 86400 # Delete 24 hours after finish
# 0 = delete immediately after finish (cascade deletes pods too)
# omit = never auto-delete (manual cleanup required)
CronJobs manage their own Job history via
successfulJobsHistoryLimit and failingJobsHistoryLimit. If you also set ttlSecondsAfterFinished on the Job template, the TTL controller may delete Jobs before CronJob history limits are evaluated. Use one mechanism or the other, not both.
Pod Replacement Policy
podReplacementPolicy (1.28+) controls when replacement pods are created:
| Policy | Replacement created when | Use case |
|---|---|---|
Failed (default) | Pod reaches Failed phase (all containers terminated) | Standard jobs |
TerminatingOrFailed | Pod has deletionTimestamp (terminating) OR Failed phase | Long graceful termination; don't wait for full shutdown before scheduling replacement |
Job Patterns
Pattern 1: Work Queue (NonIndexed)
Pods pull tasks from an external queue (Redis, SQS, RabbitMQ, Kafka). When the queue is empty, pods exit 0. Set completions to the number of workers you want running; when all succeed (having drained the queue), the Job completes.
spec:
completions: null # Null = succeed when any pod succeeds AND
parallelism: 5 # all pods have exited (work queue pattern)
# With completions:null, Job completes when all pods succeed
# This is the "work queue" completion mode
template:
spec:
restartPolicy: OnFailure
containers:
- name: worker
image: queue-worker:v1
env:
- name: QUEUE_URL
value: redis://redis-svc:6379/0
When
completions is null and completionMode: NonIndexed, the Job succeeds when at least one pod succeeds and all pods have terminated. This is the classic work-queue model where workers self-terminate when the queue is empty.
Pattern 2: Indexed Fan-out / Fan-in
# Stage 1: fan-out (Indexed Job)
apiVersion: batch/v1
kind: Job
metadata:
name: process-shards
spec:
completions: 32
parallelism: 8
completionMode: Indexed
template:
spec:
restartPolicy: Never
containers:
- name: processor
image: shard-processor:v1
command: ["/bin/sh", "-c"]
args:
- |
INDEX=${JOB_COMPLETION_INDEX}
# Process shard $INDEX of 32 total shards
./process-shard --index=${INDEX} --total=32 --output=s3://bucket/shards/${INDEX}
env:
- name: JOB_COMPLETION_INDEX
valueFrom:
fieldRef:
fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
---
# Stage 2: fan-in (separate Job triggered by CI/workflow engine after fan-out completes)
apiVersion: batch/v1
kind: Job
metadata:
name: merge-shards
spec:
completions: 1
template:
spec:
restartPolicy: Never
containers:
- name: merger
image: shard-merger:v1
command: ["./merge-shards", "--input=s3://bucket/shards/", "--count=32"]
Pattern 3: Database Migration
apiVersion: batch/v1
kind: Job
metadata:
name: db-migrate-v3-2-0
annotations:
# Immutable label for audit trail
migration/version: "3.2.0"
migration/type: "schema"
spec:
completions: 1
parallelism: 1
backoffLimit: 0 # Schema migrations are NOT idempotent β fail fast
activeDeadlineSeconds: 300
ttlSecondsAfterFinished: 604800 # Keep for 1 week for debugging
template:
spec:
restartPolicy: Never
initContainers:
- name: wait-for-db
image: busybox:1.36
command: ['sh', '-c', 'until nc -z postgres-svc 5432; do sleep 2; done']
containers:
- name: migrator
image: myapp:v3.2.0
command: ["./migrate", "--direction=up", "--target=3.2.0"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
resources:
requests:
cpu: 100m
memory: 128Mi
Sidecar Termination Problem
Before Kubernetes 1.28, running sidecars (e.g., Istio envoy, Datadog agent, Vault agent) alongside a Job was problematic: when the main container exits 0, the Job wants to complete β but the sidecar is still running, so the Pod never reaches the Succeeded phase.
# Native sidecar in a Job (1.28+, stable 1.33)
spec:
template:
spec:
restartPolicy: Never
initContainers:
- name: vault-agent
image: vault:1.15
restartPolicy: Always # sidecar designation
args: ["agent", "-config=/vault/config"]
volumeMounts:
- name: vault-config
mountPath: /vault/config
- name: datadog-agent
image: datadog/agent:7
restartPolicy: Always # sidecar designation
env:
- name: DD_API_KEY
valueFrom:
secretKeyRef:
name: datadog-secret
key: api-key
containers:
- name: worker
image: batch-worker:v2
command: ["./process"]
With
restartPolicy: Always in an initContainer, Kubernetes treats it as a sidecar: it starts before regular containers, receives SIGTERM when the main container exits, and its exit does not fail the Pod. This cleanly solves the Job sidecar problem without shell hacks.
CronJob
CronJob creates Job objects on a time-based schedule. The CronJob controller runs in kube-controller-manager and periodically checks whether a new Job should be spawned based on the schedule and concurrency policy.
CronJob Spec
apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-report
namespace: analytics
spec:
# --- Schedule ---
schedule: "0 2 * * *" # Every day at 02:00
timeZone: "America/New_York" # Named timezone (GA 1.27); default UTC
# --- Concurrency ---
concurrencyPolicy: Forbid # Allow | Forbid | Replace
# --- Missed schedule deadline ---
startingDeadlineSeconds: 300 # Allow up to 5 min late start; nil = no deadline
# --- History ---
successfulJobsHistoryLimit: 3 # Keep last 3 successful Jobs (default: 3)
failingJobsHistoryLimit: 1 # Keep last 1 failed Job (default: 1)
# --- Suspension ---
suspend: false # Suspend scheduling (existing Jobs unaffected)
jobTemplate:
spec:
backoffLimit: 2
activeDeadlineSeconds: 3600
ttlSecondsAfterFinished: 86400
template:
spec:
restartPolicy: OnFailure
containers:
- name: reporter
image: analytics/reporter:v3
env:
- name: REPORT_DATE
value: "$(date -d 'yesterday' +%Y-%m-%d)"
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2
memory: 4Gi
Cron Schedule Syntax
| Field | Range | Special chars |
|---|---|---|
| Minute | 0β59 | * , - / |
| Hour | 0β23 | * , - / |
| Day of month | 1β31 | * , - / ? |
| Month | 1β12 or JANβDEC | * , - / |
| Day of week | 0β6 (Sun=0) or SUNβSAT | * , - / ? |
| Example schedule | Meaning |
|---|---|
0 * * * * | Every hour at :00 |
*/15 * * * * | Every 15 minutes |
0 2 * * * | Daily at 02:00 UTC |
0 9 * * 1-5 | Weekdays at 09:00 |
0 0 1 * * | Monthly, 1st day at midnight |
0 0 * * 0 | Every Sunday at midnight |
@hourly | Macro = 0 * * * * |
@daily | Macro = 0 0 * * * |
@weekly | Macro = 0 0 * * 0 |
@monthly | Macro = 0 0 1 * * |
Time Zones (GA 1.27)
Before 1.27, CronJobs always ran in UTC; teams worked around this by shifting schedule values manually. The timeZone field accepts IANA timezone names from the tz database.
spec:
schedule: "0 9 * * 1-5" # 09:00 local time, weekdays
timeZone: "Europe/Berlin" # CET/CEST automatically handled
# Common zones
# America/New_York UTC-5/UTC-4 (EST/EDT)
# America/Los_Angeles UTC-8/UTC-7 (PST/PDT)
# Asia/Tokyo UTC+9
# Australia/Sydney UTC+10/UTC+11
# UTC Always UTC (explicit is better than implicit)
When a clock change makes a time slot ambiguous (e.g., 02:30 appears twice during fall-back) or skipped (spring-forward), the CronJob controller uses the first occurrence. Schedules at midnight in DST-observing zones can shift by one hour seasonally β audit CronJobs for timezone-sensitive business logic.
Concurrency Policy
| Policy | Behavior when previous Job still running | Use case |
|---|---|---|
Allow (default) |
Create new Job anyway β multiple Jobs may run simultaneously | Independent periodic tasks; each run is isolated |
Forbid |
Skip this schedule tick; record a missed schedule | Non-reentrant jobs (DB maintenance, cache warm-up) |
Replace |
Delete the current running Job, create a new one | Stateless jobs that must always run on fresh data; old run is stale |
With
concurrencyPolicy: Replace, the in-flight Job is forcefully deleted (all its Pods are terminated) before the new Job starts. Any partially completed work is lost. Only use Replace when jobs are fully idempotent and partial runs have no side effects.
Missed Schedules & startingDeadlineSeconds
If the CronJob controller is unavailable (controller-manager downtime, cluster upgrade) or a Job is stuck, schedule ticks may be missed. The controller catches up by counting missed schedules since the last successful run.
If more than 100 schedule ticks have been missed since the last run (or since the CronJob was created), the controller stops scheduling entirely and logs:
Cannot determine if job needs to be started. Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew. This is a hard stop β the CronJob will not auto-recover. You must manually delete and recreate it, or update the schedule to reset the counter.
spec:
startingDeadlineSeconds: 300
# If the job cannot start within 5 minutes of its scheduled time,
# skip this occurrence and wait for the next schedule tick.
# null (default) = no deadline; may trigger the 100-missed-schedule trap
# if controller is down for many schedule periods
lastScheduleTime in the CronJob status shows when the last Job was successfully spawned. Use this to detect scheduling staleness:
# Check last schedule time
kubectl get cronjob nightly-report -o jsonpath='{.status.lastScheduleTime}'
# List all active Jobs owned by this CronJob
kubectl get jobs -l "batch.kubernetes.io/cronjob-name=nightly-report"
# Manually trigger a CronJob (create Job from template)
kubectl create job --from=cronjob/nightly-report nightly-report-manual-$(date +%s)
History Limits
CronJob prunes old Jobs to prevent unbounded accumulation. The controller keeps the N most recent Jobs of each type.
| Field | Default | Recommendation |
|---|---|---|
successfulJobsHistoryLimit | 3 | 3β10 for debugging; 0 to disable (not recommended for prod) |
failingJobsHistoryLimit | 1 | 3β5 to preserve failure logs; higher if jobs are long-lived |
When a Job is deleted (by history limits or TTL), its Pods and their logs are deleted too unless you have a log aggregation pipeline (Loki, Elasticsearch, Datadog). Always ship logs to external storage for batch jobs before relying on history limits for debugging.
Resource Management for Batch
PriorityClass for Batch Isolation
# Batch priority class β lower than production workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: batch-low
value: 100 # Production = 1000; system-cluster-critical = 2000000000
globalDefault: false
preemptionPolicy: Never # Batch should not preempt production pods
description: "Low-priority batch workloads"
---
# Interactive/short jobs
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: batch-high
value: 500
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "Time-sensitive batch jobs (e.g., SLA-bound reports)"
---
# Use in Job template
spec:
template:
spec:
priorityClassName: batch-low
Dedicated Batch Nodes
# Taint batch nodes to prevent non-batch workloads from landing there
kubectl taint nodes batch-node-1 batch=true:NoSchedule
kubectl taint nodes batch-node-2 batch=true:NoSchedule
# Label batch nodes
kubectl label nodes batch-node-1 workload-type=batch
kubectl label nodes batch-node-2 workload-type=batch
# Job template tolerations + nodeSelector for batch nodes
spec:
template:
spec:
tolerations:
- key: batch
operator: Equal
value: "true"
effect: NoSchedule
nodeSelector:
workload-type: batch
# Ensure batch pods don't get evicted for production
priorityClassName: batch-low
Right-sizing Batch Resources
Batch jobs often have predictable resource profiles. Over-requesting wastes cluster capacity; under-requesting causes OOM kills that count as failures.
resources:
requests:
cpu: "500m" # What the scheduler uses for placement
memory: "1Gi" # Set to p95 of observed usage
limits:
cpu: "4" # Generous CPU limit (throttling is recoverable)
memory: "2Gi" # Tight memory limit = OOM = pod failure
# Memory limit should be >= p99.9 of usage
For batch jobs processing variable-size inputs (e.g., large files), memory needs may vary per run. Either provision generously, use VPA recommendations (see VPA page), or implement input-size-based resource selection in your workflow engine. Pod OOM kills increment the backoff counter.
KEDA for Event-Driven Job Scaling
KEDA (Kubernetes Event-Driven Autoscaling) can trigger Jobs based on queue depth β creating zero Jobs when the queue is empty and scaling to N Jobs as messages accumulate. This complements CronJob (time-based) with event-driven batch execution.
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: queue-processor
namespace: platform
spec:
jobTargetRef:
parallelism: 5
completions: 5
backoffLimit: 3
template:
spec:
restartPolicy: Never
containers:
- name: processor
image: queue-processor:v2
resources:
requests:
cpu: 200m
memory: 256Mi
pollingInterval: 30 # Check queue every 30 seconds
maxReplicaCount: 50 # Max concurrent Jobs
scalingStrategy:
strategy: "accurate" # Create one Job per queue message
triggers:
- type: redis
metadata:
address: redis-svc.platform.svc.cluster.local:6379
listName: work-queue
listLength: "5" # One Job per 5 messages (batch processing)
Operational Commands
# --- Job operations ---
# List jobs with status
kubectl get jobs -n platform -o wide
# Watch a job complete
kubectl get job data-migration-v2 -w
# Get job status details
kubectl describe job data-migration-v2
# Get job completion ratio
kubectl get job data-migration-v2 \
-o jsonpath='{.status.succeeded}/{.spec.completions} succeeded'
# View logs from all pods of a job
kubectl logs -l job-name=data-migration-v2 --all-containers
# Delete finished jobs older than 1 day (manual cleanup)
kubectl get jobs -o json | jq -r \
'.items[] | select(.status.completionTime != null) |
select(now - (.status.completionTime | fromdateiso8601) > 86400) |
.metadata.name' | xargs -I{} kubectl delete job {}
# --- CronJob operations ---
# List cronjobs with schedule and last schedule time
kubectl get cronjobs -o wide
# Manually trigger a CronJob
kubectl create job --from=cronjob/nightly-report nightly-report-manual-$(date +%s)
# Suspend a CronJob (stop new jobs from being created)
kubectl patch cronjob nightly-report -p '{"spec":{"suspend":true}}'
# Resume a CronJob
kubectl patch cronjob nightly-report -p '{"spec":{"suspend":false}}'
# List Jobs created by a specific CronJob
kubectl get jobs -l "batch.kubernetes.io/cronjob-name=nightly-report"
# View CronJob events (missed schedules, etc.)
kubectl describe cronjob nightly-report
# --- Indexed Job debugging ---
# Get pods for each index
kubectl get pods -l job-name=process-shards -L batch.kubernetes.io/job-completion-index
# Get logs for specific index
kubectl logs -l batch.kubernetes.io/job-completion-index=3 -l job-name=process-shards
Job Status Fields
| Field | Type | Description |
|---|---|---|
status.active | int | Number of currently running pods |
status.succeeded | int | Number of successfully completed pods |
status.failed | int | Number of failed pods (total, not just toward backoffLimit) |
status.completedIndexes | string | Compact range notation of completed indexes (Indexed mode) |
status.failedIndexes | string | Compact range notation of failed indexes (with backoffLimitPerIndex) |
status.startTime | time | When the Job was acknowledged by the controller |
status.completionTime | time | When the Job entered terminal state (succeeded or failed) |
status.conditions | []Condition | Complete, Failed, Suspended, FailureTarget |
status.uncountedTerminatedPods | object | Pods that terminated but haven't been counted yet (transient) |
Metrics
| Metric | Labels | Use |
|---|---|---|
kube_job_status_active | job_name, namespace | Currently running pods in a Job |
kube_job_status_succeeded | job_name, namespace | Successful completions |
kube_job_status_failed | job_name, namespace | Cumulative failures |
kube_job_complete | job_name, condition | 1 when Job is complete (succeeded/failed) |
kube_cronjob_next_schedule_time | cronjob, namespace | Unix timestamp of next scheduled execution |
Alerting Rules
groups:
- name: jobs-cronjobs
rules:
# Job failed
- alert: JobFailed
expr: kube_job_status_conditions{condition="Failed",status="true"} > 0
for: 0m
labels:
severity: critical
annotations:
summary: "Job {{ $labels.namespace }}/{{ $labels.job_name }} has failed"
description: "Check logs: kubectl logs -l job-name={{ $labels.job_name }} -n {{ $labels.namespace }}"
# Job taking too long (no activeDeadlineSeconds set)
- alert: JobStalled
expr: |
kube_job_status_active > 0
and (time() - kube_job_status_start_time) > 7200
for: 5m
labels:
severity: warning
annotations:
summary: "Job {{ $labels.job_name }} has been running for > 2 hours"
# CronJob not scheduled on time
- alert: CronJobMissedSchedule
expr: |
time() - kube_cronjob_status_last_schedule_time > 3600
unless kube_cronjob_spec_suspend == 1
for: 5m
labels:
severity: warning
annotations:
summary: "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} missed last schedule"
# CronJob suspended unexpectedly
- alert: CronJobSuspended
expr: kube_cronjob_spec_suspend == 1
for: 1h
labels:
severity: info
annotations:
summary: "CronJob {{ $labels.cronjob }} has been suspended for > 1 hour"
Runbooks
Job Stuck / Not Completing
# 1. Check job status and conditions
kubectl describe job <job-name> -n <namespace>
# 2. Check pod states
kubectl get pods -l job-name=<job-name> -n <namespace>
# 3. Look at logs from failed pods
kubectl logs -l job-name=<job-name> --previous -n <namespace>
# 4. Check if backoffLimit exhausted
kubectl get job <job-name> -o jsonpath='{.status.failed}/{.spec.backoffLimit}'
# 5. Check for resource constraints (Pending pods)
kubectl describe pods -l job-name=<job-name> -n <namespace> | grep -A5 Events
CronJob Stopped Scheduling (100-missed limit)
# Check events
kubectl describe cronjob <name> -n <namespace> | grep -A20 Events
# If "Too many missed start time" error:
# Option 1: delete and recreate (loses history)
kubectl delete cronjob <name> -n <namespace>
kubectl apply -f cronjob.yaml
# Option 2: add/reduce startingDeadlineSeconds to reset the window
kubectl patch cronjob <name> -p '{"spec":{"startingDeadlineSeconds":300}}'
# Option 3: check for controller-manager issues
kubectl logs -n kube-system -l component=kube-controller-manager | grep cronjob
IndexedJob Index Stuck
# Which indexes are complete?
kubectl get job <job-name> -o jsonpath='{.status.completedIndexes}'
# Which indexes failed (with backoffLimitPerIndex)?
kubectl get job <job-name> -o jsonpath='{.status.failedIndexes}'
# Get pods for a specific index
kubectl get pods -l job-name=<job-name>,batch.kubernetes.io/job-completion-index=5
# Retry a specific index by setting backoffLimitPerIndex higher and re-patching
# (Must be done before the index is in terminal Failed state)
Manually Triggering a CronJob
# One-shot manual run from CronJob template
kubectl create job --from=cronjob/<cronjob-name> <cronjob-name>-manual-$(date +%s) \
-n <namespace>
# Monitor the manually triggered Job
kubectl get job -l "batch.kubernetes.io/cronjob-name=<cronjob-name>" \
-n <namespace> -w
Job Pod Failures Due to Node Preemption
# Check if DisruptionTarget condition is set on failed pods
kubectl get pods -l job-name=<job-name> -o json | \
jq '.items[].status.conditions[] | select(.type=="DisruptionTarget")'
# Solution: add podFailurePolicy to Ignore DisruptionTarget (see above)
# This prevents preemption from consuming backoffLimit retries
CronJob Job Not Appearing
# Check if CronJob is suspended
kubectl get cronjob <name> -o jsonpath='{.spec.suspend}'
# Check concurrencyPolicy blocking new jobs
kubectl get cronjob <name> -o jsonpath='{.spec.concurrencyPolicy}'
kubectl get jobs -l "batch.kubernetes.io/cronjob-name=<name>" --sort-by=.metadata.creationTimestamp
# Check controller-manager logs for scheduling decisions
kubectl logs -n kube-system -l component=kube-controller-manager --tail=200 | grep <name>
Best Practices
- Always set
activeDeadlineSecondsβ without it, a hung Job runs forever and its pods accumulate. Set it to 2β3Γ the expected runtime. - Use
ttlSecondsAfterFinishedto prevent Job accumulation. Even with CronJob history limits, standalone Jobs need explicit cleanup. - Set
backoffLimit: 0for non-idempotent operations (schema migrations, one-time data fixes). Retrying a migration that partially ran can corrupt data. - Add
podFailurePolicywithDisruptionTarget: Ignoreon any Job that runs on spot or preemptible nodes β prevents spot reclaims from wasting retry budget. - Use Indexed completion mode for sharded workloads β deterministic index-to-shard mapping eliminates distributed coordination overhead.
- Use
backoffLimitPerIndexfor large Indexed jobs β prevents one hot shard from exhausting global retries. - Never set
concurrencyPolicy: Replaceon stateful jobs β the in-flight job is deleted without waiting for graceful termination. Only safe for truly idempotent, stateless work. - Set explicit
timeZoneon CronJobs β UTC-only schedules create operational confusion for teams in non-UTC zones; DST errors cause missed or doubled runs.
Job vs CronJob vs Deployment
| Dimension | Job | CronJob | Deployment |
|---|---|---|---|
| Lifecycle | Run-to-completion | Recurring run-to-completion | Long-running (always on) |
| Pod restartPolicy | Never or OnFailure | Never or OnFailure | Always |
| Completion tracking | Yes (succeeded count) | Yes (per-Job) | No (desired replicas) |
| Scheduling | Immediate on creation | Time-based (cron) | Continuous |
| History | TTL or manual | successfulJobsHistoryLimit | ReplicaSet revisions |
| Parallelism | spec.parallelism | Via Job template | spec.replicas |
| Failure semantics | backoffLimit, podFailurePolicy | Inherited from Job | Restart controller |