Node Advanced Core File: 02-node-components/04-cri-interface.html

Container Runtime Interface (CRI)

The Container Runtime Interface is the gRPC API contract between the kubelet and the container runtime. It was introduced in Kubernetes 1.5, promoted to stable in 1.20, and became the only supported path into container runtimes when Dockershim was removed in Kubernetes 1.24. Every line of code the kubelet uses to start, stop, and inspect containers goes through CRI — no direct OCI calls, no Docker socket, only the gRPC services defined in api/core/v1/api.proto.

Why CRI Exists

Before CRI, the kubelet contained substantial Docker-specific code (Dockershim) and later gained partial CRI-O support via separate code paths. The coupling caused several problems:

Before CRI (pre-1.5)

  • Docker-specific calls hardcoded in kubelet
  • Each new runtime required kubelet code changes
  • Runtime bugs required Kubernetes releases to fix
  • Docker upgrade lockstep with Kubernetes version
  • No standard interface for testing or validation

After CRI (1.5+)

  • Single gRPC API, versioned independently
  • Runtimes ship their own shims; kubelet unchanged
  • Runtime bugs fixed in runtime releases
  • containerd, CRI-O, any future runtime hot-swappable
  • CRI conformance test suite validates correctness
Dockershim Removal Timeline

Dockershim was deprecated in Kubernetes 1.20 (Dec 2020) and removed in 1.24 (May 2022). Docker Engine itself still works — but now through the cri-dockerd adapter (maintained by Mirantis), which implements CRI and proxies to the Docker daemon. Most clusters migrated to containerd directly.

CRI Architecture

kubelet CRI Client PLEG Eviction Manager Image GC Volume Manager gRPC unix socket containerd (CRI plugin) RuntimeService RunPodSandbox, CreateContainer… ImageService PullImage, ListImages, RemoveImage Sandbox Manager Container Manager Image Store / Snapshotter CNI invocation OCI shim OCI Runtimes runc (default) crun kata-containers gVisor (runsc)

The kubelet's CRI client connects over a Unix domain socket (default /run/containerd/containerd.sock). All communication is gRPC, and the two CRI services — RuntimeService and ImageService — are defined in the same proto file. The runtime (containerd, CRI-O) implements a gRPC server. The kubelet is the only gRPC client.

The CRI Proto API

The canonical source is staging/src/k8s.io/cri-api/pkg/apis/runtime/v1/api.proto in the Kubernetes repository. The API has two services:

RuntimeService

Manages pod sandboxes and containers: lifecycle (create/start/stop/remove), exec, attach, port-forward, resource updates, stats.

ImageService

Manages images on the node: pull, list, status, remove, filesystem info.

RuntimeService RPC Methods

RPCDirectionPurposeKey Fields
RunPodSandboxkubelet → runtime Create and start a pod sandbox (pause container + network namespace) PodSandboxConfig: metadata, hostname, log_directory, dns_config, port_mappings, labels, annotations, linux (seccomp, sysctls, cgroup_parent)
StopPodSandboxkubelet → runtime Stop all containers in sandbox; sandbox transitions to NotReady pod_sandbox_id
RemovePodSandboxkubelet → runtime Remove sandbox and all its containers; release network namespace pod_sandbox_id
PodSandboxStatuskubelet → runtime Get sandbox status (Ready/NotReady), IP addresses, network info pod_sandbox_id, verbose
ListPodSandboxkubelet → runtime List sandboxes with optional filter by state/label PodSandboxFilter: id, state, label_selector
CreateContainerkubelet → runtime Create a container within an existing sandbox (does not start it) pod_sandbox_id, ContainerConfig: metadata, image, command, args, envs, mounts, devices, labels, annotations, linux
StartContainerkubelet → runtime Start a created container container_id
StopContainerkubelet → runtime Stop a running container (sends signal, waits timeout) container_id, timeout (seconds)
RemoveContainerkubelet → runtime Delete a container; must be stopped first container_id
ListContainerskubelet → runtime List containers with optional state/label filter ContainerFilter: id, state, pod_sandbox_id, label_selector
ContainerStatuskubelet → runtime Get container status, exit code, image, mounts container_id, verbose
UpdateContainerResourceskubelet → runtime In-place pod vertical scaling (GA 1.33): update CPU/memory limits without restart container_id, LinuxContainerResources
ExecSynckubelet → runtime Run a command in a container synchronously; returns stdout/stderr/exit code container_id, cmd[], timeout
Execkubelet → runtime Prepare a streaming endpoint for interactive exec (returns URL) container_id, cmd[], tty, stdin/stdout/stderr
Attachkubelet → runtime Prepare a streaming endpoint for attaching to container stdio container_id, stdin/stdout/stderr, tty
PortForwardkubelet → runtime Prepare a streaming endpoint for port-forwarding into sandbox pod_sandbox_id, port[]
ContainerStatskubelet → runtime Get CPU/memory stats for a single container container_id
ListContainerStatskubelet → runtime Get stats for all containers (used by cAdvisor / Metrics Server) ContainerStatsFilter
PodSandboxStatskubelet → runtime Get aggregated stats for a pod sandbox (added in v1alpha2) pod_sandbox_id
UpdateRuntimeConfigkubelet → runtime Push config updates (e.g., pod CIDR) to the runtime RuntimeConfig: network_config.pod_cidr
Statuskubelet → runtime Get runtime health status (RuntimeReady, NetworkReady conditions) verbose
GetContainerEventskubelet → runtime (stream) Evented PLEG: server-streaming RPC; runtime pushes container lifecycle events GetEventsRequest → stream of ContainerEventResponse
CheckpointContainerkubelet → runtime CRIU-based container checkpoint (alpha); used for forensics and live migration container_id, location

ImageService RPC Methods

RPCPurposeKey Fields
PullImage Pull an image from registry to node ImageSpec: image (ref), annotations; AuthConfig; PodSandboxConfig (for runtime class)
ListImages List images on node with digest and size ImageFilter
ImageStatus Get info about a specific image (exists/size/digest) ImageSpec, verbose
RemoveImage Remove an image from the node ImageSpec
ImageFsInfo Get filesystem usage for image store (used by eviction/GC)

Pod Sandbox Concept

The pod sandbox is the isolation boundary for a pod. It maps directly to the pause container (also called the "infra container"). The sandbox is created first; all application containers then join its namespaces.

Pod Sandbox (Linux namespaces + cgroup hierarchy) pause container holds namespaces net: eth0 (pod IP) ipc: shared IPC pid: (optional) uts: pod hostname mnt: /proc, /sys never exits app container A shares: net, ipc, uts own: mnt, pid app container B shares: net, ipc, uts own: mnt, pid sidecar container shares: net, ipc, uts own: mnt, pid Shared emptyDir / projected volumes (mounted into each container's mnt ns) plus: pod-level /etc/hosts, /etc/resolv.conf, service account token cgroup: kubepods/burstable/pod<uid>/ — all containers in same pod cgroup hierarchy

Sandbox + Container Startup Sequence

When the kubelet decides to start a pod, it calls CRI in a strict order:

1. RunPodSandbox
CNI ADD
2. PullImage (each)
3. CreateContainer (each)
4. StartContainer (each)
StepCRI CallWhat Happens Inside Runtime
1RunPodSandboxCreate Linux namespaces (net/ipc/uts), start pause container, invoke CNI ADD to assign pod IP, set up /etc/hosts, /etc/resolv.conf, create pod cgroup slice
2PullImage (per container)Check if image digest already in content store; if not, pull layers from registry, verify digests, unpack via snapshotter
3CreateContainerPrepare container rootfs (overlay snapshot), generate OCI spec (namespaces inherit from sandbox, apply seccomp/apparmor/capabilities), create shim process, do NOT start yet
4StartContainerShim calls runc create then runc start; container process runs; cgroup limits applied
Why Separate Create and Start?

The split allows the kubelet to set up pre-start hooks (postStart lifecycle hooks run synchronously before StartContainer returns control), inject init containers in order, and coordinate sidecar container startup ordering. The CREATED state also gives runtimes time to resolve filesystem setup before entering the running state.

Evented PLEG

The Pod Lifecycle Event Generator (PLEG) is how the kubelet learns about container state changes. The classic PLEG polls via ListContainers every second, diffing against a local cache. This causes O(n) CRI calls per second regardless of cluster activity, and becomes the bottleneck on nodes with hundreds of pods.

Classic PLEG (polling)

  • Relist interval: 1 second (configurable)
  • Calls ListContainers + ListPodSandbox each cycle
  • Diffs result against PLEG cache
  • Emits: ContainerStarted, ContainerDied, ContainerRemoved, ContainerChanged
  • Risk: "PLEG not healthy" if relist takes >3× relist period
  • High CPU under churn (many short-lived containers)

Evented PLEG (streaming)

  • Feature gate: EventedPLEG (beta 1.27, see KEP-3386)
  • Kubelet calls GetContainerEvents — server-streaming RPC
  • Runtime pushes events as they happen
  • Kubelet still does occasional reconciliation relists
  • Eliminates per-second full list on quiet nodes
  • Requires runtime support (containerd 1.7+, CRI-O 1.26+)

Evented PLEG Data Flow

// KubeletConfiguration: enable evented PLEG
featureGates:
  EventedPLEG: true

# kubelet invokes:
# GetContainerEvents(GetEventsRequest{}) -> stream ContainerEventResponse

# Runtime pushes on each lifecycle event:
ContainerEventResponse {
  container_id: "abc123..."
  container_event_type: CONTAINER_STARTED_EVENT   # or DIED / STOPPED
  created_at: <timestamp>
  pod_sandbox_status: PodSandboxStatusResponse{...}
}
Stream Reconnect Required

If the gRPC stream breaks (runtime restart, connection error), the kubelet must re-establish GetContainerEvents and perform a full reconciliation relist to catch any events missed during the gap. Implement readiness to handle this in production monitoring.

Streaming RPCs: Exec, Attach, Port-Forward

For interactive operations, CRI uses a two-phase pattern to avoid routing all I/O through the kubelet process itself:

kubectl exec
apiserver upgrade WebSocket
kubelet :10250/exec/…
CRI Exec RPC → streaming URL
SPDY/WebSocket stream direct to runtime
  1. kubelet calls Exec(ExecRequest) — the runtime prepares the command but does not run it yet, and returns a URL (e.g., http://localhost:PORT/exec/<token>)
  2. The kubelet's streaming server proxies the kubectl connection to that URL
  3. The runtime's streaming server handles I/O and TTY resize events directly over SPDY (SPDY/3.1 or WebSocket)
  4. The kubelet is not in the I/O data path — it just proxied the initial handshake
ExecSync vs Exec

ExecSync is used internally by the kubelet for liveness/readiness/startup exec probes — it runs the command to completion and returns stdout/stderr as bytes. Exec (non-sync) is used for interactive kubectl exec sessions. Both go through CRI; they are not the same RPC.

CRI API Versions

CRI VersionKubernetes VersionKey Changes
v1alpha11.5–1.18Initial CRI API; basic lifecycle RPCs
v1alpha21.18–1.24Added PodSandboxStats, ListPodSandboxStats, ReopenContainerLog
v1 (stable)1.20+ (GA 1.23)Promoted to stable; v1alpha2 removed in 1.26
v1 + Evented PLEG1.26–1.27Added GetContainerEvents streaming RPC
v1 + in-place resize1.27+UpdateContainerResources supports live CPU/memory update
v1 + checkpoint1.25+ alphaCheckpointContainer for CRIU-based forensics
v1alpha2 Removed in 1.26

Runtimes that only implemented v1alpha2 (containerd < 1.6, CRI-O < 1.24) will fail to connect to the kubelet in Kubernetes 1.26+. Always check runtime compatibility before upgrading the Kubernetes control plane.

Runtime Compatibility Matrix

KubernetescontainerdCRI-Ocri-dockerdNotes
1.241.6.x+1.24.x0.2.x+Dockershim removed; cri-dockerd required for Docker
1.251.6.x+1.25.x0.2.x+
1.261.6.x+1.26.x0.3.x+v1alpha2 CRI removed; requires CRI v1
1.271.7.x+1.27.x0.3.x+Evented PLEG beta (needs containerd 1.7+)
1.281.7.x+1.28.x0.3.x+Sidecar containers beta (KEP-753)
1.291.7.x+1.29.x0.3.x+
1.30+1.7.x+1.30.x0.3.x+In-place pod vertical scaling GA path

The Kubernetes project maintains a n-3 runtime version skew policy: a given Kubernetes version is tested with up to 3 prior minor versions of containerd/CRI-O. Staying within this window is required for a supported configuration.

CRI Socket Configuration

The kubelet discovers its CRI endpoint from the --container-runtime-endpoint flag (or containerRuntimeEndpoint in KubeletConfiguration).

# KubeletConfiguration (preferred over flags)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
containerRuntimeEndpoint: "unix:///run/containerd/containerd.sock"
imageServiceEndpoint: ""   # defaults to containerRuntimeEndpoint if empty
# imageServiceEndpoint can point to a separate service
# (e.g., a registry proxy that implements ImageService only)

Common Socket Paths

RuntimeDefault Socket PathNotes
containerd/run/containerd/containerd.sockPrimary path; can be overridden in config.toml
CRI-O/var/run/crio/crio.sockSymlink also at /run/crio/crio.sock
cri-dockerd/run/cri-dockerd.sockAdapter socket; Docker daemon must be running separately
Kata Containers (standalone)/run/vc/sbs/.../shim.sockPer-sandbox shimv2 sockets; containerd manages lifecycle
Socket Permissions

The CRI socket must be readable by the user running the kubelet (typically root). Do not make it world-readable — any process that can call the CRI socket has root-equivalent capabilities on the node (run arbitrary containers, exec into existing ones). Restrict socket permissions to 0600 or 0660 with a restricted group.

Dockershim Removal Deep Dive

Understanding what changed in 1.24 is critical for anyone operating older clusters or building migration tools.

ComponentBefore 1.24 (Dockershim)After 1.24 (Direct CRI)
kubelet → runtimeInternal shim translating CRI calls to Docker API callsDirect gRPC to containerd/CRI-O
Container metadataStored in Docker daemonStored in containerd/CRI-O metadata store
docker psShows all containers including k8s workloadsShows nothing (docker not used); use crictl ps
Image managementdocker images, docker pullcrictl images, ctr images
Log formatDocker JSON log formatCRI log format (same structure, different path)
Cgroup driverDocker controlled (cgroupfs default)containerd controls (must match kubelet cgroupDriver)
Mirror/insecure registryDocker daemon config (/etc/docker/daemon.json)containerd config (/etc/containerd/config.toml)
DinD (Docker in Docker)Shared docker socket; containers could escapeMust use --privileged with nested containerd or kaniko/buildkit

Container Log Path Convention

CRI standardizes where container logs are written on the node. The kubelet instructs the runtime where to write logs via the log_path field in ContainerConfig. The full path is:

/var/log/pods/<namespace>_<pod-name>_<pod-uid>/<container-name>/<restart-count>.log

Example:

/var/log/pods/kube-system_coredns-abc123-xyz_d4e5f6a7-b8c9-1234.../coredns/0.log
/var/log/pods/kube-system_coredns-abc123-xyz_d4e5f6a7-b8c9-1234.../coredns/1.log  # after restart

CRI Log Format

# Each line in the log file:
<RFC3339Nano timestamp> <stream: stdout|stderr> <tag: F|P> <log message>

# Examples:
2024-01-15T10:30:45.123456789Z stdout F Hello from container
2024-01-15T10:30:45.123456790Z stderr P This is a partial
2024-01-15T10:30:45.123456791Z stderr F line continued here

Tag values: F = full line (newline terminated), P = partial (line continues in next log entry). Log aggregators like Fluentd, Fluent Bit, and Vector parse this format natively.

Symlinks under /var/log/containers

The kubelet also creates symlinks at /var/log/containers/<pod-name>_<namespace>_<container-name>-<container-id>.log pointing to the CRI log path. Many legacy log shippers target /var/log/containers/*.log — these symlinks maintain compatibility.

CRI Conformance Testing

The Kubernetes project provides critest (part of the cri-tools project) to validate that a CRI implementation is correct.

# Install cri-tools
VERSION="v1.29.0"
curl -LO https://github.com/kubernetes-sigs/cri-tools/releases/download/${VERSION}/critest-${VERSION}-linux-amd64.tar.gz
tar -C /usr/local/bin -xzf critest-*.tar.gz

# Run conformance tests against containerd
sudo critest --runtime-endpoint unix:///run/containerd/containerd.sock

# Run specific test suite (e.g., only sandbox tests)
sudo critest --runtime-endpoint unix:///run/containerd/containerd.sock \
  --focus "Sandbox"

# Run image service tests
sudo critest --runtime-endpoint unix:///run/containerd/containerd.sock \
  --focus "Image"

The conformance suite tests all required CRI RPCs, container lifecycle state transitions, log path creation, exec and streaming, and resource limit enforcement.

crictl Command Reference

crictl is the debug CLI for the CRI. It speaks directly to the runtime's CRI socket, bypassing the kubelet entirely. See Container Runtime for full crictl usage. Key CRI-specific commands:

# Check runtime and network readiness (calls CRI Status RPC)
crictl info

# Inspect sandbox (RunPodSandbox metadata, IPs, network info)
crictl inspectp <pod-id>

# Inspect container (ContainerStatus, OCI spec, mounts)
crictl inspect <container-id>

# Execute a command via ExecSync
crictl exec -it <container-id> /bin/sh

# Get container stats (ListContainerStats RPC)
crictl stats

# Pull an image explicitly (PullImage RPC)
crictl pull nginx:1.25

# Check image filesystem usage (ImageFsInfo RPC)
crictl imagefsinfo

CRI-Level Debugging

gRPC Debug Logging

# Enable CRI gRPC debug in kubelet (very verbose — use temporarily)
# In KubeletConfiguration:
logging:
  verbosity: 5   # level 4+ shows CRI calls, level 5 shows full payloads

# Or via flag (deprecated path):
kubelet --v=5 ...

# Tail kubelet logs for CRI calls:
journalctl -u kubelet -f | grep -i "cri\|grpc\|runtime"

# containerd side: enable debug logging
# In /etc/containerd/config.toml:
[debug]
  level = "debug"

# Then restart containerd and tail:
journalctl -u containerd -f | grep -E "RunPodSandbox|CreateContainer|StartContainer"

Runtime Status Check

# CRI Status RPC — check RuntimeReady and NetworkReady
crictl info
# Expected output:
# "runtimeReady": true
# "networkReady": true

# If NetworkReady=false: CNI plugin is not installed or misconfigured
# If RuntimeReady=false: containerd/CRI-O itself has a problem

# kubelet uses Status RPC every nodeStatusUpdateFrequency (default 10s)
# RuntimeReady=false -> node condition RuntimeReady=False -> pods not scheduled

# Check node conditions reflecting CRI status:
kubectl get node <node-name> -o jsonpath='{.status.conditions}' | jq .

Orphaned Sandbox Cleanup

# List all sandboxes known to runtime (may include orphans not in kubelet cache)
crictl pods

# List orphaned sandboxes (stopped but not removed)
crictl pods --state notready

# Manually remove an orphaned sandbox (kubelet should do this automatically)
crictl stopp <pod-id>
crictl rmp <pod-id>

# Check containerd tasks (lower level — shim processes)
ctr -n k8s.io tasks list

RuntimeClass and CRI Interaction

RuntimeClass (see Container Runtime: RuntimeClass) maps to CRI via the runtimeHandler field in RunPodSandbox and CreateContainer:

# ContainerConfig message (proto excerpt):
message RunPodSandboxRequest {
  PodSandboxConfig config = 1;
  string runtime_handler = 2;   // e.g., "runc", "kata", "runsc"
}

# kubelet reads pod.spec.runtimeClassName ->
#   looks up RuntimeClass object ->
#   extracts handler name ->
#   passes as runtime_handler in RunPodSandbox RPC

# containerd resolves runtime_handler to a runtime entry in config.toml:
# [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."kata"]
#   runtime_type = "io.containerd.kata.v2"

If the runtime_handler is not registered in the runtime, RunPodSandbox returns UNIMPLEMENTED and the pod fails to start with a CreateContainerError.

In-Place Pod Vertical Scaling via CRI

KEP-1287 (In-Place Pod Vertical Scaling) reached GA in Kubernetes 1.33. It uses UpdateContainerResources to change CPU and memory limits without restarting the container:

# Pod spec with resizePolicy (controls restart behavior per resource)
spec:
  containers:
  - name: app
    resources:
      requests:
        cpu: "500m"
        memory: "256Mi"
      limits:
        cpu: "1"
        memory: "512Mi"
    resizePolicy:
    - resourceName: cpu
      restartPolicy: NotRequired   # CPU changes without restart
    - resourceName: memory
      restartPolicy: RestartContainer  # memory changes restart container
# Trigger in-place resize:
kubectl patch pod my-pod --subresource resize --type merge \
  -p '{"spec":{"containers":[{"name":"app","resources":{"requests":{"cpu":"750m"}}}]}}'

# kubelet calls UpdateContainerResources RPC with new LinuxContainerResources
# containerd updates cgroup limits via libcontainer without exec'ing new process
# Pod status reflects: "resize: InProgress" -> "resize: " (empty = accepted)

Container Checkpointing

The CheckpointContainer RPC (alpha, feature gate ContainerCheckpoint) enables CRIU-based checkpointing:

# Enable feature gates
# In KubeletConfiguration:
featureGates:
  ContainerCheckpoint: true

# Create a checkpoint via kubelet API:
curl -sk -X POST \
  "https://localhost:10250/checkpoint/<namespace>/<pod>/<container>" \
  --cert /var/lib/kubelet/pki/kubelet-client-current.pem \
  --key /var/lib/kubelet/pki/kubelet-client-current.pem

# Output: archive at /var/lib/kubelet/checkpoints/checkpoint-<name>-<ts>.tar
# Used for: forensics, debugging OOMKilled containers, live migration (future)

CRI Metrics

MetricTypeLabelsDescription
kubelet_runtime_operations_totalCounteroperation_typeTotal CRI operations by type (RunPodSandbox, CreateContainer, etc.)
kubelet_runtime_operations_errors_totalCounteroperation_typeFailed CRI operations — key alert metric
kubelet_runtime_operations_duration_secondsHistogramoperation_typeLatency per CRI operation type
kubelet_pleg_relist_duration_secondsHistogramTime to complete PLEG relist (classic PLEG)
kubelet_pleg_relist_interval_secondsHistogramInterval between relist starts (should be ~1s)
kubelet_pleg_last_seen_secondsGaugeTimestamp of last PLEG event seen (staleness indicator)
container_runtime_crio_operations_totalCounteroperationCRI-O side: total operations (if using CRI-O)
containerd_grpc_server_handled_totalCountergrpc_method, grpc_codecontainerd: gRPC calls by method and status code

Alerting Rules

# Alert: CRI operation errors elevated
- alert: KubeletCRIOperationErrors
  expr: |
    increase(kubelet_runtime_operations_errors_total[5m]) > 5
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "CRI operation errors on {{ $labels.node }}"
    description: "{{ $labels.operation_type }} errors: {{ $value }} in last 5m"

# Alert: PLEG relist taking too long (classic PLEG)
- alert: KubeletPLEGRelistDurationHigh
  expr: |
    histogram_quantile(0.99,
      rate(kubelet_pleg_relist_duration_seconds_bucket[5m])
    ) > 2
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "PLEG relist p99 > 2s on {{ $labels.node }}"
    description: "PLEG is under pressure; node may be running too many containers"

# Alert: PLEG not healthy (missed relist deadline)
- alert: KubeletPLEGNotHealthy
  expr: |
    (time() - kubelet_pleg_last_seen_seconds) > 180
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "PLEG not healthy on {{ $labels.node }}"
    description: "PLEG has not produced events for 3 minutes; node may be stuck"

# Alert: RunPodSandbox latency high
- alert: KubeletRunPodSandboxLatencyHigh
  expr: |
    histogram_quantile(0.99,
      rate(kubelet_runtime_operations_duration_seconds_bucket{
        operation_type="run_podsandbox"}[5m])
    ) > 5
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "RunPodSandbox p99 > 5s on {{ $labels.node }}"
    description: "Slow sandbox creation; check CNI plugin and containerd"

Troubleshooting Runbooks

CRI socket connection refused / kubelet fails to start
# Symptom: kubelet fails with "no such file or directory" on socket
# or "connection refused"

# 1. Check if runtime is running
systemctl status containerd   # or crio
crictl --runtime-endpoint unix:///run/containerd/containerd.sock info

# 2. Check kubelet CRI endpoint config
cat /var/lib/kubelet/config.yaml | grep -i runtime

# 3. Verify socket exists and permissions
ls -la /run/containerd/containerd.sock
# Expected: srw-rw---- 1 root root ... containerd.sock

# 4. If socket missing, containerd crashed
journalctl -u containerd -n 100 --no-pager
# Common causes: disk full, config.toml syntax error, missing snapshotter

# 5. Fix and restart
systemctl restart containerd
systemctl restart kubelet
Pod stuck in ContainerCreating — CRI CreateContainer failed
# Symptom: kubectl describe pod shows:
# Warning  Failed  CreateContainerError: ...

# 1. Get detailed event
kubectl describe pod <pod> -n <ns>

# 2. Check kubelet logs on the node
journalctl -u kubelet -n 200 | grep -i "create\|cri\|error"

# 3. Inspect sandbox first (must be Running)
crictl pods | grep <pod-name>
crictl inspectp <sandbox-id> | jq .status.state

# 4. Try to manually pull the image
crictl pull <image-ref>
# "unauthorized": check imagePullSecrets
# "not found": check image name/tag
# "tls": check registry TLS / insecure config in config.toml

# 5. Check OCI spec generation (runtime_handler mismatch)
crictl inspect <container-id> 2>&1 | head -30
# "runtime handler not found": check RuntimeClass handler name vs config.toml

# 6. Check containerd snapshotter issues
ctr -n k8s.io snapshots ls | grep <container-id>
PLEG not healthy — node marked NotReady
# Symptom: node condition KubeletNotReady with "PLEG is not healthy"

# 1. Check PLEG relist duration metric
kubectl get --raw "/api/v1/nodes/<node>/proxy/metrics" | \
  grep pleg_relist_duration

# 2. Count containers on node (PLEG cost is O(n))
crictl ps | wc -l
# If >200: node may be over capacity for classic PLEG

# 3. Check runtime responsiveness
time crictl pods >/dev/null
# If this takes >5s, runtime is under pressure

# 4. Check containerd goroutine count / memory
curl -s http://localhost:1338/debug/pprof/goroutine?debug=1 | head -20

# 5. Enable Evented PLEG if on containerd 1.7+
# Add to KubeletConfiguration:
# featureGates:
#   EventedPLEG: true

# 6. Immediate mitigation: restart kubelet (not containerd)
# Restarting containerd stops all containers on node
systemctl restart kubelet
kubectl exec fails with "error executing in container"
# Symptom: kubectl exec hangs or returns error

# 1. Check if it's a network routing issue (apiserver -> kubelet)
kubectl get node <node> -o wide   # verify node IP
curl -sk https://<node-ip>:10250/healthz

# 2. Test exec at CRI level directly
crictl exec -it <container-id> /bin/sh
# If this works: problem is in kubelet streaming server or apiserver proxy
# If this fails: problem is in the runtime's streaming server

# 3. Check streaming server address in kubelet
# kubelet must be reachable at its advertised address
kubectl get node <node> -o jsonpath='{.status.addresses}'

# 4. Check kubelet certificate SANs include node IP
openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -noout -text | grep "IP Address"

# 5. Firewall: ensure apiserver can reach node :10250
NetworkReady=false — CNI not configured
# Symptom: crictl info shows "networkReady": false
# Pods stuck in Pending or ContainerCreating with "network not ready"

# 1. Check CNI plugin installation
ls /opt/cni/bin/
ls /etc/cni/net.d/

# 2. Check containerd CRI plugin CNI config
grep -A5 "cni" /etc/containerd/config.toml
# Expected:
# [plugins."io.containerd.grpc.v1.cri".cni]
#   bin_dir = "/opt/cni/bin"
#   conf_dir = "/etc/cni/net.d"

# 3. If CNI config dir is empty, CNI DaemonSet not deployed
# Deploy your CNI (Calico, Cilium, Flannel etc.)
kubectl get pods -n kube-system | grep -E "calico|cilium|flannel"

# 4. After CNI deploys, runtime auto-detects CNI config
# NetworkReady should flip to true within 30s
crictl info | grep networkReady

Production Best Practices

  1. Pin runtime versions — use containerd 1.7.x or CRI-O 1.27.x and update deliberately; runtime bugs directly affect node stability. Subscribe to runtime security advisories.
  2. Enable Evented PLEG on nodes with >100 pods if using containerd 1.7+ — reduces per-second CRI polling from O(pods) to near-zero on quiet nodes.
  3. Restrict CRI socket permissionschmod 0660 and assign to a dedicated group. Never bind-mount the socket into pods; it grants host escape.
  4. Match cgroup driver — containerd SystemdCgroup = true must match kubelet cgroupDriver: systemd. A mismatch causes containers to be placed in wrong cgroups and makes eviction unreliable. Verify with crictl info | grep -i cgroup.
  5. Monitor kubelet_runtime_operations_errors_total — alert on any sustained increase in CRI errors. A spike in run_podsandbox errors often indicates CNI or disk pressure before the node goes NotReady.
  6. Test CRI upgrades with critest — run the CRI conformance suite against a new runtime version in staging before rolling to production. Breaking changes in streaming or log path handling will silently break kubectl exec/logs.
  7. Use imageServiceEndpoint separation when pre-warming images — run a separate image service proxy that pre-pulls images during node provisioning, keeping the main runtime socket for workload operations only.
  8. Pre-pull critical images — use imagePullPolicy: IfNotPresent with node-level image pre-pull DaemonSets for startup-critical images. This ensures PullImage never blocks pod startup on the critical path.
  9. Handle cri-dockerd carefully — if your cluster still uses Docker Engine via cri-dockerd, remember it adds a translation layer (kubelet → cri-dockerd → Docker API → containerd). Debugging requires checking both cri-dockerd and Docker daemon logs.
  10. Audit RuntimeClass handlers — misconfigured runtimeHandler strings silently cause CreateContainerError. Maintain an inventory of registered handlers in each cluster's containerd config.toml and validate with crictl info verbose output.