Node Advanced Core File: 02-node-components/04-cri-interface.html

Container Runtime Interface (CRI)

The Container Runtime Interface is the gRPC API contract between the kubelet and the container runtime. It was introduced in Kubernetes 1.5, promoted to stable in 1.20, and became the only supported path into container runtimes when Dockershim was removed in Kubernetes 1.24. Every line of code the kubelet uses to start, stop, and inspect containers goes through CRI — no direct OCI calls, no Docker socket, only the gRPC services defined in api/core/v1/api.proto.

Why CRI Exists

Before CRI, the kubelet contained substantial Docker-specific code (Dockershim) and later gained partial CRI-O support via separate code paths. The coupling caused several problems:

Before CRI (pre-1.5)

Docker-specific calls hardcoded in kubelet
Each new runtime required kubelet code changes
Runtime bugs required Kubernetes releases to fix
Docker upgrade lockstep with Kubernetes version
No standard interface for testing or validation

After CRI (1.5+)

Single gRPC API, versioned independently
Runtimes ship their own shims; kubelet unchanged
Runtime bugs fixed in runtime releases
containerd, CRI-O, any future runtime hot-swappable
CRI conformance test suite validates correctness

Dockershim Removal Timeline

Dockershim was deprecated in Kubernetes 1.20 (Dec 2020) and removed in 1.24 (May 2022). Docker Engine itself still works — but now through the cri-dockerd adapter (maintained by Mirantis), which implements CRI and proxies to the Docker daemon. Most clusters migrated to containerd directly.

CRI Architecture

The kubelet's CRI client connects over a Unix domain socket (default /run/containerd/containerd.sock). All communication is gRPC, and the two CRI services — RuntimeService and ImageService — are defined in the same proto file. The runtime (containerd, CRI-O) implements a gRPC server. The kubelet is the only gRPC client.

The CRI Proto API

The canonical source is staging/src/k8s.io/cri-api/pkg/apis/runtime/v1/api.proto in the Kubernetes repository. The API has two services:

RuntimeService

Manages pod sandboxes and containers: lifecycle (create/start/stop/remove), exec, attach, port-forward, resource updates, stats.

ImageService

Manages images on the node: pull, list, status, remove, filesystem info.

RuntimeService RPC Methods

RPC	Direction	Purpose	Key Fields
`RunPodSandbox`	kubelet → runtime	Create and start a pod sandbox (pause container + network namespace)	`PodSandboxConfig`: metadata, hostname, log_directory, dns_config, port_mappings, labels, annotations, linux (seccomp, sysctls, cgroup_parent)
`StopPodSandbox`	kubelet → runtime	Stop all containers in sandbox; sandbox transitions to NotReady	`pod_sandbox_id`
`RemovePodSandbox`	kubelet → runtime	Remove sandbox and all its containers; release network namespace	`pod_sandbox_id`
`PodSandboxStatus`	kubelet → runtime	Get sandbox status (Ready/NotReady), IP addresses, network info	`pod_sandbox_id`, `verbose`
`ListPodSandbox`	kubelet → runtime	List sandboxes with optional filter by state/label	`PodSandboxFilter`: id, state, label_selector
`CreateContainer`	kubelet → runtime	Create a container within an existing sandbox (does not start it)	`pod_sandbox_id`, `ContainerConfig`: metadata, image, command, args, envs, mounts, devices, labels, annotations, linux
`StartContainer`	kubelet → runtime	Start a created container	`container_id`
`StopContainer`	kubelet → runtime	Stop a running container (sends signal, waits timeout)	`container_id`, `timeout` (seconds)
`RemoveContainer`	kubelet → runtime	Delete a container; must be stopped first	`container_id`
`ListContainers`	kubelet → runtime	List containers with optional state/label filter	`ContainerFilter`: id, state, pod_sandbox_id, label_selector
`ContainerStatus`	kubelet → runtime	Get container status, exit code, image, mounts	`container_id`, `verbose`
`UpdateContainerResources`	kubelet → runtime	In-place pod vertical scaling (GA 1.33): update CPU/memory limits without restart	`container_id`, `LinuxContainerResources`
`ExecSync`	kubelet → runtime	Run a command in a container synchronously; returns stdout/stderr/exit code	`container_id`, `cmd[]`, `timeout`
`Exec`	kubelet → runtime	Prepare a streaming endpoint for interactive exec (returns URL)	`container_id`, `cmd[]`, `tty`, `stdin/stdout/stderr`
`Attach`	kubelet → runtime	Prepare a streaming endpoint for attaching to container stdio	`container_id`, `stdin/stdout/stderr`, `tty`
`PortForward`	kubelet → runtime	Prepare a streaming endpoint for port-forwarding into sandbox	`pod_sandbox_id`, `port[]`
`ContainerStats`	kubelet → runtime	Get CPU/memory stats for a single container	`container_id`
`ListContainerStats`	kubelet → runtime	Get stats for all containers (used by cAdvisor / Metrics Server)	`ContainerStatsFilter`
`PodSandboxStats`	kubelet → runtime	Get aggregated stats for a pod sandbox (added in v1alpha2)	`pod_sandbox_id`
`UpdateRuntimeConfig`	kubelet → runtime	Push config updates (e.g., pod CIDR) to the runtime	`RuntimeConfig`: network_config.pod_cidr
`Status`	kubelet → runtime	Get runtime health status (RuntimeReady, NetworkReady conditions)	`verbose`
`GetContainerEvents`	kubelet → runtime (stream)	Evented PLEG: server-streaming RPC; runtime pushes container lifecycle events	`GetEventsRequest` → stream of `ContainerEventResponse`
`CheckpointContainer`	kubelet → runtime	CRIU-based container checkpoint (alpha); used for forensics and live migration	`container_id`, `location`

ImageService RPC Methods

RPC	Purpose	Key Fields
`PullImage`	Pull an image from registry to node	`ImageSpec`: image (ref), annotations; `AuthConfig`; `PodSandboxConfig` (for runtime class)
`ListImages`	List images on node with digest and size	`ImageFilter`
`ImageStatus`	Get info about a specific image (exists/size/digest)	`ImageSpec`, `verbose`
`RemoveImage`	Remove an image from the node	`ImageSpec`
`ImageFsInfo`	Get filesystem usage for image store (used by eviction/GC)	—

Pod Sandbox Concept

The pod sandbox is the isolation boundary for a pod. It maps directly to the pause container (also called the "infra container"). The sandbox is created first; all application containers then join its namespaces.

Sandbox + Container Startup Sequence

When the kubelet decides to start a pod, it calls CRI in a strict order:

1. RunPodSandbox

→

CNI ADD

→

2. PullImage (each)

→

3. CreateContainer (each)

→

4. StartContainer (each)

Step	CRI Call	What Happens Inside Runtime
1	`RunPodSandbox`	Create Linux namespaces (net/ipc/uts), start pause container, invoke CNI ADD to assign pod IP, set up `/etc/hosts`, `/etc/resolv.conf`, create pod cgroup slice
2	`PullImage` (per container)	Check if image digest already in content store; if not, pull layers from registry, verify digests, unpack via snapshotter
3	`CreateContainer`	Prepare container rootfs (overlay snapshot), generate OCI spec (namespaces inherit from sandbox, apply seccomp/apparmor/capabilities), create shim process, do NOT start yet
4	`StartContainer`	Shim calls `runc create` then `runc start`; container process runs; cgroup limits applied

Why Separate Create and Start?

The split allows the kubelet to set up pre-start hooks (postStart lifecycle hooks run synchronously before StartContainer returns control), inject init containers in order, and coordinate sidecar container startup ordering. The CREATED state also gives runtimes time to resolve filesystem setup before entering the running state.

Evented PLEG

The Pod Lifecycle Event Generator (PLEG) is how the kubelet learns about container state changes. The classic PLEG polls via ListContainers every second, diffing against a local cache. This causes O(n) CRI calls per second regardless of cluster activity, and becomes the bottleneck on nodes with hundreds of pods.

Classic PLEG (polling)

Relist interval: 1 second (configurable)
Calls ListContainers + ListPodSandbox each cycle
Diffs result against PLEG cache
Emits: ContainerStarted, ContainerDied, ContainerRemoved, ContainerChanged
Risk: "PLEG not healthy" if relist takes >3× relist period
High CPU under churn (many short-lived containers)

Evented PLEG (streaming)

Feature gate: EventedPLEG (beta 1.27, see KEP-3386)
Kubelet calls GetContainerEvents — server-streaming RPC
Runtime pushes events as they happen
Kubelet still does occasional reconciliation relists
Eliminates per-second full list on quiet nodes
Requires runtime support (containerd 1.7+, CRI-O 1.26+)

Evented PLEG Data Flow

// KubeletConfiguration: enable evented PLEG
featureGates:
  EventedPLEG: true

# kubelet invokes:
# GetContainerEvents(GetEventsRequest{}) -> stream ContainerEventResponse

# Runtime pushes on each lifecycle event:
ContainerEventResponse {
  container_id: "abc123..."
  container_event_type: CONTAINER_STARTED_EVENT   # or DIED / STOPPED
  created_at: <timestamp>
  pod_sandbox_status: PodSandboxStatusResponse{...}
}

Stream Reconnect Required

If the gRPC stream breaks (runtime restart, connection error), the kubelet must re-establish GetContainerEvents and perform a full reconciliation relist to catch any events missed during the gap. Implement readiness to handle this in production monitoring.

Streaming RPCs: Exec, Attach, Port-Forward

For interactive operations, CRI uses a two-phase pattern to avoid routing all I/O through the kubelet process itself:

kubectl exec

→

apiserver upgrade WebSocket

→

kubelet :10250/exec/…

→

CRI Exec RPC → streaming URL

→

SPDY/WebSocket stream direct to runtime

kubelet calls Exec(ExecRequest) — the runtime prepares the command but does not run it yet, and returns a URL (e.g., http://localhost:PORT/exec/<token>)
The kubelet's streaming server proxies the kubectl connection to that URL
The runtime's streaming server handles I/O and TTY resize events directly over SPDY (SPDY/3.1 or WebSocket)
The kubelet is not in the I/O data path — it just proxied the initial handshake

ExecSync vs Exec

ExecSync is used internally by the kubelet for liveness/readiness/startup exec probes — it runs the command to completion and returns stdout/stderr as bytes. Exec (non-sync) is used for interactive kubectl exec sessions. Both go through CRI; they are not the same RPC.

CRI API Versions

CRI Version	Kubernetes Version	Key Changes
`v1alpha1`	1.5–1.18	Initial CRI API; basic lifecycle RPCs
`v1alpha2`	1.18–1.24	Added `PodSandboxStats`, `ListPodSandboxStats`, `ReopenContainerLog`
`v1` (stable)	1.20+ (GA 1.23)	Promoted to stable; `v1alpha2` removed in 1.26
`v1` + Evented PLEG	1.26–1.27	Added `GetContainerEvents` streaming RPC
`v1` + in-place resize	1.27+	`UpdateContainerResources` supports live CPU/memory update
`v1` + checkpoint	1.25+ alpha	`CheckpointContainer` for CRIU-based forensics

v1alpha2 Removed in 1.26

Runtimes that only implemented v1alpha2 (containerd < 1.6, CRI-O < 1.24) will fail to connect to the kubelet in Kubernetes 1.26+. Always check runtime compatibility before upgrading the Kubernetes control plane.

Runtime Compatibility Matrix

Kubernetes	containerd	CRI-O	cri-dockerd	Notes
1.24	1.6.x+	1.24.x	0.2.x+	Dockershim removed; cri-dockerd required for Docker
1.25	1.6.x+	1.25.x	0.2.x+
1.26	1.6.x+	1.26.x	0.3.x+	`v1alpha2` CRI removed; requires CRI v1
1.27	1.7.x+	1.27.x	0.3.x+	Evented PLEG beta (needs containerd 1.7+)
1.28	1.7.x+	1.28.x	0.3.x+	Sidecar containers beta (KEP-753)
1.29	1.7.x+	1.29.x	0.3.x+
1.30+	1.7.x+	1.30.x	0.3.x+	In-place pod vertical scaling GA path

The Kubernetes project maintains a n-3 runtime version skew policy: a given Kubernetes version is tested with up to 3 prior minor versions of containerd/CRI-O. Staying within this window is required for a supported configuration.

CRI Socket Configuration

The kubelet discovers its CRI endpoint from the --container-runtime-endpoint flag (or containerRuntimeEndpoint in KubeletConfiguration).

# KubeletConfiguration (preferred over flags)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
containerRuntimeEndpoint: "unix:///run/containerd/containerd.sock"
imageServiceEndpoint: ""   # defaults to containerRuntimeEndpoint if empty
# imageServiceEndpoint can point to a separate service
# (e.g., a registry proxy that implements ImageService only)

Common Socket Paths

Runtime	Default Socket Path	Notes
containerd	`/run/containerd/containerd.sock`	Primary path; can be overridden in config.toml
CRI-O	`/var/run/crio/crio.sock`	Symlink also at `/run/crio/crio.sock`
cri-dockerd	`/run/cri-dockerd.sock`	Adapter socket; Docker daemon must be running separately
Kata Containers (standalone)	`/run/vc/sbs/.../shim.sock`	Per-sandbox shimv2 sockets; containerd manages lifecycle

Socket Permissions

The CRI socket must be readable by the user running the kubelet (typically root). Do not make it world-readable — any process that can call the CRI socket has root-equivalent capabilities on the node (run arbitrary containers, exec into existing ones). Restrict socket permissions to 0600 or 0660 with a restricted group.

Dockershim Removal Deep Dive

Understanding what changed in 1.24 is critical for anyone operating older clusters or building migration tools.

Component	Before 1.24 (Dockershim)	After 1.24 (Direct CRI)
kubelet → runtime	Internal shim translating CRI calls to Docker API calls	Direct gRPC to containerd/CRI-O
Container metadata	Stored in Docker daemon	Stored in containerd/CRI-O metadata store
`docker ps`	Shows all containers including k8s workloads	Shows nothing (docker not used); use `crictl ps`
Image management	`docker images`, `docker pull`	`crictl images`, `ctr images`
Log format	Docker JSON log format	CRI log format (same structure, different path)
Cgroup driver	Docker controlled (cgroupfs default)	containerd controls (must match kubelet `cgroupDriver`)
Mirror/insecure registry	Docker daemon config (`/etc/docker/daemon.json`)	containerd config (`/etc/containerd/config.toml`)
DinD (Docker in Docker)	Shared docker socket; containers could escape	Must use `--privileged` with nested containerd or kaniko/buildkit

Container Log Path Convention

CRI standardizes where container logs are written on the node. The kubelet instructs the runtime where to write logs via the log_path field in ContainerConfig. The full path is:

/var/log/pods/<namespace>_<pod-name>_<pod-uid>/<container-name>/<restart-count>.log

Example:

/var/log/pods/kube-system_coredns-abc123-xyz_d4e5f6a7-b8c9-1234.../coredns/0.log
/var/log/pods/kube-system_coredns-abc123-xyz_d4e5f6a7-b8c9-1234.../coredns/1.log  # after restart

CRI Log Format

# Each line in the log file:
<RFC3339Nano timestamp> <stream: stdout|stderr> <tag: F|P> <log message>

# Examples:
2024-01-15T10:30:45.123456789Z stdout F Hello from container
2024-01-15T10:30:45.123456790Z stderr P This is a partial
2024-01-15T10:30:45.123456791Z stderr F line continued here

Tag values: F = full line (newline terminated), P = partial (line continues in next log entry). Log aggregators like Fluentd, Fluent Bit, and Vector parse this format natively.

Symlinks under /var/log/containers

The kubelet also creates symlinks at /var/log/containers/<pod-name>_<namespace>_<container-name>-<container-id>.log pointing to the CRI log path. Many legacy log shippers target /var/log/containers/*.log — these symlinks maintain compatibility.

CRI Conformance Testing

The Kubernetes project provides critest (part of the cri-tools project) to validate that a CRI implementation is correct.

# Install cri-tools
VERSION="v1.29.0"
curl -LO https://github.com/kubernetes-sigs/cri-tools/releases/download/${VERSION}/critest-${VERSION}-linux-amd64.tar.gz
tar -C /usr/local/bin -xzf critest-*.tar.gz

# Run conformance tests against containerd
sudo critest --runtime-endpoint unix:///run/containerd/containerd.sock

# Run specific test suite (e.g., only sandbox tests)
sudo critest --runtime-endpoint unix:///run/containerd/containerd.sock \
  --focus "Sandbox"

# Run image service tests
sudo critest --runtime-endpoint unix:///run/containerd/containerd.sock \
  --focus "Image"

The conformance suite tests all required CRI RPCs, container lifecycle state transitions, log path creation, exec and streaming, and resource limit enforcement.

crictl Command Reference

crictl is the debug CLI for the CRI. It speaks directly to the runtime's CRI socket, bypassing the kubelet entirely. See Container Runtime for full crictl usage. Key CRI-specific commands:

# Check runtime and network readiness (calls CRI Status RPC)
crictl info

# Inspect sandbox (RunPodSandbox metadata, IPs, network info)
crictl inspectp <pod-id>

# Inspect container (ContainerStatus, OCI spec, mounts)
crictl inspect <container-id>

# Execute a command via ExecSync
crictl exec -it <container-id> /bin/sh

# Get container stats (ListContainerStats RPC)
crictl stats

# Pull an image explicitly (PullImage RPC)
crictl pull nginx:1.25

# Check image filesystem usage (ImageFsInfo RPC)
crictl imagefsinfo

CRI-Level Debugging

gRPC Debug Logging

# Enable CRI gRPC debug in kubelet (very verbose — use temporarily)
# In KubeletConfiguration:
logging:
  verbosity: 5   # level 4+ shows CRI calls, level 5 shows full payloads

# Or via flag (deprecated path):
kubelet --v=5 ...

# Tail kubelet logs for CRI calls:
journalctl -u kubelet -f | grep -i "cri\|grpc\|runtime"

# containerd side: enable debug logging
# In /etc/containerd/config.toml:
[debug]
  level = "debug"

# Then restart containerd and tail:
journalctl -u containerd -f | grep -E "RunPodSandbox|CreateContainer|StartContainer"

Runtime Status Check

# CRI Status RPC — check RuntimeReady and NetworkReady
crictl info
# Expected output:
# "runtimeReady": true
# "networkReady": true

# If NetworkReady=false: CNI plugin is not installed or misconfigured
# If RuntimeReady=false: containerd/CRI-O itself has a problem

# kubelet uses Status RPC every nodeStatusUpdateFrequency (default 10s)
# RuntimeReady=false -> node condition RuntimeReady=False -> pods not scheduled

# Check node conditions reflecting CRI status:
kubectl get node <node-name> -o jsonpath='{.status.conditions}' | jq .

Orphaned Sandbox Cleanup

# List all sandboxes known to runtime (may include orphans not in kubelet cache)
crictl pods

# List orphaned sandboxes (stopped but not removed)
crictl pods --state notready

# Manually remove an orphaned sandbox (kubelet should do this automatically)
crictl stopp <pod-id>
crictl rmp <pod-id>

# Check containerd tasks (lower level — shim processes)
ctr -n k8s.io tasks list

RuntimeClass and CRI Interaction

RuntimeClass (see Container Runtime: RuntimeClass) maps to CRI via the runtimeHandler field in RunPodSandbox and CreateContainer:

# ContainerConfig message (proto excerpt):
message RunPodSandboxRequest {
  PodSandboxConfig config = 1;
  string runtime_handler = 2;   // e.g., "runc", "kata", "runsc"
}

# kubelet reads pod.spec.runtimeClassName ->
#   looks up RuntimeClass object ->
#   extracts handler name ->
#   passes as runtime_handler in RunPodSandbox RPC

# containerd resolves runtime_handler to a runtime entry in config.toml:
# [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."kata"]
#   runtime_type = "io.containerd.kata.v2"

If the runtime_handler is not registered in the runtime, RunPodSandbox returns UNIMPLEMENTED and the pod fails to start with a CreateContainerError.

In-Place Pod Vertical Scaling via CRI

KEP-1287 (In-Place Pod Vertical Scaling) reached GA in Kubernetes 1.33. It uses UpdateContainerResources to change CPU and memory limits without restarting the container:

# Pod spec with resizePolicy (controls restart behavior per resource)
spec:
  containers:
  - name: app
    resources:
      requests:
        cpu: "500m"
        memory: "256Mi"
      limits:
        cpu: "1"
        memory: "512Mi"
    resizePolicy:
    - resourceName: cpu
      restartPolicy: NotRequired   # CPU changes without restart
    - resourceName: memory
      restartPolicy: RestartContainer  # memory changes restart container

# Trigger in-place resize:
kubectl patch pod my-pod --subresource resize --type merge \
  -p '{"spec":{"containers":[{"name":"app","resources":{"requests":{"cpu":"750m"}}}]}}'

# kubelet calls UpdateContainerResources RPC with new LinuxContainerResources
# containerd updates cgroup limits via libcontainer without exec'ing new process
# Pod status reflects: "resize: InProgress" -> "resize: " (empty = accepted)

Container Checkpointing

The CheckpointContainer RPC (alpha, feature gate ContainerCheckpoint) enables CRIU-based checkpointing:

# Enable feature gates
# In KubeletConfiguration:
featureGates:
  ContainerCheckpoint: true

# Create a checkpoint via kubelet API:
curl -sk -X POST \
  "https://localhost:10250/checkpoint/<namespace>/<pod>/<container>" \
  --cert /var/lib/kubelet/pki/kubelet-client-current.pem \
  --key /var/lib/kubelet/pki/kubelet-client-current.pem

# Output: archive at /var/lib/kubelet/checkpoints/checkpoint-<name>-<ts>.tar
# Used for: forensics, debugging OOMKilled containers, live migration (future)

CRI Metrics

Metric	Type	Labels	Description
`kubelet_runtime_operations_total`	Counter	`operation_type`	Total CRI operations by type (RunPodSandbox, CreateContainer, etc.)
`kubelet_runtime_operations_errors_total`	Counter	`operation_type`	Failed CRI operations — key alert metric
`kubelet_runtime_operations_duration_seconds`	Histogram	`operation_type`	Latency per CRI operation type
`kubelet_pleg_relist_duration_seconds`	Histogram	—	Time to complete PLEG relist (classic PLEG)
`kubelet_pleg_relist_interval_seconds`	Histogram	—	Interval between relist starts (should be ~1s)
`kubelet_pleg_last_seen_seconds`	Gauge	—	Timestamp of last PLEG event seen (staleness indicator)
`container_runtime_crio_operations_total`	Counter	`operation`	CRI-O side: total operations (if using CRI-O)
`containerd_grpc_server_handled_total`	Counter	`grpc_method`, `grpc_code`	containerd: gRPC calls by method and status code

Alerting Rules

# Alert: CRI operation errors elevated
- alert: KubeletCRIOperationErrors
  expr: |
    increase(kubelet_runtime_operations_errors_total[5m]) > 5
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "CRI operation errors on {{ $labels.node }}"
    description: "{{ $labels.operation_type }} errors: {{ $value }} in last 5m"

# Alert: PLEG relist taking too long (classic PLEG)
- alert: KubeletPLEGRelistDurationHigh
  expr: |
    histogram_quantile(0.99,
      rate(kubelet_pleg_relist_duration_seconds_bucket[5m])
    ) > 2
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "PLEG relist p99 > 2s on {{ $labels.node }}"
    description: "PLEG is under pressure; node may be running too many containers"

# Alert: PLEG not healthy (missed relist deadline)
- alert: KubeletPLEGNotHealthy
  expr: |
    (time() - kubelet_pleg_last_seen_seconds) > 180
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "PLEG not healthy on {{ $labels.node }}"
    description: "PLEG has not produced events for 3 minutes; node may be stuck"

# Alert: RunPodSandbox latency high
- alert: KubeletRunPodSandboxLatencyHigh
  expr: |
    histogram_quantile(0.99,
      rate(kubelet_runtime_operations_duration_seconds_bucket{
        operation_type="run_podsandbox"}[5m])
    ) > 5
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "RunPodSandbox p99 > 5s on {{ $labels.node }}"
    description: "Slow sandbox creation; check CNI plugin and containerd"

Troubleshooting Runbooks

CRI socket connection refused / kubelet fails to start

# Symptom: kubelet fails with "no such file or directory" on socket
# or "connection refused"

# 1. Check if runtime is running
systemctl status containerd   # or crio
crictl --runtime-endpoint unix:///run/containerd/containerd.sock info

# 2. Check kubelet CRI endpoint config
cat /var/lib/kubelet/config.yaml | grep -i runtime

# 3. Verify socket exists and permissions
ls -la /run/containerd/containerd.sock
# Expected: srw-rw---- 1 root root ... containerd.sock

# 4. If socket missing, containerd crashed
journalctl -u containerd -n 100 --no-pager
# Common causes: disk full, config.toml syntax error, missing snapshotter

# 5. Fix and restart
systemctl restart containerd
systemctl restart kubelet

Pod stuck in ContainerCreating — CRI CreateContainer failed

# Symptom: kubectl describe pod shows:
# Warning  Failed  CreateContainerError: ...

# 1. Get detailed event
kubectl describe pod <pod> -n <ns>

# 2. Check kubelet logs on the node
journalctl -u kubelet -n 200 | grep -i "create\|cri\|error"

# 3. Inspect sandbox first (must be Running)
crictl pods | grep <pod-name>
crictl inspectp <sandbox-id> | jq .status.state

# 4. Try to manually pull the image
crictl pull <image-ref>
# "unauthorized": check imagePullSecrets
# "not found": check image name/tag
# "tls": check registry TLS / insecure config in config.toml

# 5. Check OCI spec generation (runtime_handler mismatch)
crictl inspect <container-id> 2>&1 | head -30
# "runtime handler not found": check RuntimeClass handler name vs config.toml

# 6. Check containerd snapshotter issues
ctr -n k8s.io snapshots ls | grep <container-id>

PLEG not healthy — node marked NotReady

# Symptom: node condition KubeletNotReady with "PLEG is not healthy"

# 1. Check PLEG relist duration metric
kubectl get --raw "/api/v1/nodes/<node>/proxy/metrics" | \
  grep pleg_relist_duration

# 2. Count containers on node (PLEG cost is O(n))
crictl ps | wc -l
# If >200: node may be over capacity for classic PLEG

# 3. Check runtime responsiveness
time crictl pods >/dev/null
# If this takes >5s, runtime is under pressure

# 4. Check containerd goroutine count / memory
curl -s http://localhost:1338/debug/pprof/goroutine?debug=1 | head -20

# 5. Enable Evented PLEG if on containerd 1.7+
# Add to KubeletConfiguration:
# featureGates:
#   EventedPLEG: true

# 6. Immediate mitigation: restart kubelet (not containerd)
# Restarting containerd stops all containers on node
systemctl restart kubelet

kubectl exec fails with "error executing in container"

# Symptom: kubectl exec hangs or returns error

# 1. Check if it's a network routing issue (apiserver -> kubelet)
kubectl get node <node> -o wide   # verify node IP
curl -sk https://<node-ip>:10250/healthz

# 2. Test exec at CRI level directly
crictl exec -it <container-id> /bin/sh
# If this works: problem is in kubelet streaming server or apiserver proxy
# If this fails: problem is in the runtime's streaming server

# 3. Check streaming server address in kubelet
# kubelet must be reachable at its advertised address
kubectl get node <node> -o jsonpath='{.status.addresses}'

# 4. Check kubelet certificate SANs include node IP
openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -noout -text | grep "IP Address"

# 5. Firewall: ensure apiserver can reach node :10250

NetworkReady=false — CNI not configured

# Symptom: crictl info shows "networkReady": false
# Pods stuck in Pending or ContainerCreating with "network not ready"

# 1. Check CNI plugin installation
ls /opt/cni/bin/
ls /etc/cni/net.d/

# 2. Check containerd CRI plugin CNI config
grep -A5 "cni" /etc/containerd/config.toml
# Expected:
# [plugins."io.containerd.grpc.v1.cri".cni]
#   bin_dir = "/opt/cni/bin"
#   conf_dir = "/etc/cni/net.d"

# 3. If CNI config dir is empty, CNI DaemonSet not deployed
# Deploy your CNI (Calico, Cilium, Flannel etc.)
kubectl get pods -n kube-system | grep -E "calico|cilium|flannel"

# 4. After CNI deploys, runtime auto-detects CNI config
# NetworkReady should flip to true within 30s
crictl info | grep networkReady

Production Best Practices

Pin runtime versions — use containerd 1.7.x or CRI-O 1.27.x and update deliberately; runtime bugs directly affect node stability. Subscribe to runtime security advisories.
Enable Evented PLEG on nodes with >100 pods if using containerd 1.7+ — reduces per-second CRI polling from O(pods) to near-zero on quiet nodes.
Restrict CRI socket permissions — chmod 0660 and assign to a dedicated group. Never bind-mount the socket into pods; it grants host escape.
Match cgroup driver — containerd SystemdCgroup = true must match kubelet cgroupDriver: systemd. A mismatch causes containers to be placed in wrong cgroups and makes eviction unreliable. Verify with crictl info | grep -i cgroup.
Monitor kubelet_runtime_operations_errors_total — alert on any sustained increase in CRI errors. A spike in run_podsandbox errors often indicates CNI or disk pressure before the node goes NotReady.
Test CRI upgrades with critest — run the CRI conformance suite against a new runtime version in staging before rolling to production. Breaking changes in streaming or log path handling will silently break kubectl exec/logs.
Use imageServiceEndpoint separation when pre-warming images — run a separate image service proxy that pre-pulls images during node provisioning, keeping the main runtime socket for workload operations only.
Pre-pull critical images — use imagePullPolicy: IfNotPresent with node-level image pre-pull DaemonSets for startup-critical images. This ensures PullImage never blocks pod startup on the critical path.
Handle cri-dockerd carefully — if your cluster still uses Docker Engine via cri-dockerd, remember it adds a translation layer (kubelet → cri-dockerd → Docker API → containerd). Debugging requires checking both cri-dockerd and Docker daemon logs.
Audit RuntimeClass handlers — misconfigured runtimeHandler strings silently cause CreateContainerError. Maintain an inventory of registered handlers in each cluster's containerd config.toml and validate with crictl info verbose output.