CNI Plugins — Deep Dive

The Container Network Interface (CNI) is the pluggable layer that gives every Kubernetes pod a routable IP address. This page covers the CNI specification in full, chained plugin architecture, and detailed internals of Calico, Cilium, Flannel, AWS VPC CNI, Azure CNI, and Antrea — including eBPF dataplanes, BGP configuration, HubbleObservability, ENI management, and production CNI selection criteria.

CNI Specification

CNI is a CNCF specification (NOT Kubernetes-specific) that defines how container runtimes invoke network plugins. The spec version history: 0.1 → 0.2 → 0.3.0/0.3.1 → 0.4.0 → 1.0.0 (2021). Kubernetes 1.28+ drops support for spec versions older than 0.4.0.

Operations: ADD / DEL / CHECK / GC / VERSION

OperationTriggerPlugin Must DoReturn
ADDContainer created; pod scheduled to nodeCreate veth, assign IP (IPAM), configure routes, setup iptablesCNI Result (interfaces[], ips[], routes[], dns{})
DELContainer removed; pod deletedRemove veth, release IP back to IPAM, tear down routesSuccess (errors tolerated if container already gone)
CHECKRuntime health verificationVerify container network matches last ADD resultSuccess or error with reason
GCSpec 1.0 — periodic cleanupRemove orphaned network state (IPs not in validAttachments)Success
VERSIONPlugin discoveryReport supported CNI spec versions{"cniVersion":"1.0.0","supportedVersions":[...]}

Environment Variables

The container runtime passes configuration via environment variables before exec-ing the CNI binary:

CNI_COMMAND=ADD
CNI_CONTAINERID=abc123def456...   # container ID (unique per sandbox)
CNI_NETNS=/var/run/netns/abc123   # path to network namespace
CNI_IFNAME=eth0                   # interface name inside container
CNI_ARGS=K8S_POD_NAMESPACE=default;K8S_POD_NAME=nginx-abc;K8S_POD_INFRA_CONTAINER_ID=abc123
CNI_PATH=/opt/cni/bin             # colon-separated dirs to search for plugins
stdin

Network config JSON is passed on stdin, not as CLI args. The plugin reads the config from stdin and writes the CNI result to stdout. Errors go to stderr.

Full ADD Request / Response

// stdin → plugin (network config)
{
  "cniVersion": "1.0.0",
  "name": "k8s-pod-network",
  "type": "calico",
  "ipam": {
    "type": "calico-ipam"
  },
  "kubernetes": {
    "k8s_api_root": "https://10.96.0.1:443",
    "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
  },
  "policy": {
    "type": "k8s"
  },
  "nodename": "worker-1",
  "datastore_type": "kubernetes",
  "log_level": "info",
  "prevResult": null   // null for first ADD; populated for chained plugins
}

// stdout ← plugin (CNI result v1.0.0)
{
  "cniVersion": "1.0.0",
  "interfaces": [
    {
      "name": "veth8a3f12b",
      "mac": "aa:bb:cc:dd:ee:ff",
      "sandbox": ""           // host side: no sandbox path
    },
    {
      "name": "eth0",
      "mac": "aa:bb:cc:dd:ee:00",
      "sandbox": "/var/run/netns/abc123"   // container side
    }
  ],
  "ips": [
    {
      "interface": 1,          // index into interfaces[]
      "address": "10.244.1.15/24",
      "gateway": "10.244.1.1"
    }
  ],
  "routes": [
    { "dst": "0.0.0.0/0", "gw": "10.244.1.1" }
  ],
  "dns": {
    "nameservers": ["10.96.0.10"],
    "domain": "cluster.local",
    "search": ["default.svc.cluster.local","svc.cluster.local","cluster.local"]
  }
}

Chained Plugins — conflist

A .conflist file defines an ordered list of plugins. The runtime calls them sequentially; each plugin receives the prevResult from the prior plugin in the chain. This is how bandwidth shaping, firewall, and IPAM are combined:

{
  "cniVersion": "1.0.0",
  "name": "k8s-pod-network",
  "plugins": [
    {
      "type": "calico",           // main plugin: sets up veth + routes
      "ipam": { "type": "calico-ipam" },
      "kubernetes": { "kubeconfig": "/etc/cni/net.d/calico-kubeconfig" }
    },
    {
      "type": "bandwidth",        // meta plugin: shapes traffic
      "ingressRate": 104857600,   // 100 Mbps in bps
      "ingressBurst": 209715200,
      "egressRate": 104857600,
      "egressBurst": 209715200
    },
    {
      "type": "portmap",          // meta plugin: iptables DNAT for hostPort
      "capabilities": { "portMappings": true },
      "snat": true
    },
    {
      "type": "firewall"          // meta plugin: iptables allow/deny rules
    }
  ]
}
Plugin search order

The runtime uses the first .conf or .conflist file found in /etc/cni/net.d/ sorted lexicographically. Files beginning with 10- load before 99-. Stale configs left by old CNI installations can cause the wrong plugin to load.

CNI Invocation Flow

kubelet calls CRI containerd CNI invocation /opt/cni/bin/ calico | flannel | cilium IPAM plugin host-local / calico-ipam CRI exec() IPAM exec pause container new netns created kernel netns veth pair + routes IPAM store allocate IP lease stdin: /etc/cni/net.d/10-calico.conflist → stdout: CNI Result JSON CNI_COMMAND=ADD CNI_CONTAINERID=… CNI_NETNS=… CNI_IFNAME=eth0

CNI Reference Plugins

The containernetworking/plugins repo ships reference plugins that ship with most distributions:

Main Plugins

  • bridge — Linux bridge
  • host-device — SR-IOV VF
  • ipvlan — ipvlan L2/L3
  • macvlan — macvlan
  • ptp — veth pair (no bridge)
  • vlan — VLAN trunk
  • dummy — loopback

IPAM Plugins

  • host-local — file-based
  • dhcp — DHCP daemon
  • static — hardcoded IPs
  • whereabouts — cluster-wide

Meta Plugins

  • portmap — hostPort DNAT
  • bandwidth — tc tbf shaping
  • firewall — iptables rules
  • sbr — source-based routing
  • tuning — sysctl / hwaddr
  • vrf — VRF routing table

Calico

Calico (Project Calico / Tigera) is the most widely deployed CNI in production. It supports three dataplanes: standard Linux (iptables), eBPF, and Windows HNS. Network policy is a first-class citizen implemented via Felix and iptables/eBPF rules, not through a separate plugin.

Architecture Components

Felix (DaemonSet)

The policy enforcement agent on every node. Felix programs iptables/eBPF rules, manages routing (kernel FIB or BGP), writes interface routes, enforces NetworkPolicy. Talks to the datastore (etcd or Kubernetes CRD API).

Pod: calico-node in kube-system

BIRD (BGP Daemon)

Runs inside calico-node pod. Advertises pod CIDRs via BGP to upstream routers and to peer nodes. In full-mesh mode each node peers with every other node (n² connections); in Route Reflector mode designated RR nodes relay routes.

Port 179 TCP (BGP)

confd (template engine)

Watches the datastore for BGP configuration changes and renders BIRD config files dynamically. Eliminates the need to restart BIRD on config changes — confd triggers a graceful reload.

Typha (optional)

Fan-out cache between the datastore and Felix daemons. In large clusters (>100 nodes), Typha absorbs the watch fan-out: 1,000 Felix instances watch Typha instead of the API server directly. Reduces API server load dramatically.

Deploy 1 Typha replica per 100-200 nodes

calico-kube-controllers

Deployment (not DaemonSet). Reconciles Kubernetes objects (Namespaces, Pods, NetworkPolicies, ServiceAccounts) into Calico CRDs. Also runs node cleanup when nodes are deleted.

calico-apiserver (optional)

Aggregated API extension exposing Calico CRDs through the Kubernetes API server. Required for Tigera Enterprise features and for applying Calico resources with kubectl via API aggregation.

Routing Modes

ModeEncapsulationRequiresOverheadUse When
BGP (native)NoneRoutable pod CIDRs; BGP-capable fabric~0On-prem with BGP ToR switches; maximum performance
VXLANVXLAN (UDP 4789)UDP connectivity between nodes~50 bytes/pktCloud VPCs that block BGP; most common in EKS/AKS self-managed
IP-in-IP (IPIP)IP-in-IP (proto 4)IP connectivity between nodes~20 bytes/pktLegacy; lighter than VXLAN but less universally supported
WireGuardWireGuard (UDP 51820)WireGuard kernel module~60 bytes/pktEncrypted pod-to-pod traffic without service mesh
None (CrossSubnet)None within subnet, VXLAN/IPIP acrossBGP for intra-subnet, overlay for cross-subnetMixedAWS multi-AZ or hybrid setups

Key Calico CRDs

# IPPool — defines allocatable pod CIDR range
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: default-ipv4-ippool
spec:
  cidr: 192.168.0.0/16
  ipipMode: Never           # Never | Always | CrossSubnet
  vxlanMode: Always         # Never | Always | CrossSubnet
  natOutgoing: true         # SNAT pods accessing non-cluster IPs
  nodeSelector: all()       # which nodes can use this pool
  blockSize: 26             # /26 = 64 IPs per node block (default /26 for IPv4)
  disabled: false

---
# IPAMBlock — auto-created per node; not edited manually
# kubectl get ipamblocks -o yaml
# Each block is a /26 subnet allocated from an IPPool

---
# BGPConfiguration — global BGP settings
apiVersion: crd.projectcalico.org/v1
kind: BGPConfiguration
metadata:
  name: default
spec:
  logSeverityScreen: Info
  nodeToNodeMeshEnabled: true    # full-mesh BGP (disable for RR topology)
  asNumber: 64512                # AS number for all nodes
  serviceClusterIPs:
  - cidr: 10.96.0.0/12           # advertise service CIDRs to BGP peers
  serviceExternalIPs:
  - cidr: 203.0.113.0/24

---
# BGPPeer — configure external BGP peer (Route Reflector or ToR)
apiVersion: crd.projectcalico.org/v1
kind: BGPPeer
metadata:
  name: tor-switch-1
spec:
  peerIP: 192.168.1.1
  asNumber: 65000
  nodeSelector: rack == "rack-1"

Calico eBPF Dataplane

Enable with calicoNetwork.linuxDataplane: BPF (Operator) or FELIX_BPFENABLED=true. Requirements: Linux 5.3+ (5.8+ recommended), kube-proxy disabled (--skip-phases=addon/kube-proxy).

eBPF advantages vs iptables

  • O(1) service lookup via BPF maps vs O(n) iptables chains
  • Direct server return (DSR): response packets bypass kube-proxy SNAT
  • Preserved source IP for NodePort (no SNAT)
  • Lower latency at scale (10k+ services)
  • TC hook (traffic control) replaces netfilter in the fast path

eBPF limitations

  • Requires kube-proxy to be disabled (full replacement)
  • Kernel < 5.3 not supported
  • No support for some exotic iptables rules
  • More complex troubleshooting (bpftool required)
# Inspect Calico eBPF maps
calicoctrl felix-diag-dump
bpftool map list | grep calico
bpftool prog show | grep calico

# Check eBPF conntrack
calicoctrl bpf conntrack dump
calicoctrl bpf nat dump frontend
calicoctrl bpf nat dump backend

Cilium

Cilium is an eBPF-native CNI that replaces both the network dataplane AND kube-proxy entirely with eBPF programs loaded into the Linux kernel. It requires kernel 4.9.17+ (5.10+ recommended) and provides deep observability via Hubble.

Architecture

cilium-agent (DaemonSet)

Core daemon on every node. Loads eBPF programs, manages BPF maps, handles IPAM (CiliumNode CRD), enforces network policies. Communicates with the Kubernetes API server and (optionally) Cilium KVStore (etcd).

cilium-operator (Deployment)

Cluster-scoped operations: IPAM allocation for large clusters, garbage collection of terminated pods, CiliumNetworkPolicy sync, garbage collection of CiliumEndpoints.

Hubble (observability)

Ring buffer-based eBPF observability layer. hubble-relay aggregates per-node Hubble servers into a cluster-wide gRPC API. hubble-ui provides a real-time network flow visualization graph. Zero overhead when no observer is connected.

cilium-envoy (optional)

Embedded Envoy proxy for L7 policy enforcement (HTTP, gRPC, Kafka). Runs as a sub-process of cilium-agent. L7 policies use CiliumNetworkPolicy rules[].ingress[].toPorts[].rules with HTTP match expressions.

eBPF Dataplane Internals

Cilium loads eBPF programs at multiple kernel hooks:

Hook PointDirectionPurpose
tc ingress on veth host sidePod → hostEnforce egress NetworkPolicy from pod's perspective; SNAT for masquerade
tc egress on veth host sideHost → podEnforce ingress NetworkPolicy from pod's perspective; DNAT for service
tc ingress on physical NIC (XDP)External → nodeNodePort load balancing; early drop for DDoS; DSR redirect
cgroup/connect4 (sock_ops)Socket levelTransparent socket-level load balancing (bypasses kernel netfilter entirely)
kprobe (optional)Kernel eventsProcess-level visibility for Hubble

IPAM Modes

ModeConfigWhere IPs Come FromUse Case
cluster-pool (default)ipam: cluster-poolOperator assigns per-node PodCIDR from clusterPoolIPv4PodCIDRGeneric on-prem / self-managed
kubernetesipam: kubernetesUses node.spec.podCIDR set by kube-controller-managerkubeadm clusters
aws-eniipam: eniCilium operator attaches ENIs and assigns secondary IPs via EC2 APIEKS with native VPC routing
azureipam: azureCilium operator assigns IPs from Azure VNET subnetAKS with native Azure networking
crdipam: crdCiliumNode CRD spec.ipam.poolsMulti-homing, custom IPAM

CiliumNetworkPolicy

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: api-server        # applies to pods with this label
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend        # allow from frontend pods
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:                # L7 HTTP rules
        - method: "GET"
          path: "/api/v1/.*"
        - method: "POST"
          path: "/api/v1/items"
  - fromEntities:
    - cluster               # allow all intra-cluster traffic
  egress:
  - toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:app: kube-dns    # allow DNS
    toPorts:
    - ports:
      - port: "53"
        protocol: UDP
      rules:
        dns:
        - matchPattern: "*.cluster.local"   # L7 DNS filtering
  - toFQDNs:                               # FQDN-based policy
    - matchName: "api.stripe.com"

Hubble Observability

# Install Hubble CLI
export HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
curl -L --remote-name-all https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-amd64.tar.gz

# Enable Hubble in Cilium
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --set hubble.enabled=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true

# Port-forward Hubble relay
cilium hubble port-forward &

# Real-time flow inspection
hubble observe --namespace production --follow
hubble observe --pod frontend/frontend-abc --verdict DROPPED
hubble observe --protocol TCP --port 8080 --output json

# Policy troubleshooting
hubble observe --namespace production --verdict DROPPED --last 100 | \
  jq '.flow | select(.destination.port == 5432)'

# Network flow statistics
hubble observe --namespace production --type trace --last 1000 | \
  jq -r '.flow | [.source.pod_name, .destination.pod_name, .verdict] | @tsv' | \
  sort | uniq -c | sort -rn | head -20

Cluster Mesh

Cilium Cluster Mesh connects up to 255 clusters with a shared service discovery and network policy model. Each cluster runs a clustermesh-apiserver that exposes its Cilium KVStore over TLS. Remote clusters are registered as Kubernetes Secrets:

# Enable cluster mesh on cluster-1
cilium clustermesh enable --context cluster-1

# Connect cluster-2 to cluster-1
cilium clustermesh connect \
  --context cluster-1 \
  --destination-context cluster-2

# Global service — load balance across clusters
kubectl annotate service my-svc \
  service.cilium.io/global=true \
  service.cilium.io/shared=true

Flannel

Flannel (CoreOS → flannel-io) is the simplest CNI plugin: a single binary that assigns a subnet to each node and wraps packets in a configurable backend. It has NO network policy support — Calico is often chained with Flannel (Canal) for policy enforcement.

Backend Types

BackendMechanismOverheadNotes
vxlan (default)VXLAN over UDP 8472~50 bytesWorks in most cloud environments; DirectRouting bypasses VXLAN within same subnet
host-gwStatic kernel routes (no encapsulation)~0Requires all nodes in same L2 domain; no cloud NAT support
wireguardWireGuard UDP 51820~60 bytesEncrypted; kernel 5.6+ or wireguard-go
udp (deprecated)Userspace TUN + UDPHighLegacy; extremely slow; only use for debugging
ipipIP-in-IP~20 bytesSome cloud providers block protocol 4
allocIPAM only (no dataplane)N/AWhen another component handles the dataplane

Flannel Configuration

// /etc/kube-flannel/net-conf.json (ConfigMap kube-flannel-cfg)
{
  "Network": "10.244.0.0/16",
  "Backend": {
    "Type": "vxlan",
    "VNI": 1,
    "Port": 8472,
    "DirectRouting": true    // use host-gw within same subnet, VXLAN across
  }
}
No Network Policy

Flannel does not implement Kubernetes NetworkPolicy. Use Canal (Flannel + Calico policy engine) for NetworkPolicy support with Flannel-style routing, or migrate to Calico/Cilium entirely.

AWS VPC CNI

The AWS VPC CNI (amazon-vpc-cni-k8s) places pod IPs directly in the VPC subnet — pods get real VPC IP addresses, not overlay addresses. This enables native VPC routing, security groups per pod, and direct connectivity to AWS services without NAT.

ENI Secondary IP Model

EC2 Node (e.g. m5.xlarge: 3 ENIs × 15 IPs = 45 pods max) Primary ENI Primary IP: 10.0.1.5 Secondary IP: 10.0.1.6 → pod-a Secondary IP: 10.0.1.7 → pod-b Secondary IP: 10.0.1.8 → pod-c … up to 15 secondary IPs Secondary ENI-1 Primary IP: 10.0.1.20 Secondary IP: 10.0.1.21 → pod-d Secondary IP: 10.0.1.22 → pod-e Secondary ENI-2 (Prefix Delegation Mode) /28 prefix: 10.0.1.32/28 16 IPs per /28 prefix vs 1 IP per secondary IP 10× more pods per ENI

Max Pods Calculation

# Formula: (ENIs × (IPs_per_ENI - 1)) + 2
# -1 for primary IP, +2 for kube-system pods

# m5.xlarge: 3 ENIs × 15 IPs/ENI
# Max pods = (3 × (15-1)) + 2 = 44

# With prefix delegation (/28): (3 × (15-1) × 16) + 2 = 674

# View instance type limits
aws ec2 describe-instance-types \
  --instance-types m5.xlarge \
  --query 'InstanceTypes[].NetworkInfo.[MaximumNetworkInterfaces,Ipv4AddressesPerInterface]'

# Check current pod count vs limit on node
kubectl describe node ip-10-0-1-5.ec2.internal | grep "Allocated resources" -A 5
kubectl get node ip-10-0-1-5.ec2.internal -o jsonpath='{.status.allocatable.pods}'

AWS VPC CNI Configuration

# Enable prefix delegation (dramatically increases pod density)
kubectl set env daemonset aws-node -n kube-system \
  ENABLE_PREFIX_DELEGATION=true \
  WARM_PREFIX_TARGET=1

# Security Groups for Pods
kubectl set env daemonset aws-node -n kube-system \
  ENABLE_POD_ENI=true

# Pod-level security group (per-pod ENI trunking)
# Uses ENI trunking — requires Nitro-based instances
# SecurityGroupPolicy CRD
cat <
IP Exhaustion

AWS VPC CNI consumes real VPC subnet IPs. A /24 subnet (254 IPs) shared across 10 nodes with 15 IPs/ENI can exhaust quickly. Plan separate subnets for worker nodes and use prefix delegation or RFC 1918 large private CIDRs. Consider ENABLE_SUBNET_DISCOVERY=true for multi-subnet IPAM.

Azure CNI

Azure CNI has two modes: flat/overlay-free (pods get VNET IPs directly) and overlay (pods get private IPs from an overlay space, nodes NAT to VNET). The traditional flat mode has the same IP exhaustion concern as AWS VPC CNI.

ModePod IPsProsCons
Azure CNI (flat)Real VNET IPs from subnetDirect VNET routing; no NAT; Azure NSG enforcementRequires large subnets; 250 pods = 250 IPs consumed
Azure CNI OverlayPrivate overlay CIDRs (e.g. 10.244.0.0/16)No subnet IP exhaustion; scale to 50,000 podsNAT to VNET; latency overhead; less direct NSG
kubenetPrivate CIDRs + host NATSimple; no VNET IP consumptionNo network policy; UDR required for cross-node; deprecated in AKS
# AKS with Azure CNI Overlay
az aks create \
  --network-plugin azure \
  --network-plugin-mode overlay \
  --pod-cidr 192.168.0.0/16 \
  --service-cidr 10.96.0.0/12 \
  --dns-service-ip 10.96.0.10

# AKS with Azure CNI + Cilium dataplane
az aks create \
  --network-plugin azure \
  --network-dataplane cilium \
  --network-policy cilium

Antrea

Antrea (VMware) uses Open vSwitch (OVS) as its dataplane. It supports Geneve encapsulation, eBPF (experimental), and integrates with NSX-T for enterprise policy management. Traceflow is Antrea's killer feature: trace a packet through the entire network pipeline.

Architecture

antrea-agent (DaemonSet)

Runs OVS daemon + agent per node. Manages OVS flows, IPAM, NetworkPolicy enforcement. Communicates with antrea-controller via gRPC over the Antrea network.

antrea-controller (Deployment)

Computes and distributes NetworkPolicy rules. Watches Kubernetes API server; distributes computed policies to agents. Single instance with leader election.

Traceflow

# Inject a synthetic probe packet and trace its path
apiVersion: ops.antrea.io/v1alpha1
kind: Traceflow
metadata:
  name: frontend-to-api
spec:
  source:
    namespace: production
    pod: frontend-abc-123
  destination:
    namespace: production
    pod: api-server-def-456
    port: 8080
  packet:
    ipHeader:
      protocol: 6    # TCP
    transportHeader:
      tcp:
        srcPort: 12345
        dstPort: 8080
        flags: 2     # SYN
# kubectl get traceflow frontend-to-api -o yaml
# Results show each hop: ingress/egress NetworkPolicy matches, OVS flow actions

CNI Selection Guide

RequirementRecommendationWhy
Max performance, eBPF, no kube-proxyCiliumeBPF native; socket-level LB; O(1) service lookup; Hubble observability
NetworkPolicy + BGP on-premCalicoNative BGP; Felix policy engine; WireGuard encryption; battle-tested at scale
AWS EKS native VPC routingAWS VPC CNIENI-native IPs; SG per pod; no overlay needed in AWS VPC
AKS with Azure VNET integrationAzure CNINative VNET routing; NSG per pod; AKS managed
Simple cluster, no policy neededFlannelSimple, low overhead, easy to debug; pair with Calico policy (Canal) if needed
VMware / NSX-T integrationAntreaOVS dataplane; NSX-T integration; Traceflow; enterprise support
Multi-network (multiple interfaces)MultusMeta-CNI; chains multiple CNIs per pod; SR-IOV for NFV/telco
Very large clusters (>5k nodes)CiliumeBPF maps O(1) scaling; Typha-like fan-out; Cluster Mesh for multi-cluster
GKE AutopilotGKE Dataplane V2 (Cilium)Managed; deeply integrated; no user choice needed

Feature Matrix

FeatureCalicoCiliumFlannelAWS VPC CNIAntrea
NetworkPolicy (K8s)✗*
Extended NetworkPolicy (L7)Limited✓ (HTTP/DNS/Kafka)Limited
eBPF dataplane✓ opt-in✓ defaultExperimental
kube-proxy replacement✓ eBPF mode✓ default
EncryptionWireGuard/IPSecWireGuard/IPSecWireGuardTLS app layerIPSec
ObservabilityMetricsHubble (flows)MinimalVPC Flow LogsTraceflow
BGP support✓ native✓ BGP Control Plane
IPv6 / dual-stackLimited
Windows nodes
Multi-clusterFederationCluster MeshMulti-cluster GW

* AWS VPC CNI has no native policy; use Calico for NetworkPolicy on EKS.

CNI Debugging

crictl — CRI-Level CNI Inspection

# List pod sandboxes and their network namespace status
crictl pods

# Inspect a specific pod sandbox
SANDBOX_ID=$(crictl pods --name nginx-abc --quiet)
crictl inspectp $SANDBOX_ID | jq '.status.network'

# Check CNI logs (containerd)
journalctl -u containerd --since "10 minutes ago" | grep -i cni

# Enable CNI debug logging
export CNI_DEBUG=1   # set in containerd config or systemd environment

Manual CNI Plugin Invocation

# Create a test network namespace
ip netns add test-ns

# Manually invoke CNI ADD (for debugging)
cat <<EOF | CNI_COMMAND=ADD CNI_CONTAINERID=test123 \
  CNI_NETNS=/var/run/netns/test-ns \
  CNI_IFNAME=eth0 \
  CNI_PATH=/opt/cni/bin \
  /opt/cni/bin/bridge
{
  "cniVersion": "1.0.0",
  "name": "test",
  "type": "bridge",
  "bridge": "cni-test0",
  "ipam": {
    "type": "host-local",
    "subnet": "172.19.0.0/24",
    "gateway": "172.19.0.1"
  }
}
EOF

# Verify result
ip netns exec test-ns ip addr show eth0
ip netns exec test-ns ip route

# Clean up
cat <<EOF | CNI_COMMAND=DEL CNI_CONTAINERID=test123 \
  CNI_NETNS=/var/run/netns/test-ns \
  CNI_IFNAME=eth0 \
  CNI_PATH=/opt/cni/bin \
  /opt/cni/bin/bridge
{ "cniVersion": "1.0.0", "name": "test", "type": "bridge" }
EOF
ip netns del test-ns

Calico-Specific Debugging

# Check Felix status
kubectl exec -n kube-system -it $(kubectl get pod -n kube-system -l k8s-app=calico-node -o jsonpath='{.items[0].metadata.name}') -c calico-node -- calico-node -felix-live
kubectl exec -n kube-system calico-node-xyz -c calico-node -- calico-node -bird-live

# calicoctl commands (install separately)
calicoctl node status        # Felix + BIRD status
calicoctl get ippool -o wide
calicoctl get ipamblock      # per-node blocks
calicoctl ipam show --show-blocks
calicoctl ipam check         # detect IPAM inconsistencies

# Felix logs (verbose)
kubectl logs -n kube-system calico-node-xyz -c calico-node --since=5m | grep -E "ERROR|WARN|policy"

# Datastore connectivity
DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig calicoctl get nodes

Cilium-Specific Debugging

# Cilium status
cilium status --verbose

# Endpoint (pod) status
cilium endpoint list
cilium endpoint get $(cilium endpoint list | grep 10.244.1.15 | awk '{print $1}')

# Policy verdict for a specific connection
cilium policy trace --src-k8s-pod default/frontend-abc --dst-k8s-pod default/api-server-def --dport 8080 --protocol tcp

# BPF map inspection
cilium bpf lb list           # service load balancer map
cilium bpf ct list global    # conntrack entries
cilium bpf tunnel list       # VXLAN tunnel endpoints
cilium bpf nat list          # NAT map

# Connectivity test
cilium connectivity test --test pod-to-pod,pod-to-service

# Hubble (if enabled)
cilium hubble port-forward &
hubble observe --namespace default --verdict DROPPED --last 50

Troubleshooting Runbooks

Runbook 1: Pod stuck in ContainerCreating — CNI failure

# 1. Get pod events
kubectl describe pod <name> -n <ns>
# Look for: "Failed to create pod sandbox" or "network plugin is not ready"

# 2. Check kubelet logs on the node
journalctl -u kubelet --since "5 minutes ago" | grep -E "cni|network|sandbox"

# 3. Verify CNI binary exists and is executable
ls -la /opt/cni/bin/
ls -la /etc/cni/net.d/

# 4. Check containerd CNI config
cat /etc/containerd/config.toml | grep -A5 cni
systemctl status containerd

# 5. If Calico: check calico-node pod on that node
NODE=$(kubectl get pod <name> -n <ns> -o jsonpath='{.spec.nodeName}')
kubectl get pod -n kube-system -l k8s-app=calico-node --field-selector spec.nodeName=$NODE
kubectl logs -n kube-system <calico-node-pod> -c calico-node --tail=50

# 6. If Cilium: check cilium agent on that node
kubectl exec -n kube-system cilium-xyz -- cilium status

Runbook 2: IP address exhaustion (IPAM out of IPs)

# Calico: check IPAM utilization
calicoctl ipam show --show-blocks
calicoctl ipam check
# Look for "blocks with no matching node" — orphaned blocks

# Release leaked IPs
calicoctl ipam release --ip=192.168.5.20   # release specific IP

# Calico: increase block size (destructive — restart required)
# Edit IPPool blockSize from /26 (64 IPs) to /24 (256 IPs)
# Warning: requires full cluster restart to take effect

# AWS VPC CNI: check ENI limits
kubectl describe node <node> | grep "vpc.amazonaws.com"
kubectl get node <node> -o jsonpath='{.metadata.annotations}' | jq

# Enable prefix delegation on AWS
kubectl set env ds aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true

# Cilium: check pool utilization
kubectl get ciliumnodes.cilium.io -o json | jq '.items[] | {name:.metadata.name, used:.status.ipam.used|length, available:.status.ipam.available|length}'

Runbook 3: Cross-node pod connectivity failure

# 1. Confirm pod IPs and node assignment
kubectl get pods -o wide -n default

# 2. Test from netshoot debug pod on each node
kubectl debug node/worker-1 -it --image=nicolaka/netshoot -- bash
# Inside: ping <pod-ip-on-worker-2>
# Inside: traceroute <pod-ip-on-worker-2>

# 3. Check VXLAN tunnel (Flannel/Calico VXLAN mode)
ip link show flannel.1    # or vxlan.calico
bridge fdb show dev flannel.1 | grep <destination-node-mac>
ip route | grep <pod-cidr-of-remote-node>

# 4. Check BGP routes (Calico BGP mode)
calicoctl node status
# Should show "Established" for all peer connections
ip route | grep blackhole   # blackhole routes = IPAM reserved

# 5. Check Cilium tunnel endpoints
cilium bpf tunnel list
# Verify destination node IP is present

# 6. Check MTU mismatch (common overlay issue)
kubectl exec -it netshoot -- ping -M do -s 8951 <remote-pod-ip>
# If this fails but small pings work → MTU problem
# Calico: set MTU in CNI config / Felix MTUIfacePattern
# Flannel: set backend.MTU in net-conf.json

Runbook 4: CNI plugin upgrade procedure

# Calico upgrade (Operator-managed)
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/operator.crds.yaml
kubectl patch installation default --type=merge -p '{"spec":{"calicoNetwork":{"linuxDataplane":"Iptables"},"registry":"docker.io/calico","version":"v3.27.0"}}'
# Watch rollout
kubectl rollout status ds/calico-node -n kube-system

# Cilium upgrade (Helm)
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --version 1.15.0 \
  --reuse-values \
  --set upgradeCompatibility=1.14
# Monitor
cilium status --wait

# Critical: never skip major versions
# Check upgrade notes: https://docs.cilium.io/en/stable/operations/upgrade/

Key CNI Metrics

MetricSourceAlert Threshold
felix_ipset_errors_totalCalico Felix>0 sustained
felix_route_table_list_seconds_countCalico Felixp99 > 1s
ipam_ips_in_use / ipam_ips_totalCalicoratio > 0.85 (85% exhaustion)
cilium_endpoint_state{state="not-ready"}Cilium>0
cilium_drop_count_totalCiliumspike > baseline × 3
cilium_bpf_map_ops_total{outcome="fail"}Cilium>0
awscni_assigned_ip_addresses / awscni_total_ip_addressesAWS VPC CNIratio > 0.90
network_plugin_operations_latency_microsecondskubeletp99 > 5s

Production Best Practices

IPAM Planning

  • Size pod CIDR for 2× peak capacity
  • For Calico: /26 blocks (default) → 64 IPs/node; use /24 for dense nodes
  • Monitor IPAM utilization at 70% → start planning expansion
  • Never overlap pod CIDR with service CIDR or node CIDR
  • Reserve separate subnets in cloud VPCs for nodes

MTU Configuration

  • Overlay (VXLAN): set pod MTU = NIC MTU − 50 (typically 1450)
  • IPIP: pod MTU = NIC MTU − 20 (typically 1460)
  • No overlay: pod MTU = NIC MTU (1500 or 9000 with jumbo frames)
  • Test with: ping -M do -s 1472 (1472 + 28 IP/ICMP = 1500)
  • Jumbo frames: set NIC MTU 9000 + Calico mtu: 8950

Upgrade Safety

  • Test CNI upgrades in staging first
  • Use PodDisruptionBudgets on critical workloads
  • Calico: rolling upgrade via DaemonSet maxUnavailable=1
  • Cilium: use --set upgradeCompatibility flag
  • Never upgrade CNI and Kubernetes simultaneously

Security

  • Enable NetworkPolicy deny-by-default in all namespaces
  • Use Calico GlobalNetworkPolicy for cluster-wide baseline rules
  • Enable WireGuard encryption for inter-node pod traffic
  • Restrict CNI DaemonSet to only needed host mounts
  • Enable Hubble/Traceflow for anomaly detection

Observability

  • Scrape Felix / cilium-agent metrics in Prometheus
  • Alert on IPAM exhaustion at 80% threshold
  • Use Hubble UI to visualize network flows in staging
  • Enable Cilium drop metrics to catch NetworkPolicy rejections
  • Monitor network_plugin_operations_latency_microseconds

Performance

  • Prefer eBPF (Cilium or Calico eBPF) for >1k services or >500 nodes
  • Enable Typha for Calico at >100 nodes
  • Use BGP underlay when possible (zero encapsulation overhead)
  • Jumbo frames (9000 MTU) for storage-intensive pods
  • Pin NUMA-sensitive workloads; check CNI NUMA awareness