CNI Plugins — Deep Dive

The Container Network Interface (CNI) is the pluggable layer that gives every Kubernetes pod a routable IP address. This page covers the CNI specification in full, chained plugin architecture, and detailed internals of Calico, Cilium, Flannel, AWS VPC CNI, Azure CNI, and Antrea — including eBPF dataplanes, BGP configuration, HubbleObservability, ENI management, and production CNI selection criteria.

CNI Specification

CNI is a CNCF specification (NOT Kubernetes-specific) that defines how container runtimes invoke network plugins. The spec version history: 0.1 → 0.2 → 0.3.0/0.3.1 → 0.4.0 → 1.0.0 (2021). Kubernetes 1.28+ drops support for spec versions older than 0.4.0.

Operations: ADD / DEL / CHECK / GC / VERSION

Operation	Trigger	Plugin Must Do	Return
`ADD`	Container created; pod scheduled to node	Create veth, assign IP (IPAM), configure routes, setup iptables	CNI Result (interfaces[], ips[], routes[], dns{})
`DEL`	Container removed; pod deleted	Remove veth, release IP back to IPAM, tear down routes	Success (errors tolerated if container already gone)
`CHECK`	Runtime health verification	Verify container network matches last ADD result	Success or error with reason
`GC`	Spec 1.0 — periodic cleanup	Remove orphaned network state (IPs not in validAttachments)	Success
`VERSION`	Plugin discovery	Report supported CNI spec versions	{"cniVersion":"1.0.0","supportedVersions":[...]}

Environment Variables

The container runtime passes configuration via environment variables before exec-ing the CNI binary:

CNI_COMMAND=ADD
CNI_CONTAINERID=abc123def456...   # container ID (unique per sandbox)
CNI_NETNS=/var/run/netns/abc123   # path to network namespace
CNI_IFNAME=eth0                   # interface name inside container
CNI_ARGS=K8S_POD_NAMESPACE=default;K8S_POD_NAME=nginx-abc;K8S_POD_INFRA_CONTAINER_ID=abc123
CNI_PATH=/opt/cni/bin             # colon-separated dirs to search for plugins

stdin

Network config JSON is passed on stdin, not as CLI args. The plugin reads the config from stdin and writes the CNI result to stdout. Errors go to stderr.

Full ADD Request / Response

// stdin → plugin (network config)
{
  "cniVersion": "1.0.0",
  "name": "k8s-pod-network",
  "type": "calico",
  "ipam": {
    "type": "calico-ipam"
  },
  "kubernetes": {
    "k8s_api_root": "https://10.96.0.1:443",
    "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
  },
  "policy": {
    "type": "k8s"
  },
  "nodename": "worker-1",
  "datastore_type": "kubernetes",
  "log_level": "info",
  "prevResult": null   // null for first ADD; populated for chained plugins
}

// stdout ← plugin (CNI result v1.0.0)
{
  "cniVersion": "1.0.0",
  "interfaces": [
    {
      "name": "veth8a3f12b",
      "mac": "aa:bb:cc:dd:ee:ff",
      "sandbox": ""           // host side: no sandbox path
    },
    {
      "name": "eth0",
      "mac": "aa:bb:cc:dd:ee:00",
      "sandbox": "/var/run/netns/abc123"   // container side
    }
  ],
  "ips": [
    {
      "interface": 1,          // index into interfaces[]
      "address": "10.244.1.15/24",
      "gateway": "10.244.1.1"
    }
  ],
  "routes": [
    { "dst": "0.0.0.0/0", "gw": "10.244.1.1" }
  ],
  "dns": {
    "nameservers": ["10.96.0.10"],
    "domain": "cluster.local",
    "search": ["default.svc.cluster.local","svc.cluster.local","cluster.local"]
  }
}

Chained Plugins — conflist

A .conflist file defines an ordered list of plugins. The runtime calls them sequentially; each plugin receives the prevResult from the prior plugin in the chain. This is how bandwidth shaping, firewall, and IPAM are combined:

{
  "cniVersion": "1.0.0",
  "name": "k8s-pod-network",
  "plugins": [
    {
      "type": "calico",           // main plugin: sets up veth + routes
      "ipam": { "type": "calico-ipam" },
      "kubernetes": { "kubeconfig": "/etc/cni/net.d/calico-kubeconfig" }
    },
    {
      "type": "bandwidth",        // meta plugin: shapes traffic
      "ingressRate": 104857600,   // 100 Mbps in bps
      "ingressBurst": 209715200,
      "egressRate": 104857600,
      "egressBurst": 209715200
    },
    {
      "type": "portmap",          // meta plugin: iptables DNAT for hostPort
      "capabilities": { "portMappings": true },
      "snat": true
    },
    {
      "type": "firewall"          // meta plugin: iptables allow/deny rules
    }
  ]
}

Plugin search order

The runtime uses the first .conf or .conflist file found in /etc/cni/net.d/ sorted lexicographically. Files beginning with 10- load before 99-. Stale configs left by old CNI installations can cause the wrong plugin to load.

CNI Invocation Flow

CNI Reference Plugins

The containernetworking/plugins repo ships reference plugins that ship with most distributions:

Main Plugins

bridge — Linux bridge
host-device — SR-IOV VF
ipvlan — ipvlan L2/L3
macvlan — macvlan
ptp — veth pair (no bridge)
vlan — VLAN trunk
dummy — loopback

IPAM Plugins

host-local — file-based
dhcp — DHCP daemon
static — hardcoded IPs
whereabouts — cluster-wide

Meta Plugins

portmap — hostPort DNAT
bandwidth — tc tbf shaping
firewall — iptables rules
sbr — source-based routing
tuning — sysctl / hwaddr
vrf — VRF routing table

Calico

Calico (Project Calico / Tigera) is the most widely deployed CNI in production. It supports three dataplanes: standard Linux (iptables), eBPF, and Windows HNS. Network policy is a first-class citizen implemented via Felix and iptables/eBPF rules, not through a separate plugin.

Architecture Components

Felix (DaemonSet)

The policy enforcement agent on every node. Felix programs iptables/eBPF rules, manages routing (kernel FIB or BGP), writes interface routes, enforces NetworkPolicy. Talks to the datastore (etcd or Kubernetes CRD API).

Pod: calico-node in kube-system

BIRD (BGP Daemon)

Runs inside calico-node pod. Advertises pod CIDRs via BGP to upstream routers and to peer nodes. In full-mesh mode each node peers with every other node (n² connections); in Route Reflector mode designated RR nodes relay routes.

Port 179 TCP (BGP)

confd (template engine)

Watches the datastore for BGP configuration changes and renders BIRD config files dynamically. Eliminates the need to restart BIRD on config changes — confd triggers a graceful reload.

Typha (optional)

Fan-out cache between the datastore and Felix daemons. In large clusters (>100 nodes), Typha absorbs the watch fan-out: 1,000 Felix instances watch Typha instead of the API server directly. Reduces API server load dramatically.

Deploy 1 Typha replica per 100-200 nodes

calico-kube-controllers

Deployment (not DaemonSet). Reconciles Kubernetes objects (Namespaces, Pods, NetworkPolicies, ServiceAccounts) into Calico CRDs. Also runs node cleanup when nodes are deleted.

calico-apiserver (optional)

Aggregated API extension exposing Calico CRDs through the Kubernetes API server. Required for Tigera Enterprise features and for applying Calico resources with kubectl via API aggregation.

Routing Modes

Mode	Encapsulation	Requires	Overhead	Use When
BGP (native)	None	Routable pod CIDRs; BGP-capable fabric	~0	On-prem with BGP ToR switches; maximum performance
VXLAN	VXLAN (UDP 4789)	UDP connectivity between nodes	~50 bytes/pkt	Cloud VPCs that block BGP; most common in EKS/AKS self-managed
IP-in-IP (IPIP)	IP-in-IP (proto 4)	IP connectivity between nodes	~20 bytes/pkt	Legacy; lighter than VXLAN but less universally supported
WireGuard	WireGuard (UDP 51820)	WireGuard kernel module	~60 bytes/pkt	Encrypted pod-to-pod traffic without service mesh
None (CrossSubnet)	None within subnet, VXLAN/IPIP across	BGP for intra-subnet, overlay for cross-subnet	Mixed	AWS multi-AZ or hybrid setups

Key Calico CRDs

# IPPool — defines allocatable pod CIDR range
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: default-ipv4-ippool
spec:
  cidr: 192.168.0.0/16
  ipipMode: Never           # Never | Always | CrossSubnet
  vxlanMode: Always         # Never | Always | CrossSubnet
  natOutgoing: true         # SNAT pods accessing non-cluster IPs
  nodeSelector: all()       # which nodes can use this pool
  blockSize: 26             # /26 = 64 IPs per node block (default /26 for IPv4)
  disabled: false

---
# IPAMBlock — auto-created per node; not edited manually
# kubectl get ipamblocks -o yaml
# Each block is a /26 subnet allocated from an IPPool

---
# BGPConfiguration — global BGP settings
apiVersion: crd.projectcalico.org/v1
kind: BGPConfiguration
metadata:
  name: default
spec:
  logSeverityScreen: Info
  nodeToNodeMeshEnabled: true    # full-mesh BGP (disable for RR topology)
  asNumber: 64512                # AS number for all nodes
  serviceClusterIPs:
  - cidr: 10.96.0.0/12           # advertise service CIDRs to BGP peers
  serviceExternalIPs:
  - cidr: 203.0.113.0/24

---
# BGPPeer — configure external BGP peer (Route Reflector or ToR)
apiVersion: crd.projectcalico.org/v1
kind: BGPPeer
metadata:
  name: tor-switch-1
spec:
  peerIP: 192.168.1.1
  asNumber: 65000
  nodeSelector: rack == "rack-1"

Calico eBPF Dataplane

Enable with calicoNetwork.linuxDataplane: BPF (Operator) or FELIX_BPFENABLED=true. Requirements: Linux 5.3+ (5.8+ recommended), kube-proxy disabled (--skip-phases=addon/kube-proxy).

eBPF advantages vs iptables

O(1) service lookup via BPF maps vs O(n) iptables chains
Direct server return (DSR): response packets bypass kube-proxy SNAT
Preserved source IP for NodePort (no SNAT)
Lower latency at scale (10k+ services)
TC hook (traffic control) replaces netfilter in the fast path

eBPF limitations

Requires kube-proxy to be disabled (full replacement)
Kernel < 5.3 not supported
No support for some exotic iptables rules
More complex troubleshooting (bpftool required)

# Inspect Calico eBPF maps
calicoctrl felix-diag-dump
bpftool map list | grep calico
bpftool prog show | grep calico

# Check eBPF conntrack
calicoctrl bpf conntrack dump
calicoctrl bpf nat dump frontend
calicoctrl bpf nat dump backend

Cilium

Cilium is an eBPF-native CNI that replaces both the network dataplane AND kube-proxy entirely with eBPF programs loaded into the Linux kernel. It requires kernel 4.9.17+ (5.10+ recommended) and provides deep observability via Hubble.

Architecture

cilium-agent (DaemonSet)

Core daemon on every node. Loads eBPF programs, manages BPF maps, handles IPAM (CiliumNode CRD), enforces network policies. Communicates with the Kubernetes API server and (optionally) Cilium KVStore (etcd).

cilium-operator (Deployment)

Cluster-scoped operations: IPAM allocation for large clusters, garbage collection of terminated pods, CiliumNetworkPolicy sync, garbage collection of CiliumEndpoints.

Hubble (observability)

Ring buffer-based eBPF observability layer. hubble-relay aggregates per-node Hubble servers into a cluster-wide gRPC API. hubble-ui provides a real-time network flow visualization graph. Zero overhead when no observer is connected.

cilium-envoy (optional)

Embedded Envoy proxy for L7 policy enforcement (HTTP, gRPC, Kafka). Runs as a sub-process of cilium-agent. L7 policies use CiliumNetworkPolicy rules[].ingress[].toPorts[].rules with HTTP match expressions.

eBPF Dataplane Internals

Cilium loads eBPF programs at multiple kernel hooks:

Hook Point	Direction	Purpose
`tc ingress` on veth host side	Pod → host	Enforce egress NetworkPolicy from pod's perspective; SNAT for masquerade
`tc egress` on veth host side	Host → pod	Enforce ingress NetworkPolicy from pod's perspective; DNAT for service
`tc ingress` on physical NIC (XDP)	External → node	NodePort load balancing; early drop for DDoS; DSR redirect
`cgroup/connect4` (sock_ops)	Socket level	Transparent socket-level load balancing (bypasses kernel netfilter entirely)
`kprobe` (optional)	Kernel events	Process-level visibility for Hubble

IPAM Modes

Mode	Config	Where IPs Come From	Use Case
cluster-pool (default)	`ipam: cluster-pool`	Operator assigns per-node PodCIDR from `clusterPoolIPv4PodCIDR`	Generic on-prem / self-managed
kubernetes	`ipam: kubernetes`	Uses `node.spec.podCIDR` set by kube-controller-manager	kubeadm clusters
aws-eni	`ipam: eni`	Cilium operator attaches ENIs and assigns secondary IPs via EC2 API	EKS with native VPC routing
azure	`ipam: azure`	Cilium operator assigns IPs from Azure VNET subnet	AKS with native Azure networking
crd	`ipam: crd`	CiliumNode CRD `spec.ipam.pools`	Multi-homing, custom IPAM

CiliumNetworkPolicy

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: api-server        # applies to pods with this label
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend        # allow from frontend pods
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:                # L7 HTTP rules
        - method: "GET"
          path: "/api/v1/.*"
        - method: "POST"
          path: "/api/v1/items"
  - fromEntities:
    - cluster               # allow all intra-cluster traffic
  egress:
  - toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:app: kube-dns    # allow DNS
    toPorts:
    - ports:
      - port: "53"
        protocol: UDP
      rules:
        dns:
        - matchPattern: "*.cluster.local"   # L7 DNS filtering
  - toFQDNs:                               # FQDN-based policy
    - matchName: "api.stripe.com"

Hubble Observability

# Install Hubble CLI
export HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
curl -L --remote-name-all https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-amd64.tar.gz

# Enable Hubble in Cilium
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --set hubble.enabled=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true

# Port-forward Hubble relay
cilium hubble port-forward &

# Real-time flow inspection
hubble observe --namespace production --follow
hubble observe --pod frontend/frontend-abc --verdict DROPPED
hubble observe --protocol TCP --port 8080 --output json

# Policy troubleshooting
hubble observe --namespace production --verdict DROPPED --last 100 | \
  jq '.flow | select(.destination.port == 5432)'

# Network flow statistics
hubble observe --namespace production --type trace --last 1000 | \
  jq -r '.flow | [.source.pod_name, .destination.pod_name, .verdict] | @tsv' | \
  sort | uniq -c | sort -rn | head -20

Cluster Mesh

Cilium Cluster Mesh connects up to 255 clusters with a shared service discovery and network policy model. Each cluster runs a clustermesh-apiserver that exposes its Cilium KVStore over TLS. Remote clusters are registered as Kubernetes Secrets:

# Enable cluster mesh on cluster-1
cilium clustermesh enable --context cluster-1

# Connect cluster-2 to cluster-1
cilium clustermesh connect \
  --context cluster-1 \
  --destination-context cluster-2

# Global service — load balance across clusters
kubectl annotate service my-svc \
  service.cilium.io/global=true \
  service.cilium.io/shared=true

Flannel

Flannel (CoreOS → flannel-io) is the simplest CNI plugin: a single binary that assigns a subnet to each node and wraps packets in a configurable backend. It has NO network policy support — Calico is often chained with Flannel (Canal) for policy enforcement.

Backend Types

Backend	Mechanism	Overhead	Notes
vxlan (default)	VXLAN over UDP 8472	~50 bytes	Works in most cloud environments; DirectRouting bypasses VXLAN within same subnet
host-gw	Static kernel routes (no encapsulation)	~0	Requires all nodes in same L2 domain; no cloud NAT support
wireguard	WireGuard UDP 51820	~60 bytes	Encrypted; kernel 5.6+ or wireguard-go
udp (deprecated)	Userspace TUN + UDP	High	Legacy; extremely slow; only use for debugging
ipip	IP-in-IP	~20 bytes	Some cloud providers block protocol 4
alloc	IPAM only (no dataplane)	N/A	When another component handles the dataplane

Flannel Configuration

// /etc/kube-flannel/net-conf.json (ConfigMap kube-flannel-cfg)
{
  "Network": "10.244.0.0/16",
  "Backend": {
    "Type": "vxlan",
    "VNI": 1,
    "Port": 8472,
    "DirectRouting": true    // use host-gw within same subnet, VXLAN across
  }
}

No Network Policy

Flannel does not implement Kubernetes NetworkPolicy. Use Canal (Flannel + Calico policy engine) for NetworkPolicy support with Flannel-style routing, or migrate to Calico/Cilium entirely.

AWS VPC CNI

The AWS VPC CNI (amazon-vpc-cni-k8s) places pod IPs directly in the VPC subnet — pods get real VPC IP addresses, not overlay addresses. This enables native VPC routing, security groups per pod, and direct connectivity to AWS services without NAT.

ENI Secondary IP Model

Max Pods Calculation

# Formula: (ENIs × (IPs_per_ENI - 1)) + 2
# -1 for primary IP, +2 for kube-system pods

# m5.xlarge: 3 ENIs × 15 IPs/ENI
# Max pods = (3 × (15-1)) + 2 = 44

# With prefix delegation (/28): (3 × (15-1) × 16) + 2 = 674

# View instance type limits
aws ec2 describe-instance-types \
  --instance-types m5.xlarge \
  --query 'InstanceTypes[].NetworkInfo.[MaximumNetworkInterfaces,Ipv4AddressesPerInterface]'

# Check current pod count vs limit on node
kubectl describe node ip-10-0-1-5.ec2.internal | grep "Allocated resources" -A 5
kubectl get node ip-10-0-1-5.ec2.internal -o jsonpath='{.status.allocatable.pods}'

AWS VPC CNI Configuration

# Enable prefix delegation (dramatically increases pod density)
kubectl set env daemonset aws-node -n kube-system \
  ENABLE_PREFIX_DELEGATION=true \
  WARM_PREFIX_TARGET=1

# Security Groups for Pods
kubectl set env daemonset aws-node -n kube-system \
  ENABLE_POD_ENI=true

# Pod-level security group (per-pod ENI trunking)
# Uses ENI trunking — requires Nitro-based instances
# SecurityGroupPolicy CRD
cat <



IP ExhaustionAWS VPC CNI consumes real VPC subnet IPs. A /24 subnet (254 IPs) shared across 10 nodes with 15 IPs/ENI can exhaust quickly. Plan separate subnets for worker nodes and use prefix delegation or RFC 1918 large private CIDRs. Consider ENABLE_SUBNET_DISCOVERY=true for multi-subnet IPAM.


Azure CNI
Azure CNI has two modes: flat/overlay-free (pods get VNET IPs directly) and overlay (pods get private IPs from an overlay space, nodes NAT to VNET). The traditional flat mode has the same IP exhaustion concern as AWS VPC CNI.


  Mode Pod IPs Pros Cons
  
    Azure CNI (flat) Real VNET IPs from subnet Direct VNET routing; no NAT; Azure NSG enforcement Requires large subnets; 250 pods = 250 IPs consumed
    Azure CNI Overlay Private overlay CIDRs (e.g. 10.244.0.0/16) No subnet IP exhaustion; scale to 50,000 pods NAT to VNET; latency overhead; less direct NSG
    kubenet Private CIDRs + host NAT Simple; no VNET IP consumption No network policy; UDR required for cross-node; deprecated in AKS
  


# AKS with Azure CNI Overlay
az aks create \
  --network-plugin azure \
  --network-plugin-mode overlay \
  --pod-cidr 192.168.0.0/16 \
  --service-cidr 10.96.0.0/12 \
  --dns-service-ip 10.96.0.10

# AKS with Azure CNI + Cilium dataplane
az aks create \
  --network-plugin azure \
  --network-dataplane cilium \
  --network-policy cilium


Antrea
Antrea (VMware) uses Open vSwitch (OVS) as its dataplane. It supports Geneve encapsulation, eBPF (experimental), and integrates with NSX-T for enterprise policy management. Traceflow is Antrea's killer feature: trace a packet through the entire network pipeline.

Architecture

  
    antrea-agent (DaemonSet)
    Runs OVS daemon + agent per node. Manages OVS flows, IPAM, NetworkPolicy enforcement. Communicates with antrea-controller via gRPC over the Antrea network.
  
  
    antrea-controller (Deployment)
    Computes and distributes NetworkPolicy rules. Watches Kubernetes API server; distributes computed policies to agents. Single instance with leader election.
  


Traceflow
# Inject a synthetic probe packet and trace its path
apiVersion: ops.antrea.io/v1alpha1
kind: Traceflow
metadata:
  name: frontend-to-api
spec:
  source:
    namespace: production
    pod: frontend-abc-123
  destination:
    namespace: production
    pod: api-server-def-456
    port: 8080
  packet:
    ipHeader:
      protocol: 6    # TCP
    transportHeader:
      tcp:
        srcPort: 12345
        dstPort: 8080
        flags: 2     # SYN
# kubectl get traceflow frontend-to-api -o yaml
# Results show each hop: ingress/egress NetworkPolicy matches, OVS flow actions


CNI Selection Guide

  Requirement Recommendation Why
  
    Max performance, eBPF, no kube-proxy Cilium eBPF native; socket-level LB; O(1) service lookup; Hubble observability
    NetworkPolicy + BGP on-prem Calico Native BGP; Felix policy engine; WireGuard encryption; battle-tested at scale
    AWS EKS native VPC routing AWS VPC CNI ENI-native IPs; SG per pod; no overlay needed in AWS VPC
    AKS with Azure VNET integration Azure CNI Native VNET routing; NSG per pod; AKS managed
    Simple cluster, no policy needed Flannel Simple, low overhead, easy to debug; pair with Calico policy (Canal) if needed
    VMware / NSX-T integration Antrea OVS dataplane; NSX-T integration; Traceflow; enterprise support
    Multi-network (multiple interfaces) Multus Meta-CNI; chains multiple CNIs per pod; SR-IOV for NFV/telco
    Very large clusters (>5k nodes) Cilium eBPF maps O(1) scaling; Typha-like fan-out; Cluster Mesh for multi-cluster
    GKE Autopilot GKE Dataplane V2 (Cilium) Managed; deeply integrated; no user choice needed
  


Feature Matrix

  Feature Calico Cilium Flannel AWS VPC CNI Antrea
  
    NetworkPolicy (K8s) ✓ ✓ ✗ ✗* ✓
    Extended NetworkPolicy (L7) Limited ✓ (HTTP/DNS/Kafka) ✗ ✗ Limited
    eBPF dataplane ✓ opt-in ✓ default ✗ ✗ Experimental
    kube-proxy replacement ✓ eBPF mode ✓ default ✗ ✗ ✗
    Encryption WireGuard/IPSec WireGuard/IPSec WireGuard TLS app layer IPSec
    Observability Metrics Hubble (flows) Minimal VPC Flow Logs Traceflow
    BGP support ✓ native ✓ BGP Control Plane ✗ ✗ ✗
    IPv6 / dual-stack ✓ ✓ Limited ✓ ✓
    Windows nodes ✓ ✗ ✓ ✓ ✓
    Multi-cluster Federation Cluster Mesh ✗ ✗ Multi-cluster GW
  

* AWS VPC CNI has no native policy; use Calico for NetworkPolicy on EKS.


CNI Debugging
crictl — CRI-Level CNI Inspection
# List pod sandboxes and their network namespace status
crictl pods

# Inspect a specific pod sandbox
SANDBOX_ID=$(crictl pods --name nginx-abc --quiet)
crictl inspectp $SANDBOX_ID | jq '.status.network'

# Check CNI logs (containerd)
journalctl -u containerd --since "10 minutes ago" | grep -i cni

# Enable CNI debug logging
export CNI_DEBUG=1   # set in containerd config or systemd environment

Manual CNI Plugin Invocation
# Create a test network namespace
ip netns add test-ns

# Manually invoke CNI ADD (for debugging)
cat <<EOF | CNI_COMMAND=ADD CNI_CONTAINERID=test123 \
  CNI_NETNS=/var/run/netns/test-ns \
  CNI_IFNAME=eth0 \
  CNI_PATH=/opt/cni/bin \
  /opt/cni/bin/bridge
{
  "cniVersion": "1.0.0",
  "name": "test",
  "type": "bridge",
  "bridge": "cni-test0",
  "ipam": {
    "type": "host-local",
    "subnet": "172.19.0.0/24",
    "gateway": "172.19.0.1"
  }
}
EOF

# Verify result
ip netns exec test-ns ip addr show eth0
ip netns exec test-ns ip route

# Clean up
cat <<EOF | CNI_COMMAND=DEL CNI_CONTAINERID=test123 \
  CNI_NETNS=/var/run/netns/test-ns \
  CNI_IFNAME=eth0 \
  CNI_PATH=/opt/cni/bin \
  /opt/cni/bin/bridge
{ "cniVersion": "1.0.0", "name": "test", "type": "bridge" }
EOF
ip netns del test-ns

Calico-Specific Debugging
# Check Felix status
kubectl exec -n kube-system -it $(kubectl get pod -n kube-system -l k8s-app=calico-node -o jsonpath='{.items[0].metadata.name}') -c calico-node -- calico-node -felix-live
kubectl exec -n kube-system calico-node-xyz -c calico-node -- calico-node -bird-live

# calicoctl commands (install separately)
calicoctl node status        # Felix + BIRD status
calicoctl get ippool -o wide
calicoctl get ipamblock      # per-node blocks
calicoctl ipam show --show-blocks
calicoctl ipam check         # detect IPAM inconsistencies

# Felix logs (verbose)
kubectl logs -n kube-system calico-node-xyz -c calico-node --since=5m | grep -E "ERROR|WARN|policy"

# Datastore connectivity
DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig calicoctl get nodes

Cilium-Specific Debugging
# Cilium status
cilium status --verbose

# Endpoint (pod) status
cilium endpoint list
cilium endpoint get $(cilium endpoint list | grep 10.244.1.15 | awk '{print $1}')

# Policy verdict for a specific connection
cilium policy trace --src-k8s-pod default/frontend-abc --dst-k8s-pod default/api-server-def --dport 8080 --protocol tcp

# BPF map inspection
cilium bpf lb list           # service load balancer map
cilium bpf ct list global    # conntrack entries
cilium bpf tunnel list       # VXLAN tunnel endpoints
cilium bpf nat list          # NAT map

# Connectivity test
cilium connectivity test --test pod-to-pod,pod-to-service

# Hubble (if enabled)
cilium hubble port-forward &
hubble observe --namespace default --verdict DROPPED --last 50


Troubleshooting Runbooks


Runbook 1: Pod stuck in ContainerCreating — CNI failure
# 1. Get pod events
kubectl describe pod <name> -n <ns>
# Look for: "Failed to create pod sandbox" or "network plugin is not ready"

# 2. Check kubelet logs on the node
journalctl -u kubelet --since "5 minutes ago" | grep -E "cni|network|sandbox"

# 3. Verify CNI binary exists and is executable
ls -la /opt/cni/bin/
ls -la /etc/cni/net.d/

# 4. Check containerd CNI config
cat /etc/containerd/config.toml | grep -A5 cni
systemctl status containerd

# 5. If Calico: check calico-node pod on that node
NODE=$(kubectl get pod <name> -n <ns> -o jsonpath='{.spec.nodeName}')
kubectl get pod -n kube-system -l k8s-app=calico-node --field-selector spec.nodeName=$NODE
kubectl logs -n kube-system <calico-node-pod> -c calico-node --tail=50

# 6. If Cilium: check cilium agent on that node
kubectl exec -n kube-system cilium-xyz -- cilium status



Runbook 2: IP address exhaustion (IPAM out of IPs)
# Calico: check IPAM utilization
calicoctl ipam show --show-blocks
calicoctl ipam check
# Look for "blocks with no matching node" — orphaned blocks

# Release leaked IPs
calicoctl ipam release --ip=192.168.5.20   # release specific IP

# Calico: increase block size (destructive — restart required)
# Edit IPPool blockSize from /26 (64 IPs) to /24 (256 IPs)
# Warning: requires full cluster restart to take effect

# AWS VPC CNI: check ENI limits
kubectl describe node <node> | grep "vpc.amazonaws.com"
kubectl get node <node> -o jsonpath='{.metadata.annotations}' | jq

# Enable prefix delegation on AWS
kubectl set env ds aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true

# Cilium: check pool utilization
kubectl get ciliumnodes.cilium.io -o json | jq '.items[] | {name:.metadata.name, used:.status.ipam.used|length, available:.status.ipam.available|length}'



Runbook 3: Cross-node pod connectivity failure
# 1. Confirm pod IPs and node assignment
kubectl get pods -o wide -n default

# 2. Test from netshoot debug pod on each node
kubectl debug node/worker-1 -it --image=nicolaka/netshoot -- bash
# Inside: ping <pod-ip-on-worker-2>
# Inside: traceroute <pod-ip-on-worker-2>

# 3. Check VXLAN tunnel (Flannel/Calico VXLAN mode)
ip link show flannel.1    # or vxlan.calico
bridge fdb show dev flannel.1 | grep <destination-node-mac>
ip route | grep <pod-cidr-of-remote-node>

# 4. Check BGP routes (Calico BGP mode)
calicoctl node status
# Should show "Established" for all peer connections
ip route | grep blackhole   # blackhole routes = IPAM reserved

# 5. Check Cilium tunnel endpoints
cilium bpf tunnel list
# Verify destination node IP is present

# 6. Check MTU mismatch (common overlay issue)
kubectl exec -it netshoot -- ping -M do -s 8951 <remote-pod-ip>
# If this fails but small pings work → MTU problem
# Calico: set MTU in CNI config / Felix MTUIfacePattern
# Flannel: set backend.MTU in net-conf.json



Runbook 4: CNI plugin upgrade procedure
# Calico upgrade (Operator-managed)
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/operator.crds.yaml
kubectl patch installation default --type=merge -p '{"spec":{"calicoNetwork":{"linuxDataplane":"Iptables"},"registry":"docker.io/calico","version":"v3.27.0"}}'
# Watch rollout
kubectl rollout status ds/calico-node -n kube-system

# Cilium upgrade (Helm)
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --version 1.15.0 \
  --reuse-values \
  --set upgradeCompatibility=1.14
# Monitor
cilium status --wait

# Critical: never skip major versions
# Check upgrade notes: https://docs.cilium.io/en/stable/operations/upgrade/



Key CNI Metrics

  Metric Source Alert Threshold
  
    felix_ipset_errors_total Calico Felix >0 sustained
    felix_route_table_list_seconds_count Calico Felix p99 > 1s
    ipam_ips_in_use / ipam_ips_total Calico ratio > 0.85 (85% exhaustion)
    cilium_endpoint_state{state="not-ready"} Cilium >0
    cilium_drop_count_total Cilium spike > baseline × 3
    cilium_bpf_map_ops_total{outcome="fail"} Cilium >0
    awscni_assigned_ip_addresses / awscni_total_ip_addresses AWS VPC CNI ratio > 0.90
    network_plugin_operations_latency_microseconds kubelet p99 > 5s
  



Production Best Practices

  
    IPAM Planning
    
      Size pod CIDR for 2× peak capacity
      For Calico: /26 blocks (default) → 64 IPs/node; use /24 for dense nodes
      Monitor IPAM utilization at 70% → start planning expansion
      Never overlap pod CIDR with service CIDR or node CIDR
      Reserve separate subnets in cloud VPCs for nodes
    
  
  
    MTU Configuration
    
      Overlay (VXLAN): set pod MTU = NIC MTU − 50 (typically 1450)
      IPIP: pod MTU = NIC MTU − 20 (typically 1460)
      No overlay: pod MTU = NIC MTU (1500 or 9000 with jumbo frames)
      Test with: ping -M do -s 1472 (1472 + 28 IP/ICMP = 1500)
      Jumbo frames: set NIC MTU 9000 + Calico mtu: 8950
    
  
  
    Upgrade Safety
    
      Test CNI upgrades in staging first
      Use PodDisruptionBudgets on critical workloads
      Calico: rolling upgrade via DaemonSet maxUnavailable=1
      Cilium: use --set upgradeCompatibility flag
      Never upgrade CNI and Kubernetes simultaneously
    
  
  
    Security
    
      Enable NetworkPolicy deny-by-default in all namespaces
      Use Calico GlobalNetworkPolicy for cluster-wide baseline rules
      Enable WireGuard encryption for inter-node pod traffic
      Restrict CNI DaemonSet to only needed host mounts
      Enable Hubble/Traceflow for anomaly detection
    
  
  
    Observability
    
      Scrape Felix / cilium-agent metrics in Prometheus
      Alert on IPAM exhaustion at 80% threshold
      Use Hubble UI to visualize network flows in staging
      Enable Cilium drop metrics to catch NetworkPolicy rejections
      Monitor network_plugin_operations_latency_microseconds
    
  
  
    Performance
    
      Prefer eBPF (Cilium or Calico eBPF) for >1k services or >500 nodes
      Enable Typha for Calico at >100 nodes
      Use BGP underlay when possible (zero encapsulation overhead)
      Jumbo frames (9000 MTU) for storage-intensive pods
      Pin NUMA-sensitive workloads; check CNI NUMA awareness
    
  



  ← Pod Networking
  Kube-Proxy Internals →

Mode	Pod IPs	Pros	Cons
Azure CNI (flat)	Real VNET IPs from subnet	Direct VNET routing; no NAT; Azure NSG enforcement	Requires large subnets; 250 pods = 250 IPs consumed
Azure CNI Overlay	Private overlay CIDRs (e.g. 10.244.0.0/16)	No subnet IP exhaustion; scale to 50,000 pods	NAT to VNET; latency overhead; less direct NSG
kubenet	Private CIDRs + host NAT	Simple; no VNET IP consumption	No network policy; UDR required for cross-node; deprecated in AKS

Requirement	Recommendation	Why
Max performance, eBPF, no kube-proxy	Cilium	eBPF native; socket-level LB; O(1) service lookup; Hubble observability
NetworkPolicy + BGP on-prem	Calico	Native BGP; Felix policy engine; WireGuard encryption; battle-tested at scale
AWS EKS native VPC routing	AWS VPC CNI	ENI-native IPs; SG per pod; no overlay needed in AWS VPC
AKS with Azure VNET integration	Azure CNI	Native VNET routing; NSG per pod; AKS managed
Simple cluster, no policy needed	Flannel	Simple, low overhead, easy to debug; pair with Calico policy (Canal) if needed
VMware / NSX-T integration	Antrea	OVS dataplane; NSX-T integration; Traceflow; enterprise support
Multi-network (multiple interfaces)	Multus	Meta-CNI; chains multiple CNIs per pod; SR-IOV for NFV/telco
Very large clusters (>5k nodes)	Cilium	eBPF maps O(1) scaling; Typha-like fan-out; Cluster Mesh for multi-cluster
GKE Autopilot	GKE Dataplane V2 (Cilium)	Managed; deeply integrated; no user choice needed

Feature	Calico	Cilium	Flannel	AWS VPC CNI	Antrea
NetworkPolicy (K8s)	✓	✓	✗	✗*	✓
Extended NetworkPolicy (L7)	Limited	✓ (HTTP/DNS/Kafka)	✗	✗	Limited
eBPF dataplane	✓ opt-in	✓ default	✗	✗	Experimental
kube-proxy replacement	✓ eBPF mode	✓ default	✗	✗	✗
Encryption	WireGuard/IPSec	WireGuard/IPSec	WireGuard	TLS app layer	IPSec
Observability	Metrics	Hubble (flows)	Minimal	VPC Flow Logs	Traceflow
BGP support	✓ native	✓ BGP Control Plane	✗	✗	✗
IPv6 / dual-stack	✓	✓	Limited	✓	✓
Windows nodes	✓	✗	✓	✓	✓
Multi-cluster	Federation	Cluster Mesh	✗	✗	Multi-cluster GW

Metric	Source	Alert Threshold
`felix_ipset_errors_total`	Calico Felix	>0 sustained
`felix_route_table_list_seconds_count`	Calico Felix	p99 > 1s
`ipam_ips_in_use` / `ipam_ips_total`	Calico	ratio > 0.85 (85% exhaustion)
`cilium_endpoint_state{state="not-ready"}`	Cilium	>0
`cilium_drop_count_total`	Cilium	spike > baseline × 3
`cilium_bpf_map_ops_total{outcome="fail"}`	Cilium	>0
`awscni_assigned_ip_addresses` / `awscni_total_ip_addresses`	AWS VPC CNI	ratio > 0.90
`network_plugin_operations_latency_microseconds`	kubelet	p99 > 5s