DNS & Service Discovery

CoreDNS is the cluster DNS server for all Kubernetes clusters since 1.13 (replacing kube-dns). It resolves service names, pod hostnames, and external FQDNs for every pod in the cluster. This page covers CoreDNS architecture, the Corefile plugin pipeline, every DNS record type Kubernetes creates, ndots and search-domain resolution mechanics, autopath, stub zones, forward zones, negative caching, DNS-based service discovery patterns, NodeLocal DNSCache, and production tuning.

CoreDNS Architecture

CoreDNS runs as a Deployment (typically 2 replicas) in the kube-system namespace. The Service kube-dns holds the stable ClusterIP (e.g. 10.96.0.10) written into every pod's /etc/resolv.conf at pod creation time by kubelet.

CoreDNS Process Model

Single binary, plugin chain. Each DNS query traverses the plugin chain in order. Plugins can handle, pass-through, or modify the query. No forking — goroutine per query.

kubernetes plugin

Watches the API server for Services and Pods via a shared informer. Answers in-cluster queries from an in-memory cache — no API server hit per DNS query. Serves cluster.local zone.

forward plugin

Forwards unresolved queries upstream. Default: node's /etc/resolv.conf (inherits node DNS). Can be overridden to point at specific resolvers (8.8.8.8, corporate DNS, etc.).

The Corefile — Plugin Pipeline

The Corefile is CoreDNS's configuration file, mounted from a ConfigMap. Each server block defines a zone and an ordered list of plugins:

# kubectl get cm -n kube-system coredns -o jsonpath='{.data.Corefile}'

.:53 {
    errors                    # log errors to stdout
    health {                  # /health endpoint (HTTP :8080)
        lameduck 5s           # keep serving 5s after unhealthy for graceful shutdown
    }
    ready                     # /ready endpoint (HTTP :8181)
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure         # create A records for pods (insecure=no verification)
        fallthrough in-addr.arpa ip6.arpa   # pass PTR queries not found to next plugin
        ttl 30                # TTL for synthesized records (default 5s)
    }
    prometheus :9153          # expose metrics at :9153/metrics
    forward . /etc/resolv.conf {   # forward all other queries to node DNS
        max_concurrent 1000
    }
    cache 30                  # positive TTL 30s, negative TTL 30s/6 = 5s
    loop                      # detect forwarding loops
    reload                    # hot-reload Corefile without restart (every 2s check)
    loadbalance               # randomize A/AAAA/MX record order (poor-man's LB)
}

CoreDNS Plugin Reference

Plugin	Purpose	Key Config Options
`kubernetes`	Serve in-cluster DNS (Services, Pods, StatefulSets)	`pods insecure\|verified\|disabled`, `ttl`, `endpoint_pod_names`, `namespaces`, `fallthrough`
`forward`	Proxy queries upstream	`. 8.8.8.8 8.8.4.4`, `max_concurrent`, `health_check`, `prefer_udp`, `tls` (DoT)
`cache`	In-memory DNS cache	`cache 30` (TTL), `denial 5` (NXDOMAIN TTL), `success 9984 30` (capacity TTL)
`errors`	Log SERVFAIL/REFUSED to stdout	`consolidate 5m ".*"` (rate-limit identical errors)
`health`	HTTP liveness probe at `:8080/health`	`lameduck Xs` for graceful shutdown
`ready`	HTTP readiness probe at `:8181/ready`	No options; returns 200 when all plugins are ready
`prometheus`	Expose metrics at `:9153/metrics`	Port configurable
`rewrite`	Rewrite query names / types	`rewrite name suffix .legacy.svc.cluster.local .svc.cluster.local`
`hosts`	Serve records from a hosts-file	`hosts /etc/hosts { fallthrough }`
`file`	Serve a zone from a RFC 1035 zone file	`file db.example.com example.com`
`etcd`	SkyDNS-compatible etcd backend (legacy)	Mostly replaced by kubernetes plugin
`autopath`	Resolve short names server-side (reduces 5x ndots:5 to 1 query)	`autopath @kubernetes`
`loop`	Detect and abort forwarding loops (CoreDNS will crash-loop)	No options
`reload`	Watch Corefile for changes and reload without restart	`reload 10s`
`loadbalance`	Rotate A/AAAA/MX records for naive load balancing	`loadbalance round_robin`
`log`	Log all DNS queries (verbose; use sparingly in production)	`log . {class denial error}`

DNS Records Kubernetes Creates

Service DNS Records

Record Type	Name Pattern	Resolves To	Condition
A / AAAA	`<svc>.<ns>.svc.cluster.local`	ClusterIP (IPv4 or IPv6)	All services with ClusterIP
A / AAAA	`<svc>.<ns>.svc.cluster.local`	All ready pod IPs (multiple A records)	Headless service (`clusterIP: None`)
SRV	`_<port-name>._<proto>.<svc>.<ns>.svc.cluster.local`	Port number + target A record	Named ports only
CNAME	`<svc>.<ns>.svc.cluster.local`	External hostname	ExternalName services
PTR	`<reversed-ip>.in-addr.arpa`	Service FQDN	ClusterIP services (reverse DNS)

Pod DNS Records

Record Type	Name Pattern	Resolves To	Notes
A	`<ip-dashed>.<ns>.pod.cluster.local`	Pod IP	IP with dots replaced by dashes: `10-244-1-5.default.pod.cluster.local`
A	`<hostname>.<subdomain>.<ns>.svc.cluster.local`	Pod IP	Only when pod has `hostname` + `subdomain` and a matching headless service

StatefulSet Pod Records

StatefulSets combine hostname (pod ordinal name) with subdomain (headless service name) to create stable per-pod DNS entries:

apiVersion: v1
kind: Service
metadata:
  name: cassandra           # headless service — must match subdomain
spec:
  clusterIP: None
  selector:
    app: cassandra
  ports:
  - port: 9042
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
spec:
  serviceName: cassandra    # sets pod subdomain
  replicas: 3
  # Pods: cassandra-0, cassandra-1, cassandra-2
  # DNS: cassandra-0.cassandra.default.svc.cluster.local → pod-0 IP
  #      cassandra-1.cassandra.default.svc.cluster.local → pod-1 IP
  #      cassandra-2.cassandra.default.svc.cluster.local → pod-2 IP

Full DNS Name Resolution Table

Query (from pod in `default` namespace)	Resolves To	Search domain hit
`nginx`	nginx ClusterIP (same namespace)	`nginx.default.svc.cluster.local`
`nginx.production`	nginx ClusterIP in production ns	`nginx.production.svc.cluster.local`
`nginx.production.svc`	nginx ClusterIP in production ns	`nginx.production.svc.cluster.local`
`nginx.production.svc.cluster.local`	nginx ClusterIP (FQDN, no search)	Direct (no search needed, has dot count ≥ ndots)
`cassandra-0.cassandra`	cassandra-0 pod IP	`cassandra-0.cassandra.default.svc.cluster.local`
`google.com`	Forwarded to upstream → 142.250.x.x	All 5 search domains tried first if ndots:5
`google.com.` (trailing dot)	Forwarded immediately (FQDN)	No search domain expansion

Pod /etc/resolv.conf and ndots

# /etc/resolv.conf inside a pod in 'default' namespace
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

ndots:5 Resolution Mechanics

The ndots:5 option means: if the query name has fewer than 5 dots, try each search domain first before treating it as an absolute name. This causes up to 5 DNS queries for a simple external hostname like google.com:

The ndots:5 performance tax

A pod querying google.com (1 dot) fires 5 queries before getting an answer:
1. google.com.default.svc.cluster.local → NXDOMAIN
2. google.com.svc.cluster.local → NXDOMAIN
3. google.com.cluster.local → NXDOMAIN
4. google.com → actual answer
(5th search domain only tried on some resolvers)
Each NXDOMAIN adds latency. In high-throughput services making frequent external DNS lookups, this multiplies CoreDNS query load by 4×.

Mitigating ndots:5 Overhead

# Option 1: use trailing dot for FQDNs in application code
# app connects to "stripe.com." (trailing dot) → no search expansion → 1 query

# Option 2: lower ndots per pod via dnsConfig
apiVersion: v1
kind: Pod
spec:
  dnsConfig:
    options:
    - name: ndots
      value: "2"          # only expand if fewer than 2 dots (affects in-cluster too)
    - name: timeout
      value: "2"
    - name: attempts
      value: "3"
  containers:
  - name: app
    image: myapp

# Option 3: enable autopath plugin in CoreDNS (server-side search expansion)
# CoreDNS expands the search domains itself — client sends 1 query, CoreDNS tries all
# Add to Corefile kubernetes block:
# autopath @kubernetes

# Option 4: NodeLocal DNSCache (caches NXDOMAIN responses too)
# Reduces repeated NXDOMAIN round trips to CoreDNS pods

Pod dnsPolicy

dnsPolicy	resolv.conf behavior	Use Case
`ClusterFirst` (default)	CoreDNS IP as nameserver; cluster search domains	All normal pods; in-cluster service resolution
`ClusterFirstWithHostNet`	Same as ClusterFirst but for `hostNetwork: true` pods	DaemonSet pods needing both host network + cluster DNS
`Default`	Inherits node's `/etc/resolv.conf` exactly	Pods that must use node's DNS (corporate resolver, etc.)
`None`	Completely custom; must provide `dnsConfig`	Full control; custom nameservers + search domains

CoreDNS Configuration Patterns

Stub Zones (Split-Horizon DNS)

# Route corporate.internal queries to internal DNS server
# kubectl edit cm -n kube-system coredns

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health { lameduck 5s }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
            ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
            max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

    # Stub zone: forward corporate.internal to internal resolver
    corporate.internal:53 {
        errors
        cache 30
        forward . 10.0.0.2 10.0.0.3 {
            prefer_udp
        }
    }

    # Stub zone for on-prem services
    on-prem.example.com:53 {
        errors
        cache 60
        forward . 192.168.1.53
    }

Custom In-Cluster Records

# Serve custom A records for hostnames not backed by Services
# Useful for external endpoints that pods reference by name

data:
  Corefile: |
    .:53 {
        # ... standard config ...
    }
    # Custom zone with static records
    custom.cluster.local:53 {
        errors
        hosts /etc/coredns/custom-hosts {
            10.0.100.50  db-primary.custom.cluster.local
            10.0.100.51  db-replica.custom.cluster.local
            fallthrough
        }
    }
  # Mount custom-hosts from a separate ConfigMap key
  custom-hosts: |
    10.0.100.50  db-primary.custom.cluster.local
    10.0.100.51  db-replica.custom.cluster.local

DNS-over-TLS Upstream

.:53 {
    forward . tls://8.8.8.8 tls://8.8.4.4 {
        tls_servername dns.google
        health_check 5s
    }
    cache 30
}

Negative Caching

# cache plugin controls positive and negative TTLs
cache {
    success 9984 30 0     # capacity=9984, TTL=30s, min-TTL=0
    denial  9984 5  0     # NXDOMAIN capacity=9984, TTL=5s
    # Shorter NXDOMAIN TTL (5s) means bad lookups resolve faster
    # but increases CoreDNS query load for persistent NXDOMAIN patterns
    prefetch 10 1m 10%    # prefetch popular records when TTL < 10% of original
}

NodeLocal DNSCache

NodeLocal DNSCache (GA 1.18) runs a DNS caching agent (node-local-dns DaemonSet) on every node using a link-local IP (169.254.20.10). Pods are reconfigured to use this local cache instead of the CoreDNS ClusterIP, eliminating the UDP conntrack race condition and reducing latency for cached queries.

Why NodeLocal DNSCache?

Problem: UDP Conntrack Race (without NodeLocal)

Each node runs many pods. UDP DNS queries to CoreDNS ClusterIP go through conntrack DNAT. Under high query rates, multiple simultaneous queries from the same source port can collide in the conntrack table — resulting in 5-second DNS timeouts on some queries. This is the infamous "5s DNS timeout" bug endemic to iptables-mode Kubernetes.

Solution: NodeLocal DNSCache

node-local-dns listens on 169.254.20.10:53 (link-local, no conntrack needed — no DNAT). Pods send queries to this local address. Cache hits return immediately (sub-millisecond). Cache misses are forwarded to CoreDNS pods over TCP (not UDP), avoiding the conntrack race entirely.

NodeLocal DNSCache Architecture

Installing NodeLocal DNSCache

# Download and apply the NodeLocal DNSCache DaemonSet manifest
# Replace __PILLAR__DNS__SERVER__ with CoreDNS ClusterIP
# Replace __PILLAR__LOCAL__DNS__ with 169.254.20.10
# Replace __PILLAR__DNS__DOMAIN__ with cluster.local

COREDNS_IP=$(kubectl get svc -n kube-system kube-dns -o jsonpath='{.spec.clusterIP}')

curl -sL "https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml" | \
  sed "s/__PILLAR__DNS__SERVER__/${COREDNS_IP}/g" | \
  sed "s/__PILLAR__LOCAL__DNS__/169.254.20.10/g" | \
  sed "s/__PILLAR__DNS__DOMAIN__/cluster.local/g" | \
  kubectl apply -f -

# Kubelet must be reconfigured to use 169.254.20.10 as clusterDNS
# In KubeletConfiguration:
# clusterDNS:
# - 169.254.20.10

# Or set at cluster creation:
# kubeadm init --config kubeadm.yaml
# KubeletConfiguration.clusterDNS: ["169.254.20.10"]

NodeLocal DNSCache Corefile

cluster.local:53 {
    errors
    cache {
        success 9984 30
        denial 9984 5
    }
    reload
    loop
    bind 169.254.20.10        # bind to link-local address
    forward . __PILLAR__DNS__SERVER__ {
        force_tcp             # forward to CoreDNS over TCP (avoids conntrack)
    }
    prometheus :9253
    health 169.254.20.10:8080
    }
.:53 {
    errors
    cache 30
    reload
    loop
    bind 169.254.20.10
    forward . __PILLAR__DNS__SERVER__ {
        force_tcp
    }
    prometheus :9253
}

ExternalDNS — Automatic External DNS

ExternalDNS is a Kubernetes add-on that synchronizes Service and Ingress resources with external DNS providers (Route53, Cloud DNS, Cloudflare, etc.). It watches Services of type LoadBalancer and Ingress objects and creates/updates DNS records automatically.

# Annotate a LoadBalancer Service to create a Route53 record
apiVersion: v1
kind: Service
metadata:
  name: web
  annotations:
    external-dns.alpha.kubernetes.io/hostname: api.example.com
    external-dns.alpha.kubernetes.io/ttl: "300"
spec:
  type: LoadBalancer
  selector:
    app: web
  ports:
  - port: 443

---
# ExternalDNS Deployment (Route53 example)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: external-dns
  namespace: kube-system
spec:
  template:
    spec:
      serviceAccountName: external-dns
      containers:
      - name: external-dns
        image: registry.k8s.io/external-dns/external-dns:v0.14.2
        args:
        - --source=service
        - --source=ingress
        - --domain-filter=example.com
        - --provider=aws
        - --aws-zone-type=public
        - --registry=txt
        - --txt-owner-id=my-cluster    # prevents multiple clusters clobbering records

Service Discovery Patterns

Environment Variable Discovery (Legacy)

Before DNS was the primary mechanism, Kubernetes injected service information as environment variables into pods. This is still present but order-dependent (services must exist before pods) and generally discouraged:

# In a pod, if service 'nginx' exists in the same namespace:
env | grep NGINX
# NGINX_SERVICE_HOST=10.96.14.3
# NGINX_SERVICE_PORT=80
# NGINX_PORT=tcp://10.96.14.3:80
# NGINX_PORT_80_TCP=tcp://10.96.14.3:80
# NGINX_PORT_80_TCP_PROTO=tcp
# NGINX_PORT_80_TCP_PORT=80
# NGINX_PORT_80_TCP_ADDR=10.96.14.3

# Limitation: services created AFTER the pod don't get env vars
# Use DNS instead — DNS is dynamic and doesn't have this ordering issue

Headless Service Discovery

Headless services are the foundation for stateful workload discovery. DNS returns all ready pod IPs, enabling clients to implement their own load balancing or connect to specific replicas:

# From inside a pod, querying a headless service returns all pod IPs
nslookup cassandra.default.svc.cluster.local
# Server:         10.96.0.10
# Address:        10.96.0.10#53
# Name:   cassandra.default.svc.cluster.local
# Address: 10.244.3.9
# Address: 10.244.1.5
# Address: 10.244.2.7

# Round-robin across all IPs (client-side load balancing)

# Individual pod DNS (StatefulSet)
nslookup cassandra-0.cassandra.default.svc.cluster.local
# Address: 10.244.1.5   (always same pod — stable address)

# SRV record (port discovery)
nslookup -type=SRV _cql._tcp.cassandra.default.svc.cluster.local
# _cql._tcp.cassandra.default.svc.cluster.local service = 0 50 9042 cassandra-0.cassandra.default.svc.cluster.local.
# _cql._tcp.cassandra.default.svc.cluster.local service = 0 50 9042 cassandra-1.cassandra.default.svc.cluster.local.

Multi-Cluster DNS (Clusterset)

The Multicluster Services API (KEP-1645) defines ServiceImport and ServiceExport objects. CoreDNS with the multicluster plugin resolves cross-cluster service names:

# Multi-cluster DNS name format:
# ..svc..local

# Export a service from cluster-1
kubectl apply -f - <<EOF
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: nginx
  namespace: production
EOF

# In cluster-2, the service is discoverable as:
# nginx.production.svc.clusterset.local

CoreDNS Scaling

HPA for CoreDNS

# Default CoreDNS deployment has 2 replicas
# For large clusters, scale CoreDNS with HPA or static replicas

# Option 1: Static scale (simple)
kubectl scale deployment coredns -n kube-system --replicas=4

# Option 2: proportional-cluster-autoscaler (cluster-proportional-autoscaler)
# Scales CoreDNS based on node count
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: autoscaler
        image: registry.k8s.io/cpa/cluster-proportional-autoscaler:v1.8.8
        command:
        - /cluster-proportional-autoscaler
        - --namespace=kube-system
        - --configmap=dns-autoscaler
        - --target=Deployment/coredns
        - --default-params={"linear":{"coresPerReplica":256,"nodesPerReplica":16,"min":2,"max":20}}
        - --logtostderr=true
        - --v=2
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: dns-autoscaler
  namespace: kube-system
data:
  linear: |
    {
      "coresPerReplica": 256,
      "nodesPerReplica": 16,
      "min": 2,
      "max": 20,
      "preventSinglePointOfFailure": true
    }

CoreDNS Resource Tuning

# CoreDNS pod resource requests/limits
# Defaults are too conservative for large clusters
resources:
  requests:
    cpu: 100m
    memory: 70Mi
  limits:
    cpu: 500m          # increase for high-QPS clusters
    memory: 170Mi      # increase for large service count

# CoreDNS cache tuning for large clusters
cache {
    success 9984 60    # 9984 entries, 60s TTL (longer reduces CoreDNS load)
    denial  9984 5     # keep NXDOMAIN TTL short
    prefetch 10 1m 10% # prefetch entries when 10% of TTL remains
    serve_stale 15s    # serve stale cache entries during upstream outages (up to 15s)
}

DNS Debugging

# 1. Test basic DNS resolution from a pod
kubectl run dns-test --image=nicolaka/netshoot --rm -it -- bash
  # Inside:
  nslookup kubernetes.default.svc.cluster.local    # should return ClusterIP
  nslookup kubernetes                              # should expand via search domain
  dig @10.96.0.10 kubernetes.default.svc.cluster.local   # explicit query to CoreDNS
  dig kubernetes.default.svc.cluster.local +search +ndots=5

# 2. Check CoreDNS pod logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50
# Enable query logging (verbose — use only temporarily):
# Add to Corefile: log .

# 3. Check CoreDNS metrics
kubectl exec -n kube-system coredns-abc123 -- \
  curl -s localhost:9153/metrics | grep -E "coredns_dns_request|coredns_cache"

# 4. CoreDNS health/ready check
kubectl exec -n kube-system coredns-abc123 -- curl -s localhost:8080/health
kubectl exec -n kube-system coredns-abc123 -- curl -s localhost:8181/ready

# 5. Test external resolution
kubectl run dns-test --image=nicolaka/netshoot --rm -it -- \
  dig google.com @10.96.0.10 +short

# 6. Check pod's resolv.conf
kubectl exec my-pod -- cat /etc/resolv.conf

# 7. Measure DNS latency
kubectl run dns-bench --image=nicolaka/netshoot --rm -it -- \
  bash -c 'for i in $(seq 50); do time nslookup nginx.default; done 2>&1 | grep real'

# 8. Check if NodeLocal DNSCache is active
kubectl get pods -n kube-system -l k8s-app=node-local-dns -o wide
kubectl exec my-pod -- cat /etc/resolv.conf
# Should show 169.254.20.10 as nameserver when NodeLocal DNSCache is active

Key CoreDNS Metrics

Metric	Type	Alert Threshold
`coredns_dns_requests_total`	counter	Rate growth > 3× baseline
`coredns_dns_responses_total{rcode="SERVFAIL"}`	counter	> 1% of total responses
`coredns_dns_responses_total{rcode="NXDOMAIN"}`	counter	High rate → ndots:5 waste or misconfigured apps
`coredns_dns_request_duration_seconds`	histogram	p99 > 500ms → CoreDNS overloaded or upstream slow
`coredns_cache_hits_total`	counter	ratio vs requests; < 50% cache hit → increase cache TTL
`coredns_cache_misses_total`	counter	Informational; high = low cache hit rate
`coredns_panics_total`	counter	> 0 → immediate investigation
`coredns_kubernetes_dns_programming_duration_seconds`	histogram	p99 > 5s → API server pressure
`process_open_fds`	gauge	> 80% of ulimit → file descriptor exhaustion

Alerting Rules

groups:
- name: coredns
  rules:
  - alert: CoreDNSHighErrorRate
    expr: |
      rate(coredns_dns_responses_total{rcode="SERVFAIL"}[5m])
      / rate(coredns_dns_requests_total[5m]) > 0.01
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "CoreDNS SERVFAIL rate >1% — DNS resolution failures in cluster"

  - alert: CoreDNSHighLatency
    expr: histogram_quantile(0.99, rate(coredns_dns_request_duration_seconds_bucket[5m])) > 0.5
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "CoreDNS p99 latency >500ms — DNS slow for cluster workloads"

  - alert: CoreDNSPanic
    expr: increase(coredns_panics_total[5m]) > 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "CoreDNS panic detected — immediate investigation required"

  - alert: CoreDNSLowCacheHitRate
    expr: |
      rate(coredns_cache_hits_total[10m])
      / (rate(coredns_cache_hits_total[10m]) + rate(coredns_cache_misses_total[10m])) < 0.5
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: "CoreDNS cache hit rate <50% — consider increasing cache TTL"

Troubleshooting Runbooks

Runbook 1: Pod cannot resolve service names (DNS broken)

# 1. Verify CoreDNS pods are running
kubectl get pods -n kube-system -l k8s-app=kube-dns

# 2. Test DNS from a debug pod
kubectl run dns-test --image=nicolaka/netshoot --rm -it -- nslookup kubernetes
# Expected: Server 10.96.0.10, Address: 10.96.0.1

# 3. If nslookup fails — check if CoreDNS ClusterIP is reachable
kubectl run dns-test --image=nicolaka/netshoot --rm -it -- \
  nc -vz 10.96.0.10 53
# If timeout → kube-proxy issue (check 03-kube-proxy-internals.html)

# 4. Check CoreDNS logs for errors
kubectl logs -n kube-system -l k8s-app=kube-dns --since=5m | grep -i "error\|refused\|failed"

# 5. Verify Corefile is syntactically valid
kubectl get cm -n kube-system coredns -o jsonpath='{.data.Corefile}' | \
  docker run --rm -i coredns/coredns -conf /dev/stdin -validate

# 6. Check CoreDNS resource limits (OOMKilled?)
kubectl describe pod -n kube-system -l k8s-app=kube-dns | grep -A5 "Last State\|OOM\|Limits"

# 7. Restart CoreDNS (last resort)
kubectl rollout restart deployment -n kube-system coredns

Runbook 2: DNS lookup slow (intermittent 5s timeouts)

# Root cause: UDP conntrack collision (classic Kubernetes DNS bug)
# Fix: deploy NodeLocal DNSCache (see above)

# Diagnose: check if timeouts correlate with conntrack drops
kubectl debug node/worker-1 -it --image=ubuntu -- bash
  cat /proc/net/stat/nf_conntrack | head -2    # look for insert_failed counter
  # insert_failed > 0 and growing = conntrack collision confirmed

# Workaround (immediate): increase conntrack table size
sysctl -w net.netfilter.nf_conntrack_max=1048576
echo "net.netfilter.nf_conntrack_max=1048576" >> /etc/sysctl.d/99-coredns.conf

# Long-term fix 1: NodeLocal DNSCache (preferred)
# Long-term fix 2: switch CoreDNS to TCP (adds latency but no collisions)
# In Corefile: forward . /etc/resolv.conf { prefer_udp false }

# Also check: are pods using ndots:5 for external lookups?
kubectl exec my-pod -- cat /etc/resolv.conf
# If yes, add ndots:2 via dnsConfig or deploy autopath plugin

Runbook 3: External DNS not resolving (SERVFAIL for external names)

# 1. Test directly
kubectl run dns-test --image=nicolaka/netshoot --rm -it -- \
  dig google.com @10.96.0.10

# 2. Check CoreDNS forward plugin config
kubectl get cm -n kube-system coredns -o jsonpath='{.data.Corefile}' | grep -A3 forward

# 3. Verify CoreDNS can reach upstream from its node
kubectl debug node/$(kubectl get pod -n kube-system -l k8s-app=kube-dns -o jsonpath='{.items[0].spec.nodeName}') \
  -it --image=ubuntu -- dig google.com @8.8.8.8

# 4. If using /etc/resolv.conf as upstream: check node resolv.conf
kubectl debug node/worker-1 -it --image=ubuntu -- cat /etc/resolv.conf

# 5. NetworkPolicy blocking CoreDNS egress?
kubectl get networkpolicies -n kube-system
# Ensure kube-system allows egress to upstream DNS (port 53)

# 6. Check CoreDNS loop plugin (crash loop if forwarding to itself)
kubectl logs -n kube-system coredns-abc123 | grep "Loop"
# "Loop detected" → CoreDNS pointing to itself; check node resolv.conf
# Fix: explicitly set forward target: forward . 8.8.8.8

Runbook 4: CoreDNS high CPU / memory (overloaded)

# 1. Check CoreDNS QPS
kubectl exec -n kube-system coredns-abc123 -- \
  curl -s localhost:9153/metrics | grep "coredns_dns_requests_total"
# rate() in Prometheus:
# rate(coredns_dns_requests_total[1m]) — queries per second

# 2. Identify top query patterns (enable log plugin temporarily)
kubectl edit cm -n kube-system coredns
# Add under .:53 block: log .
# Watch: kubectl logs -n kube-system -l k8s-app=kube-dns -f | head -200

# 3. Check for NXDOMAIN storms (misconfigured apps hammering CoreDNS)
kubectl exec -n kube-system coredns-abc123 -- \
  curl -s localhost:9153/metrics | grep 'rcode="NXDOMAIN"'

# 4. Scale CoreDNS horizontally
kubectl scale deployment coredns -n kube-system --replicas=6

# 5. Increase cache TTL to reduce upstream queries
kubectl edit cm -n kube-system coredns
# Change: cache 30 → cache 120

# 6. Enable NodeLocal DNSCache to offload CoreDNS
# See NodeLocal DNSCache section above

Production Best Practices

Always Run NodeLocal DNSCache

Eliminates the conntrack UDP race (5s timeout bug)
Sub-millisecond cache hits for popular records
Reduces CoreDNS pod load by 60-80%
Use force_tcp for cache misses to CoreDNS

Scale CoreDNS Proportionally

Use cluster-proportional-autoscaler (not HPA)
1 replica per 16 nodes or 256 cores as baseline
Spread across failure domains with anti-affinity
Set resource limits to prevent OOMKill on traffic spikes

Reduce ndots Overhead

Use trailing dots for external FQDNs in app config
Set ndots: 2 for pods making many external calls
Enable autopath @kubernetes for server-side search
Use fully-qualified service names where latency matters

Security

Enable NetworkPolicy to restrict who can query CoreDNS
Use DNS-over-TLS for upstream forwarding
Avoid pods insecure if pod spoofing is a concern
Monitor for DNS exfiltration (high TXT/NULL query rate)
Use Cilium L7 DNS policy to restrict FQDN access per pod

← Kube-Proxy Internals Ingress →