Ingress

What This Page Covers
  • Ingress API object — spec anatomy (rules, paths, TLS, defaultBackend)
  • IngressClass — controller selection, default class, parameters
  • Path types — Exact, Prefix, ImplementationSpecific semantics
  • Ingress controller architecture — how controllers watch & reconcile
  • NGINX Ingress Controller — deep dive, annotations, ConfigMap tuning
  • Traefik — IngressRoute CRD, middlewares, automatic HTTPS
  • HAProxy Ingress — backend config, traffic shaping
  • AWS ALB Ingress Controller — target types, annotation reference
  • TLS termination — cert-manager integration, Let's Encrypt, wildcard certs
  • Advanced routing — canary, session affinity, custom headers, rate limiting
  • Multi-tenancy — IngressClass per team, namespace isolation
  • Ingress vs Gateway API — when to use which
  • Ingress controller selection guide
  • Metrics, alerting rules, 5 troubleshooting runbooks, best practices

Ingress exposes HTTP and HTTPS routes from outside the cluster to Services within it. An Ingress controller (NGINX, Traefik, HAProxy, AWS ALB, etc.) reads Ingress objects and programs the underlying proxy. This page covers the Ingress API in full, every path type, TLS configuration, cert-manager integration, and deep dives into NGINX and Traefik controllers — including advanced annotations, canary deployments, rate limiting, and multi-tenancy patterns.

What Ingress Is (and Is Not)

What Ingress provides

  • L7 (HTTP/HTTPS) routing from external to cluster
  • Host-based virtual hosting (api.example.com vs app.example.com)
  • Path-based routing (/api → api-service, /static → cdn-service)
  • TLS termination (HTTPS → HTTP to backend)
  • Name-based virtual hosting (single IP, multiple domains)

What Ingress does NOT provide

  • L4 TCP/UDP routing (use Service type=LoadBalancer or Gateway API)
  • mTLS between client and service (use service mesh or Gateway API)
  • Traffic splitting / weighted routing natively (use annotations or Gateway API)
  • gRPC routing natively in all controllers
  • Fine-grained traffic policies (use Gateway API)
Ingress requires an Ingress controller

The Ingress resource itself does nothing without a running controller. Unlike kube-proxy (which ships with Kubernetes), Ingress controllers must be installed separately. The Kubernetes project maintains the NGINX Ingress Controller; cloud providers offer their own (AWS ALB, GCE L7, Azure Application Gateway).

Ingress API Object

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ingress
  namespace: production
  annotations:
    # Annotations are controller-specific — see per-controller sections below
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx           # selects which controller handles this Ingress

  # TLS configuration — one entry per host or wildcard
  tls:
  - hosts:
    - api.example.com
    - app.example.com
    secretName: example-tls         # Secret must contain tls.crt and tls.key

  # Rules — processed in order; first matching rule wins
  rules:
  - host: api.example.com           # empty host = wildcard catch-all
    http:
      paths:
      - path: /v1
        pathType: Prefix             # Exact | Prefix | ImplementationSpecific
        backend:
          service:
            name: api-v1-service
            port:
              number: 8080
      - path: /v2
        pathType: Prefix
        backend:
          service:
            name: api-v2-service
            port:
              number: 8080

  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80

  # Default backend — handles requests that match no rule
  defaultBackend:
    service:
      name: default-404-service
      port:
        number: 80

Path Types

pathTypeMatch SemanticsExample pathMatchesDoes NOT match
ExactFull path equality, case-sensitive/foo/foo/foo/, /foobar
PrefixPath prefix, element-based (split by /)/foo/foo, /foo/, /foo/bar/foobar
ImplementationSpecificController decides (regex, glob, etc.)/foo.*Controller-definedController-defined
Prefix semantics gotcha

Prefix matching is element-based, not substring-based. Path /foo matches /foo/bar but NOT /foobar. The longest matching path wins when multiple rules match. If you need /foobar to match, use a separate rule or ImplementationSpecific with a regex.

IngressClass

apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: nginx
  annotations:
    # Mark as default — Ingress objects without ingressClassName use this
    ingressclass.kubernetes.io/is-default-class: "true"
spec:
  controller: k8s.io/ingress-nginx   # matches the controller's --ingress-class flag

---
# IngressClass with parameters (controller-specific config)
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: alb-internal
spec:
  controller: ingress.k8s.aws/alb
  parameters:
    apiGroup: elbv2.k8s.aws
    kind: IngressClassParams
    name: alb-internal-params
    namespace: kube-system
    scope: Namespace   # Namespace | Cluster

Ingress Controller Architecture

External Client Cloud LB / NodePort 80 / 443 Ingress Controller NGINX / Traefik / ALB Watches Ingress + Secret Programs proxy config API Server Ingress + EndpointSlice Service A pods Service B pods Watch proxy TLS Secret

Every Ingress controller follows the same pattern: it runs as a Pod (usually a Deployment), watches the API server for Ingress, Service, Endpoints/EndpointSlice, and Secret objects, then configures its embedded proxy (NGINX, Envoy, HAProxy, etc.) to match the declared rules. The controller never modifies iptables — traffic reaches it via a LoadBalancer Service or NodePort, then the controller proxies it to backend pods.

NGINX Ingress Controller

The Kubernetes-maintained NGINX Ingress Controller (k8s.io/ingress-nginx) is the most widely deployed controller. It embeds NGINX and dynamically regenerates nginx.conf when Ingress or Endpoint objects change.

Two different NGINX controllers

kubernetes/ingress-nginx (maintained by Kubernetes SIG Network — this page) vs nginxinc/kubernetes-ingress (maintained by NGINX Inc / F5). Their annotations and CRDs differ significantly. Confirm which one you're deploying.

Installation

# Helm (recommended)
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  --set controller.replicaCount=2 \
  --set controller.nodeSelector."kubernetes\.io/os"=linux \
  --set controller.admissionWebhooks.enabled=true \
  --set controller.metrics.enabled=true \
  --set controller.metrics.serviceMonitor.enabled=true

# Bare-metal (NodePort instead of LoadBalancer)
helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  --set controller.service.type=NodePort \
  --set controller.hostPort.enabled=true

Global ConfigMap Tuning

# kubectl edit cm -n ingress-nginx ingress-nginx-controller
apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
data:
  # Performance
  worker-processes: "auto"          # matches CPU count
  worker-connections: "16384"
  keep-alive: "75"                  # keep-alive timeout seconds
  keep-alive-requests: "10000"

  # Timeouts
  proxy-connect-timeout: "5"        # seconds to connect to backend
  proxy-send-timeout: "60"
  proxy-read-timeout: "60"

  # Request handling
  proxy-body-size: "100m"           # max client body size (default 1m!)
  use-forwarded-headers: "true"     # trust X-Forwarded-* from upstream LB
  compute-full-forwarded-for: "true"

  # TLS
  ssl-protocols: "TLSv1.2 TLSv1.3"
  ssl-ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256"
  ssl-session-cache: "shared:SSL:10m"
  ssl-session-timeout: "10m"
  hsts: "true"
  hsts-max-age: "31536000"
  hsts-include-subdomains: "true"

  # Logging
  log-format-upstream: '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id'
  access-log-path: "/var/log/nginx/access.log"

  # Enable Brotli compression
  enable-brotli: "true"

  # Rate limiting
  limit-req-status-code: "429"

Key Annotations Reference

AnnotationDefaultEffect
nginx.ingress.kubernetes.io/rewrite-targetRewrite URL path before forwarding (use capture groups with regex)
nginx.ingress.kubernetes.io/ssl-redirecttrueRedirect HTTP → HTTPS
nginx.ingress.kubernetes.io/force-ssl-redirectfalseForce HTTPS even behind HTTP LB
nginx.ingress.kubernetes.io/proxy-body-size1mOverride global max body size per Ingress
nginx.ingress.kubernetes.io/proxy-read-timeout60Backend read timeout in seconds
nginx.ingress.kubernetes.io/backend-protocolHTTPHTTPS, GRPC, GRPCS, AJP, FCGI
nginx.ingress.kubernetes.io/affinitycookie — enable session affinity via cookie
nginx.ingress.kubernetes.io/session-cookie-nameINGRESSCOOKIEName of the affinity cookie
nginx.ingress.kubernetes.io/canaryfalseMark as canary Ingress (see canary section)
nginx.ingress.kubernetes.io/canary-weightTraffic percentage (0–100) to route to canary
nginx.ingress.kubernetes.io/canary-by-headerHeader name; if value=always → send to canary
nginx.ingress.kubernetes.io/limit-rpsRate limit: requests per second per IP
nginx.ingress.kubernetes.io/limit-connectionsMax concurrent connections per IP
nginx.ingress.kubernetes.io/configuration-snippetInject raw NGINX config into location block
nginx.ingress.kubernetes.io/server-snippetInject raw NGINX config into server block
nginx.ingress.kubernetes.io/use-regexfalseEnable regex in path matching
nginx.ingress.kubernetes.io/auth-typebasic or digest HTTP auth
nginx.ingress.kubernetes.io/auth-secretSecret name containing htpasswd content
nginx.ingress.kubernetes.io/auth-urlExternal auth service URL (OAuth2 proxy pattern)
nginx.ingress.kubernetes.io/cors-allow-originCORS allowed origin header value
nginx.ingress.kubernetes.io/enable-corsfalseEnable CORS response headers
nginx.ingress.kubernetes.io/whitelist-source-rangeComma-separated IP CIDRs allowed (IP allowlist)
nginx.ingress.kubernetes.io/modsecurity-snippetModSecurity WAF rules

Canary Deployments with NGINX

# 1. Primary Ingress (production traffic)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-production
  namespace: production
spec:
  ingressClassName: nginx
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-v1
            port:
              number: 80

---
# 2. Canary Ingress (routes % of traffic to v2)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-canary
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"    # 10% to v2
    # OR by header: nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
    # OR by cookie:  nginx.ingress.kubernetes.io/canary-by-cookie: "canary"
spec:
  ingressClassName: nginx
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-v2
            port:
              number: 80

Rate Limiting

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-rate-limited
  annotations:
    nginx.ingress.kubernetes.io/limit-rps: "10"          # 10 req/s per client IP
    nginx.ingress.kubernetes.io/limit-connections: "20"  # max 20 concurrent connections
    nginx.ingress.kubernetes.io/limit-burst-multiplier: "5"   # burst = 5 × limit-rps = 50
    # Whitelist internal CIDRs from rate limiting:
    nginx.ingress.kubernetes.io/limit-whitelist: "10.0.0.0/8,192.168.0.0/16"
spec:
  ingressClassName: nginx
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api
            port:
              number: 80

TLS Termination and cert-manager

Manual TLS Secret

# Create TLS Secret from certificate files
kubectl create secret tls example-tls \
  --cert=tls.crt \
  --key=tls.key \
  --namespace=production

# Secret structure
kubectl get secret example-tls -o yaml
# data:
#   tls.crt: <base64-encoded PEM chain: server cert + intermediate CAs>
#   tls.key: <base64-encoded private key>

cert-manager Integration

cert-manager automates certificate issuance and renewal from ACME (Let's Encrypt), Vault, Venafi, or self-signed CAs. It watches for Certificate resources and Ingress TLS annotations, then populates Secrets automatically.

# Install cert-manager
helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set crds.enabled=true
# ClusterIssuer — Let's Encrypt production
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ops@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
    - http01:                         # HTTP-01 challenge (requires port 80 access)
        ingress:
          ingressClassName: nginx
    - dns01:                          # DNS-01 challenge (for wildcard certs)
        route53:
          region: us-east-1
          hostedZoneID: Z1234567890ABC

---
# Ingress with automatic cert-manager TLS (annotation approach)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    # cert-manager sees this annotation + tls.secretName → creates Certificate object
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.example.com
    secretName: app-example-com-tls   # cert-manager creates this Secret
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend
            port:
              number: 80
# Monitor certificate status
kubectl get certificate -n production
kubectl describe certificate app-example-com-tls -n production
# Look for: Ready=True, Not Before/After dates, renewal time

# Manual certificate request
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-example-com
  namespace: production
spec:
  secretName: wildcard-example-com-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - "*.example.com"
  - "example.com"
EOF

Traefik Ingress Controller

Traefik is a cloud-native reverse proxy that supports both standard Ingress objects and its own IngressRoute CRD for more expressive routing. It integrates natively with Let's Encrypt, Consul, Docker, Kubernetes, and more.

IngressRoute CRD (Traefik v2+)

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: api-route
  namespace: production
spec:
  entryPoints:
  - websecure                         # listens on HTTPS (port 443)
  routes:
  - match: Host(`api.example.com`) && PathPrefix(`/v1`)
    kind: Rule
    services:
    - name: api-v1
      port: 8080
    middlewares:
    - name: rate-limit
    - name: auth-forward
  - match: Host(`api.example.com`) && PathPrefix(`/v2`)
    kind: Rule
    services:
    - name: api-v2
      port: 8080
      weight: 80                      # weighted traffic splitting
    - name: api-v2-canary
      port: 8080
      weight: 20
  tls:
    certResolver: le                  # uses built-in Let's Encrypt resolver

Traefik Middlewares

# Rate limiting middleware
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: rate-limit
  namespace: production
spec:
  rateLimit:
    average: 100          # requests per second average
    burst: 200            # burst size

---
# Headers middleware
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: security-headers
  namespace: production
spec:
  headers:
    frameDeny: true
    sslRedirect: true
    browserXssFilter: true
    contentTypeNosniff: true
    stsSeconds: 31536000
    stsIncludeSubdomains: true

---
# Forward auth middleware (OAuth2 proxy / SSO)
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: auth-forward
  namespace: production
spec:
  forwardAuth:
    address: http://oauth2-proxy.auth.svc.cluster.local/oauth2/auth
    trustForwardHeader: true
    authResponseHeaders:
    - X-Auth-User
    - X-Auth-Email

---
# Retry middleware
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: retry
spec:
  retry:
    attempts: 3
    initialInterval: 100ms

AWS ALB Ingress Controller (aws-load-balancer-controller)

The AWS Load Balancer Controller provisions an Application Load Balancer (ALB) per Ingress object (or one shared ALB via IngressGroup). It programs ALB Listener Rules directly, bypassing kube-proxy entirely for external traffic.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-alb
  namespace: production
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing      # or internal
    alb.ingress.kubernetes.io/target-type: ip              # ip | instance
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789:certificate/abc
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
    alb.ingress.kubernetes.io/ssl-redirect: "443"
    alb.ingress.kubernetes.io/healthcheck-path: /healthz
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: "15"
    alb.ingress.kubernetes.io/group.name: shared-alb      # share ALB across Ingresses
    alb.ingress.kubernetes.io/group.order: "10"           # rule priority within group
    alb.ingress.kubernetes.io/wafv2-acl-arn: arn:aws:wafv2:...  # attach WAF
    alb.ingress.kubernetes.io/load-balancer-attributes: |
      idle_timeout.timeout_seconds=120,
      routing.http2.enabled=true,
      access_logs.s3.enabled=true,
      access_logs.s3.bucket=my-alb-logs
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api
            port:
              number: 80

target-type: ip

ALB registers pod IPs directly as targets. Bypasses kube-proxy DNAT. Requires VPC CNI (pods have real VPC IPs). Lower latency, preserves source IP. Recommended for EKS.

target-type: instance

ALB targets node IPs at NodePort. Works with any CNI. Extra hop through kube-proxy. Source IP is the node IP (lost). Works for self-managed clusters.

Multi-Tenancy Patterns

IngressClass per Team

# Dedicated NGINX controller per team for hard isolation
# Team A controller in namespace ingress-team-a
helm install ingress-team-a ingress-nginx/ingress-nginx \
  --namespace ingress-team-a \
  --set controller.ingressClassResource.name=nginx-team-a \
  --set controller.ingressClassResource.controllerValue=k8s.io/ingress-nginx-team-a \
  --set controller.watchIngressWithoutClass=false \
  --set controller.scope.enabled=true \
  --set controller.scope.namespace=team-a   # only watch team-a namespace

Namespace Isolation

# Restrict controller to specific namespaces
# Controller watches only namespaces with label ingress-controller=nginx-prod

# Add to controller Deployment env:
# - name: POD_NAMESPACE
#   value: ingress-nginx
# Plus helm flag:
# --set controller.watchIngressWithoutClass=false

# IngressClass with namespace scope parameter
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: nginx-team-a
spec:
  controller: k8s.io/ingress-nginx
  parameters:
    apiGroup: k8s.nginx.org
    kind: IngressClassParameters
    name: nginx-team-a-params
    namespace: team-a
    scope: Namespace

Ingress vs Gateway API

DimensionIngressGateway API
API stabilityGA (stable)GA for core resources (1.28+)
L4 TCP/UDP routingNoYes (TCPRoute, UDPRoute)
Traffic splittingVia annotations (controller-specific)Native (HTTPRoute weights)
Header-based routingVia annotationsNative (HTTPRoute matches)
mTLSNoYes (BackendTLSPolicy)
Role separationSingle object; no RBAC splitGatewayClass/Gateway/Route — infra vs app team split
Cross-namespace routingNoYes (ReferenceGrant)
PortabilityLow (annotations differ per controller)High (standard API)
Ecosystem maturityVery mature; wide tooling supportMaturing; most major controllers now support it
Migration path

Ingress is not deprecated and will not be removed. New workloads with complex routing requirements should use Gateway API. Existing Ingress deployments can remain as-is. See Gateway API for the full Gateway deep-dive.

Ingress Controller Selection Guide

ControllerBest ForStrengthsLimitations
NGINX (k8s-maintained)General purpose; most environmentsMassive ecosystem; stable; rich annotations; ModSecurity WAFConfig reload causes brief traffic interruption; complex at scale
TraefikDynamic environments; automatic HTTPSNative Let's Encrypt; IngressRoute CRD; Middleware system; dashboardMore moving parts; CRD-heavy
HAProxyHigh-performance L4+L7; financial servicesLowest latency; hot config reload (zero disruption); advanced ACLsLess cloud-native; steeper learning curve
AWS ALB ControllerEKS on AWSNative ALB; WAF integration; target-type=ip; no extra proxy hopAWS-only; costs per ALB rule
GCE/GKEGKE on GCPNative GCP HTTPS LB; Cloud Armor; CDN; no controller to manageGCP-only; limited customization
Contour (Envoy)Envoy-based; Gateway API forward-lookingHTTPProxy CRD; Envoy performance; Gateway API supportSmaller ecosystem than NGINX
KongAPI gateway features neededPlugin system; auth, rate limiting, transformations; enterprise supportHeavier; requires own datastore (optional)

Key Ingress Metrics (NGINX)

MetricTypeAlert Threshold
nginx_ingress_controller_requestscounterSudden drop → controller issue
nginx_ingress_controller_request_duration_secondshistogramp99 > 2s → backend slow or overloaded
nginx_ingress_controller_response_sizehistogramSpike → large response anomaly
nginx_ingress_controller_requests{status=~"5.."}counter5xx rate > 1% of total requests
nginx_ingress_controller_nginx_process_connections{state="active"}gaugeNear worker_connections limit → scale controller
nginx_ingress_controller_config_last_reload_successfulgauge= 0 → config reload failed; routes may be stale
nginx_ingress_controller_ssl_expire_time_secondsgaugeWithin 30 days of expiry → certificate renewal needed

Alerting Rules

groups:
- name: ingress-nginx
  rules:
  - alert: IngressHighErrorRate
    expr: |
      sum(rate(nginx_ingress_controller_requests{status=~"5.."}[5m])) by (ingress, namespace)
      / sum(rate(nginx_ingress_controller_requests[5m])) by (ingress, namespace)
      > 0.01
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Ingress {{ $labels.namespace }}/{{ $labels.ingress }} has >1% 5xx rate"

  - alert: IngressHighLatency
    expr: histogram_quantile(0.99, rate(nginx_ingress_controller_request_duration_seconds_bucket[5m])) > 2
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Ingress p99 latency >2s"

  - alert: IngressConfigReloadFailed
    expr: nginx_ingress_controller_config_last_reload_successful == 0
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "NGINX config reload failed — Ingress rules may be stale"

  - alert: IngressCertExpirySoon
    expr: nginx_ingress_controller_ssl_expire_time_seconds - time() < 86400 * 30
    for: 1h
    labels:
      severity: warning
    annotations:
      summary: "TLS certificate expires in less than 30 days for {{ $labels.host }}"

Troubleshooting Runbooks

Runbook 1: 404 for all paths — Ingress not matching

# 1. Verify Ingress object exists and is picked up by controller
kubectl get ingress -n production
kubectl describe ingress web-ingress -n production
# Check "Address" field — if empty, controller hasn't assigned LB yet

# 2. Check IngressClass assignment
kubectl get ingress web-ingress -o jsonpath='{.spec.ingressClassName}'
kubectl get ingressclass   # verify controller exists and matches

# 3. Check NGINX controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=100 | \
  grep -i "error\|warn\|ingress\|production"

# 4. Verify backend Service exists and has endpoints
kubectl get svc api-service -n production
kubectl get endpoints api-service -n production   # must have addresses

# 5. Test backend directly from controller pod
CONTROLLER=$(kubectl get pod -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n ingress-nginx $CONTROLLER -- curl -s http://api-service.production:8080/healthz

# 6. Check generated NGINX config
kubectl exec -n ingress-nginx $CONTROLLER -- nginx -T | grep -A 20 "api.example.com"

Runbook 2: TLS certificate errors (ERR_CERT_COMMON_NAME_INVALID)

# 1. Check certificate is valid and covers the hostname
kubectl get secret example-tls -n production -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -noout -text | grep -E "Subject|DNS"

# 2. Check cert-manager Certificate status
kubectl get certificate -n production
kubectl describe certificate app-example-com-tls -n production
# Look for Events: "Successfully issued certificate" vs error messages

# 3. Check cert-manager challenge status (for ACME)
kubectl get challenges -n production
kubectl describe challenge -n production
# Common issue: HTTP-01 challenge blocked by firewall or ingress not routing /.well-known/

# 4. Check CertificateRequest and Order objects
kubectl get certificaterequest -n production
kubectl get order -n production

# 5. Verify Secret has correct keys
kubectl get secret example-tls -n production -o jsonpath='{.data}' | jq 'keys'
# Must have: tls.crt and tls.key

# 6. Force certificate renewal
kubectl delete secret app-example-com-tls -n production
# cert-manager will re-issue automatically

Runbook 3: 502 Bad Gateway — backend not responding

# 1. Check pods behind the service
kubectl get pods -l app=api -n production
kubectl describe pod <crashing-pod> -n production

# 2. Test backend health directly
kubectl port-forward svc/api-service 8080:8080 -n production &
curl -v http://localhost:8080/healthz

# 3. Check NGINX upstream configuration
CONTROLLER=$(kubectl get pod -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n ingress-nginx $CONTROLLER -- \
  curl -s http://localhost:10246/configuration/backends | jq '.[] | select(.name | contains("api"))'

# 4. Check proxy timeouts — backend may be slow
# If p99 latency ~60s → read timeout hit
# Increase: nginx.ingress.kubernetes.io/proxy-read-timeout: "120"

# 5. Check if service port name matches
kubectl get svc api-service -n production -o jsonpath='{.spec.ports[*].name}'
# Ingress backend port.name must match svc port.name (or use port number)

Runbook 4: NGINX config reload failing

# 1. Check last reload status
kubectl exec -n ingress-nginx $CONTROLLER -- \
  curl -s http://localhost:10254/metrics | grep config_last_reload

# 2. Check for config syntax errors
kubectl logs -n ingress-nginx $CONTROLLER --tail=50 | grep "error\|NGINX config"

# 3. Common causes of reload failure:
# a) Invalid annotation (e.g. regex syntax error in configuration-snippet)
kubectl get ingress --all-namespaces -o json | \
  jq '.items[] | select(.metadata.annotations | has("nginx.ingress.kubernetes.io/configuration-snippet")) | .metadata.name'

# b) Duplicate server_name (two Ingresses with same host in same controller)
kubectl get ingress --all-namespaces -o json | \
  jq '[.items[].spec.rules[].host] | group_by(.) | map(select(length>1))'

# 4. Force reload
kubectl exec -n ingress-nginx $CONTROLLER -- nginx -s reload

Runbook 5: Ingress working but original client IP lost

# Symptom: app logs show NGINX pod IP, not client IP

# 1. Verify X-Forwarded-For chain
kubectl exec -n production api-pod -- \
  curl -s http://localhost:8080/debug/headers
# Should show X-Forwarded-For: <real-client-ip>

# 2. Configure NGINX to trust upstream LB headers
# In ConfigMap: use-forwarded-headers: "true"
# For AWS NLB: use-proxy-protocol: "true"

# 3. For AWS ALB with target-type=ip: source IP is preserved natively
# For target-type=instance: use X-Forwarded-For from ALB

# 4. Ensure app reads X-Real-IP or X-Forwarded-For header
# NGINX sets: X-Real-IP = first client IP, X-Forwarded-For = full chain

Production Best Practices

Reliability

  • Run ≥2 controller replicas with PodDisruptionBudget minAvailable:1
  • Use pod anti-affinity to spread replicas across nodes/zones
  • Set resource requests/limits to prevent OOMKill on traffic spikes
  • Enable admission webhook to catch config errors before apply
  • Use --default-backend-service for graceful 404 handling

Security

  • Always enable HSTS headers and SSL redirect
  • Use cert-manager with Let's Encrypt — never commit private keys
  • Restrict configuration-snippet annotation (CVE risk — arbitrary NGINX injection)
  • Use IP allowlist annotations for admin paths
  • Enable ModSecurity WAF for public APIs
  • Rotate TLS secrets; alert on <30 day expiry

Performance

  • Tune worker-processes: auto and worker-connections
  • Enable keep-alive to backend pods (reduces TCP handshake overhead)
  • Use proxy-buffering: on for slow clients
  • Enable Brotli/gzip compression
  • Use HTTP/2 for TLS connections

Multi-Tenancy

  • Dedicate an IngressClass per tenant for hard isolation
  • Use --watch-namespace to scope each controller
  • Apply ResourceQuota on Ingress objects per namespace
  • Consider migrating to Gateway API for cleaner role separation