Ingress

What This Page Covers

Ingress API object — spec anatomy (rules, paths, TLS, defaultBackend)
IngressClass — controller selection, default class, parameters
Path types — Exact, Prefix, ImplementationSpecific semantics
Ingress controller architecture — how controllers watch & reconcile
NGINX Ingress Controller — deep dive, annotations, ConfigMap tuning
Traefik — IngressRoute CRD, middlewares, automatic HTTPS
HAProxy Ingress — backend config, traffic shaping
AWS ALB Ingress Controller — target types, annotation reference
TLS termination — cert-manager integration, Let's Encrypt, wildcard certs
Advanced routing — canary, session affinity, custom headers, rate limiting
Multi-tenancy — IngressClass per team, namespace isolation
Ingress vs Gateway API — when to use which
Ingress controller selection guide
Metrics, alerting rules, 5 troubleshooting runbooks, best practices

Ingress exposes HTTP and HTTPS routes from outside the cluster to Services within it. An Ingress controller (NGINX, Traefik, HAProxy, AWS ALB, etc.) reads Ingress objects and programs the underlying proxy. This page covers the Ingress API in full, every path type, TLS configuration, cert-manager integration, and deep dives into NGINX and Traefik controllers — including advanced annotations, canary deployments, rate limiting, and multi-tenancy patterns.

What Ingress Is (and Is Not)

What Ingress provides

L7 (HTTP/HTTPS) routing from external to cluster
Host-based virtual hosting (api.example.com vs app.example.com)
Path-based routing (/api → api-service, /static → cdn-service)
TLS termination (HTTPS → HTTP to backend)
Name-based virtual hosting (single IP, multiple domains)

What Ingress does NOT provide

L4 TCP/UDP routing (use Service type=LoadBalancer or Gateway API)
mTLS between client and service (use service mesh or Gateway API)
Traffic splitting / weighted routing natively (use annotations or Gateway API)
gRPC routing natively in all controllers
Fine-grained traffic policies (use Gateway API)

Ingress requires an Ingress controller

The Ingress resource itself does nothing without a running controller. Unlike kube-proxy (which ships with Kubernetes), Ingress controllers must be installed separately. The Kubernetes project maintains the NGINX Ingress Controller; cloud providers offer their own (AWS ALB, GCE L7, Azure Application Gateway).

Ingress API Object

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ingress
  namespace: production
  annotations:
    # Annotations are controller-specific — see per-controller sections below
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx           # selects which controller handles this Ingress

  # TLS configuration — one entry per host or wildcard
  tls:
  - hosts:
    - api.example.com
    - app.example.com
    secretName: example-tls         # Secret must contain tls.crt and tls.key

  # Rules — processed in order; first matching rule wins
  rules:
  - host: api.example.com           # empty host = wildcard catch-all
    http:
      paths:
      - path: /v1
        pathType: Prefix             # Exact | Prefix | ImplementationSpecific
        backend:
          service:
            name: api-v1-service
            port:
              number: 8080
      - path: /v2
        pathType: Prefix
        backend:
          service:
            name: api-v2-service
            port:
              number: 8080

  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80

  # Default backend — handles requests that match no rule
  defaultBackend:
    service:
      name: default-404-service
      port:
        number: 80

Path Types

pathType	Match Semantics	Example path	Matches	Does NOT match
Exact	Full path equality, case-sensitive	`/foo`	`/foo`	`/foo/`, `/foobar`
Prefix	Path prefix, element-based (split by /)	`/foo`	`/foo`, `/foo/`, `/foo/bar`	`/foobar`
ImplementationSpecific	Controller decides (regex, glob, etc.)	`/foo.*`	Controller-defined	Controller-defined

Prefix semantics gotcha

Prefix matching is element-based, not substring-based. Path /foo matches /foo/bar but NOT /foobar. The longest matching path wins when multiple rules match. If you need /foobar to match, use a separate rule or ImplementationSpecific with a regex.

IngressClass

apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: nginx
  annotations:
    # Mark as default — Ingress objects without ingressClassName use this
    ingressclass.kubernetes.io/is-default-class: "true"
spec:
  controller: k8s.io/ingress-nginx   # matches the controller's --ingress-class flag

---
# IngressClass with parameters (controller-specific config)
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: alb-internal
spec:
  controller: ingress.k8s.aws/alb
  parameters:
    apiGroup: elbv2.k8s.aws
    kind: IngressClassParams
    name: alb-internal-params
    namespace: kube-system
    scope: Namespace   # Namespace | Cluster

Ingress Controller Architecture

Every Ingress controller follows the same pattern: it runs as a Pod (usually a Deployment), watches the API server for Ingress, Service, Endpoints/EndpointSlice, and Secret objects, then configures its embedded proxy (NGINX, Envoy, HAProxy, etc.) to match the declared rules. The controller never modifies iptables — traffic reaches it via a LoadBalancer Service or NodePort, then the controller proxies it to backend pods.

NGINX Ingress Controller

The Kubernetes-maintained NGINX Ingress Controller (k8s.io/ingress-nginx) is the most widely deployed controller. It embeds NGINX and dynamically regenerates nginx.conf when Ingress or Endpoint objects change.

Two different NGINX controllers

kubernetes/ingress-nginx (maintained by Kubernetes SIG Network — this page) vs nginxinc/kubernetes-ingress (maintained by NGINX Inc / F5). Their annotations and CRDs differ significantly. Confirm which one you're deploying.

Installation

# Helm (recommended)
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  --set controller.replicaCount=2 \
  --set controller.nodeSelector."kubernetes\.io/os"=linux \
  --set controller.admissionWebhooks.enabled=true \
  --set controller.metrics.enabled=true \
  --set controller.metrics.serviceMonitor.enabled=true

# Bare-metal (NodePort instead of LoadBalancer)
helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  --set controller.service.type=NodePort \
  --set controller.hostPort.enabled=true

Global ConfigMap Tuning

# kubectl edit cm -n ingress-nginx ingress-nginx-controller
apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
data:
  # Performance
  worker-processes: "auto"          # matches CPU count
  worker-connections: "16384"
  keep-alive: "75"                  # keep-alive timeout seconds
  keep-alive-requests: "10000"

  # Timeouts
  proxy-connect-timeout: "5"        # seconds to connect to backend
  proxy-send-timeout: "60"
  proxy-read-timeout: "60"

  # Request handling
  proxy-body-size: "100m"           # max client body size (default 1m!)
  use-forwarded-headers: "true"     # trust X-Forwarded-* from upstream LB
  compute-full-forwarded-for: "true"

  # TLS
  ssl-protocols: "TLSv1.2 TLSv1.3"
  ssl-ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256"
  ssl-session-cache: "shared:SSL:10m"
  ssl-session-timeout: "10m"
  hsts: "true"
  hsts-max-age: "31536000"
  hsts-include-subdomains: "true"

  # Logging
  log-format-upstream: '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id'
  access-log-path: "/var/log/nginx/access.log"

  # Enable Brotli compression
  enable-brotli: "true"

  # Rate limiting
  limit-req-status-code: "429"

Key Annotations Reference

Annotation	Default	Effect
`nginx.ingress.kubernetes.io/rewrite-target`	—	Rewrite URL path before forwarding (use capture groups with regex)
`nginx.ingress.kubernetes.io/ssl-redirect`	`true`	Redirect HTTP → HTTPS
`nginx.ingress.kubernetes.io/force-ssl-redirect`	`false`	Force HTTPS even behind HTTP LB
`nginx.ingress.kubernetes.io/proxy-body-size`	`1m`	Override global max body size per Ingress
`nginx.ingress.kubernetes.io/proxy-read-timeout`	`60`	Backend read timeout in seconds
`nginx.ingress.kubernetes.io/backend-protocol`	`HTTP`	`HTTPS`, `GRPC`, `GRPCS`, `AJP`, `FCGI`
`nginx.ingress.kubernetes.io/affinity`	—	`cookie` — enable session affinity via cookie
`nginx.ingress.kubernetes.io/session-cookie-name`	`INGRESSCOOKIE`	Name of the affinity cookie
`nginx.ingress.kubernetes.io/canary`	`false`	Mark as canary Ingress (see canary section)
`nginx.ingress.kubernetes.io/canary-weight`	—	Traffic percentage (0–100) to route to canary
`nginx.ingress.kubernetes.io/canary-by-header`	—	Header name; if value=always → send to canary
`nginx.ingress.kubernetes.io/limit-rps`	—	Rate limit: requests per second per IP
`nginx.ingress.kubernetes.io/limit-connections`	—	Max concurrent connections per IP
`nginx.ingress.kubernetes.io/configuration-snippet`	—	Inject raw NGINX config into location block
`nginx.ingress.kubernetes.io/server-snippet`	—	Inject raw NGINX config into server block
`nginx.ingress.kubernetes.io/use-regex`	`false`	Enable regex in path matching
`nginx.ingress.kubernetes.io/auth-type`	—	`basic` or `digest` HTTP auth
`nginx.ingress.kubernetes.io/auth-secret`	—	Secret name containing htpasswd content
`nginx.ingress.kubernetes.io/auth-url`	—	External auth service URL (OAuth2 proxy pattern)
`nginx.ingress.kubernetes.io/cors-allow-origin`	—	CORS allowed origin header value
`nginx.ingress.kubernetes.io/enable-cors`	`false`	Enable CORS response headers
`nginx.ingress.kubernetes.io/whitelist-source-range`	—	Comma-separated IP CIDRs allowed (IP allowlist)
`nginx.ingress.kubernetes.io/modsecurity-snippet`	—	ModSecurity WAF rules

Canary Deployments with NGINX

# 1. Primary Ingress (production traffic)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-production
  namespace: production
spec:
  ingressClassName: nginx
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-v1
            port:
              number: 80

---
# 2. Canary Ingress (routes % of traffic to v2)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-canary
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"    # 10% to v2
    # OR by header: nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
    # OR by cookie:  nginx.ingress.kubernetes.io/canary-by-cookie: "canary"
spec:
  ingressClassName: nginx
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-v2
            port:
              number: 80

Rate Limiting

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-rate-limited
  annotations:
    nginx.ingress.kubernetes.io/limit-rps: "10"          # 10 req/s per client IP
    nginx.ingress.kubernetes.io/limit-connections: "20"  # max 20 concurrent connections
    nginx.ingress.kubernetes.io/limit-burst-multiplier: "5"   # burst = 5 × limit-rps = 50
    # Whitelist internal CIDRs from rate limiting:
    nginx.ingress.kubernetes.io/limit-whitelist: "10.0.0.0/8,192.168.0.0/16"
spec:
  ingressClassName: nginx
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api
            port:
              number: 80

TLS Termination and cert-manager

Manual TLS Secret

# Create TLS Secret from certificate files
kubectl create secret tls example-tls \
  --cert=tls.crt \
  --key=tls.key \
  --namespace=production

# Secret structure
kubectl get secret example-tls -o yaml
# data:
#   tls.crt: <base64-encoded PEM chain: server cert + intermediate CAs>
#   tls.key: <base64-encoded private key>

cert-manager Integration

cert-manager automates certificate issuance and renewal from ACME (Let's Encrypt), Vault, Venafi, or self-signed CAs. It watches for Certificate resources and Ingress TLS annotations, then populates Secrets automatically.

# Install cert-manager
helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set crds.enabled=true

# ClusterIssuer — Let's Encrypt production
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ops@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
    - http01:                         # HTTP-01 challenge (requires port 80 access)
        ingress:
          ingressClassName: nginx
    - dns01:                          # DNS-01 challenge (for wildcard certs)
        route53:
          region: us-east-1
          hostedZoneID: Z1234567890ABC

---
# Ingress with automatic cert-manager TLS (annotation approach)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    # cert-manager sees this annotation + tls.secretName → creates Certificate object
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.example.com
    secretName: app-example-com-tls   # cert-manager creates this Secret
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend
            port:
              number: 80

# Monitor certificate status
kubectl get certificate -n production
kubectl describe certificate app-example-com-tls -n production
# Look for: Ready=True, Not Before/After dates, renewal time

# Manual certificate request
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-example-com
  namespace: production
spec:
  secretName: wildcard-example-com-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - "*.example.com"
  - "example.com"
EOF

Traefik Ingress Controller

Traefik is a cloud-native reverse proxy that supports both standard Ingress objects and its own IngressRoute CRD for more expressive routing. It integrates natively with Let's Encrypt, Consul, Docker, Kubernetes, and more.

IngressRoute CRD (Traefik v2+)

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: api-route
  namespace: production
spec:
  entryPoints:
  - websecure                         # listens on HTTPS (port 443)
  routes:
  - match: Host(`api.example.com`) && PathPrefix(`/v1`)
    kind: Rule
    services:
    - name: api-v1
      port: 8080
    middlewares:
    - name: rate-limit
    - name: auth-forward
  - match: Host(`api.example.com`) && PathPrefix(`/v2`)
    kind: Rule
    services:
    - name: api-v2
      port: 8080
      weight: 80                      # weighted traffic splitting
    - name: api-v2-canary
      port: 8080
      weight: 20
  tls:
    certResolver: le                  # uses built-in Let's Encrypt resolver

Traefik Middlewares

# Rate limiting middleware
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: rate-limit
  namespace: production
spec:
  rateLimit:
    average: 100          # requests per second average
    burst: 200            # burst size

---
# Headers middleware
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: security-headers
  namespace: production
spec:
  headers:
    frameDeny: true
    sslRedirect: true
    browserXssFilter: true
    contentTypeNosniff: true
    stsSeconds: 31536000
    stsIncludeSubdomains: true

---
# Forward auth middleware (OAuth2 proxy / SSO)
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: auth-forward
  namespace: production
spec:
  forwardAuth:
    address: http://oauth2-proxy.auth.svc.cluster.local/oauth2/auth
    trustForwardHeader: true
    authResponseHeaders:
    - X-Auth-User
    - X-Auth-Email

---
# Retry middleware
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: retry
spec:
  retry:
    attempts: 3
    initialInterval: 100ms

AWS ALB Ingress Controller (aws-load-balancer-controller)

The AWS Load Balancer Controller provisions an Application Load Balancer (ALB) per Ingress object (or one shared ALB via IngressGroup). It programs ALB Listener Rules directly, bypassing kube-proxy entirely for external traffic.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-alb
  namespace: production
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing      # or internal
    alb.ingress.kubernetes.io/target-type: ip              # ip | instance
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789:certificate/abc
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
    alb.ingress.kubernetes.io/ssl-redirect: "443"
    alb.ingress.kubernetes.io/healthcheck-path: /healthz
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: "15"
    alb.ingress.kubernetes.io/group.name: shared-alb      # share ALB across Ingresses
    alb.ingress.kubernetes.io/group.order: "10"           # rule priority within group
    alb.ingress.kubernetes.io/wafv2-acl-arn: arn:aws:wafv2:...  # attach WAF
    alb.ingress.kubernetes.io/load-balancer-attributes: |
      idle_timeout.timeout_seconds=120,
      routing.http2.enabled=true,
      access_logs.s3.enabled=true,
      access_logs.s3.bucket=my-alb-logs
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api
            port:
              number: 80

target-type: ip

ALB registers pod IPs directly as targets. Bypasses kube-proxy DNAT. Requires VPC CNI (pods have real VPC IPs). Lower latency, preserves source IP. Recommended for EKS.

target-type: instance

ALB targets node IPs at NodePort. Works with any CNI. Extra hop through kube-proxy. Source IP is the node IP (lost). Works for self-managed clusters.

Multi-Tenancy Patterns

IngressClass per Team

# Dedicated NGINX controller per team for hard isolation
# Team A controller in namespace ingress-team-a
helm install ingress-team-a ingress-nginx/ingress-nginx \
  --namespace ingress-team-a \
  --set controller.ingressClassResource.name=nginx-team-a \
  --set controller.ingressClassResource.controllerValue=k8s.io/ingress-nginx-team-a \
  --set controller.watchIngressWithoutClass=false \
  --set controller.scope.enabled=true \
  --set controller.scope.namespace=team-a   # only watch team-a namespace

Namespace Isolation

# Restrict controller to specific namespaces
# Controller watches only namespaces with label ingress-controller=nginx-prod

# Add to controller Deployment env:
# - name: POD_NAMESPACE
#   value: ingress-nginx
# Plus helm flag:
# --set controller.watchIngressWithoutClass=false

# IngressClass with namespace scope parameter
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: nginx-team-a
spec:
  controller: k8s.io/ingress-nginx
  parameters:
    apiGroup: k8s.nginx.org
    kind: IngressClassParameters
    name: nginx-team-a-params
    namespace: team-a
    scope: Namespace

Ingress vs Gateway API

Dimension	Ingress	Gateway API
API stability	GA (stable)	GA for core resources (1.28+)
L4 TCP/UDP routing	No	Yes (TCPRoute, UDPRoute)
Traffic splitting	Via annotations (controller-specific)	Native (HTTPRoute weights)
Header-based routing	Via annotations	Native (HTTPRoute matches)
mTLS	No	Yes (BackendTLSPolicy)
Role separation	Single object; no RBAC split	GatewayClass/Gateway/Route — infra vs app team split
Cross-namespace routing	No	Yes (ReferenceGrant)
Portability	Low (annotations differ per controller)	High (standard API)
Ecosystem maturity	Very mature; wide tooling support	Maturing; most major controllers now support it

Migration path

Ingress is not deprecated and will not be removed. New workloads with complex routing requirements should use Gateway API. Existing Ingress deployments can remain as-is. See Gateway API for the full Gateway deep-dive.

Ingress Controller Selection Guide

Controller	Best For	Strengths	Limitations
NGINX (k8s-maintained)	General purpose; most environments	Massive ecosystem; stable; rich annotations; ModSecurity WAF	Config reload causes brief traffic interruption; complex at scale
Traefik	Dynamic environments; automatic HTTPS	Native Let's Encrypt; IngressRoute CRD; Middleware system; dashboard	More moving parts; CRD-heavy
HAProxy	High-performance L4+L7; financial services	Lowest latency; hot config reload (zero disruption); advanced ACLs	Less cloud-native; steeper learning curve
AWS ALB Controller	EKS on AWS	Native ALB; WAF integration; target-type=ip; no extra proxy hop	AWS-only; costs per ALB rule
GCE/GKE	GKE on GCP	Native GCP HTTPS LB; Cloud Armor; CDN; no controller to manage	GCP-only; limited customization
Contour (Envoy)	Envoy-based; Gateway API forward-looking	HTTPProxy CRD; Envoy performance; Gateway API support	Smaller ecosystem than NGINX
Kong	API gateway features needed	Plugin system; auth, rate limiting, transformations; enterprise support	Heavier; requires own datastore (optional)

Key Ingress Metrics (NGINX)

Metric	Type	Alert Threshold
`nginx_ingress_controller_requests`	counter	Sudden drop → controller issue
`nginx_ingress_controller_request_duration_seconds`	histogram	p99 > 2s → backend slow or overloaded
`nginx_ingress_controller_response_size`	histogram	Spike → large response anomaly
`nginx_ingress_controller_requests{status=~"5.."}`	counter	5xx rate > 1% of total requests
`nginx_ingress_controller_nginx_process_connections{state="active"}`	gauge	Near worker_connections limit → scale controller
`nginx_ingress_controller_config_last_reload_successful`	gauge	= 0 → config reload failed; routes may be stale
`nginx_ingress_controller_ssl_expire_time_seconds`	gauge	Within 30 days of expiry → certificate renewal needed

Alerting Rules

groups:
- name: ingress-nginx
  rules:
  - alert: IngressHighErrorRate
    expr: |
      sum(rate(nginx_ingress_controller_requests{status=~"5.."}[5m])) by (ingress, namespace)
      / sum(rate(nginx_ingress_controller_requests[5m])) by (ingress, namespace)
      > 0.01
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Ingress {{ $labels.namespace }}/{{ $labels.ingress }} has >1% 5xx rate"

  - alert: IngressHighLatency
    expr: histogram_quantile(0.99, rate(nginx_ingress_controller_request_duration_seconds_bucket[5m])) > 2
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Ingress p99 latency >2s"

  - alert: IngressConfigReloadFailed
    expr: nginx_ingress_controller_config_last_reload_successful == 0
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "NGINX config reload failed — Ingress rules may be stale"

  - alert: IngressCertExpirySoon
    expr: nginx_ingress_controller_ssl_expire_time_seconds - time() < 86400 * 30
    for: 1h
    labels:
      severity: warning
    annotations:
      summary: "TLS certificate expires in less than 30 days for {{ $labels.host }}"

Troubleshooting Runbooks

Runbook 1: 404 for all paths — Ingress not matching

# 1. Verify Ingress object exists and is picked up by controller
kubectl get ingress -n production
kubectl describe ingress web-ingress -n production
# Check "Address" field — if empty, controller hasn't assigned LB yet

# 2. Check IngressClass assignment
kubectl get ingress web-ingress -o jsonpath='{.spec.ingressClassName}'
kubectl get ingressclass   # verify controller exists and matches

# 3. Check NGINX controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=100 | \
  grep -i "error\|warn\|ingress\|production"

# 4. Verify backend Service exists and has endpoints
kubectl get svc api-service -n production
kubectl get endpoints api-service -n production   # must have addresses

# 5. Test backend directly from controller pod
CONTROLLER=$(kubectl get pod -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n ingress-nginx $CONTROLLER -- curl -s http://api-service.production:8080/healthz

# 6. Check generated NGINX config
kubectl exec -n ingress-nginx $CONTROLLER -- nginx -T | grep -A 20 "api.example.com"

Runbook 2: TLS certificate errors (ERR_CERT_COMMON_NAME_INVALID)

# 1. Check certificate is valid and covers the hostname
kubectl get secret example-tls -n production -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -noout -text | grep -E "Subject|DNS"

# 2. Check cert-manager Certificate status
kubectl get certificate -n production
kubectl describe certificate app-example-com-tls -n production
# Look for Events: "Successfully issued certificate" vs error messages

# 3. Check cert-manager challenge status (for ACME)
kubectl get challenges -n production
kubectl describe challenge -n production
# Common issue: HTTP-01 challenge blocked by firewall or ingress not routing /.well-known/

# 4. Check CertificateRequest and Order objects
kubectl get certificaterequest -n production
kubectl get order -n production

# 5. Verify Secret has correct keys
kubectl get secret example-tls -n production -o jsonpath='{.data}' | jq 'keys'
# Must have: tls.crt and tls.key

# 6. Force certificate renewal
kubectl delete secret app-example-com-tls -n production
# cert-manager will re-issue automatically

Runbook 3: 502 Bad Gateway — backend not responding

# 1. Check pods behind the service
kubectl get pods -l app=api -n production
kubectl describe pod <crashing-pod> -n production

# 2. Test backend health directly
kubectl port-forward svc/api-service 8080:8080 -n production &
curl -v http://localhost:8080/healthz

# 3. Check NGINX upstream configuration
CONTROLLER=$(kubectl get pod -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n ingress-nginx $CONTROLLER -- \
  curl -s http://localhost:10246/configuration/backends | jq '.[] | select(.name | contains("api"))'

# 4. Check proxy timeouts — backend may be slow
# If p99 latency ~60s → read timeout hit
# Increase: nginx.ingress.kubernetes.io/proxy-read-timeout: "120"

# 5. Check if service port name matches
kubectl get svc api-service -n production -o jsonpath='{.spec.ports[*].name}'
# Ingress backend port.name must match svc port.name (or use port number)

Runbook 4: NGINX config reload failing

# 1. Check last reload status
kubectl exec -n ingress-nginx $CONTROLLER -- \
  curl -s http://localhost:10254/metrics | grep config_last_reload

# 2. Check for config syntax errors
kubectl logs -n ingress-nginx $CONTROLLER --tail=50 | grep "error\|NGINX config"

# 3. Common causes of reload failure:
# a) Invalid annotation (e.g. regex syntax error in configuration-snippet)
kubectl get ingress --all-namespaces -o json | \
  jq '.items[] | select(.metadata.annotations | has("nginx.ingress.kubernetes.io/configuration-snippet")) | .metadata.name'

# b) Duplicate server_name (two Ingresses with same host in same controller)
kubectl get ingress --all-namespaces -o json | \
  jq '[.items[].spec.rules[].host] | group_by(.) | map(select(length>1))'

# 4. Force reload
kubectl exec -n ingress-nginx $CONTROLLER -- nginx -s reload

Runbook 5: Ingress working but original client IP lost

# Symptom: app logs show NGINX pod IP, not client IP

# 1. Verify X-Forwarded-For chain
kubectl exec -n production api-pod -- \
  curl -s http://localhost:8080/debug/headers
# Should show X-Forwarded-For: <real-client-ip>

# 2. Configure NGINX to trust upstream LB headers
# In ConfigMap: use-forwarded-headers: "true"
# For AWS NLB: use-proxy-protocol: "true"

# 3. For AWS ALB with target-type=ip: source IP is preserved natively
# For target-type=instance: use X-Forwarded-For from ALB

# 4. Ensure app reads X-Real-IP or X-Forwarded-For header
# NGINX sets: X-Real-IP = first client IP, X-Forwarded-For = full chain

Production Best Practices

Reliability

Run ≥2 controller replicas with PodDisruptionBudget minAvailable:1
Use pod anti-affinity to spread replicas across nodes/zones
Set resource requests/limits to prevent OOMKill on traffic spikes
Enable admission webhook to catch config errors before apply
Use --default-backend-service for graceful 404 handling

Security

Always enable HSTS headers and SSL redirect
Use cert-manager with Let's Encrypt — never commit private keys
Restrict configuration-snippet annotation (CVE risk — arbitrary NGINX injection)
Use IP allowlist annotations for admin paths
Enable ModSecurity WAF for public APIs
Rotate TLS secrets; alert on <30 day expiry

Performance

Tune worker-processes: auto and worker-connections
Enable keep-alive to backend pods (reduces TCP handshake overhead)
Use proxy-buffering: on for slow clients
Enable Brotli/gzip compression
Use HTTP/2 for TLS connections

Multi-Tenancy

Dedicate an IngressClass per tenant for hard isolation
Use --watch-namespace to scope each controller
Apply ResourceQuota on Ingress objects per namespace
Consider migrating to Gateway API for cleaner role separation

← DNS & Service Discovery Gateway API →