Certificate Landscape in Kubernetes

A production Kubernetes cluster uses certificates at multiple layers — each with different issuers, lifetimes, and rotation mechanisms. Understanding the full landscape prevents surprise expirations that take down clusters or applications.

Certificate Layers in a Production Cluster
  Layer 1: Cluster PKI (managed by kubeadm / cloud provider)
  ├─ etcd peer CA  +  etcd server/client certs          (10 year CA, 1 year leaf)
  ├─ Kubernetes CA +  API server cert                   (10 year CA, 1 year leaf)
  ├─ Front-proxy CA                                     (aggregation layer)
  ├─ kubelet client cert (node identity to API server)  (auto-rotated by kubelet)
  └─ Service Account signing key pair                   (not a cert, RSA/ECDSA key)

  Layer 2: Ingress / Edge TLS (managed by cert-manager)
  ├─ Let's Encrypt ACME (public domains, 90-day, auto-renewed)
  └─ Internal CA (private domains, configurable lifetime)

  Layer 3: Workload mTLS (managed by service mesh)
  ├─ Istio SPIFFE certs (24-hour, auto-rotated by istiod)
  └─ SPIRE (external SPIFFE implementation for multi-cluster)

  Layer 4: Webhook & CRD server certs (managed by cert-manager)
  └─ Admission webhook TLS, CSR serving certs
CertificateIssuerLifetimeRotationImpact if expired
API server TLSCluster CA1 yearManual (kubeadm) / auto (EKS)All kubectl commands fail
kubelet client certCluster CA1 yearAuto (kubelet rotateCertificates)Node goes NotReady
etcd server certetcd CA1 yearManual (kubeadm / etcd restart)Cluster total outage
Ingress TLS (Let's Encrypt)ACME90 daysAuto (cert-manager, 30d before expiry)Browser TLS errors
Ingress TLS (internal CA)Internal CAConfigurableAuto (cert-manager)Internal service TLS errors
Webhook serving certcert-manager90 daysAuto (cert-manager)Admission webhooks reject all pods
Istio workload certistiod (SPIFFE)24 hoursAuto (Envoy xDS rotation)mTLS between services fails

cert-manager Installation and Operation

cert-manager is the de facto standard for automating certificate issuance and renewal in Kubernetes. It provides Certificate, Issuer, ClusterIssuer, and CertificateRequest CRDs that integrate with Let's Encrypt ACME, HashiCorp Vault, and internal CAs.

Install via Helm

helm repo add jetstack https://charts.jetstack.io
helm repo update

helm upgrade --install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.16.3 \
  --set installCRDs=true \
  --set replicaCount=2 \
  --set webhook.replicaCount=2 \
  --set cainjector.replicaCount=2 \
  --set prometheus.enabled=true \
  --set prometheus.servicemonitor.enabled=true \
  --set global.leaderElection.namespace=cert-manager

# Verify all cert-manager pods are running
kubectl get pods -n cert-manager

cert-manager Component Architecture

cert-manager Components
  cert-manager controller
    ├─ Watches Certificate, CertificateRequest, Order, Challenge CRDs
    ├─ Issues CertificateRequests to Issuers
    └─ Renews certs 30 days before expiry (or 2/3 of lifetime, whichever is sooner)

  cert-manager webhook
    ├─ Validates Issuer/Certificate spec on admission
    └─ Defaults missing fields (duration, renewBefore)

  cert-manager cainjector
    ├─ Injects CA bundles into ValidatingWebhookConfiguration
    └─ Keeps MutatingWebhookConfiguration caBundle up to date

  Reconciliation loop per Certificate:
    Certificate → CertificateRequest → Order (ACME) → Challenge (HTTP-01 or DNS-01)
                                     → CSR (Vault/CA)

cert-manager Health Check

# Check all cert-manager pods
kubectl get pods -n cert-manager -o wide

# Check the cert-manager controller logs
kubectl logs -n cert-manager \
  -l app.kubernetes.io/component=controller \
  --tail=50 --follow

# Check the webhook is healthy (test with a dry-run cert)
kubectl apply -f - --dry-run=server <<'EOF'
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: test-cert
  namespace: default
spec:
  secretName: test-cert-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames: ["test.example.com"]
EOF

# List all Certificates and their status across all namespaces
kubectl get certificates -A

# List failed CertificateRequests
kubectl get certificaterequests -A | grep -v True

# List all ACME Orders (pending challenges show here)
kubectl get orders -A

# List all ACME Challenges
kubectl get challenges -A

Issuers and ClusterIssuers

Issuer is namespace-scoped; ClusterIssuer is cluster-scoped and can issue certificates for any namespace. For shared infrastructure (Let's Encrypt, internal CA), always use ClusterIssuer.

Let's Encrypt ACME — HTTP-01 Challenge

HTTP-01 works by placing a file at http://<domain>/.well-known/acme-challenge/<token>. cert-manager creates a temporary Ingress/Service/Pod to serve the challenge response. Requires the domain to be publicly resolvable and port 80 accessible.

# Staging issuer for testing (won't burn rate limits)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: ops@example.com
    privateKeySecretRef:
      name: letsencrypt-staging-key
    solvers:
    - http01:
        ingress:
          ingressClassName: nginx

---
# Production issuer
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ops@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
    - http01:
        ingress:
          ingressClassName: nginx

Let's Encrypt ACME — DNS-01 Challenge (Route 53)

DNS-01 works by creating a TXT record _acme-challenge.<domain>. It is required for wildcard certificates and works for private/internal domains that are not publicly accessible via HTTP. Uses IRSA for Route 53 permissions.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-dns
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ops@example.com
    privateKeySecretRef:
      name: letsencrypt-dns-key
    solvers:
    - dns01:
        route53:
          region: us-east-1
          hostedZoneID: Z1D633PJN98FT9    # optional — scopes to specific zone
          # Use IRSA — no access key needed
      selector:
        dnsZones:
        - "example.com"
        - "internal.example.com"
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "route53:GetChange",
        "route53:ChangeResourceRecordSets",
        "route53:ListResourceRecordSets"
      ],
      "Resource": [
        "arn:aws:route53:::hostedzone/Z1D633PJN98FT9",
        "arn:aws:route53:::change/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["route53:ListHostedZonesByName"],
      "Resource": "*"
    }
  ]
}

Internal CA ClusterIssuer (Self-Signed Root)

For internal services, cluster-internal mTLS, and webhook certs: create a self-signed root CA in a Secret, then use it as a CA Issuer.

# Step 1: Bootstrap — create a self-signed issuer
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-bootstrap
spec:
  selfSigned: {}

---
# Step 2: Issue a root CA certificate using the bootstrap issuer
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: internal-root-ca
  namespace: cert-manager
spec:
  isCA: true
  commonName: "Internal Kubernetes CA"
  secretName: internal-root-ca-secret
  duration: 87600h     # 10 years for root CA
  renewBefore: 720h    # renew 30 days before
  subject:
    organizations: ["Example Corp"]
  privateKey:
    algorithm: ECDSA
    size: 384
  issuerRef:
    name: selfsigned-bootstrap
    kind: ClusterIssuer

---
# Step 3: Create a CA ClusterIssuer using the root CA cert
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: internal-ca
spec:
  ca:
    secretName: internal-root-ca-secret

Certificate Resources

Certificate for a Workload Service

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: payments-api-tls
  namespace: production
spec:
  secretName: payments-api-tls         # Secret created/updated by cert-manager
  duration: 2160h                      # 90 days
  renewBefore: 720h                    # renew 30 days before expiry
  subject:
    organizations: ["Example Corp"]
  commonName: payments-api.production.svc.cluster.local
  dnsNames:
  - payments-api.production.svc.cluster.local
  - payments-api.production.svc
  - payments-api
  - payments.example.com              # external domain if needed
  privateKey:
    algorithm: ECDSA
    size: 256
    rotationPolicy: Always            # generate new key on every renewal
  issuerRef:
    name: internal-ca
    kind: ClusterIssuer

Wildcard Certificate (DNS-01 Required)

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-example-com
  namespace: cert-manager
spec:
  secretName: wildcard-example-com-tls
  duration: 2160h
  renewBefore: 720h
  dnsNames:
  - "*.example.com"
  - "example.com"                      # SANs include bare domain too
  privateKey:
    algorithm: ECDSA
    size: 256
    rotationPolicy: Always
  issuerRef:
    name: letsencrypt-dns
    kind: ClusterIssuer
Wildcard Certificate Distribution

Once issued into a Secret in the cert-manager namespace, use External Secrets Operator or kubectl-reflector to replicate the Secret to namespaces that need it. Alternatively, issue per-namespace Certificates referencing the same ClusterIssuer — cert-manager deduplicates ACME Orders for the same domain.

Certificate Status Inspection

# Check a Certificate's status and renewal timeline
kubectl describe certificate payments-api-tls -n production
# Look for:
#   Status.Conditions: Ready=True
#   Status.NotBefore / NotAfter (expiry)
#   Status.RenewalTime (when cert-manager will auto-renew)

# Check the TLS Secret's actual expiry
kubectl get secret payments-api-tls -n production \
  -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | \
  openssl x509 -noout -dates -subject -issuer

# Check all certs and days until expiry (bash one-liner)
kubectl get secrets -A \
  -o json | \
jq -r '.items[] | select(.type=="kubernetes.io/tls") |
  [.metadata.namespace, .metadata.name, (.data."tls.crt" // "")] |
  @tsv' | \
while IFS=$'\t' read ns name cert; do
  if [ -n "$cert" ]; then
    EXPIRY=$(echo "$cert" | base64 -d | openssl x509 -noout -enddate 2>/dev/null | cut -d= -f2)
    DAYS=$(( ($(date -d "$EXPIRY" +%s 2>/dev/null || date -jf "%b %d %T %Y %Z" "$EXPIRY" +%s 2>/dev/null) - $(date +%s)) / 86400 ))
    echo "$DAYS days | $ns/$name | expires $EXPIRY"
  fi
done | sort -n

Ingress TLS Automation

NGINX Ingress with cert-manager Annotation

The simplest integration: annotate an Ingress with the issuer name. cert-manager's ingress-shim watches for this annotation and automatically creates a Certificate resource.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: payments-ingress
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - payments.example.com
    secretName: payments-tls           # cert-manager creates this Secret
  rules:
  - host: payments.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: payments-api
            port:
              number: 8080

Using a Pre-Issued Certificate Secret

For wildcard certs issued centrally (stored in cert-manager namespace), reference the replicated Secret directly:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: admin-ingress
  namespace: production
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - admin.example.com
    secretName: wildcard-example-com-tls   # pre-existing wildcard cert Secret
  rules:
  - host: admin.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: admin-ui
            port:
              number: 3000

ACME Challenge Debugging

# Find failing challenges
kubectl get challenges -A
kubectl describe challenge <challenge-name> -n <ns>

# For HTTP-01: verify the challenge URL is accessible
DOMAIN=payments.example.com
TOKEN=$(kubectl get challenge -n production -o jsonpath='{.items[0].spec.token}')
curl -vL "http://$DOMAIN/.well-known/acme-challenge/$TOKEN"
# Should return the key authorization string

# Check if cert-manager created the temporary solver pod/ingress
kubectl get pods -n production -l acme.cert-manager.io/http01-solver=true
kubectl get ingress -n production | grep cm-acme

# For DNS-01: verify the TXT record was created
dig +short TXT "_acme-challenge.$DOMAIN" @8.8.8.8

# Check cert-manager controller logs for ACME errors
kubectl logs -n cert-manager \
  -l app.kubernetes.io/component=controller \
  --tail=100 | grep -i "error\|acme\|challenge"

# Common HTTP-01 failures:
# - Firewall blocks port 80 from Let's Encrypt servers
# - Ingress class mismatch (solver uses wrong class)
# - Missing ingressClassName on ClusterIssuer solver

# Force re-issue a failed certificate
kubectl delete certificaterequest -n production \
  $(kubectl get certificaterequest -n production \
    -l cert-manager.io/certificate-name=payments-api-tls \
    -o jsonpath='{.items[0].metadata.name}')

Cluster PKI Certificate Management

EKS Managed Certificate Rotation

On EKS, AWS manages the Kubernetes CA and API server certificate. The kubelet client certificate is auto-rotated by the kubelet when rotateCertificates: true is set (covered in Security Hardening). The cluster CA itself can be rotated via EKS managed rotation.

# Check when EKS cluster CA expires
aws eks describe-cluster \
  --name <cluster-name> \
  --query 'cluster.certificateAuthority.data' \
  --output text | \
  base64 -d | \
  openssl x509 -noout -dates

# Check kubelet certificate on a node
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- \
  openssl x509 -in /etc/kubernetes/pki/kubelet.crt -noout -dates 2>/dev/null || \
  openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates

# Initiate EKS CA rotation (two-phase: issue new CA + activate)
aws eks update-cluster-config \
  --name <cluster-name> \
  --resources-vpc-config endpointPrivateAccess=true,endpointPublicAccess=true

Self-Managed Cluster: kubeadm Certificate Renewal

# Check all certificate expiry dates (kubeadm clusters)
kubeadm certs check-expiration

# Sample output:
# CERTIFICATE                EXPIRES                  RESIDUAL TIME   ...
# admin.conf                 May 24, 2027 10:00 UTC   364d            OK
# apiserver                  May 24, 2027 10:00 UTC   364d            OK
# apiserver-etcd-client      May 24, 2027 10:00 UTC   364d            OK
# etcd-healthcheck-client    May 24, 2027 10:00 UTC   364d            OK

# Renew all certificates (run on each control plane node)
# This rotates all certs but does NOT restart components automatically
kubeadm certs renew all

# Restart control plane components to pick up new certs
# (they read certs at startup for static pods, so restart static pods)
crictl rm $(crictl ps -a | grep kube-apiserver | awk '{print $1}')
crictl rm $(crictl ps -a | grep kube-controller | awk '{print $1}')
crictl rm $(crictl ps -a | grep kube-scheduler | awk '{print $1}')
crictl rm $(crictl ps -a | grep etcd | awk '{print $1}')

# kubelet picks up the new client cert automatically (rotateCertificates)
# Update kubeconfig after renewal
cp /etc/kubernetes/admin.conf ~/.kube/config
!
Expired API Server Certificate = Total Cluster Lock-out

If the API server certificate expires, kubectl stops working entirely. You cannot use cert-manager or any in-cluster tooling to recover — you must have out-of-band node access (SSH, SSM) to run kubeadm certs renew all directly on the control plane. Set up expiry alerts with at least 30 days advance warning.

Certificate Rotation Strategies

Zero-Downtime Application Certificate Rotation

When cert-manager renews a Certificate, it updates the TLS Secret in-place. Applications must reload the certificate without restart — or Kubernetes must restart them. The two main strategies:

StrategyHow it worksDowntimeBest for
Volume mount (file watch)App watches cert file path and reloads on change; K8s propagates Secret updates to mounted files within ~1 minuteNoneNGINX, custom servers with reload support
Rolling restart via ReloaderStakater Reloader watches Secrets; triggers rolling Deployment restart when TLS Secret changesRolling (pods replaced)Apps without file-watch, any container
Manual restart annotationcert-manager annotation triggers restart; or kubectl rollout restart after renewalRollingSimple deployments
Envoy SDS (xDS)Envoy dynamically fetches new certs via Secret Discovery Service — no restart neededNoneIstio-managed workloads

Stakater Reloader — Auto-Restart on Secret Change

helm repo add stakater https://stakater.github.io/stakater-charts
helm upgrade --install reloader stakater/reloader \
  --namespace reloader \
  --create-namespace \
  --set reloader.watchGlobally=false
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
  namespace: production
  annotations:
    # Trigger rolling restart when this Secret changes
    secret.reloader.stakater.com/reload: "payments-api-tls"
spec:
  # ...rest of deployment spec

Secret Rotation — Private Key Rotation Policy

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: payments-api-tls
  namespace: production
spec:
  secretName: payments-api-tls
  privateKey:
    rotationPolicy: Always    # generate NEW private key on every renewal
    # vs "Never" — reuse same key (easier for cert pinning, less secure)
  duration: 2160h
  renewBefore: 720h
  # ...rest of spec

mTLS with Istio

Istio uses SPIFFE (Secure Production Identity Framework For Everyone) X.509 certificates for pod-to-pod mTLS. Every pod gets a SPIFFE identity: spiffe://<trust-domain>/ns/<namespace>/sa/<service-account>. Certs are 24-hour by default and rotated automatically by istiod.

Enabling Strict mTLS Cluster-Wide

# Enforce mTLS for all services in the mesh (no plaintext)
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system     # mesh-wide policy
spec:
  mtls:
    mode: STRICT
# Verify mTLS is active between pods
istioctl x authz check <pod-name> -n production

# Check a pod's SPIFFE identity
kubectl exec -n production <pod-name> -c istio-proxy -- \
  openssl s_client -connect <target-service>:<port> -showcerts 2>/dev/null | \
  openssl x509 -noout -text | grep -A2 "Subject Alternative"
# Should show: URI:spiffe://cluster.local/ns/production/sa/payments-sa

# Check Citadel (istiod) cert rotation interval
kubectl get configmap istio -n istio-system \
  -o jsonpath='{.data.mesh}' | grep -i "workloadCertTtl\|certTtl"

# Force rotate all workload certs (by restarting istiod)
kubectl rollout restart deployment/istiod -n istio-system

AuthorizationPolicy for Service-to-Service mTLS

# Allow payments-api to call database service only from the payments SA
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: database-allow-payments
  namespace: production
spec:
  selector:
    matchLabels:
      app: database
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - "cluster.local/ns/production/sa/payments-sa"
    to:
    - operation:
        ports: ["5432"]

HashiCorp Vault PKI Integration

For organizations requiring an enterprise CA, audit trail for all certificate issuances, or short-lived certificates (minutes not days), cert-manager integrates with Vault's PKI secrets engine.

cert-manager Vault ClusterIssuer

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: vault-pki
spec:
  vault:
    server: https://vault.example.com
    path: pki/sign/kubernetes-role       # Vault PKI sign endpoint
    auth:
      kubernetes:
        mountPath: /v1/auth/kubernetes
        role: cert-manager
        secretRef:
          name: vault-cert-manager-token
          key: token
# Configure Vault PKI engine (one-time setup)
vault secrets enable pki
vault secrets tune -max-lease-ttl=8760h pki

# Generate root CA inside Vault
vault write -field=certificate pki/root/generate/internal \
  common_name="example.com" \
  ttl=87600h > /tmp/vault-root-ca.crt

# Create an intermediate CA for Kubernetes
vault secrets enable -path=pki_int pki
vault write -format=json pki_int/intermediate/generate/internal \
  common_name="kubernetes.example.com Intermediate CA" | \
  jq -r '.data.csr' > /tmp/pki_int.csr

vault write -format=json pki/root/sign-intermediate \
  csr=@/tmp/pki_int.csr \
  format=pem_bundle \
  ttl=43800h | \
  jq -r '.data.certificate' > /tmp/signed_cert.pem

vault write pki_int/intermediate/set-signed certificate=@/tmp/signed_cert.pem

# Create a role for cert-manager
vault write pki_int/roles/kubernetes-role \
  allowed_domains="svc.cluster.local,example.com" \
  allow_subdomains=true \
  max_ttl=720h \
  require_cn=false

# Configure Kubernetes auth method in Vault
vault auth enable kubernetes
vault write auth/kubernetes/config \
  kubernetes_host="https://kubernetes.default.svc" \
  kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
  token_reviewer_jwt=@/var/run/secrets/kubernetes.io/serviceaccount/token

vault write auth/kubernetes/role/cert-manager \
  bound_service_account_names=cert-manager \
  bound_service_account_namespaces=cert-manager \
  policies=pki-policy \
  ttl=1h

Troubleshooting Certificate Issues

Certificate Not Issuing / Stuck Pending

# Step 1: Check Certificate object status
kubectl describe certificate <name> -n <ns>
# Look for: Status.Conditions[] — Ready=False with Reason and Message

# Step 2: Check the CertificateRequest
kubectl get certificaterequest -n <ns> -l cert-manager.io/certificate-name=<name>
kubectl describe certificaterequest <cr-name> -n <ns>

# Step 3: For ACME — check Order and Challenge
kubectl get order -n <ns>
kubectl describe order <order-name> -n <ns>
kubectl get challenge -n <ns>
kubectl describe challenge <challenge-name> -n <ns>

# Step 4: Check cert-manager controller logs for errors
kubectl logs -n cert-manager \
  -l app.kubernetes.io/component=controller \
  --since=10m | grep -i "error\|failed"

# Step 5: Check Issuer/ClusterIssuer is Ready
kubectl get clusterissuer
# STATUS column should be True/Ready
kubectl describe clusterissuer letsencrypt-prod | grep -A5 Conditions

# Step 6: For DNS-01 — verify solver has Route 53 permissions
kubectl logs -n cert-manager \
  -l app.kubernetes.io/component=controller \
  --since=10m | grep -i "route53\|dns"

Certificate Expired — Emergency Recovery

# Force immediate reissuance (cert-manager normally won't reissue a valid cert)
# Method 1: add the force-renewal annotation
kubectl annotate certificate <name> -n <ns> \
  cert-manager.io/issuer-name- \
  --overwrite
kubectl annotate certificate <name> -n <ns> \
  cert-manager.io/renew-before="8760h" \
  --overwrite

# Method 2: delete the Certificate's TLS Secret — cert-manager detects it's gone
# and immediately issues a new one (brief outage during reissue)
kubectl delete secret <tls-secret-name> -n <ns>

# Method 3: cmctl (cert-manager CLI tool)
cmctl renew <certificate-name> -n <ns>

# Install cmctl
curl -L https://github.com/cert-manager/cert-manager/releases/latest/download/cmctl-linux-amd64 \
  -o /usr/local/bin/cmctl && chmod +x /usr/local/bin/cmctl

# Check all cert statuses at a glance
cmctl status certificate -A

Webhook Certificate Bootstrap Problem

Chicken-and-Egg: cert-manager Webhook Needs a Certificate to Issue Certificates

If cert-manager's webhook certificate expires or is missing, cert-manager's own webhook becomes unavailable, blocking all Certificate resource operations. The cainjector handles keeping the webhook CA bundle up to date. If you hit this state, delete the webhook temporarily: kubectl delete validatingwebhookconfiguration cert-manager-webhook, let cert-manager re-issue its own cert, then re-install cert-manager to restore the webhook.

TLS Debugging with openssl

# Test TLS handshake and certificate chain from outside the cluster
openssl s_client -connect payments.example.com:443 -showcerts </dev/null 2>/dev/null | \
  openssl x509 -noout -text | grep -E "Issuer:|Subject:|Not (Before|After)"

# From inside the cluster — port-forward to the service and test
kubectl port-forward svc/payments-api 8443:443 -n production &
openssl s_client -connect 127.0.0.1:8443 -servername payments-api \
  </dev/null 2>/dev/null | openssl x509 -noout -dates

# Verify cert chain (both leaf and intermediates)
openssl s_client -connect payments.example.com:443 \
  -CAfile /etc/ssl/certs/ca-certificates.crt 2>&1 | \
  grep -E "verify|Verification"

# Check which SANs are in a certificate
kubectl get secret payments-api-tls -n production \
  -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | \
  openssl x509 -noout -text | grep -A10 "Subject Alternative"

Certificate Alerting

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: certificate-management-alerts
  namespace: monitoring
  labels:
    prometheus: kube-prometheus
    role: alert-rules
spec:
  groups:
  - name: cert-manager.certificates
    rules:

    - alert: CertificateExpiryWarning
      expr: |
        certmanager_certificate_expiration_timestamp_seconds
          - time() < 30 * 24 * 3600
      for: 1h
      labels:
        severity: warning
      annotations:
        summary: "Certificate expires in less than 30 days"
        description: "Certificate {{ $labels.name }} in namespace {{ $labels.namespace }} expires in {{ $value | humanizeDuration }}."

    - alert: CertificateExpiryCritical
      expr: |
        certmanager_certificate_expiration_timestamp_seconds
          - time() < 7 * 24 * 3600
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Certificate expires in less than 7 days"
        description: "Certificate {{ $labels.name }} in namespace {{ $labels.namespace }} expires in {{ $value | humanizeDuration }}. Immediate renewal required."

    - alert: CertificateNotReady
      expr: |
        certmanager_certificate_ready_status{condition="False"} == 1
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "cert-manager Certificate is not Ready"
        description: "Certificate {{ $labels.name }} in namespace {{ $labels.namespace }} has been in a non-Ready state for > 10 minutes."

    - alert: CertificateRenewalFailed
      expr: |
        increase(certmanager_certificate_renewal_errors_total[1h]) > 0
      labels:
        severity: warning
      annotations:
        summary: "cert-manager certificate renewal failed"
        description: "Certificate {{ $labels.name }} in namespace {{ $labels.namespace }} failed to renew. Check cert-manager controller logs."

  - name: cert-manager.acme
    rules:

    - alert: ACMEOrderFailed
      expr: |
        certmanager_acme_client_request_count{status=~"4..|5.."} > 0
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "ACME order HTTP error responses"
        description: "cert-manager ACME client is receiving {{ $value }} error responses. Let's Encrypt rate limits or outage may be affecting certificate issuance."

  - name: cert-manager.tls-secrets
    rules:

    - alert: TLSSecretExpiringSoon
      expr: |
        (
          kube_secret_info{type="kubernetes.io/tls"}
            * on (namespace, secret_name)
          group_right()
          label_replace(
            certmanager_certificate_expiration_timestamp_seconds,
            "secret_name", "$1", "name", "(.*)"
          )
        ) - time() < 14 * 24 * 3600
      for: 1h
      labels:
        severity: warning
      annotations:
        summary: "TLS Secret expires in less than 14 days"
        description: "TLS Secret {{ $labels.secret_name }} in {{ $labels.namespace }} expires soon."

  - name: cluster-pki
    rules:

    - alert: KubeAPIServerCertExpiringSoon
      expr: |
        apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0
          and
        histogram_quantile(0.01, rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))
          < 7 * 24 * 3600
      labels:
        severity: critical
      annotations:
        summary: "Kubernetes API server client certificate expires in < 7 days"
        description: "A client certificate used with the API server expires in less than 7 days. Run `kubeadm certs renew all` on control plane nodes."

cert-manager Grafana Dashboard Key Metrics

# Total certificates by Ready status
count by (condition) (certmanager_certificate_ready_status)

# Certificates expiring within N days
count(
  certmanager_certificate_expiration_timestamp_seconds - time()
    < 30 * 24 * 3600
)

# ACME request rate by status
sum by (status) (
  rate(certmanager_acme_client_request_count[5m])
)

# Controller reconcile duration p99
histogram_quantile(0.99,
  sum by (le, controller) (
    rate(certmanager_controller_sync_call_count_bucket[5m])
  )
)

# Certificates successfully renewed in last 24h
increase(certmanager_certificate_renewal_count_total[24h])

Best Practices Summary

Use ClusterIssuers for Shared CAs

Always define Let's Encrypt and internal CA issuers as ClusterIssuer — namespace-scoped Issuer requires duplication per namespace and creates operational overhead.

Staging Before Production

Test all ACME configurations against letsencrypt-staging first. Let's Encrypt production has a rate limit of 5 duplicate certificates per domain per week — a misconfigured challenge burns these quickly.

rotationPolicy: Always

Set privateKey.rotationPolicy: Always on Certificate resources. Reusing the same private key on renewal means a compromised key stays valid indefinitely — each renewal should generate a fresh key pair.

Alert at 30 Days, Not 7

Set warning alerts at 30 days expiry, critical at 7 days. cert-manager renews 30 days before expiry by default, but DNS propagation delays, ACME outages, or rate limits can stall renewal for days. 30-day warning gives time to intervene.

DNS-01 for Internal Domains

Use DNS-01 challenge for any domain that is not publicly reachable via HTTP — internal services, VPN-only dashboards, staging environments. HTTP-01 requires port 80 to be publicly accessible from Let's Encrypt servers.

Audit Cluster PKI Annually

Run kubeadm certs check-expiration at least annually on self-managed clusters. The cluster CA is 10 years, but leaf certs (API server, etcd) are 1 year. A missed annual renewal causes a hard cluster outage.