Certificate Management
Managing TLS certificates for workloads, ingress, cluster components, and service mesh mTLS — with automated issuance, renewal, and rotation using cert-manager.
Certificate Landscape in Kubernetes
A production Kubernetes cluster uses certificates at multiple layers — each with different issuers, lifetimes, and rotation mechanisms. Understanding the full landscape prevents surprise expirations that take down clusters or applications.
Layer 1: Cluster PKI (managed by kubeadm / cloud provider) ├─ etcd peer CA + etcd server/client certs (10 year CA, 1 year leaf) ├─ Kubernetes CA + API server cert (10 year CA, 1 year leaf) ├─ Front-proxy CA (aggregation layer) ├─ kubelet client cert (node identity to API server) (auto-rotated by kubelet) └─ Service Account signing key pair (not a cert, RSA/ECDSA key) Layer 2: Ingress / Edge TLS (managed by cert-manager) ├─ Let's Encrypt ACME (public domains, 90-day, auto-renewed) └─ Internal CA (private domains, configurable lifetime) Layer 3: Workload mTLS (managed by service mesh) ├─ Istio SPIFFE certs (24-hour, auto-rotated by istiod) └─ SPIRE (external SPIFFE implementation for multi-cluster) Layer 4: Webhook & CRD server certs (managed by cert-manager) └─ Admission webhook TLS, CSR serving certs
| Certificate | Issuer | Lifetime | Rotation | Impact if expired |
|---|---|---|---|---|
| API server TLS | Cluster CA | 1 year | Manual (kubeadm) / auto (EKS) | All kubectl commands fail |
| kubelet client cert | Cluster CA | 1 year | Auto (kubelet rotateCertificates) | Node goes NotReady |
| etcd server cert | etcd CA | 1 year | Manual (kubeadm / etcd restart) | Cluster total outage |
| Ingress TLS (Let's Encrypt) | ACME | 90 days | Auto (cert-manager, 30d before expiry) | Browser TLS errors |
| Ingress TLS (internal CA) | Internal CA | Configurable | Auto (cert-manager) | Internal service TLS errors |
| Webhook serving cert | cert-manager | 90 days | Auto (cert-manager) | Admission webhooks reject all pods |
| Istio workload cert | istiod (SPIFFE) | 24 hours | Auto (Envoy xDS rotation) | mTLS between services fails |
cert-manager Installation and Operation
cert-manager is the de facto standard for automating certificate issuance and renewal in Kubernetes. It provides Certificate, Issuer, ClusterIssuer, and CertificateRequest CRDs that integrate with Let's Encrypt ACME, HashiCorp Vault, and internal CAs.
Install via Helm
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm upgrade --install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.16.3 \
--set installCRDs=true \
--set replicaCount=2 \
--set webhook.replicaCount=2 \
--set cainjector.replicaCount=2 \
--set prometheus.enabled=true \
--set prometheus.servicemonitor.enabled=true \
--set global.leaderElection.namespace=cert-manager
# Verify all cert-manager pods are running
kubectl get pods -n cert-manager
cert-manager Component Architecture
cert-manager controller
├─ Watches Certificate, CertificateRequest, Order, Challenge CRDs
├─ Issues CertificateRequests to Issuers
└─ Renews certs 30 days before expiry (or 2/3 of lifetime, whichever is sooner)
cert-manager webhook
├─ Validates Issuer/Certificate spec on admission
└─ Defaults missing fields (duration, renewBefore)
cert-manager cainjector
├─ Injects CA bundles into ValidatingWebhookConfiguration
└─ Keeps MutatingWebhookConfiguration caBundle up to date
Reconciliation loop per Certificate:
Certificate → CertificateRequest → Order (ACME) → Challenge (HTTP-01 or DNS-01)
→ CSR (Vault/CA)
cert-manager Health Check
# Check all cert-manager pods
kubectl get pods -n cert-manager -o wide
# Check the cert-manager controller logs
kubectl logs -n cert-manager \
-l app.kubernetes.io/component=controller \
--tail=50 --follow
# Check the webhook is healthy (test with a dry-run cert)
kubectl apply -f - --dry-run=server <<'EOF'
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: test-cert
namespace: default
spec:
secretName: test-cert-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames: ["test.example.com"]
EOF
# List all Certificates and their status across all namespaces
kubectl get certificates -A
# List failed CertificateRequests
kubectl get certificaterequests -A | grep -v True
# List all ACME Orders (pending challenges show here)
kubectl get orders -A
# List all ACME Challenges
kubectl get challenges -A
Issuers and ClusterIssuers
Issuer is namespace-scoped; ClusterIssuer is cluster-scoped and can issue certificates for any namespace. For shared infrastructure (Let's Encrypt, internal CA), always use ClusterIssuer.
Let's Encrypt ACME — HTTP-01 Challenge
HTTP-01 works by placing a file at http://<domain>/.well-known/acme-challenge/<token>. cert-manager creates a temporary Ingress/Service/Pod to serve the challenge response. Requires the domain to be publicly resolvable and port 80 accessible.
# Staging issuer for testing (won't burn rate limits)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: ops@example.com
privateKeySecretRef:
name: letsencrypt-staging-key
solvers:
- http01:
ingress:
ingressClassName: nginx
---
# Production issuer
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: ops@example.com
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- http01:
ingress:
ingressClassName: nginx
Let's Encrypt ACME — DNS-01 Challenge (Route 53)
DNS-01 works by creating a TXT record _acme-challenge.<domain>. It is required for wildcard certificates and works for private/internal domains that are not publicly accessible via HTTP. Uses IRSA for Route 53 permissions.
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-dns
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: ops@example.com
privateKeySecretRef:
name: letsencrypt-dns-key
solvers:
- dns01:
route53:
region: us-east-1
hostedZoneID: Z1D633PJN98FT9 # optional — scopes to specific zone
# Use IRSA — no access key needed
selector:
dnsZones:
- "example.com"
- "internal.example.com"
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"route53:GetChange",
"route53:ChangeResourceRecordSets",
"route53:ListResourceRecordSets"
],
"Resource": [
"arn:aws:route53:::hostedzone/Z1D633PJN98FT9",
"arn:aws:route53:::change/*"
]
},
{
"Effect": "Allow",
"Action": ["route53:ListHostedZonesByName"],
"Resource": "*"
}
]
}
Internal CA ClusterIssuer (Self-Signed Root)
For internal services, cluster-internal mTLS, and webhook certs: create a self-signed root CA in a Secret, then use it as a CA Issuer.
# Step 1: Bootstrap — create a self-signed issuer
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: selfsigned-bootstrap
spec:
selfSigned: {}
---
# Step 2: Issue a root CA certificate using the bootstrap issuer
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: internal-root-ca
namespace: cert-manager
spec:
isCA: true
commonName: "Internal Kubernetes CA"
secretName: internal-root-ca-secret
duration: 87600h # 10 years for root CA
renewBefore: 720h # renew 30 days before
subject:
organizations: ["Example Corp"]
privateKey:
algorithm: ECDSA
size: 384
issuerRef:
name: selfsigned-bootstrap
kind: ClusterIssuer
---
# Step 3: Create a CA ClusterIssuer using the root CA cert
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: internal-ca
spec:
ca:
secretName: internal-root-ca-secret
Certificate Resources
Certificate for a Workload Service
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: payments-api-tls
namespace: production
spec:
secretName: payments-api-tls # Secret created/updated by cert-manager
duration: 2160h # 90 days
renewBefore: 720h # renew 30 days before expiry
subject:
organizations: ["Example Corp"]
commonName: payments-api.production.svc.cluster.local
dnsNames:
- payments-api.production.svc.cluster.local
- payments-api.production.svc
- payments-api
- payments.example.com # external domain if needed
privateKey:
algorithm: ECDSA
size: 256
rotationPolicy: Always # generate new key on every renewal
issuerRef:
name: internal-ca
kind: ClusterIssuer
Wildcard Certificate (DNS-01 Required)
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: wildcard-example-com
namespace: cert-manager
spec:
secretName: wildcard-example-com-tls
duration: 2160h
renewBefore: 720h
dnsNames:
- "*.example.com"
- "example.com" # SANs include bare domain too
privateKey:
algorithm: ECDSA
size: 256
rotationPolicy: Always
issuerRef:
name: letsencrypt-dns
kind: ClusterIssuer
Once issued into a Secret in the cert-manager namespace, use External Secrets Operator or kubectl-reflector to replicate the Secret to namespaces that need it. Alternatively, issue per-namespace Certificates referencing the same ClusterIssuer — cert-manager deduplicates ACME Orders for the same domain.
Certificate Status Inspection
# Check a Certificate's status and renewal timeline
kubectl describe certificate payments-api-tls -n production
# Look for:
# Status.Conditions: Ready=True
# Status.NotBefore / NotAfter (expiry)
# Status.RenewalTime (when cert-manager will auto-renew)
# Check the TLS Secret's actual expiry
kubectl get secret payments-api-tls -n production \
-o jsonpath='{.data.tls\.crt}' | \
base64 -d | \
openssl x509 -noout -dates -subject -issuer
# Check all certs and days until expiry (bash one-liner)
kubectl get secrets -A \
-o json | \
jq -r '.items[] | select(.type=="kubernetes.io/tls") |
[.metadata.namespace, .metadata.name, (.data."tls.crt" // "")] |
@tsv' | \
while IFS=$'\t' read ns name cert; do
if [ -n "$cert" ]; then
EXPIRY=$(echo "$cert" | base64 -d | openssl x509 -noout -enddate 2>/dev/null | cut -d= -f2)
DAYS=$(( ($(date -d "$EXPIRY" +%s 2>/dev/null || date -jf "%b %d %T %Y %Z" "$EXPIRY" +%s 2>/dev/null) - $(date +%s)) / 86400 ))
echo "$DAYS days | $ns/$name | expires $EXPIRY"
fi
done | sort -n
Ingress TLS Automation
NGINX Ingress with cert-manager Annotation
The simplest integration: annotate an Ingress with the issuer name. cert-manager's ingress-shim watches for this annotation and automatically creates a Certificate resource.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: payments-ingress
namespace: production
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- payments.example.com
secretName: payments-tls # cert-manager creates this Secret
rules:
- host: payments.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: payments-api
port:
number: 8080
Using a Pre-Issued Certificate Secret
For wildcard certs issued centrally (stored in cert-manager namespace), reference the replicated Secret directly:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: admin-ingress
namespace: production
spec:
ingressClassName: nginx
tls:
- hosts:
- admin.example.com
secretName: wildcard-example-com-tls # pre-existing wildcard cert Secret
rules:
- host: admin.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: admin-ui
port:
number: 3000
ACME Challenge Debugging
# Find failing challenges
kubectl get challenges -A
kubectl describe challenge <challenge-name> -n <ns>
# For HTTP-01: verify the challenge URL is accessible
DOMAIN=payments.example.com
TOKEN=$(kubectl get challenge -n production -o jsonpath='{.items[0].spec.token}')
curl -vL "http://$DOMAIN/.well-known/acme-challenge/$TOKEN"
# Should return the key authorization string
# Check if cert-manager created the temporary solver pod/ingress
kubectl get pods -n production -l acme.cert-manager.io/http01-solver=true
kubectl get ingress -n production | grep cm-acme
# For DNS-01: verify the TXT record was created
dig +short TXT "_acme-challenge.$DOMAIN" @8.8.8.8
# Check cert-manager controller logs for ACME errors
kubectl logs -n cert-manager \
-l app.kubernetes.io/component=controller \
--tail=100 | grep -i "error\|acme\|challenge"
# Common HTTP-01 failures:
# - Firewall blocks port 80 from Let's Encrypt servers
# - Ingress class mismatch (solver uses wrong class)
# - Missing ingressClassName on ClusterIssuer solver
# Force re-issue a failed certificate
kubectl delete certificaterequest -n production \
$(kubectl get certificaterequest -n production \
-l cert-manager.io/certificate-name=payments-api-tls \
-o jsonpath='{.items[0].metadata.name}')
Cluster PKI Certificate Management
EKS Managed Certificate Rotation
On EKS, AWS manages the Kubernetes CA and API server certificate. The kubelet client certificate is auto-rotated by the kubelet when rotateCertificates: true is set (covered in Security Hardening). The cluster CA itself can be rotated via EKS managed rotation.
# Check when EKS cluster CA expires
aws eks describe-cluster \
--name <cluster-name> \
--query 'cluster.certificateAuthority.data' \
--output text | \
base64 -d | \
openssl x509 -noout -dates
# Check kubelet certificate on a node
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- \
openssl x509 -in /etc/kubernetes/pki/kubelet.crt -noout -dates 2>/dev/null || \
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates
# Initiate EKS CA rotation (two-phase: issue new CA + activate)
aws eks update-cluster-config \
--name <cluster-name> \
--resources-vpc-config endpointPrivateAccess=true,endpointPublicAccess=true
Self-Managed Cluster: kubeadm Certificate Renewal
# Check all certificate expiry dates (kubeadm clusters)
kubeadm certs check-expiration
# Sample output:
# CERTIFICATE EXPIRES RESIDUAL TIME ...
# admin.conf May 24, 2027 10:00 UTC 364d OK
# apiserver May 24, 2027 10:00 UTC 364d OK
# apiserver-etcd-client May 24, 2027 10:00 UTC 364d OK
# etcd-healthcheck-client May 24, 2027 10:00 UTC 364d OK
# Renew all certificates (run on each control plane node)
# This rotates all certs but does NOT restart components automatically
kubeadm certs renew all
# Restart control plane components to pick up new certs
# (they read certs at startup for static pods, so restart static pods)
crictl rm $(crictl ps -a | grep kube-apiserver | awk '{print $1}')
crictl rm $(crictl ps -a | grep kube-controller | awk '{print $1}')
crictl rm $(crictl ps -a | grep kube-scheduler | awk '{print $1}')
crictl rm $(crictl ps -a | grep etcd | awk '{print $1}')
# kubelet picks up the new client cert automatically (rotateCertificates)
# Update kubeconfig after renewal
cp /etc/kubernetes/admin.conf ~/.kube/config
If the API server certificate expires, kubectl stops working entirely. You cannot use cert-manager or any in-cluster tooling to recover — you must have out-of-band node access (SSH, SSM) to run kubeadm certs renew all directly on the control plane. Set up expiry alerts with at least 30 days advance warning.
Certificate Rotation Strategies
Zero-Downtime Application Certificate Rotation
When cert-manager renews a Certificate, it updates the TLS Secret in-place. Applications must reload the certificate without restart — or Kubernetes must restart them. The two main strategies:
| Strategy | How it works | Downtime | Best for |
|---|---|---|---|
| Volume mount (file watch) | App watches cert file path and reloads on change; K8s propagates Secret updates to mounted files within ~1 minute | None | NGINX, custom servers with reload support |
| Rolling restart via Reloader | Stakater Reloader watches Secrets; triggers rolling Deployment restart when TLS Secret changes | Rolling (pods replaced) | Apps without file-watch, any container |
| Manual restart annotation | cert-manager annotation triggers restart; or kubectl rollout restart after renewal | Rolling | Simple deployments |
| Envoy SDS (xDS) | Envoy dynamically fetches new certs via Secret Discovery Service — no restart needed | None | Istio-managed workloads |
Stakater Reloader — Auto-Restart on Secret Change
helm repo add stakater https://stakater.github.io/stakater-charts
helm upgrade --install reloader stakater/reloader \
--namespace reloader \
--create-namespace \
--set reloader.watchGlobally=false
apiVersion: apps/v1
kind: Deployment
metadata:
name: payments-api
namespace: production
annotations:
# Trigger rolling restart when this Secret changes
secret.reloader.stakater.com/reload: "payments-api-tls"
spec:
# ...rest of deployment spec
Secret Rotation — Private Key Rotation Policy
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: payments-api-tls
namespace: production
spec:
secretName: payments-api-tls
privateKey:
rotationPolicy: Always # generate NEW private key on every renewal
# vs "Never" — reuse same key (easier for cert pinning, less secure)
duration: 2160h
renewBefore: 720h
# ...rest of spec
mTLS with Istio
Istio uses SPIFFE (Secure Production Identity Framework For Everyone) X.509 certificates for pod-to-pod mTLS. Every pod gets a SPIFFE identity: spiffe://<trust-domain>/ns/<namespace>/sa/<service-account>. Certs are 24-hour by default and rotated automatically by istiod.
Enabling Strict mTLS Cluster-Wide
# Enforce mTLS for all services in the mesh (no plaintext)
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system # mesh-wide policy
spec:
mtls:
mode: STRICT
# Verify mTLS is active between pods
istioctl x authz check <pod-name> -n production
# Check a pod's SPIFFE identity
kubectl exec -n production <pod-name> -c istio-proxy -- \
openssl s_client -connect <target-service>:<port> -showcerts 2>/dev/null | \
openssl x509 -noout -text | grep -A2 "Subject Alternative"
# Should show: URI:spiffe://cluster.local/ns/production/sa/payments-sa
# Check Citadel (istiod) cert rotation interval
kubectl get configmap istio -n istio-system \
-o jsonpath='{.data.mesh}' | grep -i "workloadCertTtl\|certTtl"
# Force rotate all workload certs (by restarting istiod)
kubectl rollout restart deployment/istiod -n istio-system
AuthorizationPolicy for Service-to-Service mTLS
# Allow payments-api to call database service only from the payments SA
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: database-allow-payments
namespace: production
spec:
selector:
matchLabels:
app: database
action: ALLOW
rules:
- from:
- source:
principals:
- "cluster.local/ns/production/sa/payments-sa"
to:
- operation:
ports: ["5432"]
HashiCorp Vault PKI Integration
For organizations requiring an enterprise CA, audit trail for all certificate issuances, or short-lived certificates (minutes not days), cert-manager integrates with Vault's PKI secrets engine.
cert-manager Vault ClusterIssuer
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: vault-pki
spec:
vault:
server: https://vault.example.com
path: pki/sign/kubernetes-role # Vault PKI sign endpoint
auth:
kubernetes:
mountPath: /v1/auth/kubernetes
role: cert-manager
secretRef:
name: vault-cert-manager-token
key: token
# Configure Vault PKI engine (one-time setup)
vault secrets enable pki
vault secrets tune -max-lease-ttl=8760h pki
# Generate root CA inside Vault
vault write -field=certificate pki/root/generate/internal \
common_name="example.com" \
ttl=87600h > /tmp/vault-root-ca.crt
# Create an intermediate CA for Kubernetes
vault secrets enable -path=pki_int pki
vault write -format=json pki_int/intermediate/generate/internal \
common_name="kubernetes.example.com Intermediate CA" | \
jq -r '.data.csr' > /tmp/pki_int.csr
vault write -format=json pki/root/sign-intermediate \
csr=@/tmp/pki_int.csr \
format=pem_bundle \
ttl=43800h | \
jq -r '.data.certificate' > /tmp/signed_cert.pem
vault write pki_int/intermediate/set-signed certificate=@/tmp/signed_cert.pem
# Create a role for cert-manager
vault write pki_int/roles/kubernetes-role \
allowed_domains="svc.cluster.local,example.com" \
allow_subdomains=true \
max_ttl=720h \
require_cn=false
# Configure Kubernetes auth method in Vault
vault auth enable kubernetes
vault write auth/kubernetes/config \
kubernetes_host="https://kubernetes.default.svc" \
kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
token_reviewer_jwt=@/var/run/secrets/kubernetes.io/serviceaccount/token
vault write auth/kubernetes/role/cert-manager \
bound_service_account_names=cert-manager \
bound_service_account_namespaces=cert-manager \
policies=pki-policy \
ttl=1h
Troubleshooting Certificate Issues
Certificate Not Issuing / Stuck Pending
# Step 1: Check Certificate object status
kubectl describe certificate <name> -n <ns>
# Look for: Status.Conditions[] — Ready=False with Reason and Message
# Step 2: Check the CertificateRequest
kubectl get certificaterequest -n <ns> -l cert-manager.io/certificate-name=<name>
kubectl describe certificaterequest <cr-name> -n <ns>
# Step 3: For ACME — check Order and Challenge
kubectl get order -n <ns>
kubectl describe order <order-name> -n <ns>
kubectl get challenge -n <ns>
kubectl describe challenge <challenge-name> -n <ns>
# Step 4: Check cert-manager controller logs for errors
kubectl logs -n cert-manager \
-l app.kubernetes.io/component=controller \
--since=10m | grep -i "error\|failed"
# Step 5: Check Issuer/ClusterIssuer is Ready
kubectl get clusterissuer
# STATUS column should be True/Ready
kubectl describe clusterissuer letsencrypt-prod | grep -A5 Conditions
# Step 6: For DNS-01 — verify solver has Route 53 permissions
kubectl logs -n cert-manager \
-l app.kubernetes.io/component=controller \
--since=10m | grep -i "route53\|dns"
Certificate Expired — Emergency Recovery
# Force immediate reissuance (cert-manager normally won't reissue a valid cert)
# Method 1: add the force-renewal annotation
kubectl annotate certificate <name> -n <ns> \
cert-manager.io/issuer-name- \
--overwrite
kubectl annotate certificate <name> -n <ns> \
cert-manager.io/renew-before="8760h" \
--overwrite
# Method 2: delete the Certificate's TLS Secret — cert-manager detects it's gone
# and immediately issues a new one (brief outage during reissue)
kubectl delete secret <tls-secret-name> -n <ns>
# Method 3: cmctl (cert-manager CLI tool)
cmctl renew <certificate-name> -n <ns>
# Install cmctl
curl -L https://github.com/cert-manager/cert-manager/releases/latest/download/cmctl-linux-amd64 \
-o /usr/local/bin/cmctl && chmod +x /usr/local/bin/cmctl
# Check all cert statuses at a glance
cmctl status certificate -A
Webhook Certificate Bootstrap Problem
If cert-manager's webhook certificate expires or is missing, cert-manager's own webhook becomes unavailable, blocking all Certificate resource operations. The cainjector handles keeping the webhook CA bundle up to date. If you hit this state, delete the webhook temporarily: kubectl delete validatingwebhookconfiguration cert-manager-webhook, let cert-manager re-issue its own cert, then re-install cert-manager to restore the webhook.
TLS Debugging with openssl
# Test TLS handshake and certificate chain from outside the cluster
openssl s_client -connect payments.example.com:443 -showcerts </dev/null 2>/dev/null | \
openssl x509 -noout -text | grep -E "Issuer:|Subject:|Not (Before|After)"
# From inside the cluster — port-forward to the service and test
kubectl port-forward svc/payments-api 8443:443 -n production &
openssl s_client -connect 127.0.0.1:8443 -servername payments-api \
</dev/null 2>/dev/null | openssl x509 -noout -dates
# Verify cert chain (both leaf and intermediates)
openssl s_client -connect payments.example.com:443 \
-CAfile /etc/ssl/certs/ca-certificates.crt 2>&1 | \
grep -E "verify|Verification"
# Check which SANs are in a certificate
kubectl get secret payments-api-tls -n production \
-o jsonpath='{.data.tls\.crt}' | \
base64 -d | \
openssl x509 -noout -text | grep -A10 "Subject Alternative"
Certificate Alerting
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: certificate-management-alerts
namespace: monitoring
labels:
prometheus: kube-prometheus
role: alert-rules
spec:
groups:
- name: cert-manager.certificates
rules:
- alert: CertificateExpiryWarning
expr: |
certmanager_certificate_expiration_timestamp_seconds
- time() < 30 * 24 * 3600
for: 1h
labels:
severity: warning
annotations:
summary: "Certificate expires in less than 30 days"
description: "Certificate {{ $labels.name }} in namespace {{ $labels.namespace }} expires in {{ $value | humanizeDuration }}."
- alert: CertificateExpiryCritical
expr: |
certmanager_certificate_expiration_timestamp_seconds
- time() < 7 * 24 * 3600
for: 5m
labels:
severity: critical
annotations:
summary: "Certificate expires in less than 7 days"
description: "Certificate {{ $labels.name }} in namespace {{ $labels.namespace }} expires in {{ $value | humanizeDuration }}. Immediate renewal required."
- alert: CertificateNotReady
expr: |
certmanager_certificate_ready_status{condition="False"} == 1
for: 10m
labels:
severity: warning
annotations:
summary: "cert-manager Certificate is not Ready"
description: "Certificate {{ $labels.name }} in namespace {{ $labels.namespace }} has been in a non-Ready state for > 10 minutes."
- alert: CertificateRenewalFailed
expr: |
increase(certmanager_certificate_renewal_errors_total[1h]) > 0
labels:
severity: warning
annotations:
summary: "cert-manager certificate renewal failed"
description: "Certificate {{ $labels.name }} in namespace {{ $labels.namespace }} failed to renew. Check cert-manager controller logs."
- name: cert-manager.acme
rules:
- alert: ACMEOrderFailed
expr: |
certmanager_acme_client_request_count{status=~"4..|5.."} > 0
for: 5m
labels:
severity: warning
annotations:
summary: "ACME order HTTP error responses"
description: "cert-manager ACME client is receiving {{ $value }} error responses. Let's Encrypt rate limits or outage may be affecting certificate issuance."
- name: cert-manager.tls-secrets
rules:
- alert: TLSSecretExpiringSoon
expr: |
(
kube_secret_info{type="kubernetes.io/tls"}
* on (namespace, secret_name)
group_right()
label_replace(
certmanager_certificate_expiration_timestamp_seconds,
"secret_name", "$1", "name", "(.*)"
)
) - time() < 14 * 24 * 3600
for: 1h
labels:
severity: warning
annotations:
summary: "TLS Secret expires in less than 14 days"
description: "TLS Secret {{ $labels.secret_name }} in {{ $labels.namespace }} expires soon."
- name: cluster-pki
rules:
- alert: KubeAPIServerCertExpiringSoon
expr: |
apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0
and
histogram_quantile(0.01, rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))
< 7 * 24 * 3600
labels:
severity: critical
annotations:
summary: "Kubernetes API server client certificate expires in < 7 days"
description: "A client certificate used with the API server expires in less than 7 days. Run `kubeadm certs renew all` on control plane nodes."
cert-manager Grafana Dashboard Key Metrics
# Total certificates by Ready status
count by (condition) (certmanager_certificate_ready_status)
# Certificates expiring within N days
count(
certmanager_certificate_expiration_timestamp_seconds - time()
< 30 * 24 * 3600
)
# ACME request rate by status
sum by (status) (
rate(certmanager_acme_client_request_count[5m])
)
# Controller reconcile duration p99
histogram_quantile(0.99,
sum by (le, controller) (
rate(certmanager_controller_sync_call_count_bucket[5m])
)
)
# Certificates successfully renewed in last 24h
increase(certmanager_certificate_renewal_count_total[24h])
Best Practices Summary
Use ClusterIssuers for Shared CAs
Always define Let's Encrypt and internal CA issuers as ClusterIssuer — namespace-scoped Issuer requires duplication per namespace and creates operational overhead.
Staging Before Production
Test all ACME configurations against letsencrypt-staging first. Let's Encrypt production has a rate limit of 5 duplicate certificates per domain per week — a misconfigured challenge burns these quickly.
rotationPolicy: Always
Set privateKey.rotationPolicy: Always on Certificate resources. Reusing the same private key on renewal means a compromised key stays valid indefinitely — each renewal should generate a fresh key pair.
Alert at 30 Days, Not 7
Set warning alerts at 30 days expiry, critical at 7 days. cert-manager renews 30 days before expiry by default, but DNS propagation delays, ACME outages, or rate limits can stall renewal for days. 30-day warning gives time to intervene.
DNS-01 for Internal Domains
Use DNS-01 challenge for any domain that is not publicly reachable via HTTP — internal services, VPN-only dashboards, staging environments. HTTP-01 requires port 80 to be publicly accessible from Let's Encrypt servers.
Audit Cluster PKI Annually
Run kubeadm certs check-expiration at least annually on self-managed clusters. The cluster CA is 10 years, but leaf certs (API server, etcd) are 1 year. A missed annual renewal causes a hard cluster outage.