What Is kube-apiserver?

kube-apiserver is the single, central REST API gateway for the entire Kubernetes cluster. It is the only component that reads from and writes to etcd. All other components — scheduler, controller-manager, kubelet, kube-proxy — interact with the cluster exclusively through the apiserver. It is written in Go and lives at k8s.io/kubernetes/cmd/kube-apiserver.

Key properties:

Internal Architecture

kubectl / API clients TLS Termination :6443 HTTP/2 mux Authn x509/token OIDC/webhook bootstrap token Authz RBAC / Node ABAC / webhook AlwaysAllow Admission Mutating Validating PodSecurity ResourceQuota Storage protobuf encode versioning etcd gRPC etcd :2379 gRPC+mTLS Schema Validation Watch Subsystem watchCache (in-memory) cacheWatcher (per client) Aggregation Layer APIService objects → external API servers (metrics-server, custom API servers) API Priority & Fairness FlowSchema → Request Pipeline (per request)

Figure 1: kube-apiserver internal pipeline. Every inbound request passes through TLS→Authn→Authz→Admission→Storage. The Watch subsystem feeds from the storage layer. The Aggregation Layer proxies requests to external API servers. APF enforces concurrency limits before the pipeline.

Full Request Pipeline

Every write request to the apiserver passes through the following stages in order. A failure at any stage returns an HTTP error immediately — the request never reaches the next stage.

Stage 0: TLS Termination and HTTP/2 Multiplexing

The apiserver listens on --secure-port (default 6443) with TLS 1.2+ (TLS 1.3 preferred). It uses Go's net/http server with HTTP/2 enabled via golang.org/x/net/http2. HTTP/2 multiplexing allows a single TCP connection to carry many concurrent requests — this is why a single kubectl session or a single Informer connection can carry multiple Watch streams simultaneously without connection exhaustion.

Why HTTP/2 Matters
Without HTTP/2, each Watch stream (from every Informer in every controller and kubelet) would require a dedicated TCP connection. With 1000 nodes and 30 controllers each running 10 Informers, that's 300,000+ TCP connections. HTTP/2 multiplexing collapses these to hundreds of connections instead of hundreds of thousands.

The apiserver also exposes an HTTP port (--insecure-port) that was deprecated in v1.20 and removed in v1.24. Do NOT re-enable it.

Stage 0.5: API Priority and Fairness (APF)

Before authentication, each incoming request is classified by a FlowSchema and assigned to a PriorityLevelConfiguration. This implements per-flow concurrency limits so that a misbehaving client or a "thundering herd" of reconciliations can't starve critical traffic.

# Example: FlowSchema that puts leader election requests in the leader-election priority level
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
  name: leader-election
spec:
  priorityLevelConfiguration:
    name: leader-election      # dedicated high-priority bucket
  matchingPrecedence: 100      # lower = higher precedence
  rules:
  - subjects:
    - kind: ServiceAccount
      serviceAccount:
        name: kube-controller-manager
        namespace: kube-system
    resourceRules:
    - verbs: ["get","update","patch"]
      apiGroups: ["coordination.k8s.io"]
      resources: ["leases"]
# Inspect APF state
kubectl get flowschemas
kubectl get prioritylevelconfigurations
kubectl get --raw /debug/api_priority_and_fairness/dump_priority_levels
kubectl get --raw /debug/api_priority_and_fairness/dump_queues

Stage 1: Authentication

The apiserver tries each configured authenticator in order until one succeeds or all fail. The request identity is a UserInfo struct containing: Username, UID, Groups, and Extra claims.

MethodFlagIdentity SourceCommon Use
x509 client certificate--client-ca-fileCertificate CN → username, O → groupsControl plane components, admin kubeconfig
Static token file--token-auth-fileCSV file of token,user,uid,groupDev/test only; no rotation
Bootstrap token--enable-bootstrap-token-authSecret in kube-system namespaceNode TLS bootstrapping
ServiceAccount token (JWT)always enabledJWT signed by SA key, bound to SA/pod/nodeIn-cluster workloads
OIDC--oidc-issuer-url, --oidc-client-idJWT from OIDC provider (Dex, Okta, etc.)Human users via SSO
Webhook token--authentication-token-webhook-config-fileExternal HTTP endpoint validates tokenCustom authn (cloud IAM, etc.)
Anonymous--anonymous-auth=true (default)Username: system:anonymous, Group: system:unauthenticatedHealth checks; disable in production
x509 Certificate Authentication — Internals

When a client presents a TLS client certificate, the apiserver validates the certificate chain against --client-ca-file. The Subject CN becomes the username; each Subject O becomes a group. This is how control plane components authenticate:

  • kube-scheduler cert: CN=system:kube-scheduler
  • kube-controller-manager cert: CN=system:kube-controller-manager
  • Node certs: CN=system:node:nodename, O=system:nodes
  • Admin: CN=kubernetes-admin, O=system:masters (bypasses RBAC!)
# View your kubeconfig cert's identity
kubectl config view --minify --raw -o jsonpath='{.users[0].user.client-certificate-data}' | base64 -d | openssl x509 -noout -subject

# Check what groups a cert belongs to
openssl x509 -in /etc/kubernetes/pki/apiserver-kubelet-client.crt -noout -text | grep -E "Subject:|Issuer:"
ServiceAccount Token — Bound Token Internals (v1.22+)

Before v1.22, ServiceAccount tokens were long-lived JWTs stored in Secrets. Since v1.22, Kubernetes uses bound service account tokens (audience/expiry/pod-bound), created on-demand by the TokenRequest API and mounted via the projected volume mechanism.

# How kubelet requests a token for a pod (TokenRequest API call)
# POST /api/v1/namespaces/{ns}/serviceaccounts/{name}/token
{
  "spec": {
    "audiences": ["https://kubernetes.default.svc"],
    "expirationSeconds": 3600,
    "boundObjectRef": {
      "kind": "Pod",
      "name": "my-pod",
      "uid": "..."
    }
  }
}
# Inspect a projected service account token mounted in a pod
kubectl exec -it mypod -- cat /var/run/secrets/kubernetes.io/serviceaccount/token | \
  cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.tool

Stage 2: Authorization

After authentication succeeds, the apiserver evaluates whether the identified user (UserInfo) is allowed to perform the requested action (verb on resource in namespace). Multiple authorizers are tried in order; the first to return allow or deny wins. If all return no opinion, the request is denied.

AuthorizerDescriptionProduction Use
NodeSpecial authorizer for kubelet: nodes can only read/write resources bound to their own nodeAlways enable alongside RBAC
RBACRole/ClusterRole + RoleBinding/ClusterRoleBinding evaluated at request timePrimary authorizer in all production clusters
ABACStatic policy file; can't be changed without restartDeprecated; use RBAC
WebhookCalls external HTTP service to make authz decision (SubjectAccessReview)OPA/Gatekeeper custom authz, cloud IAM integration
AlwaysAllowAllows everythingDev only; never production
AlwaysDenyDenies everythingTesting only

RBAC evaluation flow: The apiserver collects all RoleBindings and ClusterRoleBindings that reference the user (or any of their groups). For each binding, it checks whether the referenced Role/ClusterRole grants the requested verb on the requested resource. If any binding grants the permission, the request is allowed.

# Test authorization decisions (SubjectAccessReview)
kubectl auth can-i create pods --namespace default
kubectl auth can-i create pods --as developer --namespace default
kubectl auth can-i '*' '*' --as system:masters   # cluster-admin test

# Check what a ServiceAccount can do
kubectl auth can-i list secrets --as system:serviceaccount:default:myapp

# Dry-run auth check via API
kubectl create -f - --dry-run=server << EOF
apiVersion: authorization.k8s.io/v1
kind: SelfSubjectAccessReview
spec:
  resourceAttributes:
    namespace: default
    verb: create
    resource: pods
EOF

Stage 3: Admission Control

Admission controllers run after authentication and authorization but before the object is persisted to etcd. Two phases:

Mutating Admission (Phase 1)

Can modify the incoming object. Run sequentially. Built-in mutating plugins:

  • DefaultStorageClass — adds default storage class to PVCs with no class
  • DefaultTolerationSeconds — adds default tolerations for node taints
  • MutatingAdmissionWebhook — calls external webhooks in parallel; applies patches
  • NamespaceLifecycle — rejects creates in terminating namespaces
  • ServiceAccount — auto-mounts SA token, sets imagePullSecrets
  • PodSecurity (mutating phase) — adds seccomp profile defaults

After all mutating webhooks run, the object is re-validated against the schema (to catch webhook mutations that broke structure).

Validating Admission (Phase 2)

Can only allow or reject. Run in parallel. Built-in validating plugins:

  • PodSecurity — enforces Pod Security Standards (Privileged/Baseline/Restricted)
  • ResourceQuota — rejects if the request would exceed namespace quota
  • LimitRanger — sets default limits; rejects if outside min/max range
  • NodeRestriction — limits what resources a kubelet can modify
  • ValidatingAdmissionWebhook — calls external webhooks in parallel
  • ValidatingAdmissionPolicy (CEL, v1.26 beta) — in-process policy evaluation
# Check active admission plugins
kube-apiserver --help | grep enable-admission-plugins
# Or read from the running process:
ps aux | grep kube-apiserver | tr ' ' '\n' | grep admission

# Typical production set:
# --enable-admission-plugins=NodeRestriction,PodSecurity,ResourceQuota,LimitRanger,\
#   ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,\
#   MutatingAdmissionWebhook,ValidatingAdmissionWebhook,\
#   Priority,StorageObjectInUseProtection,PersistentVolumeClaimResize

Stage 4: Schema Validation

Kubernetes uses OpenAPI v3 schemas generated from Go struct tags to validate object structure. Validation checks:

Schema validation runs both before mutating admission (to reject malformed input early) and after mutation (to catch webhook-introduced invalidity).

Stage 5: Storage (etcd Persistence)

If all previous stages pass, the object is persisted to etcd. The storage layer performs:

  1. Version conversion: The object is converted from the API version the client sent to the internal "hub" version, then to the storage version. Kubernetes maintains internal versions (not exposed to clients) as a conversion hub between all external versions.
  2. Protobuf encoding: The internal object is serialized to protobuf (not JSON) for storage efficiency. The etcd key prefix /registry/ differentiates Kubernetes objects from other etcd data.
  3. Optimistic locking: The resourceVersion from the client's request is compared with the current etcd modRevision. If they differ, a 409 Conflict is returned. This prevents lost updates.
  4. Encryption at rest (optional): Objects passing through the storage layer can be encrypted before writing to etcd using a configured KMS provider or AES key.
# EncryptionConfiguration — encrypt Secrets at rest
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources: ["secrets"]
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: 
  - identity: {}   # fallback for unencrypted (reads only)
# Verify encryption is working
kubectl create secret generic test --from-literal=key=value
# Read directly from etcd — should see encrypted bytes, not JSON
ETCDCTL_API=3 etcdctl get /registry/secrets/default/test \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/peer.crt \
  --key=/etc/kubernetes/pki/etcd/peer.key | hexdump -C | head

Watch Machinery — Internals

The Watch API is one of the most critical and subtle parts of the apiserver. Every Informer in every controller, every kubelet, and every kube-proxy uses Watch to receive real-time updates.

watchCache (In-Memory Cache)

The apiserver maintains an in-memory circular buffer of events called the watchCache. When an object is written to etcd, the etcd watch notifies the apiserver, which updates the watchCache. Client Watch requests are served from this cache, not from etcd directly — this dramatically reduces etcd read load in large clusters.

etcd Watch stream etcd Watcher (apiserver-side) watches all resources watchCache in-memory circular buffer stores last N events (default: 100–1000) also stores full object snapshot cacheWatcher #1 scheduler Informer cacheWatcher #2 controller-mgr Informer cacheWatcher #3 kubelet Informer cacheWatcher #N custom controller kubectl get -w controller loop kubelet custom ctrl Events fan-out to all watchers; each cacheWatcher has a per-channel buffer (default 100)

Watch Protocol — HTTP Chunked Streaming

A Watch request is a long-lived HTTP GET with query parameter ?watch=true&resourceVersion=XYZ. The apiserver holds the connection open and sends newline-delimited JSON objects (chunked transfer encoding) as events occur. Each event has type ADDED, MODIFIED, DELETED, BOOKMARK, or ERROR.

# Watch pods raw HTTP stream
kubectl get pods --watch -v=8 2>&1 | grep -A2 "GET.*watch=true"

# Directly via curl (requires a valid token)
TOKEN=$(kubectl create token default)
curl -sk "https://localhost:6443/api/v1/pods?watch=true&resourceVersion=0" \
  -H "Authorization: Bearer $TOKEN" | head -50

# Each line is one JSON event:
# {"type":"ADDED","object":{"kind":"Pod","metadata":{"name":"nginx","resourceVersion":"12345"},...}}
# {"type":"MODIFIED",...}
# {"type":"BOOKMARK","object":{"kind":"Pod","metadata":{"resourceVersion":"99999"}}}  ← progress marker

When the apiserver sends a 410 Gone (because the requested resourceVersion has been compacted out of the watchCache circular buffer), the Informer performs a full relist — a fresh List followed by a new Watch starting at the returned resourceVersion.

Concurrency Control and Rate Limiting

Max In-Flight Requests

The apiserver has two semaphores controlling maximum concurrent requests:

When these limits are hit, new requests receive 429 Too Many Requests with a Retry-After header. APF (if enabled) replaces this with more fine-grained per-flow queuing.

etcd Request Rate Limiting

The apiserver itself doesn't rate-limit outbound etcd requests. Pressure on etcd comes from:

# Monitor etcd pressure from apiserver metrics
curl -sk https://localhost:6443/metrics | grep etcd_request_duration

# Check if LIST requests are hitting etcd vs cache
# resourceVersion="" → must hit etcd (consistent read)
# resourceVersion="0" → may serve from watchCache
# resourceVersion="" → serve from cache if still available

# Force consistent read (expensive)
kubectl get pods --request-timeout=30s -v=9 2>&1 | grep resourceVersion

TLS Configuration Details

The apiserver uses multiple TLS identities simultaneously:

IdentityCert File FlagKey File FlagPurpose
Serving cert (HTTPS)--tls-cert-file--tls-private-key-filePresented to all clients connecting to :6443
Client CA--client-ca-fileVerifies x509 client certificates during authn
etcd client cert--etcd-certfile--etcd-keyfileapiserver's mTLS identity when connecting to etcd
etcd CA--etcd-cafileVerifies etcd server certificate
kubelet client cert--kubelet-client-certificate--kubelet-client-keyapiserver's identity when connecting to kubelet :10250
Front-proxy cert--proxy-client-cert-file--proxy-client-key-fileUsed when proxying to aggregated API servers
ServiceAccount signing key--service-account-key-file--service-account-signing-key-filePublic key for verifying SA tokens; private for issuing
# Verify TLS configuration
openssl s_client -connect localhost:6443 -showcerts 2>/dev/null | openssl x509 -noout -text | grep -E "Subject:|SAN:|DNS:|IP:"

# Check SANs on the apiserver cert (must include all CP node IPs and DNS names)
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep -A 10 "Subject Alternative Name"

# Typical SANs for apiserver cert:
# DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc
# DNS:kubernetes.default.svc.cluster.local, DNS:cp-node-1
# IP:10.96.0.1 (cluster IP), IP:192.168.1.10 (CP node IP), IP:127.0.0.1
SAN Mismatch — Most Common TLS Error
If kubelet or kubectl can't connect to the apiserver with a certificate error like "x509: certificate is valid for ..., not ...", the apiserver cert is missing a SAN for the address being used (VIP, new IP, new DNS name). Fix: regenerate the apiserver cert with the correct SANs. With kubeadm: edit kubeadm-config ConfigMap to add certSANs, then kubeadm alpha certs renew apiserver.

API Aggregation Layer

The aggregation layer allows external API servers (called "extension API servers") to serve custom API groups under the Kubernetes API tree. This is different from CRDs — CRDs serve resources through the built-in apiserver; aggregated API servers are separate processes that handle their own storage.

# APIService registers an external API server for a group/version
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  service:
    name: metrics-server
    namespace: kube-system
    port: 443
  group: metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: false       # verify metrics-server cert
  caBundle: 
  groupPriorityMinimum: 100
  versionPriority: 100
# List all registered API services (includes both built-in and aggregated)
kubectl get apiservices
# Look for: v1beta1.metrics.k8s.io, v1.admissionregistration.k8s.io, etc.

# Check status of aggregated services
kubectl get apiservices | grep -v True   # show any unavailable ones

# Debug: test if metrics-server is reachable via aggregation
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
kubectl top nodes   # uses aggregated metrics API

Audit Logging

The apiserver can log every request and response to an audit log. This is essential for compliance (SOC2, PCI, HIPAA) and incident investigation. Audit events are structured JSON with full request details, user identity, response code, and object changes.

# Production-grade audit policy
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all requests to Secrets, ConfigMaps at RequestResponse level
- level: RequestResponse
  resources:
  - group: ""
    resources: ["secrets", "configmaps", "serviceaccounts/token"]

# Log RBAC changes at RequestResponse
- level: RequestResponse
  resources:
  - group: "rbac.authorization.k8s.io"
    resources: ["roles", "clusterroles", "rolebindings", "clusterrolebindings"]

# Log exec/attach/portforward
- level: Request
  resources:
  - group: ""
    resources: ["pods/exec", "pods/attach", "pods/portforward"]

# Log all other resource modifications at Metadata level
- level: Metadata
  verbs: ["create", "update", "patch", "delete"]

# Don't log read operations on non-sensitive resources
- level: None
  resources:
  - group: ""
    resources: ["events"]
  verbs: ["get", "list", "watch"]

# Default: log metadata for everything else
- level: Metadata
# apiserver flags for audit
kube-apiserver \
  --audit-log-path=/var/log/kubernetes/audit.log \
  --audit-policy-file=/etc/kubernetes/audit-policy.yaml \
  --audit-log-maxage=30 \
  --audit-log-maxbackup=10 \
  --audit-log-maxsize=100    # MB

# Parse audit logs
cat /var/log/kubernetes/audit.log | jq 'select(.verb=="create" and .responseStatus.code==201)'
cat /var/log/kubernetes/audit.log | jq 'select(.user.username != "system:serviceaccount:kube-system:node-controller" and .verb=="delete")'

# Who deleted what in the last hour?
cat /var/log/kubernetes/audit.log | jq -r 'select(.verb=="delete") | [.requestReceivedTimestamp, .user.username, .objectRef.resource, .objectRef.name] | @tsv'

HA Deployment and Horizontal Scaling

The apiserver is designed for horizontal scaling. Add replicas freely — each instance connects to the same etcd cluster and shares the same state. However, several practical considerations apply:

Load Balancer Configuration

# HAProxy configuration for HA apiserver (TCP mode)
frontend kubernetes-api
  bind *:6443
  mode tcp
  option tcplog
  default_backend kubernetes-api-servers

backend kubernetes-api-servers
  mode tcp
  option tcp-check
  balance roundrobin
  server cp1 192.168.1.10:6443 check
  server cp2 192.168.1.11:6443 check
  server cp3 192.168.1.12:6443 check
Watch Reconnect on LB Failover
When the LB routes a Watch connection to a different apiserver replica (e.g., due to the old replica dying), the Informer will receive a new connection and must re-List. This is expected and handled automatically. However, if ALL apiserver replicas die simultaneously, all controllers/kubelets lose their Watch connections. Existing pods continue running (they don't need apiserver to operate), but no new scheduling or reconciliation occurs until connectivity is restored.

Scaling Considerations

Cluster SizeRecommended apiserver ReplicasNotes
< 50 nodes1–2Development/staging. 2 for HA.
50–500 nodes3Standard production HA
500–2000 nodes3–5Tune --max-requests-inflight
> 2000 nodes5+Shard by namespace or use APF aggressively

Critical Flags Reference

Full Production-Relevant Flags
FlagDefaultProduction ValueWhy
--secure-port64436443Standard; don't change
--insecure-port00Must be 0; removed v1.24
--anonymous-authtruefalseDisable to prevent anonymous API access
--authorization-modeAlwaysAllowNode,RBACCritical: AlwaysAllow in dev is dangerous
--enable-admission-pluginsfewsee aboveEnable security plugins
--audit-log-path""/var/log/...Required for compliance
--encryption-provider-config""config fileEncrypt Secrets at rest
--max-requests-inflight4001200 (large)Tune per cluster size
--max-mutating-requests-inflight200400 (large)Tune per cluster size
--request-timeout60s300sLong watch connections need longer timeout
--watch-cache-sizesautopods#1000Increase for busy resources
--default-watch-cache-size100500Events buffered per resource type
--profilingtruefalseDisable in production; exposes /debug/pprof
--enable-priority-and-fairnesstrue (v1.20+)trueAPF replaces old max-inflight when enabled
--tls-min-versionTLS1.2VersionTLS12Never TLS1.0/1.1
--tls-cipher-suitesdefaultsrestrict to strong ciphersDisable RC4, 3DES

Key Metrics to Monitor

# Get all apiserver metrics
kubectl get --raw /metrics | grep "^apiserver_"

# Request latency (most important SLO metric)
apiserver_request_duration_seconds_bucket{verb="GET",resource="pods"}
apiserver_request_duration_seconds_bucket{verb="POST",resource="pods"}

# Request rate and error rate
apiserver_request_total{code="200"}
apiserver_request_total{code="500"}
apiserver_request_total{code="429"}    # rate limiting

# Watch connection count
apiserver_registered_watchers          # total active watch connections
apiserver_watch_events_total           # events sent per resource

# etcd latency from apiserver's perspective
etcd_request_duration_seconds_bucket{operation="get"}
etcd_request_duration_seconds_bucket{operation="put"}

# Cache state
apiserver_cache_list_fetched_objects_total
apiserver_cache_list_returned_objects_total  # should be ≥ fetched if filtering

# APF metrics
apiserver_flowcontrol_current_inqueue_requests
apiserver_flowcontrol_current_executing_requests
apiserver_flowcontrol_rejected_requests_total

Prometheus alerting rules to configure:

- alert: APIServerHighErrorRate
  expr: rate(apiserver_request_total{code=~"5.."}[5m]) / rate(apiserver_request_total[5m]) > 0.01
  for: 5m
  annotations:
    summary: "apiserver error rate > 1%"

- alert: APIServerHighLatency
  expr: histogram_quantile(0.99, rate(apiserver_request_duration_seconds_bucket{verb!="WATCH"}[5m])) > 1
  for: 5m
  annotations:
    summary: "p99 API latency > 1s"

- alert: APIServerRateLimiting
  expr: rate(apiserver_request_total{code="429"}[5m]) > 0
  for: 1m
  annotations:
    summary: "API server is rate-limiting requests"

Troubleshooting kube-apiserver

Startup Failures

# Check static pod is being attempted
journalctl -u kubelet --since "5 min ago" | grep -i apiserver

# Check if the process is running
crictl ps --name kube-apiserver
crictl logs $(crictl ps --name kube-apiserver -q) 2>&1 | tail -50

# Common startup failures:
# 1. etcd not reachable
#    Log: "Failed to connect to etcd"
#    Fix: check etcd status, verify --etcd-servers flag, check firewall

# 2. Certificate errors
#    Log: "Failed to read or parse CA cert"
#    Fix: verify cert file paths in manifest; check cert not expired

# 3. Port already in use
#    Log: "bind: address already in use"
#    Fix: check what's on :6443; likely another apiserver or lingering process

# 4. Bad flag/config
#    Log: "Error: invalid value for flag"
#    Fix: validate manifest YAML and flag values

503 Service Unavailable

# 503 from apiserver usually means the apiserver itself is healthy but
# a backend (aggregated API server) is down

# Check aggregated API services
kubectl get apiservices | grep -v True
kubectl describe apiservice v1beta1.metrics.k8s.io  # see "Last Transition Condition"

# If metrics-server is down and HPA depends on it, HPA will fail
# Fix: debug the metrics-server pods
kubectl get pods -n kube-system -l k8s-app=metrics-server
kubectl logs -n kube-system -l k8s-app=metrics-server

Watch Streams Stuck / Stale Informers

# Symptoms: controller reconciles old state; kubectl get shows stale data
# Cause: watch connection was silently dropped without 410 error

# Force informer resync
# (most controllers do this automatically every 30–60 minutes)

# Check apiserver watch metrics
kubectl get --raw /metrics | grep apiserver_registered_watchers

# Check for 410 Gone events causing relists (expected but should be infrequent)
# In controller logs, look for:
# "watch closed with: very old resource version" → relist triggered

# Increase watchCache size for frequently-watched resources
kube-apiserver --watch-cache-sizes=pods#2000,nodes#500

Slow API Responses

# Step 1: Check if the slowness is in etcd or apiserver processing
kubectl get --raw /metrics | grep "etcd_request_duration"

# High etcd latency (p99 > 100ms):
# - etcd disk I/O problem → check iostat, ensure SSD
# - etcd leader election in progress → check etcd_server_leader_changes
# - Large objects → check etcd_mvcc_db_total_size_in_bytes

# Step 2: Check apiserver admission webhook latency
kubectl get --raw /metrics | grep "apiserver_admission_webhook_admission_duration"
# Slow mutating/validating webhooks block every write request
# Fix: add timeout to webhook config, or disable slow webhooks

# Step 3: Check APF queue depth
kubectl get --raw /debug/api_priority_and_fairness/dump_queues | python3 -m json.tool

# Step 4: Profile the apiserver (if --profiling=true)
kubectl port-forward -n kube-system kube-apiserver-cp-node-1 6443:6443
curl -sk https://localhost:6443/debug/pprof/goroutine?debug=2 > goroutine.txt

Production Checklist

15-Item apiserver Production Checklist
#CheckCommand to Verify
1anonymous-auth disabledps aux | grep apiserver | grep anonymous-auth=false
2Authorization mode = Node,RBACps aux | grep apiserver | grep authorization-mode
3Audit logging enabled with production policyls -la /var/log/kubernetes/audit.log
4Encryption at rest for Secretsetcdctl get /registry/secrets/default/test | hexdump | grep k8s:enc
5profiling disabledcurl -sk https://localhost:6443/debug/pprof/ should return 403
6Cert expiry > 30 dayskubeadm certs check-expiration
7TLS min version 1.2openssl s_client -tls1_1 -connect localhost:6443 should fail
8APF FlowSchemas configured for critical traffickubectl get flowschemas
9NodeRestriction admission enabledcheck --enable-admission-plugins flag
10PodSecurity admission enabledcheck --enable-admission-plugins flag
11Aggregated API services all healthykubectl get apiservices | grep -v True
12Resource requests set on static pod manifestgrep resources /etc/kubernetes/manifests/kube-apiserver.yaml
13Alerting on error rate and latencycheck Prometheus/Alertmanager rules
14etcd client certs on separate CA from serving certsopenssl x509 -in /etc/kubernetes/pki/apiserver-etcd-client.crt -noout -issuer
15Service account signing key rotation policycheck if --service-account-key-file supports multiple keys for rotation

Dependency Graph and Next Files

This File Covers

  • Full request pipeline (7 stages)
  • Authentication methods (7)
  • Authorization modes + RBAC flow
  • Admission control (mutating/validating)
  • Watch machinery + watchCache
  • APF and concurrency control
  • TLS config and all cert identities
  • Aggregation layer
  • Audit logging
  • HA scaling
  • Key metrics + alerting rules