01-control-plane/01-kube-apiserver.html Prerequisites: 01-control-plane/00-control-plane-overview.html Prerequisites: 00-foundations/04-kubernetes-api-model.html Related: 02-etcd.html · 06-admission-controllers.html · 07-api-aggregation.html

kube-apiserver — Complete Internal Architecture

Full internals of the API server: request pipeline, authentication chains, authorization, admission, storage encoding, watch machinery, concurrency control, TLS, HA, and production tuning.

What Is kube-apiserver?

kube-apiserver is the single, central REST API gateway for the entire Kubernetes cluster. It is the only component that reads from and writes to etcd. All other components — scheduler, controller-manager, kubelet, kube-proxy — interact with the cluster exclusively through the apiserver. It is written in Go and lives at k8s.io/kubernetes/cmd/kube-apiserver.

Key properties:

Stateless: All persistent state lives in etcd. Multiple replicas can run simultaneously.
RESTful: Every Kubernetes object is a resource with standard CRUD HTTP verbs.
Declarative: Clients state desired state; the apiserver persists it; controllers act on it.
Extensible: CRDs, API aggregation, and webhooks all integrate through the same pipeline.

Internal Architecture

Figure 1: kube-apiserver internal pipeline. Every inbound request passes through TLS→Authn→Authz→Admission→Storage. The Watch subsystem feeds from the storage layer. The Aggregation Layer proxies requests to external API servers. APF enforces concurrency limits before the pipeline.

Full Request Pipeline

Every write request to the apiserver passes through the following stages in order. A failure at any stage returns an HTTP error immediately — the request never reaches the next stage.

Stage 0: TLS Termination and HTTP/2 Multiplexing

The apiserver listens on --secure-port (default 6443) with TLS 1.2+ (TLS 1.3 preferred). It uses Go's net/http server with HTTP/2 enabled via golang.org/x/net/http2. HTTP/2 multiplexing allows a single TCP connection to carry many concurrent requests — this is why a single kubectl session or a single Informer connection can carry multiple Watch streams simultaneously without connection exhaustion.

Why HTTP/2 Matters

Without HTTP/2, each Watch stream (from every Informer in every controller and kubelet) would require a dedicated TCP connection. With 1000 nodes and 30 controllers each running 10 Informers, that's 300,000+ TCP connections. HTTP/2 multiplexing collapses these to hundreds of connections instead of hundreds of thousands.

The apiserver also exposes an HTTP port (--insecure-port) that was deprecated in v1.20 and removed in v1.24. Do NOT re-enable it.

Stage 0.5: API Priority and Fairness (APF)

Before authentication, each incoming request is classified by a FlowSchema and assigned to a PriorityLevelConfiguration. This implements per-flow concurrency limits so that a misbehaving client or a "thundering herd" of reconciliations can't starve critical traffic.

# Example: FlowSchema that puts leader election requests in the leader-election priority level
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
  name: leader-election
spec:
  priorityLevelConfiguration:
    name: leader-election      # dedicated high-priority bucket
  matchingPrecedence: 100      # lower = higher precedence
  rules:
  - subjects:
    - kind: ServiceAccount
      serviceAccount:
        name: kube-controller-manager
        namespace: kube-system
    resourceRules:
    - verbs: ["get","update","patch"]
      apiGroups: ["coordination.k8s.io"]
      resources: ["leases"]

# Inspect APF state
kubectl get flowschemas
kubectl get prioritylevelconfigurations
kubectl get --raw /debug/api_priority_and_fairness/dump_priority_levels
kubectl get --raw /debug/api_priority_and_fairness/dump_queues

Stage 1: Authentication

The apiserver tries each configured authenticator in order until one succeeds or all fail. The request identity is a UserInfo struct containing: Username, UID, Groups, and Extra claims.

Method	Flag	Identity Source	Common Use
x509 client certificate	`--client-ca-file`	Certificate CN → username, O → groups	Control plane components, admin kubeconfig
Static token file	`--token-auth-file`	CSV file of token,user,uid,group	Dev/test only; no rotation
Bootstrap token	`--enable-bootstrap-token-auth`	Secret in kube-system namespace	Node TLS bootstrapping
ServiceAccount token (JWT)	always enabled	JWT signed by SA key, bound to SA/pod/node	In-cluster workloads
OIDC	`--oidc-issuer-url`, `--oidc-client-id`	JWT from OIDC provider (Dex, Okta, etc.)	Human users via SSO
Webhook token	`--authentication-token-webhook-config-file`	External HTTP endpoint validates token	Custom authn (cloud IAM, etc.)
Anonymous	`--anonymous-auth=true` (default)	Username: `system:anonymous`, Group: `system:unauthenticated`	Health checks; disable in production

x509 Certificate Authentication — Internals

When a client presents a TLS client certificate, the apiserver validates the certificate chain against --client-ca-file. The Subject CN becomes the username; each Subject O becomes a group. This is how control plane components authenticate:

kube-scheduler cert: CN=system:kube-scheduler
kube-controller-manager cert: CN=system:kube-controller-manager
Node certs: CN=system:node:nodename, O=system:nodes
Admin: CN=kubernetes-admin, O=system:masters (bypasses RBAC!)

# View your kubeconfig cert's identity
kubectl config view --minify --raw -o jsonpath='{.users[0].user.client-certificate-data}' | base64 -d | openssl x509 -noout -subject

# Check what groups a cert belongs to
openssl x509 -in /etc/kubernetes/pki/apiserver-kubelet-client.crt -noout -text | grep -E "Subject:|Issuer:"

ServiceAccount Token — Bound Token Internals (v1.22+)

Before v1.22, ServiceAccount tokens were long-lived JWTs stored in Secrets. Since v1.22, Kubernetes uses bound service account tokens (audience/expiry/pod-bound), created on-demand by the TokenRequest API and mounted via the projected volume mechanism.

# How kubelet requests a token for a pod (TokenRequest API call)
# POST /api/v1/namespaces/{ns}/serviceaccounts/{name}/token
{
  "spec": {
    "audiences": ["https://kubernetes.default.svc"],
    "expirationSeconds": 3600,
    "boundObjectRef": {
      "kind": "Pod",
      "name": "my-pod",
      "uid": "..."
    }
  }
}

# Inspect a projected service account token mounted in a pod
kubectl exec -it mypod -- cat /var/run/secrets/kubernetes.io/serviceaccount/token | \
  cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.tool

Stage 2: Authorization

After authentication succeeds, the apiserver evaluates whether the identified user (UserInfo) is allowed to perform the requested action (verb on resource in namespace). Multiple authorizers are tried in order; the first to return allow or deny wins. If all return no opinion, the request is denied.

Authorizer	Description	Production Use
`Node`	Special authorizer for kubelet: nodes can only read/write resources bound to their own node	Always enable alongside RBAC
`RBAC`	Role/ClusterRole + RoleBinding/ClusterRoleBinding evaluated at request time	Primary authorizer in all production clusters
`ABAC`	Static policy file; can't be changed without restart	Deprecated; use RBAC
`Webhook`	Calls external HTTP service to make authz decision (SubjectAccessReview)	OPA/Gatekeeper custom authz, cloud IAM integration
`AlwaysAllow`	Allows everything	Dev only; never production
`AlwaysDeny`	Denies everything	Testing only

RBAC evaluation flow: The apiserver collects all RoleBindings and ClusterRoleBindings that reference the user (or any of their groups). For each binding, it checks whether the referenced Role/ClusterRole grants the requested verb on the requested resource. If any binding grants the permission, the request is allowed.

# Test authorization decisions (SubjectAccessReview)
kubectl auth can-i create pods --namespace default
kubectl auth can-i create pods --as developer --namespace default
kubectl auth can-i '*' '*' --as system:masters   # cluster-admin test

# Check what a ServiceAccount can do
kubectl auth can-i list secrets --as system:serviceaccount:default:myapp

# Dry-run auth check via API
kubectl create -f - --dry-run=server << EOF
apiVersion: authorization.k8s.io/v1
kind: SelfSubjectAccessReview
spec:
  resourceAttributes:
    namespace: default
    verb: create
    resource: pods
EOF

Stage 3: Admission Control

Admission controllers run after authentication and authorization but before the object is persisted to etcd. Two phases:

Mutating Admission (Phase 1)

Can modify the incoming object. Run sequentially. Built-in mutating plugins:

DefaultStorageClass — adds default storage class to PVCs with no class
DefaultTolerationSeconds — adds default tolerations for node taints
MutatingAdmissionWebhook — calls external webhooks in parallel; applies patches
NamespaceLifecycle — rejects creates in terminating namespaces
ServiceAccount — auto-mounts SA token, sets imagePullSecrets
PodSecurity (mutating phase) — adds seccomp profile defaults

After all mutating webhooks run, the object is re-validated against the schema (to catch webhook mutations that broke structure).

Validating Admission (Phase 2)

Can only allow or reject. Run in parallel. Built-in validating plugins:

PodSecurity — enforces Pod Security Standards (Privileged/Baseline/Restricted)
ResourceQuota — rejects if the request would exceed namespace quota
LimitRanger — sets default limits; rejects if outside min/max range
NodeRestriction — limits what resources a kubelet can modify
ValidatingAdmissionWebhook — calls external webhooks in parallel
ValidatingAdmissionPolicy (CEL, v1.26 beta) — in-process policy evaluation

# Check active admission plugins
kube-apiserver --help | grep enable-admission-plugins
# Or read from the running process:
ps aux | grep kube-apiserver | tr ' ' '\n' | grep admission

# Typical production set:
# --enable-admission-plugins=NodeRestriction,PodSecurity,ResourceQuota,LimitRanger,\
#   ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,\
#   MutatingAdmissionWebhook,ValidatingAdmissionWebhook,\
#   Priority,StorageObjectInUseProtection,PersistentVolumeClaimResize

Stage 4: Schema Validation

Kubernetes uses OpenAPI v3 schemas generated from Go struct tags to validate object structure. Validation checks:

Required fields are present
Field types match (string vs int)
Enum values are valid
Pattern constraints (e.g., DNS label format for names)
Custom validation via CEL x-kubernetes-validations in CRDs

Schema validation runs both before mutating admission (to reject malformed input early) and after mutation (to catch webhook-introduced invalidity).

Stage 5: Storage (etcd Persistence)

If all previous stages pass, the object is persisted to etcd. The storage layer performs:

Version conversion: The object is converted from the API version the client sent to the internal "hub" version, then to the storage version. Kubernetes maintains internal versions (not exposed to clients) as a conversion hub between all external versions.
Protobuf encoding: The internal object is serialized to protobuf (not JSON) for storage efficiency. The etcd key prefix /registry/ differentiates Kubernetes objects from other etcd data.
Optimistic locking: The resourceVersion from the client's request is compared with the current etcd modRevision. If they differ, a 409 Conflict is returned. This prevents lost updates.
Encryption at rest (optional): Objects passing through the storage layer can be encrypted before writing to etcd using a configured KMS provider or AES key.

# EncryptionConfiguration — encrypt Secrets at rest
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources: ["secrets"]
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: 
  - identity: {}   # fallback for unencrypted (reads only)

# Verify encryption is working
kubectl create secret generic test --from-literal=key=value
# Read directly from etcd — should see encrypted bytes, not JSON
ETCDCTL_API=3 etcdctl get /registry/secrets/default/test \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/peer.crt \
  --key=/etc/kubernetes/pki/etcd/peer.key | hexdump -C | head

Watch Machinery — Internals

The Watch API is one of the most critical and subtle parts of the apiserver. Every Informer in every controller, every kubelet, and every kube-proxy uses Watch to receive real-time updates.

watchCache (In-Memory Cache)

The apiserver maintains an in-memory circular buffer of events called the watchCache. When an object is written to etcd, the etcd watch notifies the apiserver, which updates the watchCache. Client Watch requests are served from this cache, not from etcd directly — this dramatically reduces etcd read load in large clusters.

Watch Protocol — HTTP Chunked Streaming

A Watch request is a long-lived HTTP GET with query parameter ?watch=true&resourceVersion=XYZ. The apiserver holds the connection open and sends newline-delimited JSON objects (chunked transfer encoding) as events occur. Each event has type ADDED, MODIFIED, DELETED, BOOKMARK, or ERROR.

# Watch pods raw HTTP stream
kubectl get pods --watch -v=8 2>&1 | grep -A2 "GET.*watch=true"

# Directly via curl (requires a valid token)
TOKEN=$(kubectl create token default)
curl -sk "https://localhost:6443/api/v1/pods?watch=true&resourceVersion=0" \
  -H "Authorization: Bearer $TOKEN" | head -50

# Each line is one JSON event:
# {"type":"ADDED","object":{"kind":"Pod","metadata":{"name":"nginx","resourceVersion":"12345"},...}}
# {"type":"MODIFIED",...}
# {"type":"BOOKMARK","object":{"kind":"Pod","metadata":{"resourceVersion":"99999"}}}  ← progress marker

When the apiserver sends a 410 Gone (because the requested resourceVersion has been compacted out of the watchCache circular buffer), the Informer performs a full relist — a fresh List followed by a new Watch starting at the returned resourceVersion.

Concurrency Control and Rate Limiting

Max In-Flight Requests

The apiserver has two semaphores controlling maximum concurrent requests:

--max-requests-inflight: Non-mutating (GET/LIST/WATCH) requests (default: 400)
--max-mutating-requests-inflight: Mutating (POST/PUT/PATCH/DELETE) requests (default: 200)

When these limits are hit, new requests receive 429 Too Many Requests with a Retry-After header. APF (if enabled) replaces this with more fine-grained per-flow queuing.

etcd Request Rate Limiting

The apiserver itself doesn't rate-limit outbound etcd requests. Pressure on etcd comes from:

High write rates (many creates/updates/deletes per second)
Excessive LIST requests that bypass the watchCache (use resourceVersion=0 for cache reads)
Large objects (e.g., ConfigMaps with large data) that slow etcd serialization

# Monitor etcd pressure from apiserver metrics
curl -sk https://localhost:6443/metrics | grep etcd_request_duration

# Check if LIST requests are hitting etcd vs cache
# resourceVersion="" → must hit etcd (consistent read)
# resourceVersion="0" → may serve from watchCache
# resourceVersion="" → serve from cache if still available

# Force consistent read (expensive)
kubectl get pods --request-timeout=30s -v=9 2>&1 | grep resourceVersion

TLS Configuration Details

The apiserver uses multiple TLS identities simultaneously:

Identity	Cert File Flag	Key File Flag	Purpose
Serving cert (HTTPS)	`--tls-cert-file`	`--tls-private-key-file`	Presented to all clients connecting to :6443
Client CA	`--client-ca-file`	—	Verifies x509 client certificates during authn
etcd client cert	`--etcd-certfile`	`--etcd-keyfile`	apiserver's mTLS identity when connecting to etcd
etcd CA	`--etcd-cafile`	—	Verifies etcd server certificate
kubelet client cert	`--kubelet-client-certificate`	`--kubelet-client-key`	apiserver's identity when connecting to kubelet :10250
Front-proxy cert	`--proxy-client-cert-file`	`--proxy-client-key-file`	Used when proxying to aggregated API servers
ServiceAccount signing key	`--service-account-key-file`	`--service-account-signing-key-file`	Public key for verifying SA tokens; private for issuing

# Verify TLS configuration
openssl s_client -connect localhost:6443 -showcerts 2>/dev/null | openssl x509 -noout -text | grep -E "Subject:|SAN:|DNS:|IP:"

# Check SANs on the apiserver cert (must include all CP node IPs and DNS names)
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep -A 10 "Subject Alternative Name"

# Typical SANs for apiserver cert:
# DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc
# DNS:kubernetes.default.svc.cluster.local, DNS:cp-node-1
# IP:10.96.0.1 (cluster IP), IP:192.168.1.10 (CP node IP), IP:127.0.0.1

SAN Mismatch — Most Common TLS Error

If kubelet or kubectl can't connect to the apiserver with a certificate error like "x509: certificate is valid for ..., not ...", the apiserver cert is missing a SAN for the address being used (VIP, new IP, new DNS name). Fix: regenerate the apiserver cert with the correct SANs. With kubeadm: edit kubeadm-config ConfigMap to add certSANs, then kubeadm alpha certs renew apiserver.

API Aggregation Layer

The aggregation layer allows external API servers (called "extension API servers") to serve custom API groups under the Kubernetes API tree. This is different from CRDs — CRDs serve resources through the built-in apiserver; aggregated API servers are separate processes that handle their own storage.

# APIService registers an external API server for a group/version
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  service:
    name: metrics-server
    namespace: kube-system
    port: 443
  group: metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: false       # verify metrics-server cert
  caBundle: 
  groupPriorityMinimum: 100
  versionPriority: 100

# List all registered API services (includes both built-in and aggregated)
kubectl get apiservices
# Look for: v1beta1.metrics.k8s.io, v1.admissionregistration.k8s.io, etc.

# Check status of aggregated services
kubectl get apiservices | grep -v True   # show any unavailable ones

# Debug: test if metrics-server is reachable via aggregation
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
kubectl top nodes   # uses aggregated metrics API

Audit Logging

The apiserver can log every request and response to an audit log. This is essential for compliance (SOC2, PCI, HIPAA) and incident investigation. Audit events are structured JSON with full request details, user identity, response code, and object changes.

# Production-grade audit policy
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all requests to Secrets, ConfigMaps at RequestResponse level
- level: RequestResponse
  resources:
  - group: ""
    resources: ["secrets", "configmaps", "serviceaccounts/token"]

# Log RBAC changes at RequestResponse
- level: RequestResponse
  resources:
  - group: "rbac.authorization.k8s.io"
    resources: ["roles", "clusterroles", "rolebindings", "clusterrolebindings"]

# Log exec/attach/portforward
- level: Request
  resources:
  - group: ""
    resources: ["pods/exec", "pods/attach", "pods/portforward"]

# Log all other resource modifications at Metadata level
- level: Metadata
  verbs: ["create", "update", "patch", "delete"]

# Don't log read operations on non-sensitive resources
- level: None
  resources:
  - group: ""
    resources: ["events"]
  verbs: ["get", "list", "watch"]

# Default: log metadata for everything else
- level: Metadata

# apiserver flags for audit
kube-apiserver \
  --audit-log-path=/var/log/kubernetes/audit.log \
  --audit-policy-file=/etc/kubernetes/audit-policy.yaml \
  --audit-log-maxage=30 \
  --audit-log-maxbackup=10 \
  --audit-log-maxsize=100    # MB

# Parse audit logs
cat /var/log/kubernetes/audit.log | jq 'select(.verb=="create" and .responseStatus.code==201)'
cat /var/log/kubernetes/audit.log | jq 'select(.user.username != "system:serviceaccount:kube-system:node-controller" and .verb=="delete")'

# Who deleted what in the last hour?
cat /var/log/kubernetes/audit.log | jq -r 'select(.verb=="delete") | [.requestReceivedTimestamp, .user.username, .objectRef.resource, .objectRef.name] | @tsv'

HA Deployment and Horizontal Scaling

The apiserver is designed for horizontal scaling. Add replicas freely — each instance connects to the same etcd cluster and shares the same state. However, several practical considerations apply:

Load Balancer Configuration

# HAProxy configuration for HA apiserver (TCP mode)
frontend kubernetes-api
  bind *:6443
  mode tcp
  option tcplog
  default_backend kubernetes-api-servers

backend kubernetes-api-servers
  mode tcp
  option tcp-check
  balance roundrobin
  server cp1 192.168.1.10:6443 check
  server cp2 192.168.1.11:6443 check
  server cp3 192.168.1.12:6443 check

Watch Reconnect on LB Failover

When the LB routes a Watch connection to a different apiserver replica (e.g., due to the old replica dying), the Informer will receive a new connection and must re-List. This is expected and handled automatically. However, if ALL apiserver replicas die simultaneously, all controllers/kubelets lose their Watch connections. Existing pods continue running (they don't need apiserver to operate), but no new scheduling or reconciliation occurs until connectivity is restored.

Scaling Considerations

Cluster Size	Recommended apiserver Replicas	Notes
< 50 nodes	1–2	Development/staging. 2 for HA.
50–500 nodes	3	Standard production HA
500–2000 nodes	3–5	Tune `--max-requests-inflight`
> 2000 nodes	5+	Shard by namespace or use APF aggressively

Critical Flags Reference

Full Production-Relevant Flags

Flag	Default	Production Value	Why
`--secure-port`	6443	6443	Standard; don't change
`--insecure-port`	0	0	Must be 0; removed v1.24
`--anonymous-auth`	true	false	Disable to prevent anonymous API access
`--authorization-mode`	AlwaysAllow	Node,RBAC	Critical: AlwaysAllow in dev is dangerous
`--enable-admission-plugins`	few	see above	Enable security plugins
`--audit-log-path`	""	/var/log/...	Required for compliance
`--encryption-provider-config`	""	config file	Encrypt Secrets at rest
`--max-requests-inflight`	400	1200 (large)	Tune per cluster size
`--max-mutating-requests-inflight`	200	400 (large)	Tune per cluster size
`--request-timeout`	60s	300s	Long watch connections need longer timeout
`--watch-cache-sizes`	auto	pods#1000	Increase for busy resources
`--default-watch-cache-size`	100	500	Events buffered per resource type
`--profiling`	true	false	Disable in production; exposes /debug/pprof
`--enable-priority-and-fairness`	true (v1.20+)	true	APF replaces old max-inflight when enabled
`--tls-min-version`	TLS1.2	VersionTLS12	Never TLS1.0/1.1
`--tls-cipher-suites`	defaults	restrict to strong ciphers	Disable RC4, 3DES

Key Metrics to Monitor

# Get all apiserver metrics
kubectl get --raw /metrics | grep "^apiserver_"

# Request latency (most important SLO metric)
apiserver_request_duration_seconds_bucket{verb="GET",resource="pods"}
apiserver_request_duration_seconds_bucket{verb="POST",resource="pods"}

# Request rate and error rate
apiserver_request_total{code="200"}
apiserver_request_total{code="500"}
apiserver_request_total{code="429"}    # rate limiting

# Watch connection count
apiserver_registered_watchers          # total active watch connections
apiserver_watch_events_total           # events sent per resource

# etcd latency from apiserver's perspective
etcd_request_duration_seconds_bucket{operation="get"}
etcd_request_duration_seconds_bucket{operation="put"}

# Cache state
apiserver_cache_list_fetched_objects_total
apiserver_cache_list_returned_objects_total  # should be ≥ fetched if filtering

# APF metrics
apiserver_flowcontrol_current_inqueue_requests
apiserver_flowcontrol_current_executing_requests
apiserver_flowcontrol_rejected_requests_total

Prometheus alerting rules to configure:

- alert: APIServerHighErrorRate
  expr: rate(apiserver_request_total{code=~"5.."}[5m]) / rate(apiserver_request_total[5m]) > 0.01
  for: 5m
  annotations:
    summary: "apiserver error rate > 1%"

- alert: APIServerHighLatency
  expr: histogram_quantile(0.99, rate(apiserver_request_duration_seconds_bucket{verb!="WATCH"}[5m])) > 1
  for: 5m
  annotations:
    summary: "p99 API latency > 1s"

- alert: APIServerRateLimiting
  expr: rate(apiserver_request_total{code="429"}[5m]) > 0
  for: 1m
  annotations:
    summary: "API server is rate-limiting requests"

Troubleshooting kube-apiserver

Startup Failures

# Check static pod is being attempted
journalctl -u kubelet --since "5 min ago" | grep -i apiserver

# Check if the process is running
crictl ps --name kube-apiserver
crictl logs $(crictl ps --name kube-apiserver -q) 2>&1 | tail -50

# Common startup failures:
# 1. etcd not reachable
#    Log: "Failed to connect to etcd"
#    Fix: check etcd status, verify --etcd-servers flag, check firewall

# 2. Certificate errors
#    Log: "Failed to read or parse CA cert"
#    Fix: verify cert file paths in manifest; check cert not expired

# 3. Port already in use
#    Log: "bind: address already in use"
#    Fix: check what's on :6443; likely another apiserver or lingering process

# 4. Bad flag/config
#    Log: "Error: invalid value for flag"
#    Fix: validate manifest YAML and flag values

503 Service Unavailable

# 503 from apiserver usually means the apiserver itself is healthy but
# a backend (aggregated API server) is down

# Check aggregated API services
kubectl get apiservices | grep -v True
kubectl describe apiservice v1beta1.metrics.k8s.io  # see "Last Transition Condition"

# If metrics-server is down and HPA depends on it, HPA will fail
# Fix: debug the metrics-server pods
kubectl get pods -n kube-system -l k8s-app=metrics-server
kubectl logs -n kube-system -l k8s-app=metrics-server

Watch Streams Stuck / Stale Informers

# Symptoms: controller reconciles old state; kubectl get shows stale data
# Cause: watch connection was silently dropped without 410 error

# Force informer resync
# (most controllers do this automatically every 30–60 minutes)

# Check apiserver watch metrics
kubectl get --raw /metrics | grep apiserver_registered_watchers

# Check for 410 Gone events causing relists (expected but should be infrequent)
# In controller logs, look for:
# "watch closed with: very old resource version" → relist triggered

# Increase watchCache size for frequently-watched resources
kube-apiserver --watch-cache-sizes=pods#2000,nodes#500

Slow API Responses

# Step 1: Check if the slowness is in etcd or apiserver processing
kubectl get --raw /metrics | grep "etcd_request_duration"

# High etcd latency (p99 > 100ms):
# - etcd disk I/O problem → check iostat, ensure SSD
# - etcd leader election in progress → check etcd_server_leader_changes
# - Large objects → check etcd_mvcc_db_total_size_in_bytes

# Step 2: Check apiserver admission webhook latency
kubectl get --raw /metrics | grep "apiserver_admission_webhook_admission_duration"
# Slow mutating/validating webhooks block every write request
# Fix: add timeout to webhook config, or disable slow webhooks

# Step 3: Check APF queue depth
kubectl get --raw /debug/api_priority_and_fairness/dump_queues | python3 -m json.tool

# Step 4: Profile the apiserver (if --profiling=true)
kubectl port-forward -n kube-system kube-apiserver-cp-node-1 6443:6443
curl -sk https://localhost:6443/debug/pprof/goroutine?debug=2 > goroutine.txt

Production Checklist

15-Item apiserver Production Checklist

#	Check	Command to Verify
1	anonymous-auth disabled	`ps aux \| grep apiserver \| grep anonymous-auth=false`
2	Authorization mode = Node,RBAC	`ps aux \| grep apiserver \| grep authorization-mode`
3	Audit logging enabled with production policy	`ls -la /var/log/kubernetes/audit.log`
4	Encryption at rest for Secrets	`etcdctl get /registry/secrets/default/test \| hexdump \| grep k8s:enc`
5	profiling disabled	`curl -sk https://localhost:6443/debug/pprof/` should return 403
6	Cert expiry > 30 days	`kubeadm certs check-expiration`
7	TLS min version 1.2	`openssl s_client -tls1_1 -connect localhost:6443` should fail
8	APF FlowSchemas configured for critical traffic	`kubectl get flowschemas`
9	NodeRestriction admission enabled	check `--enable-admission-plugins` flag
10	PodSecurity admission enabled	check `--enable-admission-plugins` flag
11	Aggregated API services all healthy	`kubectl get apiservices \| grep -v True`
12	Resource requests set on static pod manifest	`grep resources /etc/kubernetes/manifests/kube-apiserver.yaml`
13	Alerting on error rate and latency	check Prometheus/Alertmanager rules
14	etcd client certs on separate CA from serving certs	`openssl x509 -in /etc/kubernetes/pki/apiserver-etcd-client.crt -noout -issuer`
15	Service account signing key rotation policy	check if `--service-account-key-file` supports multiple keys for rotation

Dependency Graph and Next Files

Prerequisites

This File Covers

Full request pipeline (7 stages)
Authentication methods (7)
Authorization modes + RBAC flow
Admission control (mutating/validating)
Watch machinery + watchCache
APF and concurrency control
TLS config and all cert identities
Aggregation layer
Audit logging
HA scaling
Key metrics + alerting rules