kube-apiserver — Complete Internal Architecture
Full internals of the API server: request pipeline, authentication chains, authorization, admission, storage encoding, watch machinery, concurrency control, TLS, HA, and production tuning.
What Is kube-apiserver?
kube-apiserver is the single, central REST API gateway for the entire Kubernetes cluster. It is the only component that reads from and writes to etcd. All other components — scheduler, controller-manager, kubelet, kube-proxy — interact with the cluster exclusively through the apiserver. It is written in Go and lives at k8s.io/kubernetes/cmd/kube-apiserver.
Key properties:
- Stateless: All persistent state lives in etcd. Multiple replicas can run simultaneously.
- RESTful: Every Kubernetes object is a resource with standard CRUD HTTP verbs.
- Declarative: Clients state desired state; the apiserver persists it; controllers act on it.
- Extensible: CRDs, API aggregation, and webhooks all integrate through the same pipeline.
Internal Architecture
Figure 1: kube-apiserver internal pipeline. Every inbound request passes through TLS→Authn→Authz→Admission→Storage. The Watch subsystem feeds from the storage layer. The Aggregation Layer proxies requests to external API servers. APF enforces concurrency limits before the pipeline.
Full Request Pipeline
Every write request to the apiserver passes through the following stages in order. A failure at any stage returns an HTTP error immediately — the request never reaches the next stage.
Stage 0: TLS Termination and HTTP/2 Multiplexing
The apiserver listens on --secure-port (default 6443) with TLS 1.2+ (TLS 1.3 preferred). It uses Go's net/http server with HTTP/2 enabled via golang.org/x/net/http2. HTTP/2 multiplexing allows a single TCP connection to carry many concurrent requests — this is why a single kubectl session or a single Informer connection can carry multiple Watch streams simultaneously without connection exhaustion.
The apiserver also exposes an HTTP port (--insecure-port) that was deprecated in v1.20 and removed in v1.24. Do NOT re-enable it.
Stage 0.5: API Priority and Fairness (APF)
Before authentication, each incoming request is classified by a FlowSchema and assigned to a PriorityLevelConfiguration. This implements per-flow concurrency limits so that a misbehaving client or a "thundering herd" of reconciliations can't starve critical traffic.
# Example: FlowSchema that puts leader election requests in the leader-election priority level
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
name: leader-election
spec:
priorityLevelConfiguration:
name: leader-election # dedicated high-priority bucket
matchingPrecedence: 100 # lower = higher precedence
rules:
- subjects:
- kind: ServiceAccount
serviceAccount:
name: kube-controller-manager
namespace: kube-system
resourceRules:
- verbs: ["get","update","patch"]
apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
# Inspect APF state
kubectl get flowschemas
kubectl get prioritylevelconfigurations
kubectl get --raw /debug/api_priority_and_fairness/dump_priority_levels
kubectl get --raw /debug/api_priority_and_fairness/dump_queues
Stage 1: Authentication
The apiserver tries each configured authenticator in order until one succeeds or all fail. The request identity is a UserInfo struct containing: Username, UID, Groups, and Extra claims.
| Method | Flag | Identity Source | Common Use |
|---|---|---|---|
| x509 client certificate | --client-ca-file | Certificate CN → username, O → groups | Control plane components, admin kubeconfig |
| Static token file | --token-auth-file | CSV file of token,user,uid,group | Dev/test only; no rotation |
| Bootstrap token | --enable-bootstrap-token-auth | Secret in kube-system namespace | Node TLS bootstrapping |
| ServiceAccount token (JWT) | always enabled | JWT signed by SA key, bound to SA/pod/node | In-cluster workloads |
| OIDC | --oidc-issuer-url, --oidc-client-id | JWT from OIDC provider (Dex, Okta, etc.) | Human users via SSO |
| Webhook token | --authentication-token-webhook-config-file | External HTTP endpoint validates token | Custom authn (cloud IAM, etc.) |
| Anonymous | --anonymous-auth=true (default) | Username: system:anonymous, Group: system:unauthenticated | Health checks; disable in production |
x509 Certificate Authentication — Internals
When a client presents a TLS client certificate, the apiserver validates the certificate chain against --client-ca-file. The Subject CN becomes the username; each Subject O becomes a group. This is how control plane components authenticate:
kube-schedulercert: CN=system:kube-schedulerkube-controller-managercert: CN=system:kube-controller-manager- Node certs: CN=
system:node:nodename, O=system:nodes - Admin: CN=
kubernetes-admin, O=system:masters(bypasses RBAC!)
# View your kubeconfig cert's identity
kubectl config view --minify --raw -o jsonpath='{.users[0].user.client-certificate-data}' | base64 -d | openssl x509 -noout -subject
# Check what groups a cert belongs to
openssl x509 -in /etc/kubernetes/pki/apiserver-kubelet-client.crt -noout -text | grep -E "Subject:|Issuer:"
ServiceAccount Token — Bound Token Internals (v1.22+)
Before v1.22, ServiceAccount tokens were long-lived JWTs stored in Secrets. Since v1.22, Kubernetes uses bound service account tokens (audience/expiry/pod-bound), created on-demand by the TokenRequest API and mounted via the projected volume mechanism.
# How kubelet requests a token for a pod (TokenRequest API call)
# POST /api/v1/namespaces/{ns}/serviceaccounts/{name}/token
{
"spec": {
"audiences": ["https://kubernetes.default.svc"],
"expirationSeconds": 3600,
"boundObjectRef": {
"kind": "Pod",
"name": "my-pod",
"uid": "..."
}
}
}
# Inspect a projected service account token mounted in a pod
kubectl exec -it mypod -- cat /var/run/secrets/kubernetes.io/serviceaccount/token | \
cut -d. -f2 | base64 -d 2>/dev/null | python3 -m json.tool
Stage 2: Authorization
After authentication succeeds, the apiserver evaluates whether the identified user (UserInfo) is allowed to perform the requested action (verb on resource in namespace). Multiple authorizers are tried in order; the first to return allow or deny wins. If all return no opinion, the request is denied.
| Authorizer | Description | Production Use |
|---|---|---|
Node | Special authorizer for kubelet: nodes can only read/write resources bound to their own node | Always enable alongside RBAC |
RBAC | Role/ClusterRole + RoleBinding/ClusterRoleBinding evaluated at request time | Primary authorizer in all production clusters |
ABAC | Static policy file; can't be changed without restart | Deprecated; use RBAC |
Webhook | Calls external HTTP service to make authz decision (SubjectAccessReview) | OPA/Gatekeeper custom authz, cloud IAM integration |
AlwaysAllow | Allows everything | Dev only; never production |
AlwaysDeny | Denies everything | Testing only |
RBAC evaluation flow: The apiserver collects all RoleBindings and ClusterRoleBindings that reference the user (or any of their groups). For each binding, it checks whether the referenced Role/ClusterRole grants the requested verb on the requested resource. If any binding grants the permission, the request is allowed.
# Test authorization decisions (SubjectAccessReview)
kubectl auth can-i create pods --namespace default
kubectl auth can-i create pods --as developer --namespace default
kubectl auth can-i '*' '*' --as system:masters # cluster-admin test
# Check what a ServiceAccount can do
kubectl auth can-i list secrets --as system:serviceaccount:default:myapp
# Dry-run auth check via API
kubectl create -f - --dry-run=server << EOF
apiVersion: authorization.k8s.io/v1
kind: SelfSubjectAccessReview
spec:
resourceAttributes:
namespace: default
verb: create
resource: pods
EOF
Stage 3: Admission Control
Admission controllers run after authentication and authorization but before the object is persisted to etcd. Two phases:
Mutating Admission (Phase 1)
Can modify the incoming object. Run sequentially. Built-in mutating plugins:
DefaultStorageClass— adds default storage class to PVCs with no classDefaultTolerationSeconds— adds default tolerations for node taintsMutatingAdmissionWebhook— calls external webhooks in parallel; applies patchesNamespaceLifecycle— rejects creates in terminating namespacesServiceAccount— auto-mounts SA token, sets imagePullSecretsPodSecurity(mutating phase) — adds seccomp profile defaults
After all mutating webhooks run, the object is re-validated against the schema (to catch webhook mutations that broke structure).
Validating Admission (Phase 2)
Can only allow or reject. Run in parallel. Built-in validating plugins:
PodSecurity— enforces Pod Security Standards (Privileged/Baseline/Restricted)ResourceQuota— rejects if the request would exceed namespace quotaLimitRanger— sets default limits; rejects if outside min/max rangeNodeRestriction— limits what resources a kubelet can modifyValidatingAdmissionWebhook— calls external webhooks in parallelValidatingAdmissionPolicy(CEL, v1.26 beta) — in-process policy evaluation
# Check active admission plugins
kube-apiserver --help | grep enable-admission-plugins
# Or read from the running process:
ps aux | grep kube-apiserver | tr ' ' '\n' | grep admission
# Typical production set:
# --enable-admission-plugins=NodeRestriction,PodSecurity,ResourceQuota,LimitRanger,\
# ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,\
# MutatingAdmissionWebhook,ValidatingAdmissionWebhook,\
# Priority,StorageObjectInUseProtection,PersistentVolumeClaimResize
Stage 4: Schema Validation
Kubernetes uses OpenAPI v3 schemas generated from Go struct tags to validate object structure. Validation checks:
- Required fields are present
- Field types match (string vs int)
- Enum values are valid
- Pattern constraints (e.g., DNS label format for names)
- Custom validation via CEL
x-kubernetes-validationsin CRDs
Schema validation runs both before mutating admission (to reject malformed input early) and after mutation (to catch webhook-introduced invalidity).
Stage 5: Storage (etcd Persistence)
If all previous stages pass, the object is persisted to etcd. The storage layer performs:
- Version conversion: The object is converted from the API version the client sent to the internal "hub" version, then to the storage version. Kubernetes maintains internal versions (not exposed to clients) as a conversion hub between all external versions.
- Protobuf encoding: The internal object is serialized to protobuf (not JSON) for storage efficiency. The etcd key prefix
/registry/differentiates Kubernetes objects from other etcd data. - Optimistic locking: The
resourceVersionfrom the client's request is compared with the current etcdmodRevision. If they differ, a409 Conflictis returned. This prevents lost updates. - Encryption at rest (optional): Objects passing through the storage layer can be encrypted before writing to etcd using a configured KMS provider or AES key.
# EncryptionConfiguration — encrypt Secrets at rest
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources: ["secrets"]
providers:
- aescbc:
keys:
- name: key1
secret:
- identity: {} # fallback for unencrypted (reads only)
# Verify encryption is working
kubectl create secret generic test --from-literal=key=value
# Read directly from etcd — should see encrypted bytes, not JSON
ETCDCTL_API=3 etcdctl get /registry/secrets/default/test \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key | hexdump -C | head
Watch Machinery — Internals
The Watch API is one of the most critical and subtle parts of the apiserver. Every Informer in every controller, every kubelet, and every kube-proxy uses Watch to receive real-time updates.
watchCache (In-Memory Cache)
The apiserver maintains an in-memory circular buffer of events called the watchCache. When an object is written to etcd, the etcd watch notifies the apiserver, which updates the watchCache. Client Watch requests are served from this cache, not from etcd directly — this dramatically reduces etcd read load in large clusters.
Watch Protocol — HTTP Chunked Streaming
A Watch request is a long-lived HTTP GET with query parameter ?watch=true&resourceVersion=XYZ. The apiserver holds the connection open and sends newline-delimited JSON objects (chunked transfer encoding) as events occur. Each event has type ADDED, MODIFIED, DELETED, BOOKMARK, or ERROR.
# Watch pods raw HTTP stream
kubectl get pods --watch -v=8 2>&1 | grep -A2 "GET.*watch=true"
# Directly via curl (requires a valid token)
TOKEN=$(kubectl create token default)
curl -sk "https://localhost:6443/api/v1/pods?watch=true&resourceVersion=0" \
-H "Authorization: Bearer $TOKEN" | head -50
# Each line is one JSON event:
# {"type":"ADDED","object":{"kind":"Pod","metadata":{"name":"nginx","resourceVersion":"12345"},...}}
# {"type":"MODIFIED",...}
# {"type":"BOOKMARK","object":{"kind":"Pod","metadata":{"resourceVersion":"99999"}}} ← progress marker
When the apiserver sends a 410 Gone (because the requested resourceVersion has been compacted out of the watchCache circular buffer), the Informer performs a full relist — a fresh List followed by a new Watch starting at the returned resourceVersion.
Concurrency Control and Rate Limiting
Max In-Flight Requests
The apiserver has two semaphores controlling maximum concurrent requests:
--max-requests-inflight: Non-mutating (GET/LIST/WATCH) requests (default: 400)--max-mutating-requests-inflight: Mutating (POST/PUT/PATCH/DELETE) requests (default: 200)
When these limits are hit, new requests receive 429 Too Many Requests with a Retry-After header. APF (if enabled) replaces this with more fine-grained per-flow queuing.
etcd Request Rate Limiting
The apiserver itself doesn't rate-limit outbound etcd requests. Pressure on etcd comes from:
- High write rates (many creates/updates/deletes per second)
- Excessive LIST requests that bypass the watchCache (use
resourceVersion=0for cache reads) - Large objects (e.g., ConfigMaps with large data) that slow etcd serialization
# Monitor etcd pressure from apiserver metrics
curl -sk https://localhost:6443/metrics | grep etcd_request_duration
# Check if LIST requests are hitting etcd vs cache
# resourceVersion="" → must hit etcd (consistent read)
# resourceVersion="0" → may serve from watchCache
# resourceVersion="" → serve from cache if still available
# Force consistent read (expensive)
kubectl get pods --request-timeout=30s -v=9 2>&1 | grep resourceVersion
TLS Configuration Details
The apiserver uses multiple TLS identities simultaneously:
| Identity | Cert File Flag | Key File Flag | Purpose |
|---|---|---|---|
| Serving cert (HTTPS) | --tls-cert-file | --tls-private-key-file | Presented to all clients connecting to :6443 |
| Client CA | --client-ca-file | — | Verifies x509 client certificates during authn |
| etcd client cert | --etcd-certfile | --etcd-keyfile | apiserver's mTLS identity when connecting to etcd |
| etcd CA | --etcd-cafile | — | Verifies etcd server certificate |
| kubelet client cert | --kubelet-client-certificate | --kubelet-client-key | apiserver's identity when connecting to kubelet :10250 |
| Front-proxy cert | --proxy-client-cert-file | --proxy-client-key-file | Used when proxying to aggregated API servers |
| ServiceAccount signing key | --service-account-key-file | --service-account-signing-key-file | Public key for verifying SA tokens; private for issuing |
# Verify TLS configuration
openssl s_client -connect localhost:6443 -showcerts 2>/dev/null | openssl x509 -noout -text | grep -E "Subject:|SAN:|DNS:|IP:"
# Check SANs on the apiserver cert (must include all CP node IPs and DNS names)
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep -A 10 "Subject Alternative Name"
# Typical SANs for apiserver cert:
# DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc
# DNS:kubernetes.default.svc.cluster.local, DNS:cp-node-1
# IP:10.96.0.1 (cluster IP), IP:192.168.1.10 (CP node IP), IP:127.0.0.1
kubeadm-config ConfigMap to add certSANs, then kubeadm alpha certs renew apiserver.
API Aggregation Layer
The aggregation layer allows external API servers (called "extension API servers") to serve custom API groups under the Kubernetes API tree. This is different from CRDs — CRDs serve resources through the built-in apiserver; aggregated API servers are separate processes that handle their own storage.
# APIService registers an external API server for a group/version
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
spec:
service:
name: metrics-server
namespace: kube-system
port: 443
group: metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: false # verify metrics-server cert
caBundle:
groupPriorityMinimum: 100
versionPriority: 100
# List all registered API services (includes both built-in and aggregated)
kubectl get apiservices
# Look for: v1beta1.metrics.k8s.io, v1.admissionregistration.k8s.io, etc.
# Check status of aggregated services
kubectl get apiservices | grep -v True # show any unavailable ones
# Debug: test if metrics-server is reachable via aggregation
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
kubectl top nodes # uses aggregated metrics API
Audit Logging
The apiserver can log every request and response to an audit log. This is essential for compliance (SOC2, PCI, HIPAA) and incident investigation. Audit events are structured JSON with full request details, user identity, response code, and object changes.
# Production-grade audit policy
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all requests to Secrets, ConfigMaps at RequestResponse level
- level: RequestResponse
resources:
- group: ""
resources: ["secrets", "configmaps", "serviceaccounts/token"]
# Log RBAC changes at RequestResponse
- level: RequestResponse
resources:
- group: "rbac.authorization.k8s.io"
resources: ["roles", "clusterroles", "rolebindings", "clusterrolebindings"]
# Log exec/attach/portforward
- level: Request
resources:
- group: ""
resources: ["pods/exec", "pods/attach", "pods/portforward"]
# Log all other resource modifications at Metadata level
- level: Metadata
verbs: ["create", "update", "patch", "delete"]
# Don't log read operations on non-sensitive resources
- level: None
resources:
- group: ""
resources: ["events"]
verbs: ["get", "list", "watch"]
# Default: log metadata for everything else
- level: Metadata
# apiserver flags for audit
kube-apiserver \
--audit-log-path=/var/log/kubernetes/audit.log \
--audit-policy-file=/etc/kubernetes/audit-policy.yaml \
--audit-log-maxage=30 \
--audit-log-maxbackup=10 \
--audit-log-maxsize=100 # MB
# Parse audit logs
cat /var/log/kubernetes/audit.log | jq 'select(.verb=="create" and .responseStatus.code==201)'
cat /var/log/kubernetes/audit.log | jq 'select(.user.username != "system:serviceaccount:kube-system:node-controller" and .verb=="delete")'
# Who deleted what in the last hour?
cat /var/log/kubernetes/audit.log | jq -r 'select(.verb=="delete") | [.requestReceivedTimestamp, .user.username, .objectRef.resource, .objectRef.name] | @tsv'
HA Deployment and Horizontal Scaling
The apiserver is designed for horizontal scaling. Add replicas freely — each instance connects to the same etcd cluster and shares the same state. However, several practical considerations apply:
Load Balancer Configuration
# HAProxy configuration for HA apiserver (TCP mode)
frontend kubernetes-api
bind *:6443
mode tcp
option tcplog
default_backend kubernetes-api-servers
backend kubernetes-api-servers
mode tcp
option tcp-check
balance roundrobin
server cp1 192.168.1.10:6443 check
server cp2 192.168.1.11:6443 check
server cp3 192.168.1.12:6443 check
Scaling Considerations
| Cluster Size | Recommended apiserver Replicas | Notes |
|---|---|---|
| < 50 nodes | 1–2 | Development/staging. 2 for HA. |
| 50–500 nodes | 3 | Standard production HA |
| 500–2000 nodes | 3–5 | Tune --max-requests-inflight |
| > 2000 nodes | 5+ | Shard by namespace or use APF aggressively |
Critical Flags Reference
Full Production-Relevant Flags
| Flag | Default | Production Value | Why |
|---|---|---|---|
--secure-port | 6443 | 6443 | Standard; don't change |
--insecure-port | 0 | 0 | Must be 0; removed v1.24 |
--anonymous-auth | true | false | Disable to prevent anonymous API access |
--authorization-mode | AlwaysAllow | Node,RBAC | Critical: AlwaysAllow in dev is dangerous |
--enable-admission-plugins | few | see above | Enable security plugins |
--audit-log-path | "" | /var/log/... | Required for compliance |
--encryption-provider-config | "" | config file | Encrypt Secrets at rest |
--max-requests-inflight | 400 | 1200 (large) | Tune per cluster size |
--max-mutating-requests-inflight | 200 | 400 (large) | Tune per cluster size |
--request-timeout | 60s | 300s | Long watch connections need longer timeout |
--watch-cache-sizes | auto | pods#1000 | Increase for busy resources |
--default-watch-cache-size | 100 | 500 | Events buffered per resource type |
--profiling | true | false | Disable in production; exposes /debug/pprof |
--enable-priority-and-fairness | true (v1.20+) | true | APF replaces old max-inflight when enabled |
--tls-min-version | TLS1.2 | VersionTLS12 | Never TLS1.0/1.1 |
--tls-cipher-suites | defaults | restrict to strong ciphers | Disable RC4, 3DES |
Key Metrics to Monitor
# Get all apiserver metrics
kubectl get --raw /metrics | grep "^apiserver_"
# Request latency (most important SLO metric)
apiserver_request_duration_seconds_bucket{verb="GET",resource="pods"}
apiserver_request_duration_seconds_bucket{verb="POST",resource="pods"}
# Request rate and error rate
apiserver_request_total{code="200"}
apiserver_request_total{code="500"}
apiserver_request_total{code="429"} # rate limiting
# Watch connection count
apiserver_registered_watchers # total active watch connections
apiserver_watch_events_total # events sent per resource
# etcd latency from apiserver's perspective
etcd_request_duration_seconds_bucket{operation="get"}
etcd_request_duration_seconds_bucket{operation="put"}
# Cache state
apiserver_cache_list_fetched_objects_total
apiserver_cache_list_returned_objects_total # should be ≥ fetched if filtering
# APF metrics
apiserver_flowcontrol_current_inqueue_requests
apiserver_flowcontrol_current_executing_requests
apiserver_flowcontrol_rejected_requests_total
Prometheus alerting rules to configure:
- alert: APIServerHighErrorRate
expr: rate(apiserver_request_total{code=~"5.."}[5m]) / rate(apiserver_request_total[5m]) > 0.01
for: 5m
annotations:
summary: "apiserver error rate > 1%"
- alert: APIServerHighLatency
expr: histogram_quantile(0.99, rate(apiserver_request_duration_seconds_bucket{verb!="WATCH"}[5m])) > 1
for: 5m
annotations:
summary: "p99 API latency > 1s"
- alert: APIServerRateLimiting
expr: rate(apiserver_request_total{code="429"}[5m]) > 0
for: 1m
annotations:
summary: "API server is rate-limiting requests"
Troubleshooting kube-apiserver
Startup Failures
# Check static pod is being attempted
journalctl -u kubelet --since "5 min ago" | grep -i apiserver
# Check if the process is running
crictl ps --name kube-apiserver
crictl logs $(crictl ps --name kube-apiserver -q) 2>&1 | tail -50
# Common startup failures:
# 1. etcd not reachable
# Log: "Failed to connect to etcd"
# Fix: check etcd status, verify --etcd-servers flag, check firewall
# 2. Certificate errors
# Log: "Failed to read or parse CA cert"
# Fix: verify cert file paths in manifest; check cert not expired
# 3. Port already in use
# Log: "bind: address already in use"
# Fix: check what's on :6443; likely another apiserver or lingering process
# 4. Bad flag/config
# Log: "Error: invalid value for flag"
# Fix: validate manifest YAML and flag values
503 Service Unavailable
# 503 from apiserver usually means the apiserver itself is healthy but
# a backend (aggregated API server) is down
# Check aggregated API services
kubectl get apiservices | grep -v True
kubectl describe apiservice v1beta1.metrics.k8s.io # see "Last Transition Condition"
# If metrics-server is down and HPA depends on it, HPA will fail
# Fix: debug the metrics-server pods
kubectl get pods -n kube-system -l k8s-app=metrics-server
kubectl logs -n kube-system -l k8s-app=metrics-server
Watch Streams Stuck / Stale Informers
# Symptoms: controller reconciles old state; kubectl get shows stale data
# Cause: watch connection was silently dropped without 410 error
# Force informer resync
# (most controllers do this automatically every 30–60 minutes)
# Check apiserver watch metrics
kubectl get --raw /metrics | grep apiserver_registered_watchers
# Check for 410 Gone events causing relists (expected but should be infrequent)
# In controller logs, look for:
# "watch closed with: very old resource version" → relist triggered
# Increase watchCache size for frequently-watched resources
kube-apiserver --watch-cache-sizes=pods#2000,nodes#500
Slow API Responses
# Step 1: Check if the slowness is in etcd or apiserver processing
kubectl get --raw /metrics | grep "etcd_request_duration"
# High etcd latency (p99 > 100ms):
# - etcd disk I/O problem → check iostat, ensure SSD
# - etcd leader election in progress → check etcd_server_leader_changes
# - Large objects → check etcd_mvcc_db_total_size_in_bytes
# Step 2: Check apiserver admission webhook latency
kubectl get --raw /metrics | grep "apiserver_admission_webhook_admission_duration"
# Slow mutating/validating webhooks block every write request
# Fix: add timeout to webhook config, or disable slow webhooks
# Step 3: Check APF queue depth
kubectl get --raw /debug/api_priority_and_fairness/dump_queues | python3 -m json.tool
# Step 4: Profile the apiserver (if --profiling=true)
kubectl port-forward -n kube-system kube-apiserver-cp-node-1 6443:6443
curl -sk https://localhost:6443/debug/pprof/goroutine?debug=2 > goroutine.txt
Production Checklist
15-Item apiserver Production Checklist
| # | Check | Command to Verify |
|---|---|---|
| 1 | anonymous-auth disabled | ps aux | grep apiserver | grep anonymous-auth=false |
| 2 | Authorization mode = Node,RBAC | ps aux | grep apiserver | grep authorization-mode |
| 3 | Audit logging enabled with production policy | ls -la /var/log/kubernetes/audit.log |
| 4 | Encryption at rest for Secrets | etcdctl get /registry/secrets/default/test | hexdump | grep k8s:enc |
| 5 | profiling disabled | curl -sk https://localhost:6443/debug/pprof/ should return 403 |
| 6 | Cert expiry > 30 days | kubeadm certs check-expiration |
| 7 | TLS min version 1.2 | openssl s_client -tls1_1 -connect localhost:6443 should fail |
| 8 | APF FlowSchemas configured for critical traffic | kubectl get flowschemas |
| 9 | NodeRestriction admission enabled | check --enable-admission-plugins flag |
| 10 | PodSecurity admission enabled | check --enable-admission-plugins flag |
| 11 | Aggregated API services all healthy | kubectl get apiservices | grep -v True |
| 12 | Resource requests set on static pod manifest | grep resources /etc/kubernetes/manifests/kube-apiserver.yaml |
| 13 | Alerting on error rate and latency | check Prometheus/Alertmanager rules |
| 14 | etcd client certs on separate CA from serving certs | openssl x509 -in /etc/kubernetes/pki/apiserver-etcd-client.crt -noout -issuer |
| 15 | Service account signing key rotation policy | check if --service-account-key-file supports multiple keys for rotation |
Dependency Graph and Next Files
This File Covers
- Full request pipeline (7 stages)
- Authentication methods (7)
- Authorization modes + RBAC flow
- Admission control (mutating/validating)
- Watch machinery + watchCache
- APF and concurrency control
- TLS config and all cert identities
- Aggregation layer
- Audit logging
- HA scaling
- Key metrics + alerting rules