Audit Logging
Kubernetes API server audit logging: policy levels, stages, backends, log format, noise reduction, SIEM integration, and forensic query patterns for security investigation and compliance.
Coverage Checklist
- Audit logging architecture: kube-apiserver only
- 4 policy levels: None/Metadata/Request/RequestResponse
- 4 audit stages: RequestReceived/ResponseStarted/ResponseComplete/Panic
- AuditPolicy YAML: rules, verbs, resources, namespaces, users
- omitStages and omitManagedFields
- Log backend: file rotation, path, maxAge, maxBackups
- Webhook backend: config, batchType, throttle, truncate
- Audit event JSON: all fields documented
- Noise reduction: system components, health checks, watch events
- Production policy: tiered rules pattern
- SIEM: Fluentd/Fluent Bit → Elasticsearch, Splunk, Loki
- Falco audit rules via webhook
- Forensic queries: secret access, exec, privilege escalation
- jq query patterns for log analysis
- Dynamic audit webhooks (auditregistration.k8s.io)
- Compliance: PCI DSS, SOC 2, CIS requirements
- 5 metrics, 4 alerts, 5 runbooks, 8 best practices
Audit Logging Overview
Kubernetes audit logging records every request processed by the API server — who made the request, what they requested, and what the outcome was. It is the authoritative trail for security investigation, compliance, and forensics.
Kubernetes audit logs only capture activity that goes through the API server. Direct kubelet API calls (port 10250), container runtime operations, and in-process activity within pods are not captured here. Use Falco for syscall-level runtime visibility alongside audit logs.
Client
│ (kubectl / controller / service account / external)
│
▼
kube-apiserver
│
├── Stage: RequestReceived ← logged as soon as request arrives
│
├── AuthN + AuthZ + Admission
│
├── Stage: ResponseStarted ← for long-running responses (watch, exec)
│
├── Handler executes (etcd read/write)
│
└── Stage: ResponseComplete ← most important; includes response code
Stage: Panic ← if handler panics
│ │
▼ ▼
Log Backend Webhook Backend
(local file) (external audit sink)
What Audit Logs Answer
Who accessed what?
Track which user, service account, or node read a Secret, ConfigMap, or other sensitive resource.
What changed and when?
Full timeline of creates, updates, deletes to any resource, including the request body at RequestResponse level.
Failed authentication attempts
Unauthorized (401) and Forbidden (403) responses indicate credential probing or misconfigured RBAC.
Privilege escalation paths
Track creation of ClusterRoleBindings, ServiceAccounts, and impersonation attempts.
Policy Levels & Stages
Audit Levels
Each audit rule assigns a level that controls how much detail is recorded. Higher levels capture more data but increase log volume and API server overhead.
| Level | What Is Logged | Overhead | Use For |
|---|---|---|---|
None |
Nothing — request is not logged at all | Zero | High-volume noise: health checks, metrics scrapes, leader election |
Metadata |
Request metadata only: user, timestamp, resource, verb, response code. No request or response body. | Low | Most resources: read operations, watch events |
Request |
Metadata + request body (but not response body) | Medium | Writes to important resources where you need to see what was submitted |
RequestResponse |
Metadata + request body + response body | High | Secrets, RBAC changes, exec, portforward — highest-value forensic data |
Logging Secrets at RequestResponse level includes the base64-encoded secret data in the log. Ensure your log pipeline encrypts logs at rest and restricts access to the audit log backend. This is often a compliance requirement violation if left unprotected.
Audit Stages
| Stage | When Emitted | Response Code Available? | Use |
|---|---|---|---|
RequestReceived | Immediately on arrival, before AuthN/AuthZ | No | Detecting requests that crash the handler |
ResponseStarted | After headers sent, before body | Yes | Long-running: watch, exec, port-forward |
ResponseComplete | After full response sent | Yes | Standard request completion — the primary stage to capture |
Panic | On handler panic (500) | Yes (500) | Bug detection and intrusion via panic exploitation |
Watch requests emit ResponseStarted when the watch begins and ResponseComplete when it closes — but also emit one event per change notification. Set omitStages: [ResponseStarted] globally to eliminate watch startup noise while keeping completion events.
Writing Audit Policies
The audit policy is a YAML file passed to the API server via --audit-policy-file. Rules are evaluated in order; the first matching rule determines the level. If no rule matches, the request is not logged.
Minimal Policy (Baseline)
apiVersion: audit.k8s.io/v1
kind: Policy
# Omit ResponseStarted for watch-type long-running requests
omitStages:
- RequestReceived
rules:
# Rule 1: Don't log read-only requests from system components
- level: None
users:
- system:kube-scheduler
- system:kube-proxy
- system:apiserver
- system:kube-controller-manager
- system:serviceaccount:kube-system:endpoint-controller
verbs: [get, watch, list]
# Rule 2: Don't log health/readiness probes
- level: None
nonResourceURLs:
- /healthz*
- /readyz*
- /livez*
- /version
- /swagger*
- /openapi*
# Rule 3: Don't log metrics scrape
- level: None
nonResourceURLs:
- /metrics
- /metrics/cadvisor
# Rule 4: Full detail for secrets (no response body — value excluded)
- level: Request
resources:
- group: ""
resources: [secrets, configmaps, serviceaccounts/token]
# Rule 5: Full detail for RBAC changes
- level: RequestResponse
resources:
- group: rbac.authorization.k8s.io
resources: [roles, clusterroles, rolebindings, clusterrolebindings]
# Rule 6: Log exec/attach/portforward at Request level (captures command args)
- level: Request
resources:
- group: ""
resources: [pods/exec, pods/attach, pods/portforward]
# Rule 7: Metadata for most other resources
- level: Metadata
resources:
- group: ""
- group: apps
- group: batch
- group: networking.k8s.io
- group: policy
- group: storage.k8s.io
Production Policy (Tiered)
A production policy uses a tiered approach: suppress noise aggressively, capture sensitive operations in full, and use Metadata as the safe default for everything else.
apiVersion: audit.k8s.io/v1
kind: Policy
omitStages:
- RequestReceived
omitManagedFields: true # Removes managedFields from request/response bodies
rules:
# ── TIER 0: Complete suppression (high volume, no security value) ────
- level: None
userGroups: [system:nodes]
verbs: [get, watch, list]
resources:
- group: ""
resources: [endpoints, services, pods, nodes]
- level: None
nonResourceURLs: [/healthz*, /readyz*, /livez*, /version, /metrics*]
# Suppress leader election coordination (very high volume, no security value)
- level: None
resources:
- group: coordination.k8s.io
resources: [leases]
# Suppress event updates (informational only, not security relevant)
- level: None
resources:
- group: ""
resources: [events]
# ── TIER 1: RequestResponse — highest security value ─────────────────
# RBAC mutations — full request+response for privilege escalation detection
- level: RequestResponse
verbs: [create, update, patch, delete]
resources:
- group: rbac.authorization.k8s.io
resources:
- roles
- clusterroles
- rolebindings
- clusterrolebindings
# ServiceAccount token creation and SA mutations
- level: RequestResponse
resources:
- group: ""
resources: [serviceaccounts, serviceaccounts/token]
# Pod exec/attach — captures command run (command in URI params)
- level: Request
resources:
- group: ""
resources: [pods/exec, pods/attach, pods/portforward]
# ── TIER 2: Request — important writes ───────────────────────────────
# Secret reads and writes (NOT RequestResponse — would log secret values)
- level: Request
verbs: [get, list, create, update, patch, delete]
resources:
- group: ""
resources: [secrets]
# Admission webhook configurations — changes here affect all cluster security
- level: RequestResponse
resources:
- group: admissionregistration.k8s.io
resources: [mutatingwebhookconfigurations, validatingwebhookconfigurations]
# ── TIER 3: Metadata — default for everything else ───────────────────
- level: Metadata
API Server Flags
# kube-apiserver flags for audit logging
--audit-log-path=/var/log/kubernetes/audit/audit.log
--audit-log-maxage=30 # Days to retain rotated log files
--audit-log-maxbackup=10 # Max number of old log files to keep
--audit-log-maxsize=100 # Max size in MB before rotation
--audit-log-compress # Compress rotated files with gzip
--audit-policy-file=/etc/kubernetes/audit-policy.yaml
--audit-log-format=json # json (default) or legacy (text)
Audit Backends
Log Backend
The log backend writes audit events to a file on the API server node in JSON Lines format (one JSON object per line). This is the simplest setup and suitable for environments with a log collection agent (Fluentd, Fluent Bit, Filebeat) on the node.
| Flag | Default | Description |
|---|---|---|
--audit-log-path | — | File path; required to enable log backend. Use - for stdout. |
--audit-log-maxage | 0 | Max days to retain rotated files (0 = unlimited) |
--audit-log-maxbackup | 0 | Max number of rotated backup files (0 = unlimited) |
--audit-log-maxsize | 100 | Max size in megabytes before rotation |
--audit-log-compress | false | gzip compress rotated files |
--audit-log-format | json | json (structured) or legacy (text) |
Webhook Backend
The webhook backend sends audit events to an external HTTP endpoint in batches. Used when you want real-time delivery to a SIEM or an audit aggregator without first writing to disk.
# webhook-audit-config.yaml — kubeconfig-format endpoint configuration
apiVersion: v1
kind: Config
clusters:
- name: audit-webhook
cluster:
server: https://audit-sink.internal:8888/audit
certificate-authority: /etc/kubernetes/audit-webhook-ca.crt
contexts:
- name: webhook
context:
cluster: audit-webhook
user: ""
current-context: webhook
# API server flags for webhook backend
--audit-webhook-config-file=/etc/kubernetes/webhook-audit-config.yaml
--audit-webhook-mode=batch # batch (default) or blocking
--audit-webhook-batch-max-size=400 # Events per batch
--audit-webhook-batch-max-wait=30s # Max time before sending incomplete batch
--audit-webhook-initial-backoff=10s # Initial retry delay
--audit-webhook-truncate-enabled # Truncate oversized events instead of dropping
--audit-webhook-truncate-max-event-size=10485760 # 10MB max event
batch mode (default) is asynchronous — the API server does not wait for the webhook to confirm receipt. Events can be dropped if the webhook is down and the buffer fills. blocking mode makes every request wait for the webhook to respond before the API server returns — this adds latency to every API call and should only be used if you can guarantee sub-millisecond webhook response times.
Audit Event Format
Each audit event is a JSON object with a well-defined schema. Understanding the fields is essential for writing effective SIEM queries and Falco rules.
{
"kind": "Event",
"apiVersion": "audit.k8s.io/v1",
// Unique ID for this event (UUID)
"auditID": "f23a4b56-1234-5678-abcd-000000000001",
// Which stage this event was emitted from
"stage": "ResponseComplete",
// The full request URI
"requestURI": "/api/v1/namespaces/production/secrets/db-password",
// HTTP verb
"verb": "get",
// Authenticated user information
"user": {
"username": "system:serviceaccount:production:myapp",
"uid": "abc123",
"groups": ["system:serviceaccounts", "system:serviceaccounts:production"]
},
// If request was made via impersonation
"impersonatedUser": {
"username": "admin@example.com"
},
// Source IPs (first is original, rest are proxies)
"sourceIPs": ["10.244.1.5", "192.168.1.100"],
// User-Agent of the client
"userAgent": "kubectl/v1.29.0 (linux/amd64)",
// What resource was accessed
"objectRef": {
"resource": "secrets",
"namespace": "production",
"name": "db-password",
"apiVersion": "v1"
},
// HTTP response code
"responseStatus": {
"code": 200
},
// Request body (if level=Request or RequestResponse)
"requestObject": { "...": "..." },
// Response body (if level=RequestResponse)
"responseObject": { "...": "..." },
// Timestamps
"requestReceivedTimestamp": "2024-01-15T10:30:00.000000Z",
"stageTimestamp": "2024-01-15T10:30:00.005000Z",
// Annotations added by admission plugins
"annotations": {
"authorization.k8s.io/decision": "allow",
"authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding..."
}
}
Noise Reduction
An untuned audit policy on a production cluster can generate hundreds of megabytes of logs per hour. Most of this is noise. The goal is to capture high-value events without drowning the pipeline.
High-Volume Low-Value Sources
| Source | Why It's Noisy | Recommendation |
|---|---|---|
| kube-controller-manager | Constantly lists/watches almost all resources | None for system:kube-controller-manager reads |
| kube-scheduler | Lists pods and nodes constantly | None for system:kube-scheduler reads |
| kube-proxy / CNI | Watches endpoints, services, nodes | None for these system components |
| Lease updates | Leader election emits update every 2–5s per component | None for coordination.k8s.io/leases |
| Watch events | Each change to a watched resource emits a log entry at ResponseStarted | omitStages: [ResponseStarted] globally |
| Health checks | /healthz, /readyz hit every few seconds from load balancers | None for nonResourceURLs matching /health* |
| Metrics scrapes | Prometheus scrapes /metrics every 15–30s | None for nonResourceURLs matching /metrics* |
| Events resource | High churn; informational only | None for core/events |
Volume Estimation
# Estimate audit log volume before deploying (dry-run with --audit-log-path=-)
# Then count events per second during normal operation:
# Count events per minute from audit log
tail -f /var/log/kubernetes/audit/audit.log | \
jq -r '.stageTimestamp' | \
awk -F: '{print $1":"$2}' | uniq -c
# Top users by event count (last 1000 events)
tail -1000 /var/log/kubernetes/audit/audit.log | \
jq -r '.user.username' | sort | uniq -c | sort -rn | head -20
# Top resources by event count
tail -1000 /var/log/kubernetes/audit/audit.log | \
jq -r '.objectRef.resource // "non-resource"' | sort | uniq -c | sort -rn | head -20
SIEM Integration
│
├── audit.log (JSON Lines on node)
│ │
│ Fluent Bit DaemonSet
│ (tail /var/log/kubernetes/audit/audit.log)
│ │
│ ├──▶ Elasticsearch / OpenSearch (Kibana dashboards)
│ ├──▶ Splunk HEC (Splunk SIEM)
│ └──▶ Loki (Grafana dashboards)
│
└── Webhook backend
│
└──▶ Falco gRPC / HTTP sink (real-time alerting)
Fluent Bit Configuration for Audit Logs
# fluent-bit-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Log_Level info
Parsers_File parsers.conf
# Read audit log file — must be on control plane node
[INPUT]
Name tail
Tag kube.audit
Path /var/log/kubernetes/audit/audit.log
Parser json
DB /var/log/flb_kube_audit.db
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10
# Add cluster metadata
[FILTER]
Name record_modifier
Match kube.audit
Record cluster ${CLUSTER_NAME}
Record environment ${ENVIRONMENT}
# Forward to Elasticsearch
[OUTPUT]
Name es
Match kube.audit
Host elasticsearch.logging.svc.cluster.local
Port 9200
Index k8s-audit
Type _doc
Logstash_Format On
Logstash_Prefix k8s-audit
Elasticsearch Query Examples
# Elasticsearch: find all secret reads in the last hour
GET k8s-audit-*/_search
{
"query": {
"bool": {
"must": [
{ "term": { "objectRef.resource": "secrets" } },
{ "term": { "verb": "get" } },
{ "range": { "stageTimestamp": { "gte": "now-1h" } } }
],
"must_not": [
{ "term": { "user.username": "system:serviceaccount:kube-system:cert-manager" } }
]
}
}
}
Forensic Query Patterns
These jq patterns work against raw audit log files and translate directly to SIEM queries.
Secret Access Investigation
# All users who read any secret in the last 24h
cat audit.log | jq -r '
select(.objectRef.resource == "secrets") |
select(.verb == "get") |
select(.responseStatus.code == 200) |
[.stageTimestamp, .user.username, .objectRef.namespace, .objectRef.name] |
@tsv
'
# Secrets read by service accounts outside their own namespace
cat audit.log | jq -r '
select(.objectRef.resource == "secrets") |
select(.verb == "get") |
select(.user.username | startswith("system:serviceaccount:")) |
select(
(.user.username | split(":")[3]) !=
(.objectRef.namespace // "cluster-scoped")
) |
[.stageTimestamp, .user.username, .objectRef.namespace, .objectRef.name] |
@tsv
'
Privilege Escalation Detection
# New ClusterRoleBindings created (potential privilege escalation)
cat audit.log | jq -r '
select(.objectRef.resource == "clusterrolebindings") |
select(.verb == "create") |
select(.responseStatus.code == 201) |
[.stageTimestamp, .user.username, .objectRef.name,
(.requestObject.roleRef.name // "unknown")] |
@tsv
'
# Any binding to cluster-admin role
cat audit.log | jq -r '
select(.objectRef.resource | test("rolebinding")) |
select(.verb == "create") |
select(.requestObject.roleRef.name == "cluster-admin") |
[.stageTimestamp, .user.username, .objectRef.namespace // "cluster",
.objectRef.name, (.requestObject.subjects[0].name // "unknown")] |
@tsv
'
Pod Exec Investigation
# All exec sessions (command in URI params)
cat audit.log | jq -r '
select(.objectRef.subresource == "exec") |
[.stageTimestamp, .user.username, .objectRef.namespace,
.objectRef.name, .requestURI] |
@tsv
'
# Exec by humans (non-service-accounts) — should be rare in prod
cat audit.log | jq -r '
select(.objectRef.subresource == "exec") |
select(.user.username | startswith("system:serviceaccount") | not) |
[.stageTimestamp, .user.username, .objectRef.namespace,
.objectRef.name, .requestURI] |
@tsv
'
Unauthorized Access Patterns
# All 403 Forbidden responses — RBAC denials
cat audit.log | jq -r '
select(.responseStatus.code == 403) |
[.stageTimestamp, .user.username, .verb,
.objectRef.resource, .objectRef.namespace, .objectRef.name] |
@tsv
'
# Top users by 403 count (credential probing detection)
cat audit.log | jq -r '
select(.responseStatus.code == 403) |
.user.username
' | sort | uniq -c | sort -rn | head -10
# 401 Unauthorized (authentication failures)
cat audit.log | jq -r '
select(.responseStatus.code == 401) |
[.stageTimestamp, .user.username, .sourceIPs[0], .requestURI] |
@tsv
'
Node and Kubelet Activity
# Nodes accessing other nodes' resources (NodeRestriction violation)
cat audit.log | jq -r '
select(.user.groups[] == "system:nodes") |
select(.user.username != ("system:node:" + .objectRef.name)) |
select(.objectRef.resource == "nodes") |
[.stageTimestamp, .user.username, .verb, .objectRef.name] |
@tsv
'
Dynamic Audit Webhooks
Dynamic audit webhooks (auditregistration.k8s.io) allow configuring audit sinks via Kubernetes API objects without restarting the API server. This feature is alpha/beta in older versions — check availability in your cluster version.
The AuditSink API was introduced in 1.13 as alpha and has been deprecated. In modern clusters (1.25+), prefer configuring the static webhook backend at API server startup. Dynamic audit webhook registration via the API is no longer the recommended path.
Falco as an Audit Sink
Falco can receive Kubernetes audit events via webhook and apply its rule engine to detect security-relevant patterns in real time.
# Configure audit webhook to send to Falco's audit endpoint
# webhook-audit-config.yaml
clusters:
- name: falco
cluster:
server: http://falco.falco-system.svc.cluster.local:8765/k8s-audit
contexts:
- name: falco
context:
cluster: falco
user: ""
current-context: falco
# Falco rule triggered by audit event: detect anonymous kubectl access
- rule: K8s Anonymous Request
desc: Detect requests from system:anonymous or system:unauthenticated
condition: >
ka.user.name in ("system:anonymous", "system:unauthenticated")
output: >
Anonymous request to API server
(user=%ka.user.name verb=%ka.verb uri=%ka.uri
response=%ka.response.code sourceip=%ka.source.ip)
priority: CRITICAL
source: k8s_audit
- rule: K8s ClusterAdmin Binding Created
desc: Detect creation of ClusterRoleBindings granting cluster-admin
condition: >
ka.verb = "create" and
ka.target.resource = "clusterrolebindings" and
ka.req.binding.role = "cluster-admin"
output: >
cluster-admin ClusterRoleBinding created
(user=%ka.user.name binding=%ka.target.name subject=%ka.req.binding.subjects)
priority: CRITICAL
source: k8s_audit
- rule: K8s Secret Access
desc: Detect reads of Kubernetes Secrets
condition: >
ka.target.resource = "secrets" and
ka.verb in ("get", "list") and
not ka.user.name startswith "system:serviceaccount:kube-system:"
output: >
Kubernetes Secret accessed
(user=%ka.user.name secret=%ka.target.name ns=%ka.target.namespace)
priority: WARNING
source: k8s_audit
Compliance Requirements
| Framework | Requirement | Kubernetes Audit Coverage |
|---|---|---|
| PCI DSS 10 | Log all access to cardholder data, all auth attempts, privileged actions | Secrets access at Request level; 403/401 logging; RBAC change logging |
| SOC 2 CC6.1 | Logical access controls, monitoring of access | All read/write to sensitive resources; user activity timeline |
| HIPAA § 164.312(b) | Audit controls: record and examine activity on systems with ePHI | Full audit trail for any namespace containing ePHI workloads |
| CIS Benchmark 3.2.1 | Ensure audit log enabled | --audit-log-path and --audit-policy-file must be set |
| CIS Benchmark 3.2.2 | Ensure audit policy covers required events | Policy must include secrets, RBAC, exec, auth failures |
| NIST 800-53 AU-2 | Audit event logging | All authentication, authorization, and privileged actions |
| ISO 27001 A.12.4 | Event logging and protection of log information | Immutable log storage; log access controls |
Most compliance frameworks require 1 year minimum log retention with 90 days readily accessible. Plan your log storage accordingly: a busy cluster at Metadata level for most resources generates ~50–200GB/month. Use compressed cold storage (S3 Glacier, GCS Nearline) for logs older than 90 days.
Audit Log Integrity
# Audit logs must be protected from modification
# Ship to immutable append-only storage immediately:
# 1. S3 Object Lock (compliance mode)
aws s3api put-object-lock-configuration \
--bucket k8s-audit-logs \
--object-lock-configuration '{"ObjectLockEnabled":"Enabled","Rule":{"DefaultRetention":{"Mode":"COMPLIANCE","Years":1}}}'
# 2. Restrict who can access audit logs via RBAC
# Audit logs should NOT be readable by workload service accounts
# 3. WORM storage at OS level (if keeping on-disk)
# chattr +a /var/log/kubernetes/audit/audit.log (append-only)
Metrics, Alerts & Runbooks
Key Metrics
| Metric | Source | Description |
|---|---|---|
apiserver_audit_event_total | kube-apiserver | Total audit events generated (by level) |
apiserver_audit_requests_rejected_total | kube-apiserver | Events dropped due to backend overflow |
apiserver_audit_level_total | kube-apiserver | Events per audit level (None/Metadata/Request/RequestResponse) |
falco_events_total | Falco | Falco events triggered by k8s_audit rules |
log_storage_bytes_total | Node exporter | Audit log disk usage |
Alerts
# Alert: Audit events being dropped
- alert: AuditEventsDropped
expr: increase(apiserver_audit_requests_rejected_total[5m]) > 0
for: 1m
severity: critical
annotations:
summary: "Audit events being dropped — compliance gap"
# Alert: High rate of 403 responses (potential attack probe)
- alert: HighForbiddenRate
expr: >
sum(rate(apiserver_request_total{code="403"}[5m])) > 10
for: 5m
annotations:
summary: "High rate of 403 responses — possible RBAC probing"
# Alert: cluster-admin binding created
- alert: ClusterAdminBindingCreated
expr: increase(falco_events_total{rule="K8s ClusterAdmin Binding Created"}[5m]) > 0
for: 0m
severity: critical
annotations:
summary: "cluster-admin ClusterRoleBinding created — immediate investigation required"
# Alert: Audit log file not growing (logging broken)
- alert: AuditLogStale
expr: rate(apiserver_audit_event_total[5m]) == 0
for: 10m
annotations:
summary: "No audit events generated in 10 minutes — audit logging may be broken"
Runbooks
Audit Events Being Dropped
1. Check apiserver_audit_requests_rejected_total for trend
2. Check webhook backend latency and availability
3. Increase --audit-webhook-batch-max-size or reduce policy verbosity
4. If log backend: check disk space on control plane node
Secret Accessed Unexpectedly
1. Query audit log: who accessed the secret and when
2. Check if access was from expected service account
3. If unexpected: rotate the secret immediately
4. Check if pod was compromised (Falco events, exec history)
cluster-admin Binding Alert
1. Identify who created it: jq query on audit log
2. Determine if authorized (planned infra change vs incident)
3. If unauthorized: delete binding, investigate originating pod/user
4. Review RBAC for how the creator had permission to create the binding
High 403 Rate
1. Identify source: top users/IPs by 403 count
2. Determine if misconfigured app (wrong SA permissions) vs attack
3. For misconfigured app: fix RBAC
4. For attack: block source IP at ingress/firewall level
Audit Log Pipeline Failure
1. Check Fluent Bit pod health: kubectl logs -n logging fluent-bit-xxx
2. Verify audit log file is being written on control plane
3. Check Elasticsearch/SIEM receiver is accepting connections
4. Verify TLS certificates for webhook backend haven't expired
Best Practices
Always set an audit policy — never use the default (no policy = no logging)
A missing --audit-policy-file flag means nothing is logged. The absence of audit logs is a compliance failure and makes incident investigation impossible.
Use tiered rules: None → Metadata → Request → RequestResponse
Start with aggressive noise suppression, escalate to higher levels only for security-sensitive resources. A flat "log everything at RequestResponse" policy will generate terabytes of data and obscure the signals you need.
Never log secrets at RequestResponse level
The response body of a secret read contains base64-encoded secret data. Log secrets at Request level (captures who accessed what without exposing the value) or Metadata level if only access patterns matter.
Ship logs to immutable external storage immediately
Audit logs on the control plane node can be deleted by a cluster administrator. Stream to an external SIEM or object storage with Object Lock enabled before an attacker can cover their tracks.
Alert on audit event drops
Dropped audit events are a compliance gap. Monitor apiserver_audit_requests_rejected_total and treat any non-zero value as a critical alert requiring immediate investigation.
Add Falco as a webhook audit sink for real-time alerting
File-based audit logs are useful for investigation but not for real-time detection. Route audit events to Falco via webhook backend to trigger alerts within seconds of suspicious activity.
Include authorization annotations in your policy
The authorization.k8s.io/decision annotation in audit events records whether access was allowed or denied. Capturing this at Metadata level for all resources costs almost nothing but enables powerful RBAC misconfiguration analysis.
Set omitManagedFields: true
Managed fields metadata in request/response bodies can be extremely verbose (often larger than the actual object). Setting omitManagedFields: true in the audit policy reduces log size by 30–60% with no loss of security-relevant information.