Defense in Depth

Security hardening is not a checklist you complete once — it is a continuous process across multiple layers. No single control prevents all attacks. The goal is to make lateral movement expensive enough that attackers are detected before they reach critical assets.

Kubernetes Security Layers
  Layer 7: Supply chain    ← Image signing, SBOM, Cosign, SLSA
  Layer 6: Workload        ← Pod Security Standards, securityContext
  Layer 5: Application     ← RBAC, ServiceAccount, NetworkPolicy
  Layer 4: Secrets         ← Vault, ESO, etcd encryption, IRSA
  Layer 3: Cluster         ← CIS benchmark, admission control, audit logs
  Layer 2: Node            ← OS hardening, IMDSv2, containerd AppArmor/Seccomp
  Layer 1: Network         ← VPC isolation, private endpoint, Security Groups
  Layer 0: Identity        ← IAM roles, OIDC federation, MFA, PAM

  Attacker must breach ALL layers. Defender needs to stop at ANY layer.
ThreatPrimary ControlDetection
Container escape (kernel exploit)seccomp RuntimeDefault, AppArmor, non-root, read-only rootfsFalco: syscall anomaly
Privileged container abusePSS Restricted, Kyverno disallow-privilegedFalco: privileged spawn, kube-bench
Lateral movement via ServiceAccountautomountServiceAccountToken: false, least-privilege RBACAudit logs: unexpected API calls
Secret exfiltrationetcd encryption, Vault, NetworkPolicy restrict egressFalco: /proc/*/environ read, unusual network
Malicious imageCosign image signing, Kyverno verifyImagesTrivy Operator: CVE scan on running pods
RBAC escalationNo cluster-admin, audit RoleBindings quarterlyAudit logs: bind/escalate verbs
Node compromise via IMDSIMDSv2 mandatory, hop-limit 1 (blocks pod access)CloudTrail: unexpected IAM calls from EC2
etcd direct accessetcd mTLS, private subnet, no public endpointetcd audit logs

CIS Kubernetes Benchmark

The CIS Kubernetes Benchmark provides prescriptive hardening guidance covering control plane, worker nodes, policies, and managed services. The benchmark is versioned per Kubernetes release — use the version matching your cluster.

Key CIS control areas

SectionKey ControlsAutomated Check
1. Control PlaneAPI server flags: anonymous-auth=false, kubelet HTTPS, RBAC enabled, NodeRestriction admission, audit loggingkube-bench section 1
2. etcdTLS client auth, data encryption at rest, separate etcd network interfacekube-bench section 2
3. Control Plane ConfigController manager: service-account-private-key, root-ca-file, profiling disabledkube-bench section 3
4. Worker Nodeskubelet: anonymous auth disabled, authorization mode Webhook, read-only port disabled, TLS cert rotation, protect kernel defaultskube-bench section 4
5. PoliciesRBAC, ServiceAccount token projection, Pod Security Standards, Network Policies, Secrets encryptionkube-bench section 5
EKS/GKE/AKSManaged plane controls; focus on worker node hardening + policieskube-bench with --benchmark eks-stig-node or gke

kube-bench

kube-bench runs CIS Kubernetes Benchmark checks automatically against a live cluster. It auto-detects the Kubernetes version and selects the corresponding benchmark.

Running kube-bench

# Run against current cluster as a Job (recommended over local binary)
kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml

# Wait for completion and get results
kubectl wait --for=condition=complete job/kube-bench --timeout=120s
kubectl logs job/kube-bench

# EKS-specific benchmark
kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job-eks.yaml
kubectl logs job/kube-bench-eks

# Worker node only (run on each node)
kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job-node.yaml
# kube-bench Job with JSON output to S3 (for centralized reporting)
apiVersion: batch/v1
kind: Job
metadata:
  name: kube-bench
  namespace: kube-system
spec:
  template:
    spec:
      hostPID: true
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          effect: NoSchedule
      restartPolicy: Never
      volumes:
        - name: var-lib-etcd
          hostPath:
            path: /var/lib/etcd
        - name: etc-kubernetes
          hostPath:
            path: /etc/kubernetes
        - name: usr-local-mount-1
          hostPath:
            path: /usr/local/mount-from-host/bin
        - name: etc-cni-netd
          hostPath:
            path: /etc/cni/net.d/
        - name: opt-cni-bin
          hostPath:
            path: /opt/cni/bin/
      containers:
        - name: kube-bench
          image: aquasec/kube-bench:v0.8.0
          command:
            - kube-bench
            - --json
            - --outputfile
            - /tmp/kube-bench-results.json
          volumeMounts:
            - name: var-lib-etcd
              mountPath: /var/lib/etcd
              readOnly: true
            - name: etc-kubernetes
              mountPath: /etc/kubernetes
              readOnly: true
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName

Interpreting and prioritizing results

# Parse kube-bench JSON output — show only FAILs
kubectl logs job/kube-bench | \
  python3 -c "
import json, sys
data = json.load(sys.stdin)
for c in data.get('Controls', []):
    for g in c.get('tests', []):
        for t in g.get('results', []):
            if t['status'] == 'FAIL':
                print(f\"[FAIL] {t['test_number']}: {t['test_desc']}\")
                print(f\"       Remediation: {t.get('remediation','N/A')[:120]}\")
                print()
"

# Quick summary
kubectl logs job/kube-bench | grep -E "^\[FAIL\]|^\[WARN\]|^== Summary" | head -40

Common kube-bench remediations

Check IDDescriptionRemediation
1.2.1anonymous-auth not disabled--anonymous-auth=false on kube-apiserver
1.2.6NodeRestriction admission not enabledAdd NodeRestriction to --enable-admission-plugins
1.2.22Audit logging not enabledAdd --audit-log-path, --audit-policy-file, --audit-log-maxage=30
4.2.1kubelet anonymous auth enabledauthentication.anonymous.enabled: false in kubelet config
4.2.2kubelet authorization mode not Webhookauthorization.mode: Webhook in kubelet config
4.2.6Read-only port not disabledreadOnlyPort: 0 in kubelet config
5.1.3Wildcards in ClusterRolesReplace * verbs/resources with explicit permissions
5.2.2Privileged containers allowedEnforce PSS Restricted or Kyverno policy
5.3.2NetworkPolicies not setDefault-deny + explicit allow rules per namespace
5.4.1Secrets not encrypted at restEnable EncryptionConfiguration for etcd

Node Hardening

IMDSv2 enforcement (AWS EKS)

Instance Metadata Service v2 (IMDSv2) requires a session token for all IMDS calls. This prevents pods from stealing the node's IAM role credentials via SSRF or IMDS abuse.

# Enforce IMDSv2 with hop limit 1 on all new instances
# Set in EKS managed node group launch template (or Karpenter EC2NodeClass)
aws ec2 modify-instance-metadata-options \
  --instance-id i-0123456789abcdef0 \
  --http-tokens required \
  --http-put-response-hop-limit 1 \
  --http-endpoint enabled

# Karpenter EC2NodeClass — enforce on all Karpenter-managed nodes
metadataOptions:
  httpTokens: required
  httpPutResponseHopLimit: 1
  httpEndpoint: enabled
🚨
hop-limit=1 is critical

The default hop limit of 2 allows pods to reach IMDS directly (pod → node → IMDS = 2 hops). Setting httpPutResponseHopLimit: 1 restricts IMDS access to the node itself (only 1 hop allowed). This prevents any pod from calling IMDS even if IRSA is not used — a critical defense against credential theft.

kubelet hardening configuration

# /etc/kubernetes/kubelet-config.yaml (kubeadm or EKS custom kubelet config)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration

# Authentication
authentication:
  anonymous:
    enabled: false          # CIS 4.2.1
  webhook:
    enabled: true
    cacheTTL: 2m
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt

# Authorization
authorization:
  mode: Webhook             # CIS 4.2.2
  webhook:
    cacheAuthorizedTTL: 5m
    cacheUnauthorizedTTL: 30s

# TLS
tlsMinVersion: VersionTLS12
tlsCipherSuites:
  - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
  - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
  - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
  - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384

# Port security
readOnlyPort: 0             # CIS 4.2.6 — disable unauthenticated port

# Certificate rotation
rotateCertificates: true    # CIS 4.2.11 — auto-renew client certs

# Protect kernel defaults
protectKernelDefaults: true # CIS 4.2.7 — prevent unsafe sysctl from pods

# Event recording
eventRecordQPS: 5

# Resource management
maxPods: 110
kubeReserved:
  cpu: "250m"
  memory: "1Gi"
systemReserved:
  cpu: "250m"
  memory: "500Mi"
evictionHard:
  memory.available: "500Mi"
  nodefs.available: "10%"

AppArmor and Seccomp profiles

# Pod-level seccomp (RuntimeDefault restricts dangerous syscalls)
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault   # uses containerd/runc default seccomp profile

# Custom seccomp profile (Localhost type)
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: profiles/payment-service.json
      # Profile stored at: /var/lib/kubelet/seccomp/profiles/payment-service.json
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
  "syscalls": [
    {
      "names": [
        "accept4", "access", "arch_prctl", "bind", "brk", "capget",
        "capset", "chdir", "chmod", "chown", "clock_gettime", "clone",
        "close", "connect", "dup", "dup2", "epoll_create1", "epoll_ctl",
        "epoll_wait", "execve", "exit", "exit_group", "fchmod", "fchown",
        "fcntl", "fstat", "fstatfs", "futex", "getcwd", "getdents64",
        "getegid", "geteuid", "getgid", "getpid", "getppid", "getuid",
        "ioctl", "listen", "lseek", "madvise", "mmap", "mprotect",
        "munmap", "nanosleep", "open", "openat", "pipe2", "poll",
        "prctl", "pread64", "pwrite64", "read", "readlink", "recvfrom",
        "recvmsg", "rt_sigaction", "rt_sigprocmask", "rt_sigreturn",
        "sched_getaffinity", "sendmsg", "sendto", "set_tid_address",
        "setgid", "setgroups", "setuid", "sigaltstack", "socket",
        "stat", "statfs", "tgkill", "uname", "unlink", "wait4", "write",
        "writev"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

RBAC Hardening

Audit existing RBAC for over-privilege

# Find all ClusterRoleBindings granting cluster-admin
kubectl get clusterrolebindings -o json | \
  jq -r '.items[] | select(.roleRef.name=="cluster-admin") |
  "\(.metadata.name) → \(.subjects[]?.kind)/\(.subjects[]?.name)"'

# Find ClusterRoles with wildcard (*) permissions
kubectl get clusterroles -o json | \
  jq -r '.items[] | select(
    .rules[]?.verbs[]? == "*" or
    .rules[]?.resources[]? == "*" or
    .rules[]?.apiGroups[]? == "*"
  ) | .metadata.name' | grep -v "^system:"

# Find ServiceAccounts with cluster-admin
kubectl get clusterrolebindings -o json | \
  jq -r '.items[] |
  select(.roleRef.name=="cluster-admin") |
  select(.subjects[]?.kind=="ServiceAccount") |
  "CRB: \(.metadata.name) → SA: \(.subjects[]?.namespace)/\(.subjects[]?.name)"'

# who-can audit (krew plugin)
kubectl who-can create pods --all-namespaces
kubectl who-can delete secrets --all-namespaces
kubectl who-can escalate clusterroles --all-namespaces

# access-matrix: show all permissions for a ServiceAccount
kubectl access-matrix --sa payments:payment-service

Disable automounted ServiceAccount tokens

# Default ServiceAccount: disable auto-mount cluster-wide
# Every namespace's default SA should have this set
apiVersion: v1
kind: ServiceAccount
metadata:
  name: default
  namespace: payments
automountServiceAccountToken: false   # Pods without explicit SA get no token

---
# Dedicated SA for services that need API access (explicit, scoped)
apiVersion: v1
kind: ServiceAccount
metadata:
  name: payment-service
  namespace: payments
automountServiceAccountToken: false   # mount explicitly in pod spec

---
# In the pod: explicitly mount with projected token (short TTL)
spec:
  serviceAccountName: payment-service
  automountServiceAccountToken: false  # belt + suspenders
  volumes:
    - name: kube-api-access
      projected:
        sources:
          - serviceAccountToken:
              audience: https://kubernetes.default.svc.cluster.local
              expirationSeconds: 3600   # 1h vs default 1y
              path: token
          - configMap:
              name: kube-root-ca.crt
              items:
                - key: ca.crt
                  path: ca.crt
  containers:
    - name: payment-service
      volumeMounts:
        - name: kube-api-access
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          readOnly: true

RBAC least-privilege patterns

# Good: namespace-scoped, explicit resources and verbs
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: payment-service-role
  namespace: payments
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list", "watch"]
    resourceNames: ["payment-config", "feature-flags"]   # name-scoped
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get"]
    resourceNames: ["db-credentials"]   # only specific secret

---
# BAD: avoid these patterns
# - apiGroups: ["*"]       # wildcard API group
#   resources: ["*"]       # wildcard resource
#   verbs: ["*"]           # wildcard verb
# - apiGroups: [""]
#   resources: ["secrets"] # all secrets in namespace
#   verbs: ["list"]        # list + watch leaks secret names even without "get"
⚠️
list + watch on Secrets leaks secret names

list on Secrets returns the secret names and metadata even if the pod can't get the contents. An attacker with only list can enumerate all secrets in a namespace to identify targets. Only grant get on explicitly named secrets, never list on secrets broadly.

Network Hardening

API server access restriction

# EKS: restrict API server access to VPN CIDR + CI/CD runner IPs only
aws eks update-cluster-config \
  --name my-cluster \
  --resources-vpc-config \
    endpointPublicAccess=true,\
    publicAccessCidrs="10.100.0.0/16,203.0.113.10/32",\
    endpointPrivateAccess=true

# Verify
aws eks describe-cluster --name my-cluster \
  --query 'cluster.resourcesVpcConfig'

Zero-trust network policy baseline

Every namespace should start with a default-deny NetworkPolicy and add explicit allows. This is covered in detail in Section 08-06. The security-hardening perspective adds monitoring of policy coverage:

# Find namespaces without any NetworkPolicy (gaps in zero-trust)
comm -23 \
  <(kubectl get namespaces -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | \
    grep -v "^kube-\|^velero\|^monitoring" | sort) \
  <(kubectl get networkpolicies -A -o jsonpath='{range .items[*]}{.metadata.namespace}{"\n"}{end}' | \
    sort -u)

# Find pods not covered by any NetworkPolicy (Cilium)
kubectl get pods -A -o json | \
  jq -r '.items[] | select(.metadata.namespace | test("^kube-|^velero") | not) |
  "\(.metadata.namespace)/\(.metadata.name)"' | \
  while read pod; do
    ns=$(echo $pod | cut -d/ -f1)
    pod_name=$(echo $pod | cut -d/ -f2)
    labels=$(kubectl get pod "$pod_name" -n "$ns" -o jsonpath='{.metadata.labels}' 2>/dev/null)
    # Check if any NetworkPolicy selects this pod...
    echo "$pod — check manually"
  done

Egress restriction for workloads

# Payments service: only allow egress to known upstreams
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: payment-service-egress
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payment-service
  policyTypes:
    - Egress
  egress:
    # Allow DNS (always required)
    - ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP

    # Allow PostgreSQL database
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: databases
          podSelector:
            matchLabels:
              app: postgresql
      ports:
        - port: 5432

    # Allow Stripe API (external; use FQDN in Cilium or CIDR for vanilla CNI)
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
            except:
              - 10.0.0.0/8
              - 172.16.0.0/12
              - 192.168.0.0/16
      ports:
        - port: 443

Supply Chain Security

Supply chain attacks target the software delivery pipeline — compromised base images, malicious dependencies, or tampered build artifacts. The defense is cryptographic verification at every stage.

Supply Chain Security Gates
  Developer   CI Pipeline        Registry        Cluster
  ─────────   ───────────        ────────        ───────
  git push →  build image    →   push + tag  →   admission webhook
              scan (Trivy)       sign (Cosign)    verifyImages (Kyverno)
              SBOM (Syft)        attest SBOM      Trivy Operator scan
              unit tests         OCI artifact     block :latest
              lint + SAST        immutable tag     require digest

Cosign image signing and verification

Full Cosign setup is covered in Section 08-03. The key runtime enforcement is via Kyverno verifyImages:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: verify-image-signatures
spec:
  validationFailureAction: Enforce
  background: false
  rules:
    - name: verify-cosign-signature
      match:
        any:
          - resources:
              kinds: ["Pod"]
              namespaces: ["payments", "auth", "orders"]
      verifyImages:
        - imageReferences:
            - "123456789.dkr.ecr.us-east-1.amazonaws.com/*"
          attestors:
            - count: 1
              entries:
                - keyless:
                    subject: "https://github.com/org/repo/.github/workflows/ci.yaml@refs/heads/main"
                    issuer: "https://token.actions.githubusercontent.com"
                    rekor:
                      url: https://rekor.sigstore.dev
          mutateDigest: true    # replace :tag with @sha256:... at admission
          verifyDigest: true    # reject if digest doesn't match signed content
          required: true

SLSA provenance verification

# Verify SLSA provenance attestation (generated by GitHub Actions SLSA builder)
cosign verify-attestation \
  --type slsaprovenance \
  --certificate-identity-regexp "https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_container_slsa3.yml" \
  --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
  123456789.dkr.ecr.us-east-1.amazonaws.com/payment-service:v1.2.3 | \
  jq '.payload | @base64d | fromjson | .predicate.buildType'

# Expected output: "https://slsa.dev/provenance/v0.2"

Allowed registries enforcement

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-image-registries
spec:
  validationFailureAction: Enforce
  rules:
    - name: allowed-registries
      match:
        any:
          - resources:
              kinds: ["Pod"]
      exclude:
        any:
          - resources:
              namespaces: ["kube-system"]
      validate:
        message: "Images must come from approved registries"
        foreach:
          - list: "request.object.spec.containers"
            deny:
              conditions:
                any:
                  - key: "{{ element.image }}"
                    operator: NotIn
                    value:
                      - "123456789.dkr.ecr.us-east-1.amazonaws.com/*"
                      - "registry.k8s.io/*"
                      - "gcr.io/distroless/*"
                      - "quay.io/prometheus/*"

Falco Runtime Detection

Falco monitors Linux system calls and Kubernetes audit events in real time. It detects container escapes, privilege escalation, unexpected network connections, file system modifications in sensitive paths, and many other runtime threats.

Falco installation

helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update

helm install falco falcosecurity/falco \
  --namespace falco \
  --create-namespace \
  --version 4.3.0 \
  -f falco-values.yaml
# falco-values.yaml
falco:
  grpc:
    enabled: true
  grpc_output:
    enabled: true
  # JSON output for log aggregation
  json_output: true
  json_include_output_property: true
  # Output to stdout (captured by Loki/Fluent Bit)
  stdout_output:
    enabled: true
  # Kubernetes audit events (requires audit webhook in apiserver)
  k8s_audit:
    enabled: true

driver:
  kind: modern_ebpf   # preferred: no kernel module, no eBPF CO-RE issues
  # alternatives: module (requires kernel headers), ebpf (legacy)

falcoctl:
  artifact:
    follow:
      enabled: true    # auto-update rules from falcosecurity artifact registry

# Priority: only alert on ERROR and above (reduce noise)
falco:
  priority: warning

# Falco Sidekick: forward alerts to Slack/PagerDuty/Elasticsearch
falcosidekick:
  enabled: true
  config:
    slack:
      webhookurl: "https://hooks.slack.com/services/..."
      minimumpriority: error
      messageformat: "long"
    pagerduty:
      routingkey: "your-pagerduty-integration-key"
      minimumpriority: critical
    elasticsearch:
      hostport: "https://elasticsearch.monitoring.svc:9200"
      index: falco
      minimumpriority: warning

resources:
  requests:
    cpu: 100m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 1Gi

Critical Falco rules

# /etc/falco/rules.d/custom-rules.yaml

# Detect container escape attempt via nsenter/chroot
- rule: Container Escape Attempt
  desc: Detect nsenter or chroot execution inside container
  condition: >
    spawned_process and
    container.id != host and
    (proc.name = "nsenter" or proc.name = "chroot" or
     proc.cmdline contains "mount --bind" or
     proc.cmdline contains "/proc/1/root")
  output: >
    Container escape attempt (user=%user.name command=%proc.cmdline
    container=%container.name image=%container.image.repository)
  priority: CRITICAL
  tags: [container, escape]

# Detect shell spawned in production container
- rule: Shell Spawned in Production Container
  desc: A shell was spawned in a production container
  condition: >
    spawned_process and
    container.id != host and
    (proc.name = bash or proc.name = sh or proc.name = zsh or proc.name = fish) and
    not proc.pname in (containerd, dockerd, runc, kubectl) and
    not container.image.repository in (
      "registry.k8s.io/pause",
      "nicolaka/netshoot",
      "busybox"
    )
  output: >
    Shell spawned in production container
    (user=%user.name shell=%proc.name parent=%proc.pname
    image=%container.image.repository:%container.image.tag
    pod=%k8s.pod.name ns=%k8s.ns.name)
  priority: WARNING
  tags: [container, shell, mitre_execution]

# Detect write to /etc inside container
- rule: Write to Sensitive Directory
  desc: Detect writes to /etc, /usr, /bin inside running container
  condition: >
    open_write and
    container.id != host and
    (fd.name startswith /etc/ or
     fd.name startswith /usr/ or
     fd.name startswith /bin/ or
     fd.name startswith /sbin/) and
    not proc.name in (sed, find, chmod, chown, cp, mv, ln)
  output: >
    File written in sensitive directory
    (user=%user.name file=%fd.name proc=%proc.name
    image=%container.image.repository pod=%k8s.pod.name ns=%k8s.ns.name)
  priority: ERROR
  tags: [container, filesystem]

# Detect unexpected outbound connection
- rule: Unexpected Outbound Connection
  desc: Container connects to unexpected external IP
  condition: >
    outbound and
    container.id != host and
    not fd.sip in (allowed_outbound_destinations) and
    not fd.sport in (53, 443, 80) and
    fd.net != "127.0.0.0/8" and
    fd.net != "10.0.0.0/8" and
    fd.net != "172.16.0.0/12"
  output: >
    Unexpected outbound connection
    (proc=%proc.name sip=%fd.sip sport=%fd.sport
    image=%container.image.repository pod=%k8s.pod.name)
  priority: WARNING
  tags: [network, exfiltration]

# Detect privileged container started
- rule: Privileged Container Started
  desc: A privileged container was started
  condition: >
    container_started and
    container.privileged = true and
    not container.image.repository in (
      "falcosecurity/falco",
      "quay.io/cilium/cilium"
    )
  output: >
    Privileged container started
    (image=%container.image.repository pod=%k8s.pod.name ns=%k8s.ns.name)
  priority: CRITICAL
  tags: [container, privilege_escalation]

# Detect read of service account token
- rule: ServiceAccount Token Read
  desc: Unexpected process reading the SA token
  condition: >
    open_read and
    container.id != host and
    fd.name in (/var/run/secrets/kubernetes.io/serviceaccount/token,
                /run/secrets/kubernetes.io/serviceaccount/token) and
    not proc.name in (java, python3, node, ruby, python, go)
  output: >
    Service account token read by unexpected process
    (proc=%proc.name image=%container.image.repository pod=%k8s.pod.name)
  priority: WARNING
  tags: [secrets, lateral_movement]

Falco Kubernetes audit rules

# Falco K8s audit: configure kube-apiserver audit webhook
# Add to /etc/kubernetes/manifests/kube-apiserver.yaml:
# --audit-webhook-config-file=/etc/kubernetes/audit-webhook.yaml
# --audit-webhook-batch-max-size=400
# --audit-webhook-batch-max-wait=5s

# /etc/kubernetes/audit-webhook.yaml
apiVersion: v1
kind: Config
clusters:
  - name: falco
    cluster:
      server: http://falco.falco.svc.cluster.local:9765/k8s-audit
users:
  - name: ""
contexts:
  - context:
      cluster: falco
      user: ""
    name: default-context
current-context: default-context

Trivy Operator

Trivy Operator runs continuously in the cluster, scanning all running workloads for CVEs, misconfigurations, exposed secrets, and RBAC risks. Unlike one-time CI scans, it catches newly disclosed CVEs in already-deployed images.

Installation and configuration

helm repo add aquasecurity https://aquasecurity.github.io/helm-charts
helm repo update

helm install trivy-operator aquasecurity/trivy-operator \
  --namespace trivy-system \
  --create-namespace \
  --version 0.24.0 \
  --set="trivy.ignoreUnfixed=true" \
  --set="trivy.severity=CRITICAL,HIGH" \
  --set="operator.scanJobTTL=1h" \
  --set="operator.vulnerabilityReportsPlugin=Trivy" \
  --set="operator.configAuditScannerEnabled=true" \
  --set="operator.rbacAssessmentScannerEnabled=true" \
  --set="operator.infraAssessmentScannerEnabled=true" \
  --set="operator.clusterComplianceEnabled=true" \
  --set="compliance.cron=0 */6 * * *"

Trivy Operator report types

Report KindWhat it checksScope
VulnerabilityReportContainer image CVEs (OS + language packages)Per Pod/container
ConfigAuditReportKubernetes manifest misconfigurations (30+ checks)Per workload resource
RbacAssessmentReportRBAC over-privilege, risky permissionsPer ClusterRole/Role
InfraAssessmentReportNode and control plane component securityPer node
ExposedSecretReportSecrets accidentally baked into images or env varsPer Pod/container
ClusterComplianceReportCIS Kubernetes Benchmark or NSA/CISA guidanceCluster-wide
SbomReportSoftware Bill of Materials (CycloneDX or SPDX)Per Pod/container

Querying reports

# List CRITICAL/HIGH vulnerabilities in payments namespace
kubectl get vulnerabilityreports -n payments -o json | \
  jq -r '.items[] |
  .metadata.name as $name |
  .report.summary |
  "\($name): CRITICAL=\(.criticalCount) HIGH=\(.highCount)"' | \
  sort -t= -k2 -rn | head -20

# List all CRITICAL CVEs with fix available
kubectl get vulnerabilityreports -A -o json | \
  jq -r '.items[].report.vulnerabilities[] |
  select(.severity=="CRITICAL" and .fixedVersion != "") |
  "\(.vulnerabilityID) \(.resource) \(.installedVersion) → \(.fixedVersion)"' | \
  sort -u | head -30

# Config audit failures (HIGH severity) in payments
kubectl get configauditreports -n payments -o json | \
  jq -r '.items[] |
  .metadata.name as $name |
  .report.checks[] |
  select(.severity == "HIGH" and .success == false) |
  "\($name): [\(.checkID)] \(.title)"'

# Exposed secrets in any namespace
kubectl get exposedsecretreports -A -o json | \
  jq -r '.items[] |
  select(.report.summary.criticalCount > 0) |
  "\(.metadata.namespace)/\(.metadata.name): \(.report.summary.criticalCount) secrets exposed"'

Trivy Operator Prometheus metrics

# Total CRITICAL CVEs by namespace
sum by (namespace) (
  trivy_image_vulnerabilities{severity="CRITICAL"}
)

# Workloads with exposed secrets
sum(trivy_exposed_secrets{severity="CRITICAL"}) by (namespace)

# Config audit HIGH failures by resource kind
sum by (resource_kind) (
  trivy_resource_configaudits{severity="HIGH",status="FAIL"}
)

Audit Log Analysis

Kubernetes audit logs record every API server request — who did what, when, on which resource, from which IP. Continuous audit log analysis detects RBAC abuse, credential theft, and reconnaissance activity.

Audit policy configuration

# /etc/kubernetes/audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
omitStages:
  - RequestReceived   # skip early stage (reduce volume)
rules:
  # Log all secret/configmap access at RequestResponse level
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["secrets", "configmaps"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

  # Log all auth/RBAC changes
  - level: RequestResponse
    resources:
      - group: "rbac.authorization.k8s.io"
        resources:
          - clusterroles
          - clusterrolebindings
          - roles
          - rolebindings
    verbs: ["create", "update", "patch", "delete"]

  # Log pod exec and port-forward (potential compromise vector)
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["pods/exec", "pods/portforward", "pods/attach"]

  # Log namespace creation/deletion
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["namespaces"]
    verbs: ["create", "delete"]

  # Log node changes (potential node compromise)
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["nodes"]
    verbs: ["patch", "update", "delete"]

  # Log all failed requests at Metadata level
  - level: Metadata
    omitStages:
      - RequestReceived
    # Matches any request not matched above

  # Reduce noise: skip read-only requests to common non-sensitive resources
  - level: None
    resources:
      - group: ""
        resources:
          - events
          - endpoints
          - services
    verbs: ["get", "list", "watch"]

  # Skip health checks
  - level: None
    users: ["system:kube-proxy"]
    verbs: ["watch"]
    resources:
      - group: ""
        resources: ["endpoints", "services", "services/status"]

  # Default: log metadata for all other requests
  - level: Metadata

Audit log analysis queries (Loki)

# Detect cluster-admin binding events (RBAC escalation)
{job="kubernetes-audit"}
  | json
  | objectRef_resource = "clusterrolebindings"
  | verb = "create"
  | line_contains "cluster-admin"

# Detect pod exec events (interactive access to production pods)
{job="kubernetes-audit"}
  | json
  | objectRef_subresource = "exec"
  | namespace_name !~ "kube-system|velero|monitoring"

# Detect secret listing (reconnaissance)
{job="kubernetes-audit"}
  | json
  | objectRef_resource = "secrets"
  | verb = "list"
  | user_username !~ "system:.*"

# Detect API server access from unexpected user agents (not kubectl or controller)
{job="kubernetes-audit"}
  | json
  | userAgent !~ "kubectl.*|kube-.*|argocd.*|velero.*|cert-manager.*"
  | user_username !~ "system:.*"
  | responseStatus_code != "401"

Audit log alert patterns

# Real-time alert: escalate/bind verb on RBAC resources
# In Falco k8s audit rules or Loki alert:

# Who accessed secrets in the last hour (from audit logs)
kubectl logs -n kube-system kube-apiserver-master-0 2>/dev/null | \
  grep '"resource":"secrets"' | \
  grep -v '"verb":"watch"' | \
  jq -r '"\(.user.username) \(.verb) \(.objectRef.namespace)/\(.objectRef.name) from \(.sourceIPs[0])"' | \
  sort | uniq -c | sort -rn | head -20

# EKS: audit logs in CloudWatch Logs
aws logs filter-log-events \
  --log-group-name "/aws/eks/my-cluster/cluster" \
  --log-stream-names "kube-apiserver-audit*" \
  --filter-pattern '{ $.verb = "create" && $.objectRef.resource = "clusterrolebindings" }' \
  --start-time $(date -d '1 hour ago' +%s000) \
  --query 'events[].message' \
  --output text | jq -r '"\(.user.username) bound \(.requestObject.subjects[].name) to \(.requestObject.roleRef.name)"'

etcd Encryption at Rest

# /etc/kubernetes/encryption-config.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
  - resources:
      - secrets
      - configmaps
    providers:
      # Primary: AES-GCM with 256-bit key (fast + authenticated)
      - aescbc:
          keys:
            - name: key1
              secret:    # openssl rand -base64 32
      # Fallback: identity (plaintext — for reading unencrypted data during migration)
      - identity: {}

  # Also encrypt CRDs that may contain sensitive data
  - resources:
      - externalsecrets.external-secrets.io
      - sealedsecrets.bitnami.com
    providers:
      - aescbc:
          keys:
            - name: key1
              secret: 
# Add to kube-apiserver flags:
# --encryption-provider-config=/etc/kubernetes/encryption-config.yaml

# Rotate encryption key: add new key as first provider, old key second
# Then re-encrypt all secrets:
kubectl get secrets -A -o json | kubectl replace -f -

# Verify a secret is encrypted (should show encrypted bytes, not plaintext)
ETCDCTL_API=3 etcdctl get /registry/secrets/default/my-secret \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key | \
  hexdump -C | head -5
# Should start with: "k8s:enc:aescbc:v1:key1:" prefix (not plaintext JSON)

Secrets Hardening

Kubernetes Secrets are base64-encoded (not encrypted) by default. Full secrets hardening requires layered controls:

ControlWhat it preventsImplementation
etcd encryption at restetcd disk dump reveals plaintext secretsEncryptionConfiguration above
ESO + Vault / ASMSecrets stored in etcd entirelySection 08-09
No list/watch on SecretsRBAC enumeration of secret namesRBAC least-privilege above
No env vars for secretsprintenv / /proc/*/environ exposureUse volume mounts, not envFrom
Falco: SA token read detectionUnexpected process reads tokenFalco rule above
Audit log on secrets getDetect bulk secret readsAudit policy above
Trivy: ExposedSecretReportSecrets baked into imagesTrivy Operator above
# Prefer volume-mounted secrets over environment variables
spec:
  volumes:
    - name: db-secret
      secret:
        secretName: db-credentials
        defaultMode: 0400   # read-only, owner only
  containers:
    - name: payment-service
      volumeMounts:
        - name: db-secret
          mountPath: /run/secrets/db
          readOnly: true
      # App reads from file: /run/secrets/db/password
      # NOT from env var: DB_PASSWORD (visible in /proc/*/environ)

Security Hardening Alerts

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: security-hardening-alerts
  namespace: monitoring
spec:
  groups:
    - name: security.hardening
      rules:

        # Falco critical alert rate spike
        - alert: FalcoCriticalAlertsHigh
          expr: |
            sum(rate(falco_events{priority="Critical"}[5m])) > 0.1
          for: 2m
          labels:
            severity: critical
          annotations:
            summary: "Falco critical security events firing"
            description: "{{ $value | humanize }} critical events/sec — potential active attack"
            runbook_url: https://runbooks.example.com/security/falco-critical

        # Trivy CRITICAL CVE in running pod
        - alert: CriticalCVEInRunningPod
          expr: |
            sum by (namespace, resource_name) (
              trivy_image_vulnerabilities{severity="CRITICAL"}
            ) > 0
          for: 1h
          labels:
            severity: warning
          annotations:
            summary: "CRITICAL CVE in running pod {{ $labels.namespace }}/{{ $labels.resource_name }}"
            runbook_url: https://runbooks.example.com/security/critical-cve

        # kube-bench FAIL count (run weekly)
        - alert: KubeBenchFailuresHigh
          expr: |
            kube_bench_test_status{status="FAIL"} > 20
          for: 1h
          labels:
            severity: warning
          annotations:
            summary: "kube-bench reports {{ $value }} CIS benchmark failures"
            runbook_url: https://runbooks.example.com/security/kube-bench

        # Privileged pod running
        - alert: PrivilegedPodRunning
          expr: |
            count(kube_pod_container_info{namespace!~"kube-system|falco|cilium"}) by (namespace, pod, container)
            * on (namespace, pod, container)
            group_left()
            kube_pod_container_status_running{namespace!~"kube-system|falco|cilium"} > 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Privileged container {{ $labels.namespace }}/{{ $labels.pod }} detected"
            runbook_url: https://runbooks.example.com/security/privileged-container

        # Exposed secret in image
        - alert: ExposedSecretInImage
          expr: |
            sum by (namespace, resource_name) (
              trivy_exposed_secrets{severity="CRITICAL"}
            ) > 0
          for: 1m
          labels:
            severity: critical
          annotations:
            summary: "Secret exposed in image for {{ $labels.namespace }}/{{ $labels.resource_name }}"
            runbook_url: https://runbooks.example.com/security/exposed-secret

        # No NetworkPolicy in namespace
        - alert: NamespaceWithoutNetworkPolicy
          expr: |
            count by (namespace) (kube_namespace_labels{namespace!~"kube-.*|velero|monitoring"})
            unless
            count by (namespace) (kube_networkpolicy_info)
            > 0
          for: 30m
          labels:
            severity: warning
          annotations:
            summary: "Namespace {{ $labels.namespace }} has no NetworkPolicy"
            runbook_url: https://runbooks.example.com/security/missing-networkpolicy

Best Practices

Run kube-bench weekly

Automate kube-bench as a weekly CronJob. Track FAIL count over time — any increase means a configuration drift or new cluster component introduced a gap.

IMDSv2 + hop-limit 1

This single control prevents pods from stealing the node IAM role. Apply it to ALL nodes via launch template or EC2NodeClass. It's the highest-value AWS-specific hardening step.

Falco in production

Falco with modern eBPF driver has <2% CPU overhead and zero kernel module risk. Run it on every node. Start with default rules; suppress false positives with macros before writing custom rules.

Audit RBAC quarterly

Run the who-can and access-matrix queries quarterly. Privilege creep is the norm — developers request broad permissions for debugging and never revoke them.

Encrypt etcd + use Vault

etcd encryption at rest is table stakes. Then eliminate secrets from etcd entirely via ESO + Vault. The goal is zero plaintext secrets in etcd.

Require Cosign signatures

Kyverno verifyImages with required: true in production namespaces means unsigned images are rejected at admission — even if an attacker pushes them to the registry directly.