Pod Security

Section 06 › 02 Last updated: 2025 ~40 min read

PSP removal in 1.25 background
PSA: enforce / audit / warn modes
PSA namespace labels
PSA exemptions (namespaces, runtimeClasses, usernames)
PSS: privileged / baseline / restricted levels
PSS field-by-field complete reference table
securityContext: pod-level vs container-level fields
runAsUser / runAsGroup / fsGroup semantics
runAsNonRoot enforcement
allowPrivilegeEscalation
readOnlyRootFilesystem
capabilities: full Linux caps list, drop ALL + add pattern
Dangerous capabilities: CAP_SYS_ADMIN, CAP_NET_ADMIN, CAP_SYS_PTRACE
seccomp: RuntimeDefault, Localhost, Unconfined
Custom seccomp profile JSON example

AppArmor: profile modes (enforce/complain), K8s annotation (stable 1.30)
SELinux: seLinuxOptions fields
Privileged containers: kernel module load, device access, container escape
hostPID / hostNetwork / hostIPC risks
hostPath mount risks and mitigation
PSP → PSA migration checklist
Kyverno policies for pod security
OPA/Gatekeeper ConstraintTemplate for pod security
5 metrics, 4 alerts, 5 runbooks
8 best practices

PSP Removal Background

PodSecurityPolicy (PSP) was the original Kubernetes mechanism for enforcing pod-level security constraints. It was deprecated in Kubernetes 1.21 and removed in 1.25. Clusters upgrading to 1.25+ must migrate to an alternative before the upgrade.

PSP had several fundamental design flaws:

Authorization complexity: PSPs were enforced via RBAC — a pod's ServiceAccount (or the user creating the pod) had to have use verb on the PSP. This created confusing interactions where PSPs existed but weren't being enforced.
Silent non-enforcement: If no PSP matched a pod, the pod was rejected — but if any PSP matched (including permissive ones), the most permissive allowed policy won. This led to overly permissive PSPs being deployed just to unblock workloads.
Namespace-level control was poor: PSPs were cluster-scoped objects with namespace-level applicability controlled by RBAC bindings — a confusing indirection.

Replacement options:

Pod Security Admission (PSA) — built into Kubernetes since 1.22 (GA 1.25). Label-based, simple, but not fully customizable.
OPA/Gatekeeper — fully custom Rego policies via CRDs; maximum flexibility.
Kyverno — Kubernetes-native YAML-based policies; lower learning curve than Rego.
Kubewarden — WebAssembly-based policies.

Pod Security Admission (PSA)

PSA is a built-in admission controller (GA in 1.25) that enforces Pod Security Standards by evaluating pod specs against one of three levels. It is configured per-namespace using labels.

Modes

Mode	Label	Effect on Violation
`enforce`	`pod-security.kubernetes.io/enforce`	Pod is rejected; cannot be created
`audit`	`pod-security.kubernetes.io/audit`	Pod is allowed; violation recorded in audit log
`warn`	`pod-security.kubernetes.io/warn`	Pod is allowed; warning message returned to client

Each mode is independent and can reference a different level. A namespace can have all three modes active simultaneously with different levels:

# Apply PSA labels to a namespace
kubectl label namespace production \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/enforce-version=v1.28 \
  pod-security.kubernetes.io/audit=restricted \
  pod-security.kubernetes.io/audit-version=v1.28 \
  pod-security.kubernetes.io/warn=restricted \
  pod-security.kubernetes.io/warn-version=v1.28

Pin the PSS version with -version labels. Each mode label has a corresponding -version label (e.g., pod-security.kubernetes.io/enforce-version=v1.28). Without it, the version defaults to latest, which means a Kubernetes upgrade could change what's enforced and suddenly break workloads. Always pin to a specific version and bump it deliberately during upgrades.

PSA Exemptions

Some system workloads legitimately need privileged access. PSA supports three exemption types, configured in the PodSecurity admission plugin config (via kube-apiserver --admission-control-config-file):

apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
  configuration:
    apiVersion: pod-security.admission.config.k8s.io/v1
    kind: PodSecurityConfiguration
    defaults:
      enforce: "restricted"          # cluster-wide default for namespaces without labels
      enforce-version: "latest"
      audit: "restricted"
      audit-version: "latest"
      warn: "restricted"
      warn-version: "latest"
    exemptions:
      usernames:                     # API server users exempt from PSA (e.g., cluster bootstrappers)
      - "system:serviceaccount:kube-system:replicaset-controller"
      runtimeClasses:                # RuntimeClasses exempt from PSA (e.g., kata-containers)
      - "kata-containers"
      namespaces:                    # Namespaces fully exempt from PSA
      - kube-system
      - kube-public
      - cert-manager                 # cert-manager needs elevated permissions

Use namespace exemptions sparingly. Exempting an entire namespace disables PSA enforcement for all pods in it. Prefer applying the privileged level to system namespaces rather than exempting them entirely — exemption bypasses audit and warn modes too, removing all visibility.

Graduation Timeline

Version	Status	Notes
1.22	Alpha	PSA introduced; PSP still active
1.23	Beta (on by default)	Enabled by default; PSP still active
1.25	GA; PSP removed	Clusters must migrate before upgrading to 1.25

Pod Security Standards

Pod Security Standards (PSS) define three levels, each a superset of restrictions from the previous level. The levels apply to pod specs as a whole — if any container in the pod violates the level, the entire pod is rejected.

Privileged

No restrictions
All host namespace access allowed
Privileged containers allowed
Any capabilities allowed
Any seccomp profile (or none)
Use: kube-system, CNI plugins, node agents, storage CSI drivers

Baseline

No privileged containers
No host namespaces (hostPID, hostIPC, hostNetwork)
No dangerous capabilities
No hostPath or restricted volume types
Host port restriction
AppArmor: only runtime/default or localhost profiles
seccomp: no restriction (Unconfined allowed)
No privilege escalation (Linux)
Use: General workloads not needing host access

Restricted

All baseline restrictions plus:
Must drop ALL capabilities
seccompProfile: RuntimeDefault or Localhost required
runAsNonRoot: true
allowPrivilegeEscalation: false
Only specific volume types (configMap, CSI, downwardAPI, emptyDir, ephemeral, persistentVolumeClaim, projected, secret)
Use: Untrusted workloads, public-facing services, high-security requirements

PSS Field-by-Field Reference

This table maps every PSS check to its pod spec field, what each level allows, and any nuances.

Host Namespaces

Field	Privileged	Baseline	Restricted	Notes
`spec.hostNetwork`	Any	Must be false or unset	Must be false or unset	Shares host network stack; bypasses CNI
`spec.hostPID`	Any	Must be false or unset	Must be false or unset	Can see and signal all host processes
`spec.hostIPC`	Any	Must be false or unset	Must be false or unset	Access host IPC namespace (shared memory, semaphores)

Privileged Containers

Field	Privileged	Baseline	Restricted
`containers[].securityContext.privileged`	Any	Must be false or unset	Must be false or unset
`initContainers[].securityContext.privileged`	Any	Must be false or unset	Must be false or unset
`ephemeralContainers[].securityContext.privileged`	Any	Must be false or unset	Must be false or unset

Capabilities

Field	Baseline	Restricted	Notes
`securityContext.capabilities.add`	Only: NET_BIND_SERVICE (or empty)	Empty (no adds after dropping ALL)	Baseline allows adding NET_BIND_SERVICE for port <1024 binding
`securityContext.capabilities.drop`	No requirement	Must include ALL	Restricted requires dropping ALL first; then cannot add any

Privilege Escalation

Field	Baseline	Restricted
`securityContext.allowPrivilegeEscalation`	No restriction	Must be false

Run As Non-Root

Field	Baseline	Restricted
`spec.securityContext.runAsNonRoot`	No restriction	Must be true (pod or all containers)
`containers[].securityContext.runAsNonRoot`	No restriction	Must be true OR pod-level must be true
`spec.securityContext.runAsUser`	No restriction	Must not be 0 (if set)

seccomp

Field	Baseline	Restricted
`spec.securityContext.seccompProfile.type`	No restriction (Unconfined allowed)	Must be RuntimeDefault or Localhost
`containers[].securityContext.seccompProfile.type`	No restriction	Must be RuntimeDefault or Localhost (OR pod-level set)

Volumes

Baseline Allowed Volume Types	Restricted (subset of baseline)
configMap, csi, downwardAPI, emptyDir, ephemeral, hostPath (restricted), nfs, persistentVolumeClaim, projected, secret + all others except: hostPath to sensitive paths, inline volumes from sources not in the allowed list	configMap, csi, downwardAPI, emptyDir, ephemeral, persistentVolumeClaim, projected, secret only — no hostPath, no NFS

Host Ports

Field	Baseline / Restricted
`containers[].ports[].hostPort`	Must be 0, unset, or defined — baseline allows 0/unset. Host ports > 0 violate baseline.

AppArmor (baseline only check)

Annotation	Baseline: Allowed values
`container.apparmor.security.beta.kubernetes.io/<container>`	`runtime/default` or `localhost/<profile>` — `unconfined` violates baseline

Sysctls

Field	Baseline / Restricted
`spec.securityContext.sysctls[].name`	Only "safe" sysctls: `kernel.shm_rmid_forced`, `net.ipv4.ip_local_port_range`, `net.ipv4.ip_unprivileged_port_start`, `net.ipv4.tcp_syncookies`, `net.ipv4.ping_group_range`

securityContext Deep Dive

Pod-Level vs Container-Level Fields

Some fields exist at both pod and container scope. Container-level overrides pod-level. Some fields only exist at one scope:

Field	Pod-Level	Container-Level	Notes
`runAsUser`	✅ Default for all containers	✅ Overrides pod-level	UID for process; must match image USER or be set explicitly
`runAsGroup`	✅ Primary GID for all containers	✅ Overrides pod-level
`runAsNonRoot`	✅	✅ Overrides pod-level	Kubelet verifies UID != 0 at container start; fails if image runs as root
`fsGroup`	✅ Only pod-level	❌	Supplemental GID applied to volume mounts; owns files in mounted volumes
`fsGroupChangePolicy`	✅ Only pod-level	❌	`OnRootMismatch` (1.20+): only chown if root ownership wrong; avoids slow chown on large volumes
`supplementalGroups`	✅ Only pod-level	❌	Additional GIDs for all containers
`seccompProfile`	✅ Default for all containers	✅ Overrides pod-level	Container-level overrides pod-level profile
`seLinuxOptions`	✅ Default	✅ Overrides pod-level
`sysctls`	✅ Only pod-level	❌	Only safe sysctls without special runtime config
`privileged`	❌	✅ Only container-level
`allowPrivilegeEscalation`	❌	✅ Only container-level
`capabilities`	❌	✅ Only container-level
`readOnlyRootFilesystem`	❌	✅ Only container-level

Complete Restricted-Compliant securityContext

apiVersion: v1
kind: Pod
spec:
  securityContext:                       # Pod-level
    runAsNonRoot: true
    runAsUser: 1000                      # must not be 0
    runAsGroup: 3000
    fsGroup: 2000                        # volume files owned by GID 2000
    fsGroupChangePolicy: OnRootMismatch  # avoid slow recursive chown
    seccompProfile:
      type: RuntimeDefault               # applies to all containers unless overridden
    sysctls: []                          # no unsafe sysctls
  containers:
  - name: app
    image: myapp:latest
    securityContext:                     # Container-level (overrides pod-level where applicable)
      allowPrivilegeEscalation: false    # MUST be false for restricted
      readOnlyRootFilesystem: true       # prevents writes to container rootfs
      capabilities:
        drop: ["ALL"]                    # MUST drop ALL for restricted
        # add: ["NET_BIND_SERVICE"]      # only if binding port < 1024
      seccompProfile:
        type: RuntimeDefault             # can override pod-level per-container
      runAsNonRoot: true                 # belt-and-suspenders with pod-level
    volumeMounts:
    - name: tmp
      mountPath: /tmp                    # writable temp dir for apps that need it
    - name: cache
      mountPath: /app/cache
  volumes:
  - name: tmp
    emptyDir: {}                         # allowed in restricted
  - name: cache
    emptyDir: {}
  # NOT allowed in restricted:
  # hostPath, NFS, secrets via env (allowed but discouraged), hostNetwork/PID/IPC

fsGroup and Volume Ownership

When fsGroup is set, the kubelet changes ownership of mounted volumes to the specified GID. For large volumes (hundreds of thousands of files), this recursive chown causes significant pod startup latency. Use fsGroupChangePolicy: OnRootMismatch (GA 1.23) to skip the chown if the root directory already has the correct ownership.

Linux Capabilities

Linux capabilities split root privilege into discrete units. When a container runs without privileged: true, it starts with a default set of capabilities inherited from the container runtime. The recommended security posture is drop ALL, add only what's needed.

Default Container Capabilities (containerd/CRI-O)

By default, containers receive these capabilities: CHOWN, DAC_OVERRIDE, FSETID, FOWNER, MKNOD, NET_RAW, SETGID, SETUID, SETFCAP, SETPCAP, NET_BIND_SERVICE, SYS_CHROOT, KILL, AUDIT_WRITE.

NET_RAW in default capabilities is a known risk. NET_RAW allows raw socket creation (ping, ARP spoofing, packet crafting). It's in the default set for compatibility but should be dropped in almost all workloads. Drop it explicitly: capabilities.drop: ["NET_RAW"] or drop: ["ALL"].

Dangerous Capabilities Reference

Capability	What It Allows	Attack Potential
`CAP_SYS_ADMIN`	Mount filesystems, modify kernel parameters, load kernel modules, many others	Container escape — effectively root on host
`CAP_SYS_PTRACE`	Trace and inspect other processes; inject code	Process hijack — can ptrace processes on host if hostPID=true
`CAP_NET_ADMIN`	Configure network interfaces, routes, firewall rules, packet mangling	Network attack — ARP poison, traffic interception, iptables modification
`CAP_SYS_MODULE`	Load/unload kernel modules	Container escape — insert rootkit kernel module
`CAP_SYS_RAWIO`	Direct access to I/O ports, /dev/mem, /dev/kmem	Container escape via physical memory access
`CAP_DAC_READ_SEARCH`	Bypass file permission checks; read any file	Data exfiltration — read /etc/shadow, host secrets
`CAP_CHOWN`	Change file ownership	Privilege — change ownership of sensitive files
`CAP_SETUID`	Set UID; switch to any user including root	Root escalation
`CAP_NET_RAW`	Raw sockets, packet injection	Network attack
`CAP_MKNOD`	Create device files	Device access

Capability Drop ALL Pattern

# Minimum viable capabilities for most web applications
securityContext:
  capabilities:
    drop: ["ALL"]
    # Most apps need zero capabilities after dropping ALL
    # Common exceptions:
    # add: ["NET_BIND_SERVICE"]   # ONLY if binding port 80 or 443 as non-root
    # add: ["SYS_NICE"]           # ONLY if app sets process priority (rare)
    # add: ["IPC_LOCK"]           # ONLY if app uses mlock() for security (e.g., Vault)

seccomp Profiles

seccomp (secure computing mode) is a Linux kernel feature that restricts the syscalls a process can make. Kubernetes supports three seccomp profile types:

Type	Description	Security Level
`Unconfined`	No syscall filtering; all syscalls allowed	Minimum — default before 1.27
`RuntimeDefault`	Container runtime's default profile (blocks ~100 dangerous syscalls)	Good baseline — recommended for most workloads
`Localhost`	Custom JSON profile from `/var/lib/kubelet/seccomp/` on the node	Maximum — application-specific allowlist

RuntimeDefault is the cluster default since Kubernetes 1.27. Starting in 1.27, the kubelet uses RuntimeDefault as the seccomp default for new pods when SeccompDefault feature gate is enabled (on by default in 1.27+). However, explicitly setting it in the pod spec is still best practice for clarity and portability.

RuntimeDefault Profile

The RuntimeDefault profile is defined by the container runtime. For containerd, it's based on Docker's default seccomp profile. It blocks syscalls including:

kexec_load — load new kernel
keyctl, add_key, request_key — kernel key management
ptrace — process tracing (blocks in some profiles)
reboot, create_module, finit_module — system-level ops
mount, umount2 — filesystem mounting

Custom Localhost Profile

// /var/lib/kubelet/seccomp/profiles/nginx.json
// Allowlist approach — only permit needed syscalls
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
  "syscalls": [
    {
      "names": [
        "accept4", "access", "arch_prctl", "bind", "brk", "capget", "capset",
        "chdir", "chmod", "chown", "clone", "close", "connect", "dup", "dup2",
        "epoll_create1", "epoll_ctl", "epoll_wait", "eventfd2", "execve",
        "exit", "exit_group", "faccessat", "fchmod", "fchown", "fcntl",
        "fstat", "fstatfs", "futex", "getcwd", "getdents64", "getegid",
        "geteuid", "getgid", "getpid", "getppid", "getrandom", "gettid",
        "gettimeofday", "getuid", "ioctl", "lseek", "lstat", "madvise",
        "mkdir", "mmap", "mprotect", "munmap", "nanosleep", "open",
        "openat", "pipe2", "poll", "prctl", "pread64", "read", "readlink",
        "recv", "recvfrom", "recvmsg", "rename", "rt_sigaction",
        "rt_sigprocmask", "rt_sigreturn", "send", "sendfile", "sendmsg",
        "sendto", "set_robust_list", "setgid", "setgroups", "setuid",
        "socket", "stat", "statfs", "sysinfo", "tgkill", "uname",
        "unlink", "wait4", "write", "writev"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

# Use the custom localhost profile in a pod
securityContext:
  seccompProfile:
    type: Localhost
    localhostProfile: profiles/nginx.json   # relative to /var/lib/kubelet/seccomp/

Custom seccomp profiles must be present on every node. Localhost profiles are read from the node's filesystem. You must ensure the profile file is deployed to every node before pods using it can be scheduled. Use a DaemonSet to distribute profiles, or use Seccomp Operator (CNCF) which manages profile distribution via CRDs.

AppArmor

AppArmor is a Linux MAC (Mandatory Access Control) system that restricts program capabilities based on per-program profiles. It complements seccomp: seccomp restricts syscalls, AppArmor restricts file/capability/network access by pathname.

Profile Modes

Mode	Description
`enforce`	Policy violations are blocked and logged
`complain`	Policy violations are logged but not blocked — useful for profile development
`unconfined`	No AppArmor restrictions applied

Kubernetes Integration

AppArmor support moved to GA in Kubernetes 1.30 with a dedicated appArmorProfile field in the securityContext. Prior to 1.30, annotations were used:

# Kubernetes 1.30+ (GA): securityContext field
spec:
  securityContext:
    appArmorProfile:
      type: RuntimeDefault         # or Localhost
  containers:
  - name: app
    securityContext:
      appArmorProfile:
        type: Localhost
        localhostProfile: k8s-nginx  # profile name loaded on node

---
# Prior to 1.30: annotation-based (still supported for compatibility)
metadata:
  annotations:
    container.apparmor.security.beta.kubernetes.io/app: "runtime/default"
    # or: localhost/
    # or: unconfined (violates PSS baseline)

AppArmor requires support from the node OS. AppArmor is available on Debian/Ubuntu-based distributions and SUSE. It is not available on RHEL/CentOS/Fedora (which use SELinux instead). Check node OS before deploying AppArmor profiles — pods requesting a missing profile will fail to start.

SELinux

SELinux (Security-Enhanced Linux) is a MAC system used on RHEL/CentOS/Fedora. It uses a label-based policy model where every process, file, and socket has a security label. Access is governed by policy rules between labels.

spec:
  securityContext:
    seLinuxOptions:
      level: "s0:c123,c456"    # MCS label for namespace isolation
      role: "object_r"
      type: "svirt_sandbox_file_t"
      user: "system_u"
  containers:
  - name: app
    securityContext:
      seLinuxOptions:
        type: "container_t"     # container-specific label

In most Kubernetes deployments on RHEL/CoreOS, the container runtime (CRI-O, containerd) automatically assigns SELinux labels to containers. Manual configuration is primarily needed when:

Accessing host volumes that need specific SELinux labels
Implementing MCS (Multi-Category Security) for strict container isolation
Running on OpenShift (which has a specific SELinux policy model)

Privileged Container Risks

A privileged container (securityContext.privileged: true) disables most container isolation. It receives all capabilities, can access all host devices, shares the host's cgroups and namespaces for devices, and can typically escape to the host. It is effectively an unrestricted process on the host node.

Container escape via privileged container is trivially easy. A privileged container can: load kernel modules, mount the host filesystem, access /dev/sda (raw disk), modify iptables, read /proc/1/root (host root filesystem). A typical escape:
nsenter --target 1 --mount --uts --ipc --net --pid -- /bin/bash
This spawns a shell in the host's namespaces from inside a privileged container. Any workload that doesn't require host access has zero business being privileged.

What Actually Requires Privileged

Workload Type	Requires Privileged?	Alternative
CNI plugins (calico-node, cilium-agent)	Often yes — needs to configure host network interfaces	Use specific capabilities + hostNetwork instead of full privileged
CSI drivers (storage)	Sometimes — for mount operations	Use only on specific CSI plugin containers; not main application
GPU workloads	Device access via device plugin — not full privileged	GPU device plugin via `resources.limits`; no privileged needed
Node monitoring agents (Datadog, Falco)	Partially — needs access to host proc, host net, host pid	Specific capabilities + hostPID/hostNetwork; not full privileged
Application containers (99%)	Never	Proper securityContext with drop ALL

Host Namespaces

Field	Risk When Enabled	Legitimate Use
`hostNetwork: true`	Pod shares host network stack; can bind to all host ports; traffic not routed through CNI policies; bypasses NetworkPolicy	CNI plugins, host-level monitoring (node-exporter), some service mesh data planes
`hostPID: true`	Pod can see and signal all processes on the host; can read /proc/<host-pid>/mem	System-level debuggers, Falco (for syscall monitoring), some monitoring agents
`hostIPC: true`	Pod can access host shared memory segments and semaphores; can read/write IPC data from other processes	Extremely rare — specific HPC or legacy enterprise applications

hostPath Volume Risks

The hostPath volume type mounts a directory from the host node's filesystem directly into the container. This creates a direct path to host data:

hostPath Mount	Risk
`/`	Full read/write access to host root filesystem — equivalent to privileged
`/etc`	Modify /etc/sudoers, /etc/passwd, host certs
`/var/run/docker.sock` or `/run/containerd/containerd.sock`	Full control of container runtime — can launch privileged containers
`/proc`	Read/write host kernel parameters, process memory
`/var/lib/kubelet`	Access to all pod secrets and service account tokens on the node
`/sys`	Modify kernel settings and hardware interfaces

Mounting the container runtime socket grants full cluster access. Mounting /var/run/docker.sock or /run/containerd/containerd.sock into a container lets that container launch new privileged containers on the host, bypass all pod security controls, and read all secrets cached on the node. This is a complete cluster compromise vector. Block it via admission policy.

# Kyverno policy: block mounting container runtime socket
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: block-runtime-socket
spec:
  validationFailureAction: Enforce
  rules:
  - name: no-runtime-socket
    match:
      resources:
        kinds: ["Pod"]
    validate:
      message: "Mounting container runtime socket is not allowed"
      deny:
        conditions:
          any:
          - key: "{{ request.object.spec.volumes[].hostPath.path | [contains(@, '/var/run/docker.sock'), contains(@, '/run/containerd')] | [] }}"
            operator: AnyIn
            value: [true]

Migrating from PSP to PSA

Migration Strategy

Audit current PSP usage:

kubectl get psp -o json | jq '.items[] | {name:.metadata.name, privileged:.spec.privileged, hostNetwork:.spec.hostNetwork}'

Map PSPs to PSS levels: Identify which PSS level (privileged/baseline/restricted) each namespace's effective PSP corresponds to.
Enable PSA in warn mode first: Apply pod-security.kubernetes.io/warn=restricted and run your workloads. Collect warnings. This is non-breaking.
```
kubectl label namespace my-app pod-security.kubernetes.io/warn=restricted
```
Enable PSA in audit mode: Add pod-security.kubernetes.io/audit=restricted. Check audit logs for violations.
```
kubectl get events -n my-app | grep -i "violated"
```
Fix violating workloads: Update pod specs to meet the target PSS level (add securityContext, drop capabilities, set seccomp, etc.).
Enable enforce mode: Switch to pod-security.kubernetes.io/enforce=restricted when all violations are resolved.
Remove PSP resources: Delete PSP objects and RBAC bindings granting use verb on PSPs.

PSP → PSS Field Mapping

PSP Field	PSS Equivalent
`privileged: false`	Baseline: privileged=false enforced
`hostPID/hostIPC/hostNetwork: false`	Baseline: all three forbidden
`allowedCapabilities: []`	Restricted: drop ALL, no adds
`requiredDropCapabilities: [ALL]`	Restricted: drop ALL required
`volumes: [configMap, emptyDir, ...]`	Restricted: only allowed volume types
`runAsNonRoot: true`	Restricted: runAsNonRoot enforced
`allowPrivilegeEscalation: false`	Restricted: allowPrivilegeEscalation=false
`seccomp: runtime/default`	Restricted: seccompProfile RuntimeDefault
`apparmor: runtime/default`	Baseline: AppArmor unconfined violates baseline

Extending with Kyverno / OPA

PSA is intentionally simple — it enforces only the three built-in PSS levels. For custom policies (e.g., "images must come from our registry", "all pods must have a specific label", "no latest tags"), use Kyverno or OPA/Gatekeeper.

Kyverno — enforce securityContext patterns

# Kyverno: require readOnlyRootFilesystem on all containers
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-ro-rootfs
spec:
  validationFailureAction: Enforce
  background: true
  rules:
  - name: check-readonly-rootfs
    match:
      any:
      - resources:
          kinds: ["Pod"]
    validate:
      message: "readOnlyRootFilesystem must be true"
      pattern:
        spec:
          containers:
          - securityContext:
              readOnlyRootFilesystem: true
          =(initContainers):
          - securityContext:
              readOnlyRootFilesystem: true

OPA/Gatekeeper — ConstraintTemplate for capabilities

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8snocapabilities
spec:
  crd:
    spec:
      names:
        kind: K8sNoCapabilities
      validation:
        openAPIV3Schema:
          properties:
            allowedCapabilities:
              type: array
              items:
                type: string
  targets:
  - target: admission.k8s.gatekeeper.sh
    rego: |
      package k8snocapabilities
      violation[{"msg": msg}] {
        container := input.review.object.spec.containers[_]
        cap := container.securityContext.capabilities.add[_]
        not cap == input.parameters.allowedCapabilities[_]
        msg := sprintf("Container %v has disallowed capability: %v", [container.name, cap])
      }
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sNoCapabilities
metadata:
  name: no-dangerous-caps
spec:
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]
  parameters:
    allowedCapabilities: ["NET_BIND_SERVICE"]  # only this cap is allowed to be added

Metrics & Alerts

Key Metrics

Metric	Source	What It Tells You
`pod_security_evaluations_total{decision="allow\|deny\|exempt",level,mode,policy}`	kube-apiserver	PSA evaluation results per level/mode; track deny counts for violations
`apiserver_admission_controller_admission_duration_seconds{name="PodSecurity"}`	kube-apiserver	PSA admission latency — should be sub-millisecond
`falco_events{rule,priority}`	Falco	Runtime security events by rule and priority
`container_processes{container, namespace}`	cAdvisor/kubelet	Process count per container — unexpected process spike may indicate exec/shell spawn
`apiserver_audit_event_total`	kube-apiserver	Total audit events; correlated with PSA audit-mode violations

Alerts

groups:
- name: pod-security.rules
  rules:

  - alert: PSAViolationsHigh
    expr: |
      rate(pod_security_evaluations_total{decision="deny",mode="enforce"}[5m]) > 0
    annotations:
      summary: "PSA enforce mode is rejecting pods in {{ $labels.namespace }}"
      description: "Pod Security Admission ({{ $labels.level }}) is blocking pod creation. Check workload securityContext."
    labels:
      severity: warning

  - alert: PrivilegedPodRunning
    expr: |
      count(kube_pod_container_info{container!=""}) by (namespace, pod)
      # Implement via Falco rule: container.privileged=true → alert
    annotations:
      summary: "Privileged container detected: {{ $labels.pod }} in {{ $labels.namespace }}"
    labels:
      severity: critical

  - alert: FalcoHighPriorityEvent
    expr: |
      rate(falco_events{priority="Critical"}[5m]) > 0
    annotations:
      summary: "Falco critical event: {{ $labels.rule }}"
    labels:
      severity: critical

  - alert: ContainerRunningAsRoot
    # Via Falco rule: user.uid=0 AND container=true → alert
    annotations:
      summary: "Container running as root: {{ $labels.container }} in {{ $labels.pod }}"
    labels:
      severity: warning

Runbooks

PSA enforce mode blocking pods: Check kubectl get events -n <namespace> | grep PolicyViolation. Identify which PSS check failed. Options: fix the pod spec to comply, lower the namespace's enforce level if legitimate (and explain why), or add a PSA exemption for the specific workload. Do not blindly lower the PSS level — understand why the violation is occurring first.
Privileged container detected at runtime: Identify the pod and node (kubectl get pod -o wide). If unauthorized: cordon the node, delete the pod, audit what it did (Falco logs, audit log). If authorized (CNI plugin, CSI driver): verify it matches expected workloads. Consider whether capabilities + specific access can replace full privileged.
Falco critical event — shell spawned in container: Identify pod, container, and user who exec'd. Check kubectl get events and audit log for who triggered the exec. Determine if it was authorized (troubleshooting by a human) or automated (attacker executing code). If unauthorized: isolate the pod (remove from Service, set networkPolicy to deny all), preserve forensic evidence, trigger incident response.
Container running as root (unexpected): Check the image's Dockerfile for USER instruction. If missing, add USER nonroot to the Dockerfile, rebuild, and redeploy. Add runAsNonRoot: true to the pod spec as a safety net — it causes the pod to fail to start if the image runs as root, forcing the issue to be resolved at deploy time rather than silently.
Workload fails to start after PSA enforce upgrade: Use kubectl describe pod <pod> -n <ns> to see the specific PSA violation. Common issues: missing seccompProfile (add RuntimeDefault), missing capabilities.drop: [ALL], allowPrivilegeEscalation not set to false. Apply the minimal fix rather than lowering the PSS level.

Best Practices

Apply PSA restricted to all tenant namespaces by default. Use cluster-level defaults in the PSA admission config to set warn=restricted and audit=restricted cluster-wide, with enforce=baseline as the cluster default. Teams that need to deploy to restricted-compliant workloads can remain on the cluster default. Override to privileged only for explicitly exempted system namespaces.
Always use enforce + audit + warn together with the same level. Enforce alone gives no visibility before violations. Audit and warn together let you detect violations in existing workloads and new deployments before they hit enforce. Set all three to the same target level during migration; set all three to the same level in steady state.
Drop ALL capabilities, then add only what's proven necessary. Start with capabilities.drop: ["ALL"]. Run the workload. If it fails, check the error, identify the needed syscall, determine which capability grants it, add only that capability. Most applications need zero capabilities after dropping ALL.
Set seccompProfile RuntimeDefault as your baseline, consider Localhost for critical services. RuntimeDefault blocks ~100 dangerous syscalls with zero configuration. For services handling sensitive data (auth services, secret managers, payment processors), generate a custom allowlist profile using strace or Falco's syscall logging in complain mode, then deploy as Localhost profile.
Use readOnlyRootFilesystem: true and back it with emptyDir mounts for writable paths. Set readOnlyRootFilesystem: true on all containers. If the application writes to disk (temp files, caches, logs), mount specific writable directories as emptyDir. This prevents malware from writing persistence or tooling to the container filesystem.
Ban hostPath mounts in tenant namespaces via admission policy. PSA doesn't block all hostPath mounts (baseline allows some). Use Kyverno or OPA/Gatekeeper to deny all hostPath volume types in non-system namespaces. Block specific dangerous paths (docker socket, /proc, /etc, /var/lib/kubelet) even in system namespaces via path-specific policies.
Pair Falco with PSA for runtime detection. PSA prevents known-bad configurations at deploy time. Falco detects unknown-bad behavior at runtime (a container that starts compliant but later executes a shell, makes unexpected network connections, or writes to sensitive paths). Both layers are needed.
Pin PSA version labels and update them deliberately. Always set pod-security.kubernetes.io/enforce-version=v1.<N>. When upgrading Kubernetes, review PSS changelog for new checks, test with warn mode first, then bump the version pin once workloads are compliant. Never use latest in production enforce mode.

Pod Security

On this page

PSP Removal Background

Pod Security Admission (PSA)

Modes

PSA Exemptions

Graduation Timeline

Pod Security Standards

Privileged

Baseline

Restricted

PSS Field-by-Field Reference

Host Namespaces

Privileged Containers

Capabilities

Privilege Escalation

Run As Non-Root

seccomp

Volumes

Host Ports

AppArmor (baseline only check)

Sysctls

securityContext Deep Dive

Pod-Level vs Container-Level Fields

Complete Restricted-Compliant securityContext

fsGroup and Volume Ownership

Linux Capabilities

Default Container Capabilities (containerd/CRI-O)

Dangerous Capabilities Reference

Capability Drop ALL Pattern

seccomp Profiles

RuntimeDefault Profile

Custom Localhost Profile

AppArmor

Profile Modes

Kubernetes Integration

SELinux

Privileged Container Risks

What Actually Requires Privileged

Host Namespaces

hostPath Volume Risks

Migrating from PSP to PSA

Migration Strategy

PSP → PSS Field Mapping

Extending with Kyverno / OPA

Kyverno — enforce securityContext patterns

OPA/Gatekeeper — ConstraintTemplate for capabilities

Metrics & Alerts

Key Metrics

Alerts

Runbooks

Best Practices