Policy Enforcement in Kubernetes
Complete guide to Kubernetes admission control, OPA Gatekeeper, Kyverno, Pod Security Admission, and policy-as-code workflows — ensuring every workload meets security, operational, and compliance standards before it lands in the cluster.
Contents
Admission Control Architecture
Every API request in Kubernetes passes through an ordered admission chain before being persisted to etcd. Policy engines hook into this chain as dynamic admission webhooks — they are called synchronously, making them enforceable gates rather than after-the-fact detectors.
failurePolicy: Fail block all matching requests if the webhook pod is unavailable. Deploy policy engines with podAntiAffinity across zones, set PodDisruptionBudgets, and size replicas to survive zone failure. failurePolicy: Ignore is an escape hatch but removes your guarantees.Webhook Configuration Anatomy
# ValidatingWebhookConfiguration (simplified — Gatekeeper creates this)
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: gatekeeper-validating-webhook-configuration
webhooks:
- name: validation.gatekeeper.sh
admissionReviewVersions: ["v1"]
clientConfig:
service:
name: gatekeeper-webhook-service
namespace: gatekeeper-system
path: /v1/admit
rules:
- apiGroups: ["*"]
apiVersions: ["*"]
operations: ["CREATE","UPDATE"]
resources: ["*"]
scope: Namespaced
namespaceSelector:
matchExpressions:
- key: admission.gatekeeper.sh/ignore
operator: DoesNotExist
failurePolicy: Fail
sideEffects: None
timeoutSeconds: 10
matchPolicy: Equivalent # catches admission of old API versions too
Built-in Admission Plugins
| Plugin | Type | Purpose | Enabled by Default |
|---|---|---|---|
PodSecurity | Validating | Enforce Pod Security Standards per namespace | Yes (1.25+) |
LimitRanger | Mutating+Validating | Apply LimitRange defaults and enforce bounds | Yes |
ResourceQuota | Validating | Block creation when namespace quota exceeded | Yes |
DefaultStorageClass | Mutating | Add default StorageClass annotation to PVCs | Yes |
MutatingAdmissionWebhook | Mutating | Call registered mutating webhooks | Yes |
ValidatingAdmissionWebhook | Validating | Call registered validating webhooks | Yes |
ValidatingAdmissionPolicy | Validating | CEL-based in-process policies (1.30 GA) | Yes (1.30+) |
Pod Security Admission (PSA)
PSA is the built-in replacement for the deprecated PodSecurityPolicy. It enforces three hardcoded Pod Security Standards at the namespace level via labels. No CRDs, no external webhook — zero operational overhead.
The Three Pod Security Standards
Privileged
Completely unrestricted. Allows all host namespaces, privileged containers, any capabilities. Reserved for system-level DaemonSets (CNI, node agents, eBPF tools) in kube-system-equivalent namespaces.
Baseline
Prevents known privilege escalations. Disallows privileged containers, hostPID/hostNetwork/hostIPC, dangerous capabilities (NET_ADMIN, SYS_ADMIN), host path volumes, hostPort. Suitable for most general workloads.
Restricted
Heavily hardened. Requires runAsNonRoot: true, drops ALL capabilities, allows only NET_BIND_SERVICE, requires seccompProfile: RuntimeDefault or Localhost, disallows all volume types except configMap/secret/projected/emptyDir/csi/persistent. Target for internet-facing services.
PSA Modes
| Mode | Label Key | Behavior | Use Case |
|---|---|---|---|
enforce | pod-security.kubernetes.io/enforce | Reject violating pods synchronously | Production namespaces |
audit | pod-security.kubernetes.io/audit | Allow but add audit annotation to API audit log | Monitoring compliance without breakage |
warn | pod-security.kubernetes.io/warn | Allow but return HTTP warning header to client | Gradual rollout — warns kubectl users |
Namespace Labeling Examples
# Recommended: enforce baseline, warn+audit restricted
apiVersion: v1
kind: Namespace
metadata:
name: payments-api
labels:
# Reject pods that aren't at least baseline
pod-security.kubernetes.io/enforce: baseline
pod-security.kubernetes.io/enforce-version: latest
# Warn and audit against restricted (see violations without breaking)
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/warn-version: latest
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/audit-version: latest
---
# Infrastructure namespace — privileged for CNI/monitoring DaemonSets
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
labels:
pod-security.kubernetes.io/enforce: privileged
Labeling Existing Namespaces with Dry-Run
# Check what would break before enforcing
kubectl label --dry-run=server --overwrite ns my-app \
pod-security.kubernetes.io/enforce=restricted 2>&1 | grep -i warning
# Bulk audit all namespaces against restricted
kubectl get ns -o name | xargs -I{} kubectl label --dry-run=server \
--overwrite {} pod-security.kubernetes.io/audit=restricted 2>&1 | grep Warning
Compliant Pod securityContext for Restricted
spec:
securityContext:
runAsNonRoot: true
runAsUser: 65532
runAsGroup: 65532
fsGroup: 65532
seccompProfile:
type: RuntimeDefault
containers:
- name: app
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
OPA Gatekeeper
Gatekeeper extends OPA (Open Policy Agent) into Kubernetes as a set of CRDs. Policies are expressed as ConstraintTemplates (Rego logic) and instantiated as Constraints (parameters). This two-layer model lets platform teams ship reusable policy libraries while teams configure enforcement parameters.
Install Gatekeeper
helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
helm repo update
helm install gatekeeper gatekeeper/gatekeeper \
--namespace gatekeeper-system \
--create-namespace \
--version 3.17.1 \
--set replicas=3 \
--set auditInterval=60 \
--set auditMatchKindOnly=false \
--set constraintViolationsLimit=100 \
--set disableMutation=false \
--set logDenies=true \
--set emitAdmissionEvents=true \
--set emitAuditEvents=true
ConstraintTemplate: Required Labels
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8srequiredlabels
annotations:
metadata.gatekeeper.sh/title: "Required Labels"
metadata.gatekeeper.sh/description: "Requires specified labels on resources"
spec:
crd:
spec:
names:
kind: K8sRequiredLabels
validation:
openAPIV3Schema:
type: object
properties:
labels:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredlabels
violation[{"msg": msg}] {
provided := {label | input.review.object.metadata.labels[label]}
required := {label | label := input.parameters.labels[_]}
missing := required - provided
count(missing) > 0
msg := sprintf("Missing required labels: %v", [missing])
}
Constraint: Enforce Labels on Namespaces
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: require-team-env-labels
spec:
enforcementAction: deny # deny | warn | dryrun
match:
kinds:
- apiGroups: [""]
kinds: ["Namespace"]
excludedNamespaces:
- kube-system
- kube-public
- gatekeeper-system
- cert-manager
- argocd
parameters:
labels: ["team", "env", "cost-center"]
ConstraintTemplate: Allowed Image Registries
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8sallowedrepos
spec:
crd:
spec:
names:
kind: K8sAllowedRepos
validation:
openAPIV3Schema:
type: object
properties:
repos:
type: array
items: {type: string}
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8sallowedrepos
violation[{"msg": msg}] {
container := input_containers[_]
not strings.any_prefix_match(container.image, input.parameters.repos)
msg := sprintf("Container %q image %q not from allowed registries %v",
[container.name, container.image, input.parameters.repos])
}
input_containers[c] {
c := input.review.object.spec.containers[_]
}
input_containers[c] {
c := input.review.object.spec.initContainers[_]
}
input_containers[c] {
c := input.review.object.spec.ephemeralContainers[_]
}
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
name: allowed-image-registries
spec:
enforcementAction: deny
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
parameters:
repos:
- "123456789.dkr.ecr.us-east-1.amazonaws.com/"
- "gcr.io/distroless/"
- "registry.k8s.io/"
Gatekeeper Mutation
# Automatically inject default resource limits
apiVersion: mutations.gatekeeper.sh/v1
kind: AssignMetadata
metadata:
name: add-cost-center-label
spec:
match:
scope: Namespaced
kinds:
- apiGroups: ["apps"]
kinds: ["Deployment","StatefulSet","DaemonSet"]
location: "metadata.labels.cost-center"
parameters:
assign:
value: "platform"
---
# Set readOnlyRootFilesystem if not explicitly set
apiVersion: mutations.gatekeeper.sh/v1
kind: Assign
metadata:
name: set-readonly-rootfs
spec:
match:
scope: Namespaced
kinds:
- apiGroups: [""]
kinds: ["Pod"]
location: "spec.containers[name:*].securityContext.readOnlyRootFilesystem"
parameters:
assign:
value: true
Reading Audit Violations
# Show all violations across all constraints
kubectl get constraints -A
kubectl describe k8srequiredlabels require-team-env-labels
# Structured violation output
kubectl get k8srequiredlabels require-team-env-labels \
-o jsonpath='{.status.violations[*]}' | jq .
# Count by constraint
kubectl get constraints -o json | jq '
.items[] | {
constraint: .metadata.name,
violations: (.status.violations | length)
}'
Kyverno
Kyverno is a Kubernetes-native policy engine that uses YAML (with JMESPath expressions) instead of a separate policy language. Policies are ClusterPolicy or namespace-scoped Policy resources with three rule types: validate, mutate, and generate.
Install Kyverno
helm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update
helm install kyverno kyverno/kyverno \
--namespace kyverno \
--create-namespace \
--version 3.2.6 \
--set admissionController.replicas=3 \
--set backgroundController.replicas=2 \
--set cleanupController.replicas=1 \
--set reportsController.replicas=1 \
--set admissionController.container.args.enableDeferredLoading=true \
--set features.policyExceptions.enabled=true \
--set features.globalContext.enabled=true
ClusterPolicy: Validate
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-pod-probes
annotations:
policies.kyverno.io/title: Require Liveness and Readiness Probes
policies.kyverno.io/category: Best Practices
policies.kyverno.io/severity: medium
policies.kyverno.io/subject: Pod
policies.kyverno.io/description: >-
Pods without liveness probes cannot be automatically restarted.
Pods without readiness probes receive traffic before they are ready.
spec:
validationFailureAction: Enforce # Enforce | Audit
background: true # also scan existing resources
rules:
- name: check-container-probes
match:
any:
- resources:
kinds: ["Pod"]
operations: ["CREATE","UPDATE"]
exclude:
any:
- resources:
namespaces: ["kube-system","monitoring","kyverno","argocd"]
- subjects:
- kind: ServiceAccount
name: "argo-rollouts"
namespace: "argo-rollouts"
validate:
message: "Liveness and readiness probes are required for all containers."
foreach:
- list: "request.object.spec.containers"
deny:
conditions:
any:
- key: "{{ element.livenessProbe }}"
operator: Equals
value: null
- key: "{{ element.readinessProbe }}"
operator: Equals
value: null
ClusterPolicy: Mutate — Inject Default Resources
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-default-resources
spec:
validationFailureAction: Audit
background: false
rules:
- name: add-resource-defaults
match:
any:
- resources:
kinds: ["Pod"]
operations: ["CREATE"]
mutate:
foreach:
- list: "request.object.spec.containers[]"
patchStrategicMerge:
spec:
containers:
- name: "{{ element.name }}"
resources:
requests:
cpu: "{{ element.resources.requests.cpu || '100m' }}"
memory: "{{ element.resources.requests.memory || '128Mi' }}"
limits:
memory: "{{ element.resources.limits.memory || '256Mi' }}"
ClusterPolicy: Generate — NetworkPolicy on Namespace Create
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: generate-default-network-policy
spec:
rules:
- name: generate-deny-all-ingress
match:
any:
- resources:
kinds: ["Namespace"]
operations: ["CREATE"]
exclude:
any:
- resources:
names: ["kube-system","kube-public","kube-node-lease",
"monitoring","kyverno","argocd","cert-manager"]
generate:
synchronize: true # keep in sync; delete if namespace deleted
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
name: deny-all-ingress
namespace: "{{ request.object.metadata.name }}"
data:
spec:
podSelector: {}
policyTypes: ["Ingress"]
# No ingress rules = deny all. Teams must add their own allow rules.
ClusterPolicy: Verify Image Signatures
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: verify-image-signatures
spec:
validationFailureAction: Enforce
background: false
webhookTimeoutSeconds: 30
rules:
- name: verify-cosign-keyless
match:
any:
- resources:
kinds: ["Pod"]
operations: ["CREATE","UPDATE"]
verifyImages:
- imageReferences:
- "123456789.dkr.ecr.us-east-1.amazonaws.com/*"
mutateDigest: true # replace tag with digest (immutability)
verifyDigest: true
required: true
attestors:
- count: 1
entries:
- keyless:
subject: "https://github.com/myorg/myrepo/.github/workflows/ci.yaml@refs/heads/main"
issuer: "https://token.actions.githubusercontent.com"
rekor:
url: https://rekor.sigstore.dev
:v1.2.3 tags with @sha256:... digests at admission time, preventing tag mutation attacks even after the pod is admitted.Kyverno CLI — Test Locally
# Install
brew install kyverno # or: curl -sL https://github.com/kyverno/kyverno/releases/...
# Test a policy against resources without a cluster
kyverno apply policy.yaml --resource pod.yaml
# Test a policy with values (simulating variables)
kyverno apply policy.yaml --resource pod.yaml \
--values-file values.yaml
# Run a full test suite
kyverno test . # looks for kyverno-test.yaml in current dir
# Generate policy reports from live cluster
kyverno report cluster --policy-report-dir ./reports/
Kyverno Test Manifest
# kyverno-test.yaml
name: require-pod-probes-test
policies:
- require-pod-probes.yaml
resources:
- good-pod.yaml
- bad-pod.yaml
results:
- policy: require-pod-probes
rule: check-container-probes
resource: good-pod
result: pass
- policy: require-pod-probes
rule: check-container-probes
resource: bad-pod
result: fail
Gatekeeper vs Kyverno
| Dimension | OPA Gatekeeper | Kyverno |
|---|---|---|
| Policy language | Rego (OPA's declarative language) | YAML + JMESPath expressions |
| Learning curve | High (Rego has unique syntax/semantics) | Low (K8s-native YAML) |
| Rule types | Validate + Mutate (separate CRDs) | Validate + Mutate + Generate in one ClusterPolicy |
| Mutation support | AssignMetadata, Assign, ModifySet CRDs | patchStrategicMerge, patchesJSON6902, foreach |
| Resource generation | No native generate | Yes — generate NetworkPolicy, RBAC, etc. on triggers |
| Image verification | No (use separate Kyverno for this) | Yes — verifyImages with cosign keyless/key-based |
| External data | OPA external data providers (HTTP cache) | Global context (K8s resources, API calls) |
| Testing tooling | conftest, opa test | kyverno test (full test suite with pass/fail) |
| Policy Reports | via status.violations (custom) | PolicyReport / ClusterPolicyReport (K8s standard CRD) |
| Audit mode | Separate audit controller, enforcementAction: dryrun | background: true + validationFailureAction: Audit |
| Exceptions | Excluded scopes in match, separate namespace annotations | PolicyException CRD |
| Community adoption | CNCF graduated; strong enterprise adoption | CNCF graduated; rapidly growing; preferred for K8s teams |
| Best for | Teams already using OPA/Rego for other policy; complex cross-resource logic | Teams wanting K8s-native approach; need generate rules; image signing |
ValidatingAdmissionPolicy (VAP) — Built-in CEL
Kubernetes 1.30 GA'd ValidatingAdmissionPolicy — lightweight CEL-based validation without an external webhook. Ideal for simple constraints where you don't want another running pod.
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: require-run-as-non-root
spec:
failurePolicy: Fail
matchConstraints:
resourceRules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE","UPDATE"]
resources: ["pods"]
validations:
- expression: >-
object.spec.securityContext.runAsNonRoot == true ||
object.spec.containers.all(c,
c.securityContext.runAsNonRoot == true)
message: "Pods must set runAsNonRoot: true"
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: require-run-as-non-root-binding
spec:
policyName: require-run-as-non-root
validationActions: [Deny]
matchResources:
namespaceSelector:
matchLabels:
enforce-security: "true"
Common Policy Library
These are the policies every production platform should enforce. Ship them in enforcementAction: dryrun / Audit first to find violations, then graduate to deny / Enforce.
| Policy | Risk Mitigated | Recommended Action | Notes |
|---|---|---|---|
| Require resource limits & requests | CPU/memory OOM, node pressure | Enforce | Use LimitRange defaults as fallback |
| Require liveness + readiness probes | Traffic to not-ready pods, no auto-restart | Enforce | Exclude Jobs and one-shot containers |
| Disallow privileged containers | Node escape, kernel access | Enforce | Allow in kube-system with annotation |
| Disallow hostPID / hostIPC / hostNetwork | Process snooping, network sniffing | Enforce | Allow for CNI/monitoring DaemonSets |
| Disallow hostPath volumes | Host filesystem read/write | Enforce | Allow specific paths for log agents |
| Require non-root user | Container breakout impact | Enforce | Set runAsNonRoot + runAsUser ≥ 1000 |
| Disallow privilege escalation | setuid binary exploitation | Enforce | allowPrivilegeEscalation: false |
| Drop ALL capabilities | Linux capability abuse | Enforce | Allow NET_BIND_SERVICE explicitly |
| Require readOnlyRootFilesystem | Reduce attack surface for malware | Audit → Enforce | Many apps need emptyDir for /tmp |
| Require seccompProfile: RuntimeDefault | Syscall surface reduction | Enforce | Required for restricted PSS |
| Allowed image registries | Supply chain compromise | Enforce | Allowlist internal ECR + distroless |
| Require image digest (not tag) | Tag mutation attacks | Enforce (via mutateDigest) | Kyverno verifyImages handles this |
| Require cosign signature | Unsigned / untrusted images | Enforce | Gate on specific namespaces first |
| Disallow latest tag | Non-reproducible deployments | Enforce | Block pods with image ending in :latest |
| Require team + env labels | Ownership, cost attribution | Enforce on Namespace/Deployment | Required for cost chargeback |
| Require PodDisruptionBudget | Accidental zero-replica drain | Audit | Complex to enforce — use audit + alert |
| Require NetworkPolicy exists | Unrestricted lateral movement | Audit → Enforce | Kyverno generate creates deny-all |
| Max replicas without HPA | Over-provisioning | Audit | Warn if replicas > 3 without HPA |
| Disallow NodePort services | Direct node exposure | Enforce | Allow ClusterIP + LoadBalancer only |
| Require Ingress TLS | Plaintext traffic | Enforce | Check spec.tls is set |
Kyverno: Disallow Latest Tag
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: disallow-latest-tag
annotations:
policies.kyverno.io/title: Disallow Latest Tag
policies.kyverno.io/severity: medium
spec:
validationFailureAction: Enforce
background: true
rules:
- name: require-image-tag
match:
any:
- resources:
kinds: ["Pod"]
operations: ["CREATE","UPDATE"]
exclude:
any:
- resources:
namespaces: ["kube-system"]
validate:
message: "Image tag ':latest' or missing tag is not allowed. Use a specific tag or digest."
foreach:
- list: "request.object.spec.containers"
deny:
conditions:
any:
- key: "{{ element.image }}"
operator: Equals
value: "*:latest"
- key: "{{ element.image }}"
operator: NotEquals
value: "*:*" # images without any tag
Kyverno: Require Resource Limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-requests-limits
annotations:
policies.kyverno.io/title: Require Resource Requests and Limits
policies.kyverno.io/severity: high
spec:
validationFailureAction: Enforce
background: true
rules:
- name: validate-resources
match:
any:
- resources:
kinds: ["Pod"]
operations: ["CREATE","UPDATE"]
exclude:
any:
- resources:
namespaces: ["kube-system","kyverno"]
validate:
message: "CPU/memory requests and memory limits are required on all containers."
foreach:
- list: "request.object.spec.containers"
deny:
conditions:
any:
- key: "{{ element.resources.requests.cpu || '' }}"
operator: Equals
value: ""
- key: "{{ element.resources.requests.memory || '' }}"
operator: Equals
value: ""
- key: "{{ element.resources.limits.memory || '' }}"
operator: Equals
value: ""
Policy Exceptions
No policy library is perfect — there will always be legitimate exceptions (legacy apps, specialized system components, vendor-provided DaemonSets). Manage these explicitly rather than widening the policy's exclusion scope.
Kyverno PolicyException
apiVersion: kyverno.io/v2
kind: PolicyException
metadata:
name: datadog-agent-exception
namespace: monitoring # exceptions are namespaced
spec:
exceptions:
- policyName: disallow-privileged-containers
ruleNames:
- check-privileged
- policyName: require-requests-limits
ruleNames:
- validate-resources
match:
any:
- resources:
kinds: ["Pod"]
namespaces: ["monitoring"]
selector:
matchLabels:
app: datadog-agent
# Optional: expiry date to force re-review
podSecurity: []
annotations: approved-by: platform-team on every PolicyException. Track all exceptions in a GitOps repo for auditability.Gatekeeper Exemption via Namespace Annotation
# Exclude a namespace from ALL Gatekeeper webhooks
kubectl label namespace legacy-app \
admission.gatekeeper.sh/ignore=no-validation
# Per-constraint: use excludedNamespaces in Constraint spec
spec:
match:
excludedNamespaces:
- legacy-app
- vendor-system
Exception Tracking in Git
policy/
library/
require-probes.yaml
disallow-privileged.yaml
allowed-registries.yaml
constraints/
cluster-wide-constraints.yaml
exceptions/
README.md # exception registry with justification
datadog-agent.yaml
legacy-payment-service.yaml # expires: 2025-12-31, ticket: PLAT-4892
tests/
require-probes/
kyverno-test.yaml
good-pod.yaml
bad-pod.yaml
Policy Testing
Policy changes must be tested before reaching production. The testing pyramid applies: unit tests (kyverno test / conftest), integration against a Kind cluster, then graduated rollout in audit mode.
Kyverno Test Suite Structure
# tests/require-probes/kyverno-test.yaml
name: require-probes-tests
policies:
- ../../library/require-probes.yaml
resources:
- good-pod.yaml
- bad-pod-no-liveness.yaml
- bad-pod-no-readiness.yaml
- excluded-namespace-pod.yaml
variables: variables.yaml
results:
- policy: require-pod-probes
rule: check-container-probes
resource: good-pod
result: pass
- policy: require-pod-probes
rule: check-container-probes
resource: bad-pod-no-liveness
result: fail
- policy: require-pod-probes
rule: check-container-probes
resource: bad-pod-no-readiness
result: fail
- policy: require-pod-probes
rule: check-container-probes
resource: excluded-namespace-pod # in kube-system
result: skip
conftest for Gatekeeper Rego
# Test Rego policy logic in isolation
conftest test pod.json \
--policy policy/library/required-labels.rego \
--namespace k8srequiredlabels
# Test all policies against a manifest directory
conftest test manifests/ \
--policy policy/library/ \
--all-namespaces
# Parse K8s YAML to JSON for conftest
kubectl get pod my-pod -o json | conftest test - \
--policy policy/library/required-labels.rego
GitHub Actions Policy CI
name: Policy Tests
on:
pull_request:
paths: ["policy/**"]
jobs:
kyverno-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Kyverno CLI
run: |
curl -sL "https://github.com/kyverno/kyverno/releases/download/v1.12.0/kyverno_1.12.0_linux_amd64.tar.gz" \
| tar -xz -C /usr/local/bin kyverno
- name: Run Kyverno tests
run: kyverno test policy/tests/
audit-against-kind:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Create Kind cluster
uses: helm/kind-action@v1.9.0
- name: Install Kyverno
run: helm install kyverno kyverno/kyverno -n kyverno --create-namespace --wait
- name: Apply policies in Audit mode
run: |
find policy/library -name "*.yaml" | xargs -I{} \
sed 's/validationFailureAction: Enforce/validationFailureAction: Audit/' | \
kubectl apply -f -
- name: Apply test manifests
run: kubectl apply -f policy/tests/fixtures/
- name: Check policy reports
run: |
kubectl get policyreport -A -o json | \
jq '.items[].results[] | select(.result=="fail") | .message' | \
tee /tmp/violations.txt
# Fail CI if unexpected violations
[ -s /tmp/violations.txt ] && exit 1 || true
Graduated Rollout Strategy
Audit mode cluster-wide — deploy with
validationFailureAction: Audit. Let the audit controller scan all existing resources. Review PolicyReport violations.Warn mode on new namespaces — label new namespaces with the PSA
warnlabel. Developers see warnings in kubectl output without failures.Enforce on non-production first — flip to
Enforcein dev/staging namespaces. Fix violations that surface.Enforce on production — after 2-week soak in staging with zero new violations, promote to production. Document timeline in GitOps PR.
Enforce globally — remove per-namespace overrides. Only named PolicyExceptions remain.
Policy as Code in GitOps
Policies must be in Git — not applied ad-hoc. The GitOps loop (see 02-gitops.html) ensures policies are version-controlled, reviewed, and automatically reconciled. Drift from approved policy triggers alerts.
Argo CD Application for Policy
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cluster-policies
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "-2" # before workloads
spec:
project: platform
source:
repoURL: https://github.com/myorg/platform
targetRevision: main
path: policy/
kustomize:
namePrefix: ""
destination:
server: https://kubernetes.default.svc
namespace: kyverno
syncPolicy:
automated:
prune: true
selfHeal: true # revert manual policy changes immediately
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
Policy Repo Structure
policy/
├── kustomization.yaml
├── library/
│ ├── security/
│ │ ├── disallow-privileged.yaml
│ │ ├── require-non-root.yaml
│ │ ├── drop-all-capabilities.yaml
│ │ ├── disallow-host-namespaces.yaml
│ │ └── require-seccomp.yaml
│ ├── best-practices/
│ │ ├── require-probes.yaml
│ │ ├── require-resource-limits.yaml
│ │ ├── disallow-latest-tag.yaml
│ │ └── require-labels.yaml
│ ├── supply-chain/
│ │ ├── allowed-registries.yaml
│ │ └── verify-image-signatures.yaml
│ └── networking/
│ ├── disallow-nodeport.yaml
│ └── require-ingress-tls.yaml
├── exceptions/
│ ├── kustomization.yaml
│ └── datadog-agent.yaml
└── tests/
└── ...
OPA Policy Bundle for Multi-Cluster
# Gatekeeper Config — sync K8s resources into OPA cache
# (needed for policies that reference other objects)
apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
name: config
namespace: gatekeeper-system
spec:
sync:
syncOnly:
- group: ""
version: "v1"
kind: Namespace
- group: ""
version: "v1"
kind: Service
- group: "networking.k8s.io"
version: "v1"
kind: Ingress
validation:
traces: []
Audit & Compliance Reporting
Kyverno PolicyReport
# PolicyReport is namespaced — one per namespace
kubectl get policyreport -n payments-api -o yaml
# ClusterPolicyReport is cluster-scoped — for non-namespaced resources
kubectl get clusterpolicyreport -o yaml
# Summary: pass/fail/warn/error counts per policy
kubectl get policyreport -A \
-o jsonpath='{range .items[*]}{.metadata.namespace}: {.summary}{"\n"}{end}'
# Find all failing resources across cluster
kubectl get policyreport -A -o json | jq '
.items[] | .metadata.namespace as $ns |
.results[] | select(.result == "fail") | {
namespace: $ns,
resource: .resources[0].name,
policy: .policy,
message: .message
}'
Policy Reporter Dashboard
# Policy Reporter: visualization layer over PolicyReports
helm install policy-reporter policy-reporter/policy-reporter \
--namespace policy-reporter \
--create-namespace \
--set ui.enabled=true \
--set kyvernoPlugin.enabled=true \
--set monitoring.enabled=true \ # ServiceMonitor
--set target.slack.webhook="https://hooks.slack.com/..."
--set target.slack.minimumSeverity="high"
K8s Audit Logs for Policy Actions
# Audit policy on kube-apiserver — log admission denials
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all admission webhook denials at RequestResponse level
- level: RequestResponse
omitStages: []
resources:
- group: ""
resources: ["pods","deployments"]
verbs: ["create","update"]
# Filter in SIEM/log pipeline for responseStatus.code=403
Alerting & Monitoring
Gatekeeper Prometheus Metrics
# Key metrics exposed by gatekeeper-controller-manager
gatekeeper_violations # gauge: current audit violations per constraint
gatekeeper_audit_last_run_time # gauge: Unix timestamp of last audit
gatekeeper_audit_duration_seconds # histogram: audit run duration
gatekeeper_request_count_total # counter: webhook requests (admitted/denied)
gatekeeper_request_duration_seconds # histogram: webhook latency
PrometheusRule
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: policy-enforcement-alerts
namespace: monitoring
labels:
prometheus: kube-prometheus
role: alert-rules
spec:
groups:
- name: policy.enforcement
interval: 60s
rules:
# Kyverno webhook pod health
- alert: KyvernoWebhookPodsLow
expr: |
kube_deployment_status_replicas_available{
namespace="kyverno",
deployment="kyverno-admission-controller"
} < 2
for: 5m
labels:
severity: critical
annotations:
summary: "Kyverno admission controller replicas below minimum"
description: "Only {{ $value }} replica(s) available. Policy enforcement may be degraded."
# Gatekeeper audit violations spiking
- alert: GatekeeperHighViolations
expr: |
sum(gatekeeper_violations) by (enforcement_action) > 50
for: 15m
labels:
severity: warning
annotations:
summary: "High number of Gatekeeper violations in audit"
description: "{{ $value }} violations detected. Run 'kubectl get constraints -A' to review."
# Policy webhook latency
- alert: PolicyWebhookHighLatency
expr: |
histogram_quantile(0.99,
rate(gatekeeper_request_duration_seconds_bucket[5m])
) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "Gatekeeper webhook P99 latency exceeds 2s"
description: "High webhook latency may cause timeout-based admission failures."
# Kyverno background scan stale
- alert: KyvernoAuditStale
expr: |
(time() - kyverno_policy_results_total) > 3600
for: 10m
labels:
severity: warning
annotations:
summary: "Kyverno policy audit not running"
# New high-severity violations in Kyverno PolicyReport
- alert: KyvernoCriticalPolicyViolation
expr: |
increase(kyverno_policy_results_total{
policy_type="validate",
status="fail"
}[10m]) > 0
for: 0m
labels:
severity: warning
annotations:
summary: "New Kyverno policy violations detected"
description: "Check 'kubectl get policyreport -A' for details."
Grafana Dashboard Panels
| Panel | Query | Visualization |
|---|---|---|
| Violations by constraint | gatekeeper_violations by (constraint_name) | Bar chart |
| Webhook admit/deny ratio | rate(gatekeeper_request_count_total[5m]) by (admission_status) | Stacked area |
| P99 webhook latency | histogram_quantile(0.99, rate(gatekeeper_request_duration_seconds_bucket[5m])) | Stat (threshold: >1s yellow, >2s red) |
| Kyverno pass/fail trend | kyverno_policy_results_total by (status) | Time series |
| Namespace PSS compliance | Custom query on namespace labels | Table |
Best Practices
Audit Before Enforce
Always deploy new policies in Audit/dryrun mode first. Run for at least 2 weeks and fix all violations before flipping to Enforce. New policies in Enforce without audit break existing workloads.
Policy as Code in GitOps
Store all policies in Git with Argo CD selfHeal: true on the policy Application. Any manual policy change is reverted within 3 minutes — policies can never drift from reviewed state.
Sync Waves for Policies
Use Argo CD sync wave -2 for policies and -1 for Gatekeeper/Kyverno install, so the engine is running before workloads are reconciled. A workload in wave 0 will be validated by already-running webhooks.
PodDisruptionBudget on Policy Engine
Set minAvailable: 2 PDB on Kyverno/Gatekeeper admission controllers. This prevents node drains from taking down all webhook replicas simultaneously, which would block all pod scheduling.
Minimize Webhook Scope
Use namespaceSelector and objectSelector to minimize what your webhook is called for. A webhook called for every resource in the cluster adds latency proportional to your policy count.
Test Policies in CI
Run kyverno test or conftest in every PR that touches the policy directory. Include both positive (compliant) and negative (violating) fixtures. Aim for >80% branch coverage of Rego rules.
Gate PolicyExceptions
Create a Kyverno policy that requires annotations.approved-by: platform-team on every PolicyException. Log all exception creations to your SIEM. Review quarterly and expire by date.
No Blanket Webhook Ignore
Labeling a namespace admission.gatekeeper.sh/ignore disables ALL Gatekeeper constraints in that namespace. This is an escape hatch, not standard practice. Use named PolicyExceptions or per-constraint exclusions instead.
Coverage: 05 · Policy Enforcement
- Admission control architecture diagram (authentication → authorization → mutating webhooks → schema validation → validating webhooks → etcd)
- ValidatingWebhookConfiguration anatomy (clientConfig, rules, namespaceSelector, failurePolicy, matchPolicy: Equivalent)
- Built-in admission plugins reference table (PodSecurity/LimitRanger/ResourceQuota/DefaultStorageClass/MutatingAdmissionWebhook/ValidatingAdmissionWebhook/ValidatingAdmissionPolicy)
- Pod Security Standards: Privileged / Baseline / Restricted with allowed/disallowed controls
- PSA modes: enforce / audit / warn with label keys and behavior differences
- Namespace labeling examples (enforce baseline + warn/audit restricted dual-mode)
- kubectl dry-run PSS audit commands (check breakage before enforcing)
- Compliant Pod securityContext for restricted PSS (runAsNonRoot/runAsUser/fsGroup/seccompProfile/allowPrivilegeEscalation:false/capabilities:drop:ALL/readOnlyRootFilesystem)
- Gatekeeper architecture (ConstraintTemplate + Constraint two-layer model + audit controller)
- Gatekeeper Helm install (replicas/auditInterval/logDenies/emitEvents)
- ConstraintTemplate: K8sRequiredLabels (Rego: provided-required-missing set arithmetic)
- Constraint: K8sRequiredLabels with enforcementAction/match/excludedNamespaces/parameters
- ConstraintTemplate: K8sAllowedRepos (all container types: containers/initContainers/ephemeralContainers)
- Gatekeeper Mutation: AssignMetadata (label injection) + Assign (security context defaults)
- Reading audit violations with kubectl (describe, jsonpath, jq count by constraint)
- Kyverno install (4 controllers: admission/background/cleanup/reports; features: policyExceptions/globalContext)
- Kyverno ClusterPolicy: validate rule (require-pod-probes with foreach, exclude namespaces+serviceaccounts)
- Kyverno ClusterPolicy: mutate rule (add-default-resources with foreach + patchStrategicMerge + JMESPath defaults)
- Kyverno ClusterPolicy: generate rule (NetworkPolicy on Namespace CREATE with synchronize:true)
- Kyverno ClusterPolicy: verifyImages (cosign keyless, mutateDigest:true, verifyDigest, subject regexp, rekor)
- Kyverno CLI: apply, test, report commands
- Kyverno test manifest (name/policies/resources/results with pass/fail/skip)
- Gatekeeper vs Kyverno 13-dimension comparison table
- ValidatingAdmissionPolicy (CEL, 1.30 GA): require-run-as-non-root + ValidatingAdmissionPolicyBinding
- Common policy library table: 20 policies with risk, recommended action, notes
- Kyverno: disallow-latest-tag (foreach deny with wildcard operator)
- Kyverno: require-requests-limits (foreach deny on missing cpu/memory requests and memory limits)
- PolicyException CRD: datadog-agent example with policyName+ruleNames+match+expiry pattern
- Gatekeeper exemption: admission.gatekeeper.sh/ignore label + per-constraint excludedNamespaces
- Exception tracking in Git (exceptions/ directory with README and dated tickets)
- Kyverno test suite (kyverno-test.yaml with fixtures + good/bad/excluded resources)
- conftest for Gatekeeper Rego (test, all-namespaces, parse K8s YAML to JSON)
- GitHub Actions policy CI (kyverno test + Kind integration test with PolicyReport failure check)
- Graduated rollout strategy: 5-step progression (audit → warn → enforce non-prod → enforce prod → global)
- Argo CD Application for policies (sync-wave -2, selfHeal:true, ServerSideApply)
- Policy repo structure (library/security/best-practices/supply-chain/networking + exceptions + tests)
- Gatekeeper Config CRD (sync Namespace/Service/Ingress into OPA cache for cross-resource policies)
- Kyverno PolicyReport and ClusterPolicyReport (get, jsonpath summary, jq fail extraction)
- Policy Reporter Helm install (ui/kyvernoPlugin/monitoring/Slack target)
- K8s audit policy for admission denials (RequestResponse level for pod create/update)
- Gatekeeper Prometheus metrics reference (violations/audit_last_run/request_count/latency)
- PrometheusRule: KyvernoWebhookPodsLow/GatekeeperHighViolations/PolicyWebhookHighLatency/KyvernoAuditStale/KyvernoCriticalPolicyViolation
- Grafana dashboard panels for policy (violations/admit-deny ratio/P99 latency/pass-fail trend)
- 8 best practices cards (audit-before-enforce/GitOps selfHeal/sync waves/PDB/minimize scope/CI testing/gate exceptions/no blanket ignore)