🛡️ Policy Enforcement

Policy Enforcement in Kubernetes

Complete guide to Kubernetes admission control, OPA Gatekeeper, Kyverno, Pod Security Admission, and policy-as-code workflows — ensuring every workload meets security, operational, and compliance standards before it lands in the cluster.

🧩 OPA Gatekeeper 🔒 Kyverno 🔐 Pod Security Admission ⚙️ Admission Webhooks 🧪 Policy Testing

Contents

  1. Admission Control Architecture
  2. Pod Security Admission
  3. OPA Gatekeeper
  4. Kyverno
  5. Gatekeeper vs Kyverno
  6. Common Policy Library
  7. Policy Exceptions
  8. Policy Testing
  9. Policy as Code in GitOps
  10. Audit & Compliance Reporting
  11. Alerting & Monitoring
  12. Best Practices

Admission Control Architecture

Every API request in Kubernetes passes through an ordered admission chain before being persisted to etcd. Policy engines hook into this chain as dynamic admission webhooks — they are called synchronously, making them enforceable gates rather than after-the-fact detectors.

kubectl apply / CI pipeline / GitOps reconciler │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ kube-apiserver │ │ │ │ 1. Authentication (certificate / OIDC / bearer token) │ │ 2. Authorization (RBAC / ABAC / Node) │ │ 3. Admission │ │ ├── Mutating Admission Webhooks (run in parallel) │ │ │ ├── Kyverno mutate webhook │ │ │ ├── Gatekeeper mutation webhook │ │ │ └── cert-manager / linkerd injectors │ │ ├── Object Schema Validation │ │ └── Validating Admission Webhooks (run in parallel) │ │ ├── Gatekeeper validating webhook │ │ ├── Kyverno validate webhook │ │ └── Pod Security Admission (built-in) │ │ │ │ 4. Persist to etcd (only if all webhooks admit) │ └─────────────────────────────────────────────────────────────┘ │ │ ▼ ▼ ALLOW + audit log DENY (HTTP 403 back to client)
⚠️
Webhook availability matters. Validating webhooks with failurePolicy: Fail block all matching requests if the webhook pod is unavailable. Deploy policy engines with podAntiAffinity across zones, set PodDisruptionBudgets, and size replicas to survive zone failure. failurePolicy: Ignore is an escape hatch but removes your guarantees.

Webhook Configuration Anatomy

# ValidatingWebhookConfiguration (simplified — Gatekeeper creates this)
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: gatekeeper-validating-webhook-configuration
webhooks:
- name: validation.gatekeeper.sh
  admissionReviewVersions: ["v1"]
  clientConfig:
    service:
      name: gatekeeper-webhook-service
      namespace: gatekeeper-system
      path: /v1/admit
  rules:
  - apiGroups: ["*"]
    apiVersions: ["*"]
    operations: ["CREATE","UPDATE"]
    resources: ["*"]
    scope: Namespaced
  namespaceSelector:
    matchExpressions:
    - key: admission.gatekeeper.sh/ignore
      operator: DoesNotExist
  failurePolicy: Fail
  sideEffects: None
  timeoutSeconds: 10
  matchPolicy: Equivalent   # catches admission of old API versions too

Built-in Admission Plugins

PluginTypePurposeEnabled by Default
PodSecurityValidatingEnforce Pod Security Standards per namespaceYes (1.25+)
LimitRangerMutating+ValidatingApply LimitRange defaults and enforce boundsYes
ResourceQuotaValidatingBlock creation when namespace quota exceededYes
DefaultStorageClassMutatingAdd default StorageClass annotation to PVCsYes
MutatingAdmissionWebhookMutatingCall registered mutating webhooksYes
ValidatingAdmissionWebhookValidatingCall registered validating webhooksYes
ValidatingAdmissionPolicyValidatingCEL-based in-process policies (1.30 GA)Yes (1.30+)

Pod Security Admission (PSA)

PSA is the built-in replacement for the deprecated PodSecurityPolicy. It enforces three hardcoded Pod Security Standards at the namespace level via labels. No CRDs, no external webhook — zero operational overhead.

The Three Pod Security Standards

Privileged

Completely unrestricted. Allows all host namespaces, privileged containers, any capabilities. Reserved for system-level DaemonSets (CNI, node agents, eBPF tools) in kube-system-equivalent namespaces.

Baseline

Prevents known privilege escalations. Disallows privileged containers, hostPID/hostNetwork/hostIPC, dangerous capabilities (NET_ADMIN, SYS_ADMIN), host path volumes, hostPort. Suitable for most general workloads.

Restricted

Heavily hardened. Requires runAsNonRoot: true, drops ALL capabilities, allows only NET_BIND_SERVICE, requires seccompProfile: RuntimeDefault or Localhost, disallows all volume types except configMap/secret/projected/emptyDir/csi/persistent. Target for internet-facing services.

PSA Modes

ModeLabel KeyBehaviorUse Case
enforcepod-security.kubernetes.io/enforceReject violating pods synchronouslyProduction namespaces
auditpod-security.kubernetes.io/auditAllow but add audit annotation to API audit logMonitoring compliance without breakage
warnpod-security.kubernetes.io/warnAllow but return HTTP warning header to clientGradual rollout — warns kubectl users

Namespace Labeling Examples

# Recommended: enforce baseline, warn+audit restricted
apiVersion: v1
kind: Namespace
metadata:
  name: payments-api
  labels:
    # Reject pods that aren't at least baseline
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/enforce-version: latest
    # Warn and audit against restricted (see violations without breaking)
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: latest
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: latest
---
# Infrastructure namespace — privileged for CNI/monitoring DaemonSets
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
  labels:
    pod-security.kubernetes.io/enforce: privileged

Labeling Existing Namespaces with Dry-Run

# Check what would break before enforcing
kubectl label --dry-run=server --overwrite ns my-app \
  pod-security.kubernetes.io/enforce=restricted 2>&1 | grep -i warning

# Bulk audit all namespaces against restricted
kubectl get ns -o name | xargs -I{} kubectl label --dry-run=server \
  --overwrite {} pod-security.kubernetes.io/audit=restricted 2>&1 | grep Warning

Compliant Pod securityContext for Restricted

spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 65532
    runAsGroup: 65532
    fsGroup: 65532
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop: ["ALL"]

OPA Gatekeeper

Gatekeeper extends OPA (Open Policy Agent) into Kubernetes as a set of CRDs. Policies are expressed as ConstraintTemplates (Rego logic) and instantiated as Constraints (parameters). This two-layer model lets platform teams ship reusable policy libraries while teams configure enforcement parameters.

ConstraintTemplate Constraint (instance) name: K8sRequiredLabels kind: K8sRequiredLabels rego: deny if labels missing name: require-team-label creates CRD: K8sRequiredLabels params: labels: ["team","env"] │ │ └──────────────────────────────┘ │ Gatekeeper admission webhook │ Evaluates Rego against object │ ┌─────────┴──────────┐ ALLOW DENY (with violation message) Audit controller polls all existing objects → populates status.violations[] on each Constraint (catch pre-existing)

Install Gatekeeper

helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
helm repo update

helm install gatekeeper gatekeeper/gatekeeper \
  --namespace gatekeeper-system \
  --create-namespace \
  --version 3.17.1 \
  --set replicas=3 \
  --set auditInterval=60 \
  --set auditMatchKindOnly=false \
  --set constraintViolationsLimit=100 \
  --set disableMutation=false \
  --set logDenies=true \
  --set emitAdmissionEvents=true \
  --set emitAuditEvents=true

ConstraintTemplate: Required Labels

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
  annotations:
    metadata.gatekeeper.sh/title: "Required Labels"
    metadata.gatekeeper.sh/description: "Requires specified labels on resources"
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        openAPIV3Schema:
          type: object
          properties:
            labels:
              type: array
              items:
                type: string
  targets:
  - target: admission.k8s.gatekeeper.sh
    rego: |
      package k8srequiredlabels

      violation[{"msg": msg}] {
        provided := {label | input.review.object.metadata.labels[label]}
        required := {label | label := input.parameters.labels[_]}
        missing  := required - provided
        count(missing) > 0
        msg := sprintf("Missing required labels: %v", [missing])
      }

Constraint: Enforce Labels on Namespaces

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: require-team-env-labels
spec:
  enforcementAction: deny     # deny | warn | dryrun
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Namespace"]
    excludedNamespaces:
    - kube-system
    - kube-public
    - gatekeeper-system
    - cert-manager
    - argocd
  parameters:
    labels: ["team", "env", "cost-center"]

ConstraintTemplate: Allowed Image Registries

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sallowedrepos
spec:
  crd:
    spec:
      names:
        kind: K8sAllowedRepos
      validation:
        openAPIV3Schema:
          type: object
          properties:
            repos:
              type: array
              items: {type: string}
  targets:
  - target: admission.k8s.gatekeeper.sh
    rego: |
      package k8sallowedrepos

      violation[{"msg": msg}] {
        container := input_containers[_]
        not strings.any_prefix_match(container.image, input.parameters.repos)
        msg := sprintf("Container %q image %q not from allowed registries %v",
          [container.name, container.image, input.parameters.repos])
      }

      input_containers[c] {
        c := input.review.object.spec.containers[_]
      }
      input_containers[c] {
        c := input.review.object.spec.initContainers[_]
      }
      input_containers[c] {
        c := input.review.object.spec.ephemeralContainers[_]
      }
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
  name: allowed-image-registries
spec:
  enforcementAction: deny
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]
  parameters:
    repos:
    - "123456789.dkr.ecr.us-east-1.amazonaws.com/"
    - "gcr.io/distroless/"
    - "registry.k8s.io/"

Gatekeeper Mutation

# Automatically inject default resource limits
apiVersion: mutations.gatekeeper.sh/v1
kind: AssignMetadata
metadata:
  name: add-cost-center-label
spec:
  match:
    scope: Namespaced
    kinds:
    - apiGroups: ["apps"]
      kinds: ["Deployment","StatefulSet","DaemonSet"]
  location: "metadata.labels.cost-center"
  parameters:
    assign:
      value: "platform"
---
# Set readOnlyRootFilesystem if not explicitly set
apiVersion: mutations.gatekeeper.sh/v1
kind: Assign
metadata:
  name: set-readonly-rootfs
spec:
  match:
    scope: Namespaced
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]
  location: "spec.containers[name:*].securityContext.readOnlyRootFilesystem"
  parameters:
    assign:
      value: true

Reading Audit Violations

# Show all violations across all constraints
kubectl get constraints -A
kubectl describe k8srequiredlabels require-team-env-labels

# Structured violation output
kubectl get k8srequiredlabels require-team-env-labels \
  -o jsonpath='{.status.violations[*]}' | jq .

# Count by constraint
kubectl get constraints -o json | jq '
  .items[] | {
    constraint: .metadata.name,
    violations: (.status.violations | length)
  }'

Kyverno

Kyverno is a Kubernetes-native policy engine that uses YAML (with JMESPath expressions) instead of a separate policy language. Policies are ClusterPolicy or namespace-scoped Policy resources with three rule types: validate, mutate, and generate.

Install Kyverno

helm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update

helm install kyverno kyverno/kyverno \
  --namespace kyverno \
  --create-namespace \
  --version 3.2.6 \
  --set admissionController.replicas=3 \
  --set backgroundController.replicas=2 \
  --set cleanupController.replicas=1 \
  --set reportsController.replicas=1 \
  --set admissionController.container.args.enableDeferredLoading=true \
  --set features.policyExceptions.enabled=true \
  --set features.globalContext.enabled=true

ClusterPolicy: Validate

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-pod-probes
  annotations:
    policies.kyverno.io/title: Require Liveness and Readiness Probes
    policies.kyverno.io/category: Best Practices
    policies.kyverno.io/severity: medium
    policies.kyverno.io/subject: Pod
    policies.kyverno.io/description: >-
      Pods without liveness probes cannot be automatically restarted.
      Pods without readiness probes receive traffic before they are ready.
spec:
  validationFailureAction: Enforce   # Enforce | Audit
  background: true   # also scan existing resources
  rules:
  - name: check-container-probes
    match:
      any:
      - resources:
          kinds: ["Pod"]
          operations: ["CREATE","UPDATE"]
    exclude:
      any:
      - resources:
          namespaces: ["kube-system","monitoring","kyverno","argocd"]
      - subjects:
        - kind: ServiceAccount
          name: "argo-rollouts"
          namespace: "argo-rollouts"
    validate:
      message: "Liveness and readiness probes are required for all containers."
      foreach:
      - list: "request.object.spec.containers"
        deny:
          conditions:
            any:
            - key: "{{ element.livenessProbe }}"
              operator: Equals
              value: null
            - key: "{{ element.readinessProbe }}"
              operator: Equals
              value: null

ClusterPolicy: Mutate — Inject Default Resources

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-default-resources
spec:
  validationFailureAction: Audit
  background: false
  rules:
  - name: add-resource-defaults
    match:
      any:
      - resources:
          kinds: ["Pod"]
          operations: ["CREATE"]
    mutate:
      foreach:
      - list: "request.object.spec.containers[]"
        patchStrategicMerge:
          spec:
            containers:
            - name: "{{ element.name }}"
              resources:
                requests:
                  cpu: "{{ element.resources.requests.cpu || '100m' }}"
                  memory: "{{ element.resources.requests.memory || '128Mi' }}"
                limits:
                  memory: "{{ element.resources.limits.memory || '256Mi' }}"

ClusterPolicy: Generate — NetworkPolicy on Namespace Create

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: generate-default-network-policy
spec:
  rules:
  - name: generate-deny-all-ingress
    match:
      any:
      - resources:
          kinds: ["Namespace"]
          operations: ["CREATE"]
    exclude:
      any:
      - resources:
          names: ["kube-system","kube-public","kube-node-lease",
                  "monitoring","kyverno","argocd","cert-manager"]
    generate:
      synchronize: true   # keep in sync; delete if namespace deleted
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      name: deny-all-ingress
      namespace: "{{ request.object.metadata.name }}"
      data:
        spec:
          podSelector: {}
          policyTypes: ["Ingress"]
          # No ingress rules = deny all. Teams must add their own allow rules.

ClusterPolicy: Verify Image Signatures

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: verify-image-signatures
spec:
  validationFailureAction: Enforce
  background: false
  webhookTimeoutSeconds: 30
  rules:
  - name: verify-cosign-keyless
    match:
      any:
      - resources:
          kinds: ["Pod"]
          operations: ["CREATE","UPDATE"]
    verifyImages:
    - imageReferences:
      - "123456789.dkr.ecr.us-east-1.amazonaws.com/*"
      mutateDigest: true       # replace tag with digest (immutability)
      verifyDigest: true
      required: true
      attestors:
      - count: 1
        entries:
        - keyless:
            subject: "https://github.com/myorg/myrepo/.github/workflows/ci.yaml@refs/heads/main"
            issuer: "https://token.actions.githubusercontent.com"
            rekor:
              url: https://rekor.sigstore.dev
ℹ️
mutateDigest: true is a security critical feature — it replaces :v1.2.3 tags with @sha256:... digests at admission time, preventing tag mutation attacks even after the pod is admitted.

Kyverno CLI — Test Locally

# Install
brew install kyverno   # or: curl -sL https://github.com/kyverno/kyverno/releases/...

# Test a policy against resources without a cluster
kyverno apply policy.yaml --resource pod.yaml

# Test a policy with values (simulating variables)
kyverno apply policy.yaml --resource pod.yaml \
  --values-file values.yaml

# Run a full test suite
kyverno test .    # looks for kyverno-test.yaml in current dir

# Generate policy reports from live cluster
kyverno report cluster --policy-report-dir ./reports/

Kyverno Test Manifest

# kyverno-test.yaml
name: require-pod-probes-test
policies:
- require-pod-probes.yaml
resources:
- good-pod.yaml
- bad-pod.yaml
results:
- policy: require-pod-probes
  rule: check-container-probes
  resource: good-pod
  result: pass
- policy: require-pod-probes
  rule: check-container-probes
  resource: bad-pod
  result: fail

Gatekeeper vs Kyverno

DimensionOPA GatekeeperKyverno
Policy languageRego (OPA's declarative language)YAML + JMESPath expressions
Learning curveHigh (Rego has unique syntax/semantics)Low (K8s-native YAML)
Rule typesValidate + Mutate (separate CRDs)Validate + Mutate + Generate in one ClusterPolicy
Mutation supportAssignMetadata, Assign, ModifySet CRDspatchStrategicMerge, patchesJSON6902, foreach
Resource generationNo native generateYes — generate NetworkPolicy, RBAC, etc. on triggers
Image verificationNo (use separate Kyverno for this)Yes — verifyImages with cosign keyless/key-based
External dataOPA external data providers (HTTP cache)Global context (K8s resources, API calls)
Testing toolingconftest, opa testkyverno test (full test suite with pass/fail)
Policy Reportsvia status.violations (custom)PolicyReport / ClusterPolicyReport (K8s standard CRD)
Audit modeSeparate audit controller, enforcementAction: dryrunbackground: true + validationFailureAction: Audit
ExceptionsExcluded scopes in match, separate namespace annotationsPolicyException CRD
Community adoptionCNCF graduated; strong enterprise adoptionCNCF graduated; rapidly growing; preferred for K8s teams
Best forTeams already using OPA/Rego for other policy; complex cross-resource logicTeams wanting K8s-native approach; need generate rules; image signing
ℹ️
You can run both. A common pattern: Kyverno handles validate + mutate + generate + image verification; Gatekeeper handles complex Rego-based cross-resource validation (e.g., "no two services with same external hostname"). Ensure webhook timeout budgets don't cascade.

ValidatingAdmissionPolicy (VAP) — Built-in CEL

Kubernetes 1.30 GA'd ValidatingAdmissionPolicy — lightweight CEL-based validation without an external webhook. Ideal for simple constraints where you don't want another running pod.

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
  name: require-run-as-non-root
spec:
  failurePolicy: Fail
  matchConstraints:
    resourceRules:
    - apiGroups: [""]
      apiVersions: ["v1"]
      operations: ["CREATE","UPDATE"]
      resources: ["pods"]
  validations:
  - expression: >-
      object.spec.securityContext.runAsNonRoot == true ||
      object.spec.containers.all(c,
        c.securityContext.runAsNonRoot == true)
    message: "Pods must set runAsNonRoot: true"
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
  name: require-run-as-non-root-binding
spec:
  policyName: require-run-as-non-root
  validationActions: [Deny]
  matchResources:
    namespaceSelector:
      matchLabels:
        enforce-security: "true"

Common Policy Library

These are the policies every production platform should enforce. Ship them in enforcementAction: dryrun / Audit first to find violations, then graduate to deny / Enforce.

PolicyRisk MitigatedRecommended ActionNotes
Require resource limits & requestsCPU/memory OOM, node pressureEnforceUse LimitRange defaults as fallback
Require liveness + readiness probesTraffic to not-ready pods, no auto-restartEnforceExclude Jobs and one-shot containers
Disallow privileged containersNode escape, kernel accessEnforceAllow in kube-system with annotation
Disallow hostPID / hostIPC / hostNetworkProcess snooping, network sniffingEnforceAllow for CNI/monitoring DaemonSets
Disallow hostPath volumesHost filesystem read/writeEnforceAllow specific paths for log agents
Require non-root userContainer breakout impactEnforceSet runAsNonRoot + runAsUser ≥ 1000
Disallow privilege escalationsetuid binary exploitationEnforceallowPrivilegeEscalation: false
Drop ALL capabilitiesLinux capability abuseEnforceAllow NET_BIND_SERVICE explicitly
Require readOnlyRootFilesystemReduce attack surface for malwareAudit → EnforceMany apps need emptyDir for /tmp
Require seccompProfile: RuntimeDefaultSyscall surface reductionEnforceRequired for restricted PSS
Allowed image registriesSupply chain compromiseEnforceAllowlist internal ECR + distroless
Require image digest (not tag)Tag mutation attacksEnforce (via mutateDigest)Kyverno verifyImages handles this
Require cosign signatureUnsigned / untrusted imagesEnforceGate on specific namespaces first
Disallow latest tagNon-reproducible deploymentsEnforceBlock pods with image ending in :latest
Require team + env labelsOwnership, cost attributionEnforce on Namespace/DeploymentRequired for cost chargeback
Require PodDisruptionBudgetAccidental zero-replica drainAuditComplex to enforce — use audit + alert
Require NetworkPolicy existsUnrestricted lateral movementAudit → EnforceKyverno generate creates deny-all
Max replicas without HPAOver-provisioningAuditWarn if replicas > 3 without HPA
Disallow NodePort servicesDirect node exposureEnforceAllow ClusterIP + LoadBalancer only
Require Ingress TLSPlaintext trafficEnforceCheck spec.tls is set

Kyverno: Disallow Latest Tag

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-latest-tag
  annotations:
    policies.kyverno.io/title: Disallow Latest Tag
    policies.kyverno.io/severity: medium
spec:
  validationFailureAction: Enforce
  background: true
  rules:
  - name: require-image-tag
    match:
      any:
      - resources:
          kinds: ["Pod"]
          operations: ["CREATE","UPDATE"]
    exclude:
      any:
      - resources:
          namespaces: ["kube-system"]
    validate:
      message: "Image tag ':latest' or missing tag is not allowed. Use a specific tag or digest."
      foreach:
      - list: "request.object.spec.containers"
        deny:
          conditions:
            any:
            - key: "{{ element.image }}"
              operator: Equals
              value: "*:latest"
            - key: "{{ element.image }}"
              operator: NotEquals
              value: "*:*"   # images without any tag

Kyverno: Require Resource Limits

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-requests-limits
  annotations:
    policies.kyverno.io/title: Require Resource Requests and Limits
    policies.kyverno.io/severity: high
spec:
  validationFailureAction: Enforce
  background: true
  rules:
  - name: validate-resources
    match:
      any:
      - resources:
          kinds: ["Pod"]
          operations: ["CREATE","UPDATE"]
    exclude:
      any:
      - resources:
          namespaces: ["kube-system","kyverno"]
    validate:
      message: "CPU/memory requests and memory limits are required on all containers."
      foreach:
      - list: "request.object.spec.containers"
        deny:
          conditions:
            any:
            - key: "{{ element.resources.requests.cpu || '' }}"
              operator: Equals
              value: ""
            - key: "{{ element.resources.requests.memory || '' }}"
              operator: Equals
              value: ""
            - key: "{{ element.resources.limits.memory || '' }}"
              operator: Equals
              value: ""

Policy Exceptions

No policy library is perfect — there will always be legitimate exceptions (legacy apps, specialized system components, vendor-provided DaemonSets). Manage these explicitly rather than widening the policy's exclusion scope.

Kyverno PolicyException

apiVersion: kyverno.io/v2
kind: PolicyException
metadata:
  name: datadog-agent-exception
  namespace: monitoring        # exceptions are namespaced
spec:
  exceptions:
  - policyName: disallow-privileged-containers
    ruleNames:
    - check-privileged
  - policyName: require-requests-limits
    ruleNames:
    - validate-resources
  match:
    any:
    - resources:
        kinds: ["Pod"]
        namespaces: ["monitoring"]
        selector:
          matchLabels:
            app: datadog-agent
  # Optional: expiry date to force re-review
  podSecurity: []
⚠️
Gate PolicyException creation. Use RBAC to allow only the platform team to create PolicyExceptions. Add a Kyverno policy that requires annotations: approved-by: platform-team on every PolicyException. Track all exceptions in a GitOps repo for auditability.

Gatekeeper Exemption via Namespace Annotation

# Exclude a namespace from ALL Gatekeeper webhooks
kubectl label namespace legacy-app \
  admission.gatekeeper.sh/ignore=no-validation

# Per-constraint: use excludedNamespaces in Constraint spec
spec:
  match:
    excludedNamespaces:
    - legacy-app
    - vendor-system

Exception Tracking in Git

policy/
  library/
    require-probes.yaml
    disallow-privileged.yaml
    allowed-registries.yaml
  constraints/
    cluster-wide-constraints.yaml
  exceptions/
    README.md          # exception registry with justification
    datadog-agent.yaml
    legacy-payment-service.yaml   # expires: 2025-12-31, ticket: PLAT-4892
  tests/
    require-probes/
      kyverno-test.yaml
      good-pod.yaml
      bad-pod.yaml

Policy Testing

Policy changes must be tested before reaching production. The testing pyramid applies: unit tests (kyverno test / conftest), integration against a Kind cluster, then graduated rollout in audit mode.

Kyverno Test Suite Structure

# tests/require-probes/kyverno-test.yaml
name: require-probes-tests
policies:
- ../../library/require-probes.yaml
resources:
- good-pod.yaml
- bad-pod-no-liveness.yaml
- bad-pod-no-readiness.yaml
- excluded-namespace-pod.yaml
variables: variables.yaml
results:
- policy: require-pod-probes
  rule: check-container-probes
  resource: good-pod
  result: pass
- policy: require-pod-probes
  rule: check-container-probes
  resource: bad-pod-no-liveness
  result: fail
- policy: require-pod-probes
  rule: check-container-probes
  resource: bad-pod-no-readiness
  result: fail
- policy: require-pod-probes
  rule: check-container-probes
  resource: excluded-namespace-pod  # in kube-system
  result: skip

conftest for Gatekeeper Rego

# Test Rego policy logic in isolation
conftest test pod.json \
  --policy policy/library/required-labels.rego \
  --namespace k8srequiredlabels

# Test all policies against a manifest directory
conftest test manifests/ \
  --policy policy/library/ \
  --all-namespaces

# Parse K8s YAML to JSON for conftest
kubectl get pod my-pod -o json | conftest test - \
  --policy policy/library/required-labels.rego

GitHub Actions Policy CI

name: Policy Tests
on:
  pull_request:
    paths: ["policy/**"]

jobs:
  kyverno-test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - name: Install Kyverno CLI
      run: |
        curl -sL "https://github.com/kyverno/kyverno/releases/download/v1.12.0/kyverno_1.12.0_linux_amd64.tar.gz" \
          | tar -xz -C /usr/local/bin kyverno
    - name: Run Kyverno tests
      run: kyverno test policy/tests/

  audit-against-kind:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - name: Create Kind cluster
      uses: helm/kind-action@v1.9.0
    - name: Install Kyverno
      run: helm install kyverno kyverno/kyverno -n kyverno --create-namespace --wait
    - name: Apply policies in Audit mode
      run: |
        find policy/library -name "*.yaml" | xargs -I{} \
          sed 's/validationFailureAction: Enforce/validationFailureAction: Audit/' | \
          kubectl apply -f -
    - name: Apply test manifests
      run: kubectl apply -f policy/tests/fixtures/
    - name: Check policy reports
      run: |
        kubectl get policyreport -A -o json | \
          jq '.items[].results[] | select(.result=="fail") | .message' | \
          tee /tmp/violations.txt
        # Fail CI if unexpected violations
        [ -s /tmp/violations.txt ] && exit 1 || true

Graduated Rollout Strategy

  1. Audit mode cluster-wide — deploy with validationFailureAction: Audit. Let the audit controller scan all existing resources. Review PolicyReport violations.

  2. Warn mode on new namespaces — label new namespaces with the PSA warn label. Developers see warnings in kubectl output without failures.

  3. Enforce on non-production first — flip to Enforce in dev/staging namespaces. Fix violations that surface.

  4. Enforce on production — after 2-week soak in staging with zero new violations, promote to production. Document timeline in GitOps PR.

  5. Enforce globally — remove per-namespace overrides. Only named PolicyExceptions remain.

Policy as Code in GitOps

Policies must be in Git — not applied ad-hoc. The GitOps loop (see 02-gitops.html) ensures policies are version-controlled, reviewed, and automatically reconciled. Drift from approved policy triggers alerts.

Argo CD Application for Policy

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: cluster-policies
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: "-2"   # before workloads
spec:
  project: platform
  source:
    repoURL: https://github.com/myorg/platform
    targetRevision: main
    path: policy/
    kustomize:
      namePrefix: ""
  destination:
    server: https://kubernetes.default.svc
    namespace: kyverno
  syncPolicy:
    automated:
      prune: true
      selfHeal: true    # revert manual policy changes immediately
    syncOptions:
    - CreateNamespace=true
    - ServerSideApply=true

Policy Repo Structure

policy/
├── kustomization.yaml
├── library/
│   ├── security/
│   │   ├── disallow-privileged.yaml
│   │   ├── require-non-root.yaml
│   │   ├── drop-all-capabilities.yaml
│   │   ├── disallow-host-namespaces.yaml
│   │   └── require-seccomp.yaml
│   ├── best-practices/
│   │   ├── require-probes.yaml
│   │   ├── require-resource-limits.yaml
│   │   ├── disallow-latest-tag.yaml
│   │   └── require-labels.yaml
│   ├── supply-chain/
│   │   ├── allowed-registries.yaml
│   │   └── verify-image-signatures.yaml
│   └── networking/
│       ├── disallow-nodeport.yaml
│       └── require-ingress-tls.yaml
├── exceptions/
│   ├── kustomization.yaml
│   └── datadog-agent.yaml
└── tests/
    └── ...

OPA Policy Bundle for Multi-Cluster

# Gatekeeper Config — sync K8s resources into OPA cache
# (needed for policies that reference other objects)
apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
  name: config
  namespace: gatekeeper-system
spec:
  sync:
    syncOnly:
    - group: ""
      version: "v1"
      kind: Namespace
    - group: ""
      version: "v1"
      kind: Service
    - group: "networking.k8s.io"
      version: "v1"
      kind: Ingress
  validation:
    traces: []

Audit & Compliance Reporting

Kyverno PolicyReport

# PolicyReport is namespaced — one per namespace
kubectl get policyreport -n payments-api -o yaml

# ClusterPolicyReport is cluster-scoped — for non-namespaced resources
kubectl get clusterpolicyreport -o yaml

# Summary: pass/fail/warn/error counts per policy
kubectl get policyreport -A \
  -o jsonpath='{range .items[*]}{.metadata.namespace}: {.summary}{"\n"}{end}'

# Find all failing resources across cluster
kubectl get policyreport -A -o json | jq '
  .items[] | .metadata.namespace as $ns |
  .results[] | select(.result == "fail") | {
    namespace: $ns,
    resource: .resources[0].name,
    policy: .policy,
    message: .message
  }'

Policy Reporter Dashboard

# Policy Reporter: visualization layer over PolicyReports
helm install policy-reporter policy-reporter/policy-reporter \
  --namespace policy-reporter \
  --create-namespace \
  --set ui.enabled=true \
  --set kyvernoPlugin.enabled=true \
  --set monitoring.enabled=true \  # ServiceMonitor
  --set target.slack.webhook="https://hooks.slack.com/..."
  --set target.slack.minimumSeverity="high"

K8s Audit Logs for Policy Actions

# Audit policy on kube-apiserver — log admission denials
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all admission webhook denials at RequestResponse level
- level: RequestResponse
  omitStages: []
  resources:
  - group: ""
    resources: ["pods","deployments"]
  verbs: ["create","update"]
  # Filter in SIEM/log pipeline for responseStatus.code=403

Alerting & Monitoring

Gatekeeper Prometheus Metrics

# Key metrics exposed by gatekeeper-controller-manager
gatekeeper_violations                    # gauge: current audit violations per constraint
gatekeeper_audit_last_run_time           # gauge: Unix timestamp of last audit
gatekeeper_audit_duration_seconds        # histogram: audit run duration
gatekeeper_request_count_total           # counter: webhook requests (admitted/denied)
gatekeeper_request_duration_seconds      # histogram: webhook latency

PrometheusRule

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: policy-enforcement-alerts
  namespace: monitoring
  labels:
    prometheus: kube-prometheus
    role: alert-rules
spec:
  groups:
  - name: policy.enforcement
    interval: 60s
    rules:

    # Kyverno webhook pod health
    - alert: KyvernoWebhookPodsLow
      expr: |
        kube_deployment_status_replicas_available{
          namespace="kyverno",
          deployment="kyverno-admission-controller"
        } < 2
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Kyverno admission controller replicas below minimum"
        description: "Only {{ $value }} replica(s) available. Policy enforcement may be degraded."

    # Gatekeeper audit violations spiking
    - alert: GatekeeperHighViolations
      expr: |
        sum(gatekeeper_violations) by (enforcement_action) > 50
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "High number of Gatekeeper violations in audit"
        description: "{{ $value }} violations detected. Run 'kubectl get constraints -A' to review."

    # Policy webhook latency
    - alert: PolicyWebhookHighLatency
      expr: |
        histogram_quantile(0.99,
          rate(gatekeeper_request_duration_seconds_bucket[5m])
        ) > 2
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Gatekeeper webhook P99 latency exceeds 2s"
        description: "High webhook latency may cause timeout-based admission failures."

    # Kyverno background scan stale
    - alert: KyvernoAuditStale
      expr: |
        (time() - kyverno_policy_results_total) > 3600
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Kyverno policy audit not running"

    # New high-severity violations in Kyverno PolicyReport
    - alert: KyvernoCriticalPolicyViolation
      expr: |
        increase(kyverno_policy_results_total{
          policy_type="validate",
          status="fail"
        }[10m]) > 0
      for: 0m
      labels:
        severity: warning
      annotations:
        summary: "New Kyverno policy violations detected"
        description: "Check 'kubectl get policyreport -A' for details."

Grafana Dashboard Panels

PanelQueryVisualization
Violations by constraintgatekeeper_violations by (constraint_name)Bar chart
Webhook admit/deny ratiorate(gatekeeper_request_count_total[5m]) by (admission_status)Stacked area
P99 webhook latencyhistogram_quantile(0.99, rate(gatekeeper_request_duration_seconds_bucket[5m]))Stat (threshold: >1s yellow, >2s red)
Kyverno pass/fail trendkyverno_policy_results_total by (status)Time series
Namespace PSS complianceCustom query on namespace labelsTable

Best Practices

Audit Before Enforce

Always deploy new policies in Audit/dryrun mode first. Run for at least 2 weeks and fix all violations before flipping to Enforce. New policies in Enforce without audit break existing workloads.

Policy as Code in GitOps

Store all policies in Git with Argo CD selfHeal: true on the policy Application. Any manual policy change is reverted within 3 minutes — policies can never drift from reviewed state.

Sync Waves for Policies

Use Argo CD sync wave -2 for policies and -1 for Gatekeeper/Kyverno install, so the engine is running before workloads are reconciled. A workload in wave 0 will be validated by already-running webhooks.

PodDisruptionBudget on Policy Engine

Set minAvailable: 2 PDB on Kyverno/Gatekeeper admission controllers. This prevents node drains from taking down all webhook replicas simultaneously, which would block all pod scheduling.

Minimize Webhook Scope

Use namespaceSelector and objectSelector to minimize what your webhook is called for. A webhook called for every resource in the cluster adds latency proportional to your policy count.

Test Policies in CI

Run kyverno test or conftest in every PR that touches the policy directory. Include both positive (compliant) and negative (violating) fixtures. Aim for >80% branch coverage of Rego rules.

Gate PolicyExceptions

Create a Kyverno policy that requires annotations.approved-by: platform-team on every PolicyException. Log all exception creations to your SIEM. Review quarterly and expire by date.

No Blanket Webhook Ignore

Labeling a namespace admission.gatekeeper.sh/ignore disables ALL Gatekeeper constraints in that namespace. This is an escape hatch, not standard practice. Use named PolicyExceptions or per-constraint exclusions instead.

Coverage: 05 · Policy Enforcement