Image Security
Container image trust chain, vulnerability scanning, registry security, image signing with Sigstore/Cosign, SLSA provenance, runtime image policies, and supply chain integrity from build to deploy.
Coverage Checklist
- Image threat model: 5 attack vectors
- OCI image spec: layers, manifest, config
- Base image selection: distroless, scratch, alpine
- Multi-stage build pattern
- Vulnerability scanning tools: Trivy, Grype, Snyk
- CVSS scoring and triage
- Registry security: auth, RBAC, replication
- imagePullPolicy: Always vs IfNotPresent
- Image digest pinning vs tag
- Private registry imagePullSecrets
- Cosign signing and verification
- Keyless signing with OIDC/Fulcio/Rekor
- SLSA framework levels
- SLSA provenance attestation
- Admission policy: registry allow-list
- Kyverno verifyImages policy
- OPA Gatekeeper image policy
- AlwaysPullImages admission plugin
- Dockerfile hardening: non-root, read-only, no secrets
- Runtime image integrity: Falco rules
- 5 metrics, 4 alerts, 5 runbooks, 8 best practices
Image Threat Model
Container images are a primary attack vector in Kubernetes. Compromised images bypass most runtime controls because the attack is baked in before the container ever starts.
Vulnerable Dependencies
OS packages and language libraries with known CVEs. Most breaches involve publicly disclosed vulnerabilities with available patches.
Malicious Base Images
Typosquatted or compromised public images on Docker Hub. mysql:lastest (typo), node:slim backdoored builds.
Secrets Baked In
API keys, TLS certs, SSH keys, or database passwords embedded in image layers via COPY, RUN, or ENV.
Unverified Provenance
No proof that an image was built from source you control. Compromised CI pipeline could inject malicious code without changing the tag.
Tag Mutability
Image tags are mutable. :latest or even :1.2.3 can be overwritten. Re-deploying the "same" tag may run different code.
Compromised registry credentials → push malicious image with legitimate tag → Kubernetes pulls on next rollout → attacker code runs with pod's RBAC permissions and network access.
Image Anatomy & Attack Surface
Understanding the OCI image specification reveals exactly where each attack type enters.
├── Config (image config JSON) ← ENV vars, CMD, ENTRYPOINT
│ └── Attack: secrets in ENV, dangerous capabilities in config
└── Layers (ordered)
├── Layer 0: FROM ubuntu:22.04 ← base image CVEs
├── Layer 1: RUN apt-get install ← package CVEs added here
├── Layer 2: COPY . /app ← source code + possibly secrets
└── Layer N: RUN pip install ← library CVEs
# Secrets in deleted layers are still accessible
COPY secret.key /tmp/secret.key # Layer 3: secret present
RUN rm /tmp/secret.key # Layer 4: deleted — but Layer 3 is still in image!
OCI layers are additive. A file deleted in a later layer is still present in earlier layers and accessible via docker save image.tar && tar xf image.tar. Never add secrets to any layer, even temporarily.
Image Identity: Tag vs Digest
| Reference Type | Example | Mutable? | Security Implication |
|---|---|---|---|
| Tag | nginx:1.25.3 | Yes | Registry owner can overwrite; no guarantee of content |
| Digest | nginx@sha256:abc123... | No | Cryptographically immutable; content cannot change |
| Tag + Digest | nginx:1.25.3@sha256:abc... | No | Best of both: human-readable + immutable reference |
Tags are human convenience. Pin production workloads to digest references. Admission controllers can enforce this. Update digests as part of your dependency update process.
Base Image Strategy
The base image determines your initial attack surface. Smaller base images have fewer packages to patch and a smaller CVE surface.
| Base Image | Size | Shell | Package Mgr | Use Case | Security |
|---|---|---|---|---|---|
ubuntu:22.04 | ~77MB | Yes | apt | Development, debugging | Low |
debian:slim | ~75MB | Yes | apt | General purpose | Medium |
alpine:3.19 | ~7MB | ash | apk | Small footprint | Medium |
gcr.io/distroless/base | ~20MB | No | None | Compiled binaries | High |
gcr.io/distroless/java17 | ~200MB | No | None | JVM applications | High |
gcr.io/distroless/python3 | ~100MB | No | None | Python apps | High |
scratch | 0MB | No | None | Static Go binaries | Highest |
Distroless images contain only your application and its runtime dependencies — no shell, no package manager, no coreutils. This eliminates entire classes of post-exploitation techniques (reverse shells, package installation, script execution). The absence of a shell doesn't break kubectl exec entirely — use the debug container pattern with kubectl debug -it --image=busybox.
Multi-Stage Build Pattern
Multi-stage builds are the primary mechanism for separating build-time tools from the final image, preventing the leakage of build secrets and tools into production images.
# Stage 1: Build (all build deps, secrets, tools)
FROM golang:1.22 AS builder
WORKDIR /app
# Dependency layer (cached unless go.mod/sum changes)
COPY go.mod go.sum ./
RUN go mod download
# Source code
COPY . .
# Build static binary — no CGO means no glibc dependency
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o /server ./cmd/server
# Stage 2: Final image — only the binary
FROM gcr.io/distroless/static-debian12:nonroot
# nonroot tag: runs as UID 65532 by default
COPY --from=builder /server /server
# Document the port (does not publish it)
EXPOSE 8080
# Use numeric UID to work with PodSecurityPolicy runAsNonRoot
USER 65532:65532
ENTRYPOINT ["/server"]
For private package registries during build, use BuildKit's secret mount — the secret is never written to any layer:
# BuildKit secret mount — never appears in any layer
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc \
npm install --production
# Build with:
# docker buildx build --secret id=npmrc,src=.npmrc .
Vulnerability Scanning
Vulnerability scanners match image packages and libraries against CVE databases (NVD, OSV, GHSA). They are a necessary but not sufficient security control — zero-day vulnerabilities have no CVE.
Scanning Tools Comparison
| Tool | Type | Databases | Formats | Kubernetes Integration |
|---|---|---|---|---|
| Trivy | Open source | NVD, GHSA, OSV, language-specific | JSON, SARIF, CycloneDX, SPDX | Operator, CLI, CI/CD |
| Grype | Open source | NVD, GHSA, OSV | JSON, SARIF, CycloneDX | CLI, CI/CD |
| Snyk | Commercial | Snyk DB (proprietary) | JSON, SARIF | Operator, admission webhook |
| Prisma Cloud (Twistlock) | Commercial | Multiple + runtime | Multiple | Deep K8s integration |
| Clair | Open source | NVD, RHEL, Ubuntu | JSON | Registry-integrated |
| Anchore Grype | Open source | Multiple | JSON, CycloneDX, SPDX | CLI, AnchoreCTL |
Trivy Usage
# Scan image — includes OS packages and language libs
trivy image nginx:1.25.3
# Scan with severity filter (fail CI on CRITICAL/HIGH)
trivy image --exit-code 1 --severity CRITICAL,HIGH nginx:1.25.3
# SBOM-first scan: generate CycloneDX SBOM then scan it
trivy image --format cyclonedx --output sbom.json nginx:1.25.3
trivy sbom sbom.json
# Scan in-cluster images (requires kubeconfig)
trivy k8s --report summary cluster
# Scan a specific namespace
trivy k8s --namespace production --report all
# Ignore unfixed vulnerabilities (useful for reducing noise)
trivy image --ignore-unfixed nginx:1.25.3
# Scan with .trivyignore for known-acceptable CVEs
trivy image --ignorefile .trivyignore nginx:1.25.3
CVSS Scoring and Triage
| Severity | CVSS Score | Action | SLA |
|---|---|---|---|
| Critical | 9.0–10.0 | Block deployment, immediate patch | 24–48 hours |
| High | 7.0–8.9 | Block deployment, patch in sprint | 7 days |
| Medium | 4.0–6.9 | Track, patch in next release | 30 days |
| Low | 0.1–3.9 | Track, patch opportunistically | 90 days |
| None | 0.0 | Informational | Best effort |
CVSS score measures theoretical severity, not actual exploitability in your environment. A Critical CVE in a library that has no network path to an attacker is less urgent than a Medium CVE in your public API handler. Use CVSS as a starting point, not a final decision. Tools like VEX (Vulnerability Exploitability eXchange) provide exploitability context.
Trivy Operator — In-Cluster Continuous Scanning
# Install Trivy Operator via Helm
helm repo add aqua https://aquasecurity.github.io/helm-charts/
helm install trivy-operator aqua/trivy-operator \
--namespace trivy-system \
--create-namespace \
--set="trivy.ignoreUnfixed=true"
# Operator creates VulnerabilityReport CRDs per workload
kubectl get vulnerabilityreports -A
kubectl describe vulnerabilityreport replicaset-nginx-abc123 -n production
# VulnerabilityReport CRD (auto-created by Trivy Operator)
apiVersion: aquasecurity.github.io/v1alpha1
kind: VulnerabilityReport
metadata:
name: replicaset-nginx-abc123-nginx
namespace: production
report:
summary:
criticalCount: 0
highCount: 2
mediumCount: 5
vulnerabilities:
- vulnerabilityID: CVE-2023-44487
title: HTTP/2 Rapid Reset Attack
severity: HIGH
fixedVersion: 1.25.3
resource: nginx
Registry Security
Registry Authentication
# Create imagePullSecret for private registry
kubectl create secret docker-registry regcred \
--docker-server=registry.example.com \
--docker-username=myuser \
--docker-password=mypassword \
--docker-email=ops@example.com \
--namespace production
# Reference in pod spec
spec:
imagePullSecrets:
- name: regcred
containers:
- image: registry.example.com/myapp:1.2.3
# Attach imagePullSecret to ServiceAccount so all pods
# in the namespace automatically get it
apiVersion: v1
kind: ServiceAccount
metadata:
name: default
namespace: production
imagePullSecrets:
- name: regcred
Attaching an imagePullSecret to the default ServiceAccount makes it available to all pods that use that SA in that namespace. For cross-namespace use, you must replicate the secret. Use tools like reflector or reloader to sync secrets across namespaces.
imagePullPolicy
| Policy | Behavior | Security Risk | When to Use |
|---|---|---|---|
Always | Pull on every pod start | Lowest — always gets latest | Production with mutable tags |
IfNotPresent | Pull only if not on node | Medium — cached image may be stale | Pinned digests |
Never | Never pull, fail if absent | High — relies on pre-loaded images | Air-gapped environments |
Enabling the AlwaysPullImages admission plugin forces imagePullPolicy: Always on every pod, regardless of what the pod spec says. This prevents nodes from running images that a different tenant pulled previously. Without it, tenant A could reference an image from tenant B's namespace if it happens to be cached on the same node.
Registry Security Controls
| Control | Description | Implementation |
|---|---|---|
| Private registry | No anonymous pulls | Harbor, ECR, GCR, ACR with auth required |
| Image scanning gate | Block push if vulnerabilities exceed threshold | Harbor built-in, ECR Enhanced Scanning |
| Content trust | Only signed images allowed | Notary v2, Cosign, registry policy |
| Repository RBAC | Teams can only push to their repos | Harbor projects, ECR resource policies |
| Immutable tags | Tags cannot be overwritten | ECR immutable tags, Harbor tag retention |
| Geo-replication | Images closer to clusters | Harbor replication, ECR replication |
| Audit logging | Who pushed/pulled when | Registry audit logs → SIEM |
Image Signing with Cosign
Cosign (part of the Sigstore project) enables cryptographic signing of container images, creating a verifiable link between an image digest and an identity that signed it.
CI/CD Pipeline
│
├─ Build image → push to registry
│ └── Image Digest: sha256:abc123...
│
├─ Sign with private key:
│ cosign sign --key cosign.key registry/image@sha256:abc123
│ └── Signature stored in registry as: registry/image:sha256-abc123.sig
│
└─ Push SBOM attestation:
cosign attest --key cosign.key --predicate sbom.json registry/image@sha256:abc123
── Verification at admission time ────────────────────────────────────
Admission Webhook (Kyverno / Policy Controller)
│
├─ Pod CREATE request arrives
├─ Extract image reference
├─ Fetch signature from registry
├─ Verify: cosign verify --key cosign.pub registry/image@sha256:abc123
└─ Allow (signature valid) or Deny (no valid signature)
Key-Based Signing
# Generate key pair (store private key in Vault/KMS, not CI env var)
cosign generate-key-pair
# Sign image (after push to registry)
cosign sign --key cosign.key \
registry.example.com/myapp@sha256:abc123...
# Sign with KMS key (Google Cloud KMS example)
cosign sign --key gcpkms://projects/my-project/locations/global/keyRings/my-ring/cryptoKeys/cosign \
registry.example.com/myapp@sha256:abc123...
# Verify image signature
cosign verify --key cosign.pub \
registry.example.com/myapp@sha256:abc123...
# Verify outputs JSON with signer identity and timestamp
Keyless Signing (Sigstore)
Keyless signing uses ephemeral keys tied to OIDC identity (GitHub Actions, Google SA, etc.) and records signatures in the public Rekor transparency log. No long-lived private keys to manage or protect.
│
├─ OIDC token issued by GitHub (id_token: write permission)
│ iss: https://token.actions.githubusercontent.com
│ sub: repo:myorg/myrepo:ref:refs/heads/main
│
├─ cosign sign (SIGSTORE_NO_TLOG=false)
│ ├── Request ephemeral cert from Fulcio CA
│ │ └── Cert binds: OIDC identity ↔ ephemeral public key
│ ├── Sign image digest with ephemeral private key
│ └── Upload signature + cert to Rekor transparency log
│
└─ Ephemeral private key discarded
Verifier
├─ Fetch signature from registry
├─ Verify cert chain (Fulcio root CA)
├─ Verify signature with cert public key
├─ Check Rekor inclusion proof
└─ Verify cert subject matches expected OIDC identity
# Keyless sign in GitHub Actions (OIDC token available automatically)
- name: Sign image
run: |
cosign sign \
--yes \
registry.example.com/myapp@${{ steps.build.outputs.digest }}
env:
COSIGN_EXPERIMENTAL: "1" # Not needed from cosign v2.0+
# Keyless verify — check identity matches expected GitHub workflow
cosign verify \
--certificate-identity-regexp "https://github.com/myorg/myrepo/.github/workflows/.*" \
--certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
registry.example.com/myapp@sha256:abc123...
Attestations
# Attach SBOM as attestation (CycloneDX format)
cosign attest \
--key cosign.key \
--type cyclonedx \
--predicate sbom.cyclonedx.json \
registry.example.com/myapp@sha256:abc123...
# Attach SLSA provenance as attestation
cosign attest \
--key cosign.key \
--type slsaprovenance \
--predicate provenance.json \
registry.example.com/myapp@sha256:abc123...
# Verify attestation
cosign verify-attestation \
--key cosign.pub \
--type cyclonedx \
registry.example.com/myapp@sha256:abc123... | \
jq .payload | base64 -d | jq .
SLSA Provenance
SLSA (Supply chain Levels for Software Artifacts, pronounced "salsa") is a framework of incrementally adoptable security levels for software supply chains, defined by Google and the OpenSSF.
| Level | Requirements | What It Prevents |
|---|---|---|
| SLSA 0 | No guarantees | Nothing |
| SLSA 1 | Provenance generated (not authenticated) | Accidental mistakes; provides artifact lineage |
| SLSA 2 | Signed provenance from hosted build service | Tampering after the build; identifies build service |
| SLSA 3 | Hardened build platform; non-forgeable provenance | Compromised build service modifying artifacts |
| SLSA 4 | Two-person review; hermetic builds | Insider threats; reproducible builds verification |
SLSA 3 requires a hosted, hardened build platform (GitHub Actions, Google Cloud Build, etc.) that generates non-forgeable provenance. This is achievable with existing CI/CD tooling. SLSA 4 requires hermetic and reproducible builds which many projects cannot achieve today.
SLSA Provenance with GitHub Actions
# .github/workflows/release.yaml
jobs:
build:
permissions:
id-token: write # For OIDC / Sigstore
contents: read
packages: write # For pushing to GHCR
steps:
- uses: actions/checkout@v4
- name: Build and push
id: build
uses: docker/build-push-action@v5
with:
push: true
tags: ghcr.io/myorg/myapp:${{ github.sha }}
provenance:
needs: [build]
uses: slsa-framework/slsa-github-generator/.github/workflows/generator_container_slsa3.yml@v1.10.0
with:
image: ghcr.io/myorg/myapp
digest: ${{ needs.build.outputs.digest }}
permissions:
id-token: write
contents: read
packages: write
actions: read
# Verify SLSA provenance
slsa-verifier verify-image \
--source-uri github.com/myorg/myapp \
--source-tag v1.2.3 \
ghcr.io/myorg/myapp@sha256:abc123...
Policy Enforcement in Kubernetes
Kyverno verifyImages
Kyverno's verifyImages rule validates image signatures and attestations at admission time. It supports Cosign key-based, keyless, and attestation verification.
# Kyverno ClusterPolicy: require signed images from trusted registry
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: verify-image-signatures
spec:
validationFailureAction: Enforce
background: false # Don't check existing pods
rules:
- name: verify-signature
match:
any:
- resources:
kinds: [Pod]
verifyImages:
- imageReferences:
- "registry.example.com/*"
attestors:
- entries:
- keys:
publicKeys: |-
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE...
-----END PUBLIC KEY-----
mutateDigest: true # Replace tag with verified digest
verifyDigest: true # Ensure image ref uses digest
required: true
# Kyverno verifyImages: keyless (Sigstore) verification
verifyImages:
- imageReferences:
- "ghcr.io/myorg/*"
attestors:
- entries:
- keyless:
subject: "https://github.com/myorg/myrepo/.github/workflows/release.yaml@refs/heads/main"
issuer: "https://token.actions.githubusercontent.com"
rekor:
url: https://rekor.sigstore.dev
# Kyverno verifyImages: verify SBOM attestation exists
verifyImages:
- imageReferences:
- "registry.example.com/*"
attestations:
- type: https://cyclonedx.org/bom
attestors:
- entries:
- keys:
publicKeys: "..."
conditions:
- all:
- key: "{{ components | length(@) }}"
operator: GreaterThan
value: "0"
Registry Allow-List with Kyverno
# Block images from untrusted registries
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: restrict-image-registries
spec:
validationFailureAction: Enforce
rules:
- name: validate-registries
match:
any:
- resources:
kinds: [Pod]
validate:
message: "Images must come from registry.example.com or gcr.io/distroless"
pattern:
spec:
containers:
- image: "registry.example.com/* | gcr.io/distroless/*"
OPA Gatekeeper: Digest Pinning Policy
# ConstraintTemplate: require digest-pinned images
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8srequiredimagedigest
spec:
crd:
spec:
names:
kind: K8sRequiredImageDigest
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredimagedigest
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not contains(container.image, "@sha256:")
msg := sprintf("Container '%v' image '%v' must use digest pinning (@sha256:...)", [container.name, container.image])
}
violation[{"msg": msg}] {
container := input.review.object.spec.initContainers[_]
not contains(container.image, "@sha256:")
msg := sprintf("InitContainer '%v' image '%v' must use digest pinning", [container.name, container.image])
}
ValidatingAdmissionPolicy (CEL): Registry Check
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: check-image-registry
spec:
matchConstraints:
resourceRules:
- apiGroups: [""]
apiVersions: [v1]
operations: [CREATE, UPDATE]
resources: [pods]
validations:
- expression: >
object.spec.containers.all(c,
c.image.startsWith("registry.example.com/") ||
c.image.startsWith("gcr.io/distroless/")
)
message: "Images must come from approved registries"
- expression: >
object.spec.containers.all(c,
c.image.contains("@sha256:")
)
message: "Images must be pinned to a digest"
Runtime Image Security
Falco Rules for Image Anomalies
Falco detects runtime behavior that indicates a compromised or malicious image. Crucially, Falco detects but does not prevent — pair it with admission controls for prevention.
# Falco rule: detect container running as root
- rule: Container Running as Root
desc: Detect containers running as root user
condition: >
container and
proc.vpid=1 and
user.uid=0 and
not container.image.repository in (trusted_root_images)
output: >
Container running as root (user=%user.name image=%container.image.repository
container=%container.name pod=%k8s.pod.name ns=%k8s.ns.name)
priority: WARNING
# Falco rule: detect unexpected network connection from container
- rule: Unexpected Outbound Connection
desc: Detect outbound network connections to unusual destinations
condition: >
outbound and
container and
not fd.sport in (allowed_outbound_ports) and
not fd.sip in (allowed_outbound_ips)
output: >
Unexpected outbound connection (image=%container.image.repository
connection=%fd.name pod=%k8s.pod.name)
priority: CRITICAL
# Falco rule: detect binary execution in container
- rule: Unexpected Process in Container
desc: Detect execution of binaries not expected in container
condition: >
spawned_process and
container and
not proc.name in (allowed_processes) and
not proc.pname in (allowed_parent_processes)
output: >
Unexpected process (proc=%proc.name pproc=%proc.pname
image=%container.image.repository pod=%k8s.pod.name)
priority: ERROR
Image Pull Verification Workflow
Pod CREATE Request
Admission webhook intercepts the pod creation request before it reaches etcd.
Registry Allow-List Check
Validate image references start with approved registry prefixes. Reject images from Docker Hub or unrecognized registries.
Digest Pinning Check
Validate image references contain @sha256: digest. Reject mutable tag-only references.
Signature Verification
Fetch and verify Cosign signature from registry. Validate against trusted public keys or OIDC identity constraints.
Admission Allowed
Image reference mutated to verified digest (if tag was supplied). Pod created with immutable image reference.
Dockerfile Hardening
Complete Hardened Dockerfile
###############################################################
# Stage 1: Dependency download (separate for better caching)
###############################################################
FROM node:20-alpine3.19 AS deps
WORKDIR /app
# Copy only manifests first — layer cache: only invalidated when deps change
COPY package.json package-lock.json ./
RUN npm ci --only=production --ignore-scripts
###############################################################
# Stage 2: Application build
###############################################################
FROM node:20-alpine3.19 AS build
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci # include devDeps for build
COPY . .
RUN npm run build
###############################################################
# Stage 3: Production image
###############################################################
FROM node:20-alpine3.19
# Upgrade OS packages to get latest CVE fixes
RUN apk update && apk upgrade --no-cache && rm -rf /var/cache/apk/*
# Create non-root user with fixed UID (important for runAsNonRoot + runAsUser)
RUN addgroup -g 10001 -S appgroup && \
adduser -u 10001 -S appuser -G appgroup
WORKDIR /app
# Copy artifacts from build stages (not source code or devDeps)
COPY --from=deps --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=build --chown=appuser:appgroup /app/dist ./dist
COPY --chown=appuser:appgroup package.json .
# Drop to non-root user
USER 10001:10001
# Document (do not publish) the application port
EXPOSE 3000
# Use exec form (not shell form) to ensure signals propagate correctly
ENTRYPOINT ["node", "dist/server.js"]
Dockerfile Anti-Patterns
| Anti-Pattern | Risk | Fix |
|---|---|---|
FROM ubuntu:latest | Non-reproducible, mutable | Pin to specific digest: FROM ubuntu:22.04@sha256:... |
USER root (final image) | Container runs as root | Add non-root user, use USER 10001 |
COPY . . without .dockerignore | Source secrets, .env copied | Add .dockerignore; copy specific files |
ENV API_KEY=secret | Secret in image layers forever | Use runtime injection: K8s Secrets or ESO |
RUN curl | bash | Arbitrary code execution at build time | Pin to specific version + verify checksum |
RUN apt-get install && rm -rf /var/lib/apt | Package manager remains; packages cached | Multi-stage; apt in build stage only |
ADD https://... /app/ | Remote content without verification | Use RUN curl -L URL | sha256sum -c - |
No HEALTHCHECK | Kubernetes can't detect app-level failure | Add HEALTHCHECK or use liveness probe |
.dockerignore
# .dockerignore — always include this file
.git
.gitignore
*.md
.env
.env.*
*.key
*.pem
*.cert
*.crt
node_modules
dist
coverage
.DS_Store
Dockerfile*
docker-compose*
.github
tests
__tests__
*.test.ts
*.spec.ts
Metrics, Alerts & Runbooks
Key Metrics
| Metric | Source | Description |
|---|---|---|
trivy_vulnerability_id | Trivy Operator | Per-image CVE count by severity |
kyverno_policy_results_total | Kyverno | Policy pass/fail counts by policy name |
image_pull_errors_total | kubelet | Failed image pull attempts (ErrImagePull, ImagePullBackOff) |
cosign_verification_duration_seconds | Custom webhook | Signature verification latency at admission |
falco_events_total | Falco | Runtime security events by rule name |
Alerts
# Alert: Critical CVE in running workload
- alert: CriticalCVEInRunningWorkload
expr: sum by (namespace, workload, image) (trivy_vulnerability_id{severity="CRITICAL"}) > 0
for: 5m
annotations:
summary: "Critical CVE in {{ $labels.workload }} ({{ $labels.namespace }})"
# Alert: Image from untrusted registry blocked
- alert: UntrustedRegistryBlocked
expr: increase(kyverno_policy_results_total{policy="restrict-image-registries",result="fail"}[5m]) > 0
annotations:
summary: "Attempt to deploy image from untrusted registry"
# Alert: Signature verification failures spike
- alert: ImageSignatureVerificationFailure
expr: increase(kyverno_policy_results_total{policy="verify-image-signatures",result="fail"}[5m]) > 3
annotations:
summary: "Multiple image signature verification failures — possible supply chain attack"
# Alert: Falco runtime anomaly
- alert: FalcoRuntimeAnomaly
expr: increase(falco_events_total{priority=~"CRITICAL|ERROR"}[5m]) > 0
annotations:
summary: "Falco detected runtime anomaly: {{ $labels.rule }}"
Runbooks
Critical CVE in Production
1. Identify affected images: kubectl get vulnerabilityreports -A
2. Check if CVE is exploitable (network path? fix available?)
3. Update base image or package, rebuild, re-sign, redeploy
4. If no fix: apply compensating controls (NetworkPolicy, seccomp)
ImagePullBackOff
1. Check events: kubectl describe pod <name>
2. Verify imagePullSecret exists and is valid
3. Verify registry is reachable from node
4. Check if admission policy mutated the image reference unexpectedly
Signature Verification Failure Spike
1. Check which images are failing: Kyverno PolicyReport
2. Verify CI pipeline is signing correctly
3. Check if public key in policy matches signing key
4. Investigate for unauthorized image push attempts
Falco Runtime Alert
1. Identify pod: kubectl get pod -n <ns> <name>
2. Capture forensics: kubectl debug -it --image=busybox
3. Consider immediate isolation: remove from Service, cordon node
4. Preserve evidence before terminating pod
Registry Credential Rotation
1. Create new imagePullSecret with new credentials
2. Update ServiceAccount or pod specs
3. Roll deployments: kubectl rollout restart deployment
4. Revoke old credentials in registry
Best Practices
Pin all images to digest in production
Tags are mutable. Use image@sha256: references for all production workloads. Automate digest updates with tools like Renovate or Dependabot.
Use distroless or scratch base images
Eliminate shell access, package managers, and unneeded utilities from production images. This removes entire attack classes post-exploitation.
Scan in CI and block on critical/high CVEs
Run trivy image --exit-code 1 --severity CRITICAL,HIGH in every build pipeline. Never deploy images that fail the threshold.
Sign every image with Cosign
Adopt keyless signing via GitHub Actions OIDC for zero key management overhead. Verify signatures at admission with Kyverno verifyImages.
Enforce registry allow-list at admission
Use Kyverno or ValidatingAdmissionPolicy to reject images from unapproved registries. Docker Hub images should be mirrored to your private registry before use.
Never bake secrets into images
Use BuildKit's --mount=type=secret for build-time secrets. Inject runtime secrets via Kubernetes Secrets or an external secrets operator — not ENV in Dockerfile.
Enable AlwaysPullImages in multi-tenant clusters
The AlwaysPullImages admission plugin prevents cross-tenant image cache exploitation. Performance impact is mitigated by registry proximity (same datacenter).
Deploy Trivy Operator for continuous scanning
One-time CI scans miss new CVEs in running workloads. The Trivy Operator continuously scans running images and surfaces new vulnerabilities via VulnerabilityReport CRDs.