Overview

Testing Kubernetes applications requires a wider test pyramid than traditional software. On top of unit and integration tests, you need cluster-level tests that validate Kubernetes-specific behaviour: does the pod start? does the probe work? does the NetworkPolicy allow the right traffic? does the HPA scale correctly?

                    ┌──────────────┐
                    │  Chaos Tests │  ← infrastructure resilience
                    └──────┬───────┘
                   ┌───────┴────────┐
                   │  E2E Tests     │  ← full user journeys in cluster
                   └───────┬────────┘
                  ┌────────┴─────────┐
                  │  Contract Tests  │  ← API consumer/producer contracts
                  └────────┬─────────┘
              ┌────────────┴──────────────┐
              │  Integration Tests        │  ← service + DB + K8s primitives
              └────────────┬──────────────┘
         ┌──────────────────┴───────────────────┐
         │  Unit Tests                           │  ← pure logic, no I/O
         └───────────────────────────────────────┘

Unit Tests

Unit tests run without a cluster. They test pure business logic, parsing, transformations. Target: < 1 second per test, 80%+ coverage on core logic.

// Go example — test business logic, mock external dependencies
func TestCalculatePaymentFee(t *testing.T) {
    tests := []struct {
        name     string
        amount   int64
        currency string
        want     int64
    }{
        {"USD small", 1000, "USD", 29},
        {"EUR small", 1000, "EUR", 25},
        {"USD large", 100000, "USD", 230},
    }
    for _, tc := range tests {
        t.Run(tc.name, func(t *testing.T) {
            got := CalculatePaymentFee(tc.amount, tc.currency)
            if got != tc.want {
                t.Errorf("CalculatePaymentFee(%d, %s) = %d, want %d",
                    tc.amount, tc.currency, got, tc.want)
            }
        })
    }
}
# Run unit tests with race detection and coverage
go test ./... -race -coverprofile=coverage.out -covermode=atomic

# View coverage report
go tool cover -html=coverage.out

# Run specific package
go test ./internal/payments/... -v -run TestCalculatePaymentFee

Integration Tests

Integration tests verify your service interacts correctly with real dependencies — a real database, a real message queue, a real Redis — but not the full Kubernetes cluster. Use testcontainers-go or docker-compose to spin up dependencies.

testcontainers-go

package integration_test

import (
    "context"
    "testing"

    "github.com/testcontainers/testcontainers-go"
    "github.com/testcontainers/testcontainers-go/modules/postgres"
)

func TestPaymentRepository(t *testing.T) {
    ctx := context.Background()

    // Spin up a real Postgres container
    pgContainer, err := postgres.RunContainer(ctx,
        testcontainers.WithImage("postgres:16-alpine"),
        postgres.WithDatabase("payments_test"),
        postgres.WithUsername("test"),
        postgres.WithPassword("test"),
        testcontainers.WithWaitStrategy(
            wait.ForLog("database system is ready to accept connections").
                WithOccurrence(2).
                WithStartupTimeout(30*time.Second),
        ),
    )
    if err != nil {
        t.Fatal(err)
    }
    defer pgContainer.Terminate(ctx)

    connStr, _ := pgContainer.ConnectionString(ctx, "sslmode=disable")

    // Run migrations
    db, _ := sql.Open("postgres", connStr)
    runMigrations(db)

    // Test the repository
    repo := NewPaymentRepository(db)
    payment, err := repo.Create(ctx, &Payment{Amount: 1000, Currency: "USD"})
    if err != nil {
        t.Fatalf("Create failed: %v", err)
    }
    if payment.ID == "" {
        t.Error("expected non-empty payment ID")
    }
}
# Run integration tests (requires Docker)
go test ./integration/... -tags=integration -timeout=120s

Cluster-Level Tests with envtest

controller-runtime/envtest starts a real API server and etcd binary locally — no cluster needed — and lets you test controllers, webhooks, and admission logic against a real K8s API.

package controllers_test

import (
    "path/filepath"
    "testing"

    . "github.com/onsi/ginkgo/v2"
    . "github.com/onsi/gomega"
    "sigs.k8s.io/controller-runtime/pkg/envtest"
)

var (
    testEnv *envtest.Environment
    cfg     *rest.Config
)

func TestControllers(t *testing.T) {
    RegisterFailHandler(Fail)
    RunSpecs(t, "Controller Suite")
}

var _ = BeforeSuite(func() {
    testEnv = &envtest.Environment{
        CRDDirectoryPaths: []string{
            filepath.Join("..", "config", "crd", "bases"),
        },
        ErrorIfCRDPathMissing: true,
    }
    var err error
    cfg, err = testEnv.Start()
    Expect(err).NotTo(HaveOccurred())
})

var _ = AfterSuite(func() {
    Expect(testEnv.Stop()).To(Succeed())
})

var _ = Describe("PaymentReconciler", func() {
    It("creates a ConfigMap when a Payment is created", func() {
        payment := &paymentv1.Payment{
            ObjectMeta: metav1.ObjectMeta{Name: "test-payment", Namespace: "default"},
            Spec:       paymentv1.PaymentSpec{Amount: 100},
        }
        Expect(k8sClient.Create(ctx, payment)).To(Succeed())
        
        cm := &corev1.ConfigMap{}
        Eventually(func() error {
            return k8sClient.Get(ctx, types.NamespacedName{
                Name: "payment-config", Namespace: "default",
            }, cm)
        }, 10*time.Second, 100*time.Millisecond).Should(Succeed())
    })
})

E2E Tests — kubectl/kuttl

kuttl (KUbernetes Test TooL) runs test cases as YAML files against a real cluster. It applies manifests, asserts state, and cleans up.

# tests/e2e/payment-flow/00-create-payment.yaml
apiVersion: kuttl.dev/v1beta1
kind: TestStep
apply:
- payment.yaml

---
# tests/e2e/payment-flow/01-assert.yaml
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
timeout: 60
collectors:
- type: pod
  selector: app=payments-api
---
# Assert the payment object reaches Ready state
apiVersion: payments.acme.com/v1
kind: Payment
metadata:
  name: test-payment
status:
  phase: Ready
# Install kuttl
kubectl krew install kuttl

# Run E2E tests against current cluster context
kubectl kuttl test --config kuttl-test.yaml

# kuttl-test.yaml
apiVersion: kuttl.dev/v1beta1
kind: TestSuite
testDirs:
- ./tests/e2e
startKIND: false    # use existing cluster context
timeout: 120

E2E Tests with Chainsaw (kuttl successor)

Chainsaw is the next-generation test tool from the Kyverno project, with better assertion syntax and parallel test execution.

# chainsaw-test.yaml
apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
  name: payment-e2e
spec:
  steps:
  - name: create-payment
    try:
    - apply:
        file: payment.yaml
    - assert:
        file: payment-ready.yaml
    - command:
        entrypoint: curl
        args: ["-sf", "http://payments-api.production:8080/healthz"]
    catch:
    - describe:
        apiVersion: payments.acme.com/v1
        kind: Payment
    - podLogs:
        selector: app=payments-api
    finally:
    - delete:
        file: payment.yaml
# Run chainsaw tests
chainsaw test --test-dir ./tests/e2e

Helm Test Integration

After every Helm deploy in CI, run helm test to validate the release:

helm upgrade --install payments-api ./charts/payments-api \
  --namespace staging \
  --values values-staging.yaml \
  --wait \
  --timeout 5m

# Run helm tests immediately after
helm test payments-api --namespace staging --logs

# If tests fail, CI fails and Helm rolls back (if --atomic was used)

Policy Tests with Kyverno

Test Kyverno policies without a running cluster using kyverno test:

# kyverno-test/kyverno-test.yaml
name: require-resource-limits-test
policies:
- ../../policies/require-resource-limits.yaml
resources:
- resources/compliant-pod.yaml
- resources/non-compliant-pod.yaml
results:
- policy: require-resource-limits
  rule: check-resource-limits
  resource: compliant-pod
  result: pass
- policy: require-resource-limits
  rule: check-resource-limits
  resource: non-compliant-pod
  result: fail
kyverno test kyverno-test/

Chaos Testing

Chaos tests verify your application survives infrastructure failures. See 09 — Disaster Recovery for the full chaos toolchain. Below is the developer-facing subset for per-service chaos testing.

chaos-mesh Pod Kill

# Randomly kill one payments-api pod every 5 minutes (during test window)
apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
  name: payments-api-pod-kill
  namespace: chaos-testing
spec:
  action: pod-kill
  mode: one
  selector:
    namespaces: [staging]
    labelSelectors:
      app: payments-api
  scheduler:
    cron: "@every 5m"
  duration: "10m"
# Apply chaos, run load test, verify no errors
kubectl apply -f pod-kill-chaos.yaml
k6 run --duration 10m load-test.js
kubectl delete -f pod-kill-chaos.yaml

Network Latency Injection

apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: payments-latency
  namespace: chaos-testing
spec:
  action: delay
  mode: all
  selector:
    namespaces: [staging]
    labelSelectors:
      app: payments-api
  delay:
    latency: "100ms"
    correlation: "25"
    jitter: "50ms"
  duration: "5m"
  direction: to            # inject latency on incoming requests

Load Testing with k6

// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate } from 'k6/metrics';

const errorRate = new Rate('errors');

export const options = {
  stages: [
    { duration: '2m', target: 50 },    // ramp up to 50 VUs
    { duration: '5m', target: 50 },    // hold at 50 VUs
    { duration: '2m', target: 100 },   // ramp up to 100 VUs
    { duration: '5m', target: 100 },   // hold at 100 VUs
    { duration: '2m', target: 0 },     // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(99)<500'],  // 99th percentile < 500ms
    errors: ['rate<0.01'],             // error rate < 1%
  },
};

export default function () {
  const res = http.post(
    'http://payments-api.staging.svc.cluster.local:8080/payments',
    JSON.stringify({ amount: 1000, currency: 'USD', idempotency_key: `key-${__VU}-${__ITER}` }),
    { headers: { 'Content-Type': 'application/json' } }
  );

  check(res, {
    'status is 201': (r) => r.status === 201,
    'response time < 200ms': (r) => r.timings.duration < 200,
  });

  errorRate.add(res.status >= 400);
  sleep(0.1);
}
# Run k6 from a pod inside the cluster (avoids network round-trip)
kubectl run k6 --image=grafana/k6:latest --rm -it -- \
  run - < load-test.js

CI Test Stages Summary

StageToolRuns onBlocks merge?
Unit testsgo test / jest / pytestEvery PRYes
Lint + type checkgolangci-lint / eslint / mypyEvery PRYes
Security scan (code)gosec / semgrepEvery PRYes
Image buildBuildkitEvery PRYes
Image scan (CVE)TrivyEvery PRYes (CRITICAL)
Policy testkyverno testEvery PRYes
Integration teststestcontainersEvery PRYes
Helm testhelm testPost-deploy to stagingYes
E2E testskuttl / chainsawPost-deploy to stagingYes
Load testk6Scheduled / release branchNo (informational)
Chaos testchaos-meshScheduled weeklyNo