Testing Strategies

Overview

Testing Kubernetes applications requires a wider test pyramid than traditional software. On top of unit and integration tests, you need cluster-level tests that validate Kubernetes-specific behaviour: does the pod start? does the probe work? does the NetworkPolicy allow the right traffic? does the HPA scale correctly?

                    ┌──────────────┐
                    │  Chaos Tests │  ← infrastructure resilience
                    └──────┬───────┘
                   ┌───────┴────────┐
                   │  E2E Tests     │  ← full user journeys in cluster
                   └───────┬────────┘
                  ┌────────┴─────────┐
                  │  Contract Tests  │  ← API consumer/producer contracts
                  └────────┬─────────┘
              ┌────────────┴──────────────┐
              │  Integration Tests        │  ← service + DB + K8s primitives
              └────────────┬──────────────┘
         ┌──────────────────┴───────────────────┐
         │  Unit Tests                           │  ← pure logic, no I/O
         └───────────────────────────────────────┘

Unit Tests

Unit tests run without a cluster. They test pure business logic, parsing, transformations. Target: < 1 second per test, 80%+ coverage on core logic.

// Go example — test business logic, mock external dependencies
func TestCalculatePaymentFee(t *testing.T) {
    tests := []struct {
        name     string
        amount   int64
        currency string
        want     int64
    }{
        {"USD small", 1000, "USD", 29},
        {"EUR small", 1000, "EUR", 25},
        {"USD large", 100000, "USD", 230},
    }
    for _, tc := range tests {
        t.Run(tc.name, func(t *testing.T) {
            got := CalculatePaymentFee(tc.amount, tc.currency)
            if got != tc.want {
                t.Errorf("CalculatePaymentFee(%d, %s) = %d, want %d",
                    tc.amount, tc.currency, got, tc.want)
            }
        })
    }
}

# Run unit tests with race detection and coverage
go test ./... -race -coverprofile=coverage.out -covermode=atomic

# View coverage report
go tool cover -html=coverage.out

# Run specific package
go test ./internal/payments/... -v -run TestCalculatePaymentFee

Integration Tests

Integration tests verify your service interacts correctly with real dependencies — a real database, a real message queue, a real Redis — but not the full Kubernetes cluster. Use testcontainers-go or docker-compose to spin up dependencies.

testcontainers-go

package integration_test

import (
    "context"
    "testing"

    "github.com/testcontainers/testcontainers-go"
    "github.com/testcontainers/testcontainers-go/modules/postgres"
)

func TestPaymentRepository(t *testing.T) {
    ctx := context.Background()

    // Spin up a real Postgres container
    pgContainer, err := postgres.RunContainer(ctx,
        testcontainers.WithImage("postgres:16-alpine"),
        postgres.WithDatabase("payments_test"),
        postgres.WithUsername("test"),
        postgres.WithPassword("test"),
        testcontainers.WithWaitStrategy(
            wait.ForLog("database system is ready to accept connections").
                WithOccurrence(2).
                WithStartupTimeout(30*time.Second),
        ),
    )
    if err != nil {
        t.Fatal(err)
    }
    defer pgContainer.Terminate(ctx)

    connStr, _ := pgContainer.ConnectionString(ctx, "sslmode=disable")

    // Run migrations
    db, _ := sql.Open("postgres", connStr)
    runMigrations(db)

    // Test the repository
    repo := NewPaymentRepository(db)
    payment, err := repo.Create(ctx, &Payment{Amount: 1000, Currency: "USD"})
    if err != nil {
        t.Fatalf("Create failed: %v", err)
    }
    if payment.ID == "" {
        t.Error("expected non-empty payment ID")
    }
}

# Run integration tests (requires Docker)
go test ./integration/... -tags=integration -timeout=120s

Cluster-Level Tests with envtest

controller-runtime/envtest starts a real API server and etcd binary locally — no cluster needed — and lets you test controllers, webhooks, and admission logic against a real K8s API.

package controllers_test

import (
    "path/filepath"
    "testing"

    . "github.com/onsi/ginkgo/v2"
    . "github.com/onsi/gomega"
    "sigs.k8s.io/controller-runtime/pkg/envtest"
)

var (
    testEnv *envtest.Environment
    cfg     *rest.Config
)

func TestControllers(t *testing.T) {
    RegisterFailHandler(Fail)
    RunSpecs(t, "Controller Suite")
}

var _ = BeforeSuite(func() {
    testEnv = &envtest.Environment{
        CRDDirectoryPaths: []string{
            filepath.Join("..", "config", "crd", "bases"),
        },
        ErrorIfCRDPathMissing: true,
    }
    var err error
    cfg, err = testEnv.Start()
    Expect(err).NotTo(HaveOccurred())
})

var _ = AfterSuite(func() {
    Expect(testEnv.Stop()).To(Succeed())
})

var _ = Describe("PaymentReconciler", func() {
    It("creates a ConfigMap when a Payment is created", func() {
        payment := &paymentv1.Payment{
            ObjectMeta: metav1.ObjectMeta{Name: "test-payment", Namespace: "default"},
            Spec:       paymentv1.PaymentSpec{Amount: 100},
        }
        Expect(k8sClient.Create(ctx, payment)).To(Succeed())
        
        cm := &corev1.ConfigMap{}
        Eventually(func() error {
            return k8sClient.Get(ctx, types.NamespacedName{
                Name: "payment-config", Namespace: "default",
            }, cm)
        }, 10*time.Second, 100*time.Millisecond).Should(Succeed())
    })
})

E2E Tests — kubectl/kuttl

kuttl (KUbernetes Test TooL) runs test cases as YAML files against a real cluster. It applies manifests, asserts state, and cleans up.

# tests/e2e/payment-flow/00-create-payment.yaml
apiVersion: kuttl.dev/v1beta1
kind: TestStep
apply:
- payment.yaml

---
# tests/e2e/payment-flow/01-assert.yaml
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
timeout: 60
collectors:
- type: pod
  selector: app=payments-api
---
# Assert the payment object reaches Ready state
apiVersion: payments.acme.com/v1
kind: Payment
metadata:
  name: test-payment
status:
  phase: Ready

# Install kuttl
kubectl krew install kuttl

# Run E2E tests against current cluster context
kubectl kuttl test --config kuttl-test.yaml

# kuttl-test.yaml
apiVersion: kuttl.dev/v1beta1
kind: TestSuite
testDirs:
- ./tests/e2e
startKIND: false    # use existing cluster context
timeout: 120

E2E Tests with Chainsaw (kuttl successor)

Chainsaw is the next-generation test tool from the Kyverno project, with better assertion syntax and parallel test execution.

# chainsaw-test.yaml
apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
  name: payment-e2e
spec:
  steps:
  - name: create-payment
    try:
    - apply:
        file: payment.yaml
    - assert:
        file: payment-ready.yaml
    - command:
        entrypoint: curl
        args: ["-sf", "http://payments-api.production:8080/healthz"]
    catch:
    - describe:
        apiVersion: payments.acme.com/v1
        kind: Payment
    - podLogs:
        selector: app=payments-api
    finally:
    - delete:
        file: payment.yaml

# Run chainsaw tests
chainsaw test --test-dir ./tests/e2e

Helm Test Integration

After every Helm deploy in CI, run helm test to validate the release:

helm upgrade --install payments-api ./charts/payments-api \
  --namespace staging \
  --values values-staging.yaml \
  --wait \
  --timeout 5m

# Run helm tests immediately after
helm test payments-api --namespace staging --logs

# If tests fail, CI fails and Helm rolls back (if --atomic was used)

Policy Tests with Kyverno

Test Kyverno policies without a running cluster using kyverno test:

# kyverno-test/kyverno-test.yaml
name: require-resource-limits-test
policies:
- ../../policies/require-resource-limits.yaml
resources:
- resources/compliant-pod.yaml
- resources/non-compliant-pod.yaml
results:
- policy: require-resource-limits
  rule: check-resource-limits
  resource: compliant-pod
  result: pass
- policy: require-resource-limits
  rule: check-resource-limits
  resource: non-compliant-pod
  result: fail

kyverno test kyverno-test/

Chaos Testing

Chaos tests verify your application survives infrastructure failures. See 09 — Disaster Recovery for the full chaos toolchain. Below is the developer-facing subset for per-service chaos testing.

chaos-mesh Pod Kill

# Randomly kill one payments-api pod every 5 minutes (during test window)
apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
  name: payments-api-pod-kill
  namespace: chaos-testing
spec:
  action: pod-kill
  mode: one
  selector:
    namespaces: [staging]
    labelSelectors:
      app: payments-api
  scheduler:
    cron: "@every 5m"
  duration: "10m"

# Apply chaos, run load test, verify no errors
kubectl apply -f pod-kill-chaos.yaml
k6 run --duration 10m load-test.js
kubectl delete -f pod-kill-chaos.yaml

Network Latency Injection

apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: payments-latency
  namespace: chaos-testing
spec:
  action: delay
  mode: all
  selector:
    namespaces: [staging]
    labelSelectors:
      app: payments-api
  delay:
    latency: "100ms"
    correlation: "25"
    jitter: "50ms"
  duration: "5m"
  direction: to            # inject latency on incoming requests

Load Testing with k6

// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate } from 'k6/metrics';

const errorRate = new Rate('errors');

export const options = {
  stages: [
    { duration: '2m', target: 50 },    // ramp up to 50 VUs
    { duration: '5m', target: 50 },    // hold at 50 VUs
    { duration: '2m', target: 100 },   // ramp up to 100 VUs
    { duration: '5m', target: 100 },   // hold at 100 VUs
    { duration: '2m', target: 0 },     // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(99)<500'],  // 99th percentile < 500ms
    errors: ['rate<0.01'],             // error rate < 1%
  },
};

export default function () {
  const res = http.post(
    'http://payments-api.staging.svc.cluster.local:8080/payments',
    JSON.stringify({ amount: 1000, currency: 'USD', idempotency_key: `key-${__VU}-${__ITER}` }),
    { headers: { 'Content-Type': 'application/json' } }
  );

  check(res, {
    'status is 201': (r) => r.status === 201,
    'response time < 200ms': (r) => r.timings.duration < 200,
  });

  errorRate.add(res.status >= 400);
  sleep(0.1);
}

# Run k6 from a pod inside the cluster (avoids network round-trip)
kubectl run k6 --image=grafana/k6:latest --rm -it -- \
  run - < load-test.js

CI Test Stages Summary

Stage	Tool	Runs on	Blocks merge?
Unit tests	go test / jest / pytest	Every PR	Yes
Lint + type check	golangci-lint / eslint / mypy	Every PR	Yes
Security scan (code)	gosec / semgrep	Every PR	Yes
Image build	Buildkit	Every PR	Yes
Image scan (CVE)	Trivy	Every PR	Yes (CRITICAL)
Policy test	kyverno test	Every PR	Yes
Integration tests	testcontainers	Every PR	Yes
Helm test	helm test	Post-deploy to staging	Yes
E2E tests	kuttl / chainsaw	Post-deploy to staging	Yes
Load test	k6	Scheduled / release branch	No (informational)
Chaos test	chaos-mesh	Scheduled weekly	No

Local Development — running tests against local cluster
CI/CD Pipelines — where tests plug into the pipeline
Progressive Delivery — using test results to gate canary promotion
Disaster Recovery — full chaos testing