Progressive Delivery

Overview

Progressive delivery is the practice of releasing changes to a controlled subset of users before rolling them out fully. It turns deployment from a binary event (old → new) into a measurable process with automatic rollback if metrics degrade.

Traditional deployment:
  v1 → v2 (all users, instant, no rollback without re-deploy)

Progressive delivery:
  v1 → 5% v2 (measure) → 25% v2 (measure) → 100% v2
            ↕                     ↕
         rollback              rollback
         if error             if latency ↑

Strategies

Strategy	Traffic split	Rollback speed	Complexity	Best for
Rolling update	Gradual pod replacement	Medium (pod restart)	Low	Stateless services, low risk
Canary	% of requests to new version	Fast (weight change)	Medium	APIs, high-traffic services
Blue-Green	100% switch on DNS/LB	Instant	High (2x resources)	Stateful migrations, big-bang changes
A/B test	Header/cookie routing	Fast	High	Feature experiments with user segmentation
Shadow	Mirror traffic to new version	N/A (no user impact)	High	Data pipelines, ML models

Argo Rollouts

Argo Rollouts extends Kubernetes Deployments with canary and blue-green strategies, metric-based promotion, and integration with Argo CD.

Installation

kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts \
  -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

# Install kubectl plugin
kubectl krew install argo-rollouts

Canary Rollout

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: payments-api
  namespace: production
spec:
  replicas: 10
  selector:
    matchLabels:
      app: payments-api
  template:
    metadata:
      labels:
        app: payments-api
    spec:
      containers:
      - name: payments-api
        image: ghcr.io/acme/payments-api:sha-abc123
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 200m
            memory: 256Mi

  strategy:
    canary:
      # Traffic shifting via NGINX Ingress
      canaryService: payments-api-canary
      stableService: payments-api-stable
      trafficRouting:
        nginx:
          stableIngress: payments-api-ingress
          annotationPrefix: nginx.ingress.kubernetes.io

      steps:
      - setWeight: 5              # 5% to canary
      - pause: {duration: 5m}    # wait 5 minutes
      - analysis:                 # run metrics analysis
          templates:
          - templateName: success-rate
          args:
          - name: service-name
            value: payments-api-canary
      - setWeight: 25
      - pause: {duration: 10m}
      - analysis:
          templates:
          - templateName: success-rate
          - templateName: latency-p99
          args:
          - name: service-name
            value: payments-api-canary
      - setWeight: 50
      - pause: {duration: 10m}
      - setWeight: 100            # promote to stable

      # Automatic rollback if analysis fails
      autoPromotionEnabled: false

AnalysisTemplate — Prometheus Metrics

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
  namespace: production
spec:
  args:
  - name: service-name

  metrics:
  - name: success-rate
    interval: 60s
    count: 5                  # run 5 times (5 minutes at 60s interval)
    successCondition: result[0] >= 0.99    # 99% success rate required
    failureLimit: 1
    provider:
      prometheus:
        address: http://prometheus-operated.monitoring:9090
        query: |
          sum(
            rate(http_requests_total{
              service="{{args.service-name}}",
              status_code!~"5.."
            }[2m])
          ) /
          sum(
            rate(http_requests_total{
              service="{{args.service-name}}"
            }[2m])
          )

  - name: latency-p99
    interval: 60s
    count: 5
    successCondition: result[0] <= 0.5    # p99 latency <= 500ms
    failureLimit: 1
    provider:
      prometheus:
        address: http://prometheus-operated.monitoring:9090
        query: |
          histogram_quantile(0.99,
            sum by (le) (
              rate(http_request_duration_seconds_bucket{
                service="{{args.service-name}}"
              }[2m])
            )
          )

Rollout Operations

# Watch rollout progress
kubectl argo rollouts get rollout payments-api -n production --watch

# Manually promote a paused step (when autoPromotionEnabled: false)
kubectl argo rollouts promote payments-api -n production

# Skip remaining steps and go to 100% (force promote)
kubectl argo rollouts promote payments-api -n production --full

# Abort a rollout (immediately rolls back to stable)
kubectl argo rollouts abort payments-api -n production

# Manually retry a failed rollout after fixing the issue
kubectl argo rollouts retry rollout payments-api -n production

# Set canary weight manually
kubectl argo rollouts set image payments-api \
  payments-api=ghcr.io/acme/payments-api:sha-newsha -n production

Blue-Green with Argo Rollouts

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: payments-api-bg
  namespace: production
spec:
  replicas: 5
  selector:
    matchLabels:
      app: payments-api
  template:
    metadata:
      labels:
        app: payments-api
    spec:
      containers:
      - name: payments-api
        image: ghcr.io/acme/payments-api:sha-abc123

  strategy:
    blueGreen:
      activeService: payments-api-active      # receives 100% production traffic
      previewService: payments-api-preview    # receives no traffic until promoted
      autoPromotionEnabled: false             # require manual promote
      prePromotionAnalysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: payments-api-preview
      scaleDownDelaySeconds: 300              # keep old (blue) pods for 5min after switch

# After deploying new image, preview service is live
# Smoke test against preview before promoting
kubectl port-forward svc/payments-api-preview 8080:8080 -n production &
curl http://localhost:8080/healthz

# Promote blue-green (switches active service selector)
kubectl argo rollouts promote payments-api-bg -n production

Flagger — Alternative to Argo Rollouts

Flagger integrates with Flux and supports Istio, NGINX, Contour, and more as traffic routers.

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: payments-api
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payments-api
  progressDeadlineSeconds: 600

  service:
    port: 8080
    targetPort: 8080
    trafficPolicy:
      tls:
        mode: DISABLE

  # Canary analysis configuration
  analysis:
    interval: 1m
    threshold: 5             # max 5 failed analysis checks before rollback
    maxWeight: 50            # max 50% canary traffic
    stepWeight: 5            # increase by 5% per interval

    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99              # minimum 99% success rate
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500             # maximum 500ms p99 latency
      interval: 1m

    # Run webhooks during analysis
    webhooks:
    - name: smoke-test
      type: pre-rollout
      url: http://smoke-tester.testing/
      timeout: 30s
      metadata:
        type: bash
        cmd: "curl -sd 'test' http://payments-api-canary.production/payments | grep '201'"

Feature Flags — Decoupling Deploy from Release

Feature flags let you deploy code that is inactive until explicitly enabled — removing the coupling between git merge and user-visible change.

OpenFeature (vendor-neutral SDK)

// Go service — OpenFeature SDK
import (
    "github.com/open-feature/go-sdk/pkg/openfeature"
    flagd "github.com/open-feature/go-sdk-contrib/providers/flagd/pkg"
)

func init() {
    openfeature.SetProvider(flagd.NewProvider(
        flagd.WithHost("flagd.production.svc.cluster.local"),
        flagd.WithPort(8013),
    ))
}

func HandlePayment(ctx context.Context, req *PaymentRequest) (*PaymentResponse, error) {
    // Check feature flag before using new payment processor
    useNewProcessor, _ := openfeature.BooleanValue(
        ctx, "new-payment-processor", false,
        openfeature.NewEvaluationContext("", map[string]interface{}{
            "user_id":  req.UserID,
            "country":  req.Country,
        }),
    )

    if useNewProcessor {
        return newProcessor.Process(ctx, req)
    }
    return legacyProcessor.Process(ctx, req)
}

# flagd ConfigMap — feature flag definitions
apiVersion: v1
kind: ConfigMap
metadata:
  name: flagd-config
  namespace: production
data:
  flags.json: |
    {
      "$schema": "https://flagd.dev/schema/v0/flags.json",
      "flags": {
        "new-payment-processor": {
          "state": "ENABLED",
          "variants": {
            "on": true,
            "off": false
          },
          "defaultVariant": "off",
          "targeting": {
            "if": [
              {"in": [{"var": "country"}, ["US", "CA"]]},
              "on", "off"
            ]
          }
        }
      }
    }

Argo CD + Argo Rollouts Integration

# Argo CD Application watching a Rollout
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payments-api
  namespace: argocd
spec:
  source:
    repoURL: https://github.com/acme/k8s-config
    targetRevision: main
    path: services/payments-api/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - RespectIgnoreDifferences=true
  # Ignore canary weight differences (Argo Rollouts manages this)
  ignoreDifferences:
  - group: argoproj.io
    kind: Rollout
    jsonPointers:
    - /spec/replicas
    - /spec/template/spec/containers/0/image

Rollout Metrics and SLO Gates

# PromQL — track canary vs stable error rate during rollout
sum(rate(http_requests_total{
  service=~"payments-api-canary|payments-api-stable",
  status_code=~"5.."
}[5m])) by (service)
/
sum(rate(http_requests_total{
  service=~"payments-api-canary|payments-api-stable"
}[5m])) by (service)

# Argo Rollouts dashboard
kubectl argo rollouts dashboard     # opens at http://localhost:3100

CI/CD Pipelines — image builds that feed into rollouts
Testing Strategies — AnalysisTemplates reference metrics from load tests
SRE Practices — SLO-based promotion gates
09 — Production Overview — change management tiers