Progressive Delivery
Overview
Progressive delivery is the practice of releasing changes to a controlled subset of users before rolling them out fully. It turns deployment from a binary event (old → new) into a measurable process with automatic rollback if metrics degrade.
Traditional deployment:
v1 → v2 (all users, instant, no rollback without re-deploy)
Progressive delivery:
v1 → 5% v2 (measure) → 25% v2 (measure) → 100% v2
↕ ↕
rollback rollback
if error if latency ↑
Strategies
| Strategy | Traffic split | Rollback speed | Complexity | Best for |
|---|---|---|---|---|
| Rolling update | Gradual pod replacement | Medium (pod restart) | Low | Stateless services, low risk |
| Canary | % of requests to new version | Fast (weight change) | Medium | APIs, high-traffic services |
| Blue-Green | 100% switch on DNS/LB | Instant | High (2x resources) | Stateful migrations, big-bang changes |
| A/B test | Header/cookie routing | Fast | High | Feature experiments with user segmentation |
| Shadow | Mirror traffic to new version | N/A (no user impact) | High | Data pipelines, ML models |
Argo Rollouts
Argo Rollouts extends Kubernetes Deployments with canary and blue-green strategies, metric-based promotion, and integration with Argo CD.
Installation
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts \
-f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml
# Install kubectl plugin
kubectl krew install argo-rollouts
Canary Rollout
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: payments-api
namespace: production
spec:
replicas: 10
selector:
matchLabels:
app: payments-api
template:
metadata:
labels:
app: payments-api
spec:
containers:
- name: payments-api
image: ghcr.io/acme/payments-api:sha-abc123
ports:
- containerPort: 8080
resources:
requests:
cpu: 200m
memory: 256Mi
strategy:
canary:
# Traffic shifting via NGINX Ingress
canaryService: payments-api-canary
stableService: payments-api-stable
trafficRouting:
nginx:
stableIngress: payments-api-ingress
annotationPrefix: nginx.ingress.kubernetes.io
steps:
- setWeight: 5 # 5% to canary
- pause: {duration: 5m} # wait 5 minutes
- analysis: # run metrics analysis
templates:
- templateName: success-rate
args:
- name: service-name
value: payments-api-canary
- setWeight: 25
- pause: {duration: 10m}
- analysis:
templates:
- templateName: success-rate
- templateName: latency-p99
args:
- name: service-name
value: payments-api-canary
- setWeight: 50
- pause: {duration: 10m}
- setWeight: 100 # promote to stable
# Automatic rollback if analysis fails
autoPromotionEnabled: false
AnalysisTemplate — Prometheus Metrics
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
namespace: production
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 60s
count: 5 # run 5 times (5 minutes at 60s interval)
successCondition: result[0] >= 0.99 # 99% success rate required
failureLimit: 1
provider:
prometheus:
address: http://prometheus-operated.monitoring:9090
query: |
sum(
rate(http_requests_total{
service="{{args.service-name}}",
status_code!~"5.."
}[2m])
) /
sum(
rate(http_requests_total{
service="{{args.service-name}}"
}[2m])
)
- name: latency-p99
interval: 60s
count: 5
successCondition: result[0] <= 0.5 # p99 latency <= 500ms
failureLimit: 1
provider:
prometheus:
address: http://prometheus-operated.monitoring:9090
query: |
histogram_quantile(0.99,
sum by (le) (
rate(http_request_duration_seconds_bucket{
service="{{args.service-name}}"
}[2m])
)
)
Rollout Operations
# Watch rollout progress
kubectl argo rollouts get rollout payments-api -n production --watch
# Manually promote a paused step (when autoPromotionEnabled: false)
kubectl argo rollouts promote payments-api -n production
# Skip remaining steps and go to 100% (force promote)
kubectl argo rollouts promote payments-api -n production --full
# Abort a rollout (immediately rolls back to stable)
kubectl argo rollouts abort payments-api -n production
# Manually retry a failed rollout after fixing the issue
kubectl argo rollouts retry rollout payments-api -n production
# Set canary weight manually
kubectl argo rollouts set image payments-api \
payments-api=ghcr.io/acme/payments-api:sha-newsha -n production
Blue-Green with Argo Rollouts
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: payments-api-bg
namespace: production
spec:
replicas: 5
selector:
matchLabels:
app: payments-api
template:
metadata:
labels:
app: payments-api
spec:
containers:
- name: payments-api
image: ghcr.io/acme/payments-api:sha-abc123
strategy:
blueGreen:
activeService: payments-api-active # receives 100% production traffic
previewService: payments-api-preview # receives no traffic until promoted
autoPromotionEnabled: false # require manual promote
prePromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: payments-api-preview
scaleDownDelaySeconds: 300 # keep old (blue) pods for 5min after switch
# After deploying new image, preview service is live
# Smoke test against preview before promoting
kubectl port-forward svc/payments-api-preview 8080:8080 -n production &
curl http://localhost:8080/healthz
# Promote blue-green (switches active service selector)
kubectl argo rollouts promote payments-api-bg -n production
Flagger — Alternative to Argo Rollouts
Flagger integrates with Flux and supports Istio, NGINX, Contour, and more as traffic routers.
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: payments-api
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: payments-api
progressDeadlineSeconds: 600
service:
port: 8080
targetPort: 8080
trafficPolicy:
tls:
mode: DISABLE
# Canary analysis configuration
analysis:
interval: 1m
threshold: 5 # max 5 failed analysis checks before rollback
maxWeight: 50 # max 50% canary traffic
stepWeight: 5 # increase by 5% per interval
metrics:
- name: request-success-rate
thresholdRange:
min: 99 # minimum 99% success rate
interval: 1m
- name: request-duration
thresholdRange:
max: 500 # maximum 500ms p99 latency
interval: 1m
# Run webhooks during analysis
webhooks:
- name: smoke-test
type: pre-rollout
url: http://smoke-tester.testing/
timeout: 30s
metadata:
type: bash
cmd: "curl -sd 'test' http://payments-api-canary.production/payments | grep '201'"
Feature Flags — Decoupling Deploy from Release
Feature flags let you deploy code that is inactive until explicitly enabled — removing the coupling between git merge and user-visible change.
OpenFeature (vendor-neutral SDK)
// Go service — OpenFeature SDK
import (
"github.com/open-feature/go-sdk/pkg/openfeature"
flagd "github.com/open-feature/go-sdk-contrib/providers/flagd/pkg"
)
func init() {
openfeature.SetProvider(flagd.NewProvider(
flagd.WithHost("flagd.production.svc.cluster.local"),
flagd.WithPort(8013),
))
}
func HandlePayment(ctx context.Context, req *PaymentRequest) (*PaymentResponse, error) {
// Check feature flag before using new payment processor
useNewProcessor, _ := openfeature.BooleanValue(
ctx, "new-payment-processor", false,
openfeature.NewEvaluationContext("", map[string]interface{}{
"user_id": req.UserID,
"country": req.Country,
}),
)
if useNewProcessor {
return newProcessor.Process(ctx, req)
}
return legacyProcessor.Process(ctx, req)
}
# flagd ConfigMap — feature flag definitions
apiVersion: v1
kind: ConfigMap
metadata:
name: flagd-config
namespace: production
data:
flags.json: |
{
"$schema": "https://flagd.dev/schema/v0/flags.json",
"flags": {
"new-payment-processor": {
"state": "ENABLED",
"variants": {
"on": true,
"off": false
},
"defaultVariant": "off",
"targeting": {
"if": [
{"in": [{"var": "country"}, ["US", "CA"]]},
"on", "off"
]
}
}
}
}
Argo CD + Argo Rollouts Integration
# Argo CD Application watching a Rollout
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payments-api
namespace: argocd
spec:
source:
repoURL: https://github.com/acme/k8s-config
targetRevision: main
path: services/payments-api/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- RespectIgnoreDifferences=true
# Ignore canary weight differences (Argo Rollouts manages this)
ignoreDifferences:
- group: argoproj.io
kind: Rollout
jsonPointers:
- /spec/replicas
- /spec/template/spec/containers/0/image
Rollout Metrics and SLO Gates
# PromQL — track canary vs stable error rate during rollout
sum(rate(http_requests_total{
service=~"payments-api-canary|payments-api-stable",
status_code=~"5.."
}[5m])) by (service)
/
sum(rate(http_requests_total{
service=~"payments-api-canary|payments-api-stable"
}[5m])) by (service)
# Argo Rollouts dashboard
kubectl argo rollouts dashboard # opens at http://localhost:3100
Related
- CI/CD Pipelines — image builds that feed into rollouts
- Testing Strategies — AnalysisTemplates reference metrics from load tests
- SRE Practices — SLO-based promotion gates
- 09 — Production Overview — change management tiers