GitOps
- GitOps Principles
- Argo CD Architecture
- Argo CD Applications
- App-of-Apps Pattern
- ApplicationSets
- Sync Waves & Hooks
- Argo CD RBAC & SSO
- Flux Architecture
- Flux: Kustomize & Helm
- Flux Image Automation
- Argo CD vs Flux
- Git Branching Strategies
- Secrets in GitOps
- Drift Detection & Remediation
- Best Practices
1. GitOps Principles
GitOps is an operational model where the desired state of a system is declared in Git, and an automated agent continuously reconciles the live state to match the declared state. The four core principles (OpenGitOps spec):
1. Declarative
The entire system is described declaratively — not imperative scripts. Kubernetes YAML, Helm charts, and Kustomize overlays express what should exist, not how to get there.
2. Versioned and Immutable
The desired state is stored in Git with a complete history. Every change is a commit with authorship, timestamp, and message. Rollback is a Git revert.
3. Pulled Automatically
Software agents (Argo CD, Flux) pull from the Git source and apply changes. No CI system needs cluster credentials. The cluster-side agent initiates all connections — reducing the attack surface.
4. Continuously Reconciled
Agents continuously compare live state with desired state and correct drift. A manual kubectl apply that deviates from Git is automatically reverted (self-heal mode).
GitOps flow:
Developer Git Repository Kubernetes Cluster
│ │ │
│── git commit + push ──────▶│ │
│ │◀── poll (every 3min) ────│
│ │ (Argo CD / Flux) │
│ │──── diff detected ──────▶│
│ │ │── apply/sync
│ │ │── reconcile
│ │◀────── Health: Healthy ──│
│ │ │
│ │ [manual kubectl apply] │
│ │ │── drift detected
│ │◀────── selfHeal ─────────│── revert to Git state
Push-based vs Pull-based CD
| Dimension | Push-based (traditional CD) | Pull-based (GitOps) |
|---|---|---|
| Who initiates deployment | CI pipeline pushes to cluster | Agent in cluster pulls from Git |
| Credentials location | CI system holds kubeconfig/service account | Only Git credentials needed outside cluster |
| Attack surface | CI compromise = cluster compromise | Cluster credentials never leave cluster |
| Drift detection | None (pipeline only runs on trigger) | Continuous — every reconciliation loop |
| Auditability | CI logs (may be transient) | Git history (permanent, signed) |
| Multi-cluster | Complex (credential management per cluster) | Single agent per cluster, single Git repo |
2. Argo CD Architecture
┌──────────────────────────────────────────────────────┐
│ Argo CD │
│ │
│ ┌──────────────┐ ┌───────────────┐ │
│ │ API Server │ │ Repo Server │ │
│ │ (argocd- │ │ (git clone, │ │
│ │ server) │ │ template, │ │
│ │ REST + gRPC │ │ diff cache) │ │
│ └──────┬───────┘ └───────┬───────┘ │
│ │ │ │
│ ┌──────▼──────────────────▼───────┐ │
│ │ Application Controller │ │
│ │ (reconcile loop, sync, │ │
│ │ health assessment, │ │
│ │ hook execution) │ │
│ └──────────────────────────────── ┘ │
│ │
│ ┌──────────────┐ ┌───────────────┐ │
│ │ Redis │ │ Dex (SSO) │ │
│ │ (cache, │ │ (OIDC auth) │ │
│ │ sessions) │ │ │ │
│ └──────────────┘ └───────────────┘ │
└──────────────────────────────────────────────────────┘
Storage: Application CRDs in Kubernetes etcd (not external DB)
Argo CD Install (Production)
kubectl create namespace argocd
# HA install (recommended for production — 3 app controller replicas)
kubectl apply -n argocd \
-f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/ha/install.yaml
# Or via Helm (recommended for managing Argo CD config-as-code)
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update
helm upgrade --install argocd argo/argo-cd \
--namespace argocd \
--create-namespace \
--values argocd-values.yaml
# argocd-values.yaml (production)
global:
domain: argocd.internal.example.com
configs:
cm:
# Resource tracking: label (default) or annotation
application.resourceTrackingMethod: annotation
# Timeout for sync operations
timeout.reconciliation: 180s
# OIDC SSO via Dex
url: https://argocd.internal.example.com
oidc.config: |
name: Okta
issuer: https://myorg.okta.com/oauth2/default
clientId: $oidc.okta.clientId
clientSecret: $oidc.okta.clientSecret
requestedScopes: [openid, profile, email, groups]
requestedIDTokenClaims:
groups:
essential: true
rbac:
policy.csv: |
g, platform-admins, role:admin
g, developers, role:readonly
p, role:team-deployer, applications, sync, */*, allow
p, role:team-deployer, applications, get, */*, allow
policy.default: role:readonly
scopes: '[groups]'
params:
server.insecure: false
controller.diff.server.side: "true" # use server-side diff
repoServer:
replicas: 2
resources:
requests: {cpu: 250m, memory: 256Mi}
limits: {memory: 1Gi}
# Enable Helm plugins / custom tools
volumes:
- name: custom-tools
emptyDir: {}
initContainers:
- name: download-tools
image: alpine:3.18
command: [sh, -c]
args:
- |
# Install helm-secrets plugin for SOPS decryption
helm plugin install https://github.com/jkroepke/helm-secrets
applicationSet:
replicas: 2
server:
replicas: 2
service:
type: ClusterIP
ingress:
enabled: true
ingressClassName: nginx
hostname: argocd.internal.example.com
tls: true
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
controller:
replicas: 2 # HA: sharded app controller
env:
- name: ARGOCD_CONTROLLER_REPLICAS
value: "2"
3. Argo CD Applications
Application CRD — Full Reference
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: order-service
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io # cascade delete managed resources on app deletion
annotations:
argocd.argoproj.io/sync-wave: "5" # sync ordering
spec:
project: team-a # Argo CD AppProject for RBAC
source:
repoURL: https://github.com/myorg/services
targetRevision: HEAD # or specific tag/branch/SHA
path: order-service/helm
# Helm source
helm:
releaseName: order-service
valueFiles:
- values.yaml
- values-production.yaml
values: |
image:
tag: "1.4.2"
ignoreMissingValueFiles: false
skipCrds: false
# Alternative: Kustomize source
# source:
# path: order-service/kustomize/overlays/production
# kustomize:
# namePrefix: prod-
# images:
# - myregistry/order-service:1.4.2
destination:
server: https://kubernetes.default.svc # in-cluster
# or: server: https://prod-us.example.com (registered external cluster)
namespace: order-service
syncPolicy:
automated:
prune: true # delete resources removed from Git
selfHeal: true # revert manual kubectl changes
allowEmpty: false # never sync to an empty state
syncOptions:
- CreateNamespace=true
- ServerSideApply=true # use SSA instead of kubectl apply
- PrunePropagationPolicy=foreground
- PruneLast=true # delete resources after all others synced
- RespectIgnoreDifferences=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
# Ignore specific fields that change outside Git (e.g., HPA replicas)
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/replicas # HPA manages this
- group: ""
kind: Secret
managedFieldsManagers:
- secrets-store-sync # ESO manages secret data
revisionHistoryLimit: 10
AppProject — Team Isolation
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: team-a
namespace: argocd
spec:
description: "Team A microservices"
# Only allow syncing to specific clusters/namespaces
destinations:
- server: https://kubernetes.default.svc
namespace: team-a-* # wildcard namespace match
- server: https://prod-eu.example.com
namespace: team-a-*
# Only allow sources from team's repo
sourceRepos:
- https://github.com/myorg/team-a-services
- https://charts.helm.sh/stable # approved Helm repos
# Restrict which K8s resources can be managed
clusterResourceWhitelist:
- group: ""
kind: Namespace
namespaceResourceBlacklist:
- group: ""
kind: ResourceQuota # only platform team can set quotas
- group: networking.k8s.io
kind: NetworkPolicy # only platform team can create network policies
# Orphaned resource monitoring (alert on resources not in any app)
orphanedResources:
warn: true
roles:
- name: team-a-deployer
policies:
- p, proj:team-a:team-a-deployer, applications, sync, team-a/*, allow
- p, proj:team-a:team-a-deployer, applications, get, team-a/*, allow
groups:
- team-a-developers
4. App-of-Apps Pattern
The app-of-apps pattern uses a single root Application that manages other Applications. This allows the entire cluster state to be bootstrapped from a single kubectl apply and maintained declaratively in Git.
platform-gitops/
└── clusters/production/apps/ ← root app points here
├── cert-manager.yaml ← each file is an Application CR
├── ingress-nginx.yaml
├── kube-prometheus-stack.yaml
├── external-secrets.yaml
├── gatekeeper.yaml
├── karpenter.yaml
└── team-apps/
├── team-a-apps.yaml ← nested app-of-apps for each team
└── team-b-apps.yaml
# clusters/production/apps/cert-manager.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cert-manager
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "-1" # install before things that need certs
spec:
project: platform
source:
repoURL: https://charts.jetstack.io
chart: cert-manager
targetRevision: v1.15.0
helm:
values: |
installCRDs: true
replicaCount: 2
resources:
requests: {cpu: 100m, memory: 64Mi}
limits: {memory: 256Mi}
destination:
server: https://kubernetes.default.svc
namespace: cert-manager
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
5. ApplicationSets
ApplicationSet is a controller that generates multiple Argo CD Application resources from a single template using generators. It enables fleet-wide deployments and per-team application management without manual Application YAML per environment.
List Generator — Explicit Multi-Cluster
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: guestbook
namespace: argocd
spec:
generators:
- list:
elements:
- cluster: prod-us-east
url: https://prod-us-east.example.com
env: production
- cluster: prod-eu-west
url: https://prod-eu-west.example.com
env: production
- cluster: staging
url: https://staging.example.com
env: staging
template:
metadata:
name: "guestbook-{{cluster}}"
spec:
project: platform
source:
repoURL: https://github.com/myorg/platform-gitops
targetRevision: HEAD
path: "apps/guestbook/overlays/{{env}}"
destination:
server: "{{url}}"
namespace: guestbook
syncPolicy:
automated:
prune: true
selfHeal: true
Git Generator — Directory-based Multi-Env
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: cluster-addons
namespace: argocd
spec:
generators:
- git:
repoURL: https://github.com/myorg/platform-gitops
revision: HEAD
directories:
- path: "add-ons/*" # one Application per directory
- path: "add-ons/kustomize/*"
exclude: true # exclude subdirectories
template:
metadata:
name: "addon-{{path.basename}}"
spec:
project: platform
source:
repoURL: https://github.com/myorg/platform-gitops
targetRevision: HEAD
path: "{{path}}"
destination:
server: https://kubernetes.default.svc
namespace: "{{path.basename}}"
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Matrix Generator — Env × App Cartesian Product
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: services-matrix
namespace: argocd
spec:
generators:
- matrix:
generators:
# Generator 1: list of clusters
- clusters:
selector:
matchLabels:
platform.example.com/tier: workload
# Generator 2: list of services from git
- git:
repoURL: https://github.com/myorg/services
revision: HEAD
files:
- path: "services/*/config.json" # reads JSON per service
template:
metadata:
name: "{{name}}-{{values.service}}"
spec:
project: "{{values.team}}"
source:
repoURL: https://github.com/myorg/services
targetRevision: HEAD
path: "services/{{values.service}}/helm"
helm:
valueFiles:
- values.yaml
- "values-{{metadata.labels.environment}}.yaml"
destination:
server: "{{server}}"
namespace: "{{values.service}}"
Pull Request Generator — Preview Environments
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: preview-environments
namespace: argocd
spec:
generators:
- pullRequest:
github:
owner: myorg
repo: order-service
tokenRef:
secretName: github-token
key: token
labels:
- preview # only PRs with this label
template:
metadata:
name: "pr-{{number}}-order-service"
spec:
project: preview
source:
repoURL: https://github.com/myorg/order-service
targetRevision: "{{head_sha}}"
path: helm
helm:
values: |
image:
tag: "pr-{{number}}"
ingress:
host: "pr-{{number}}.preview.example.com"
destination:
server: https://kubernetes.default.svc
namespace: "preview-pr-{{number}}"
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
6. Sync Waves & Hooks
Sync Waves
Sync waves control the order in which resources are applied within a single sync operation. Resources in lower waves are applied first and must be healthy before higher waves begin.
# Resource annotation — wave number (default 0, range: any integer)
metadata:
annotations:
argocd.argoproj.io/sync-wave: "5"
# Common wave ordering pattern:
# Wave -3: Namespaces
# Wave -2: CRDs (cert-manager CRDs, Gatekeeper CRDs, etc.)
# Wave -1: Core infrastructure (cert-manager, external-dns, kube-state-metrics)
# Wave 0: Default (most resources)
# Wave 1: Applications that depend on infrastructure
# Wave 5: Ingress resources (after cert-manager and ingress controller)
# Wave 10: Smoke test jobs / verification
Resource Hooks
# PreSync hook — run before sync (e.g., database migrations)
apiVersion: batch/v1
kind: Job
metadata:
name: db-migrate
annotations:
argocd.argoproj.io/hook: PreSync
argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
spec:
template:
spec:
containers:
- name: migrate
image: myregistry/order-service:1.4.2
command: ["/app/migrate", "up"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
restartPolicy: Never
backoffLimit: 3
# PostSync hook — run after all resources are Healthy (e.g., smoke test)
apiVersion: batch/v1
kind: Job
metadata:
name: smoke-test
annotations:
argocd.argoproj.io/hook: PostSync
argocd.argoproj.io/hook-delete-policy: HookSucceeded # auto-cleanup on success
spec:
template:
spec:
containers:
- name: smoke-test
image: curlimages/curl:8.4.0
command:
- sh
- -c
- |
until curl -sf http://order-service.order-service/health; do
echo "waiting..."; sleep 3
done
echo "smoke test passed"
restartPolicy: Never
# Available hook types:
# PreSync — before sync begins
# Sync — during sync (parallel with resource apply)
# PostSync — after all resources are Healthy
# SyncFail — if sync fails (rollback notification, cleanup)
# PostDelete — when application is deleted
# Hook delete policies:
# HookSucceeded — delete after successful run (default for most)
# HookFailed — delete after failed run
# BeforeHookCreation — delete old hook before creating new one (idempotent)
7. Argo CD RBAC & SSO
RBAC Policy Syntax
# argocd-rbac-cm ConfigMap policy.csv
# Format: p, , , ,
argocd CLI Usage
# Login
argocd login argocd.internal.example.com \
--username admin \
--password $(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath='{.data.password}' | base64 -d)
# App management
argocd app list
argocd app get order-service
argocd app diff order-service # diff live vs desired state
argocd app sync order-service # trigger manual sync
argocd app sync order-service --dry-run # preview sync without applying
argocd app history order-service # deployment history
argocd app rollback order-service 5 # rollback to revision 5
# Wait for sync to complete in CI
argocd app wait order-service \
--sync \
--health \
--timeout 300
# Refresh (re-fetch from Git without syncing)
argocd app get order-service --refresh
# Force sync (ignore sync window)
argocd app sync order-service --force
# Terminate running sync
argocd app terminate-op order-service
8. Flux Architecture
Flux v2 Controllers (GitOps Toolkit):
┌───────────────────────────────────────────────────────┐
│ Flux Controllers │
│ │
│ source-controller → GitRepository │
│ → HelmRepository │
│ → OCIRepository │
│ → Bucket (S3/GCS) │
│ │
│ kustomize-controller → Kustomization │
│ (applies kustomize overlays)│
│ │
│ helm-controller → HelmRelease │
│ (manages Helm releases) │
│ │
│ notification-controller → Provider (Slack/Teams/PD) │
│ → Alert (what to notify) │
│ → Receiver (webhook) │
│ │
│ image-reflector-controller → ImageRepository │
│ → ImagePolicy │
│ │
│ image-automation-controller → ImageUpdateAutomation │
│ (writes back to Git) │
└───────────────────────────────────────────────────────┘
Flux Bootstrap
# Install flux CLI
curl -s https://fluxcd.io/install.sh | bash
# Bootstrap: installs Flux into cluster AND commits manifests to Git
flux bootstrap github \
--owner=myorg \
--repository=platform-gitops \
--branch=main \
--path=clusters/production \
--personal=false \
--token-auth=false \
--ssh-key-algorithm=ed25519
# This:
# 1. Creates GitHub deploy key
# 2. Installs Flux CRDs + controllers in flux-system namespace
# 3. Commits Flux manifests to clusters/production/flux-system/
# 4. Creates GitRepository pointing at the repo
# 5. Creates Kustomization pointing at clusters/production/
GitRepository Source
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: platform-gitops
namespace: flux-system
spec:
interval: 1m # poll interval
url: https://github.com/myorg/platform-gitops
ref:
branch: main
secretRef:
name: flux-system # SSH deploy key or HTTPS token
ignore: |
# ignore non-Kubernetes files
/.github/
/docs/
**/*.md
**/*.png
9. Flux: Kustomize & Helm
Kustomization (Flux)
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: cluster-addons
namespace: flux-system
spec:
interval: 10m # reconcile every 10 minutes
path: "./clusters/production/add-ons"
prune: true # delete removed resources from cluster
sourceRef:
kind: GitRepository
name: platform-gitops
healthChecks: # wait for these to be healthy before proceeding
- apiVersion: apps/v1
kind: Deployment
name: cert-manager
namespace: cert-manager
postBuild:
substitute:
CLUSTER_NAME: "production"
AWS_ACCOUNT_ID: "123456789"
substituteFrom:
- kind: ConfigMap
name: cluster-vars # inject per-cluster variables
decryption:
provider: sops # decrypt SOPS-encrypted secrets
secretRef:
name: sops-age # age key for SOPS decryption
timeout: 5m
retryInterval: 30s
# Dependency: only apply after infrastructure Kustomization is healthy
dependsOn:
- name: infrastructure
HelmRelease
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: kube-prometheus-stack
namespace: observability
spec:
interval: 1h
chart:
spec:
chart: kube-prometheus-stack
version: ">=60.0.0 <61.0.0" # semver constraint
sourceRef:
kind: HelmRepository
name: prometheus-community
namespace: flux-system
interval: 12h # check for chart updates
# Override values
values:
prometheus:
prometheusSpec:
retention: 15d
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: ebs-gp3
resources:
requests:
storage: 100Gi
# Merge values from ConfigMap/Secret
valuesFrom:
- kind: ConfigMap
name: prometheus-values
optional: false
- kind: Secret
name: prometheus-secrets
optional: true
# Upgrade strategy
upgrade:
remediation:
retries: 3 # retry failed upgrades
cleanupOnFail: true # remove new resources if upgrade fails
# Rollback on upgrade failure
rollback:
timeout: 5m
cleanupOnFail: true
# Test hook after upgrade
test:
enable: true
ignoreFailures: false
# Drift detection: reconcile even if Helm thinks it's in sync
driftDetection:
mode: enabled
ignore:
- paths: ["/spec/replicas"] # HPA manages replicas
HelmRepository Source
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: prometheus-community
namespace: flux-system
spec:
interval: 12h
url: https://prometheus-community.github.io/helm-charts
---
# OCI Helm registry (Helm 3.8+, preferred for private charts)
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: myorg-charts
namespace: flux-system
spec:
interval: 12h
type: oci
url: oci://123456789.dkr.ecr.us-east-1.amazonaws.com/helm-charts
secretRef:
name: ecr-credentials
10. Flux Image Automation
Flux image automation monitors container registries for new image tags, selects the latest matching a policy (semver, alphabetical, regex), and writes the updated tag back to Git — triggering a GitOps sync automatically.
# 1. Define which registry to watch
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageRepository
metadata:
name: order-service
namespace: flux-system
spec:
interval: 5m
image: 123456789.dkr.ecr.us-east-1.amazonaws.com/order-service
secretRef:
name: ecr-credentials
---
# 2. Define how to select the "latest" tag
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy
metadata:
name: order-service
namespace: flux-system
spec:
imageRepositoryRef:
name: order-service
policy:
semver:
range: ">=1.0.0 <2.0.0" # only 1.x.x tags
# Alternative: alphabetical (latest timestamp-tagged image)
# alphabetical:
# order: asc
---
# 3. In the Deployment manifest, add a marker comment:
# (Flux image automation rewrites the tag between the markers)
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
spec:
template:
spec:
containers:
- name: order-service
image: 123456789.dkr.ecr.us-east-1.amazonaws.com/order-service:1.3.5 # {"$imagepolicy": "flux-system:order-service"}
---
# 4. Configure the automation — writes updated tag back to Git
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageUpdateAutomation
metadata:
name: flux-system
namespace: flux-system
spec:
interval: 5m
sourceRef:
kind: GitRepository
name: platform-gitops
git:
push:
branch: main
commit:
author:
name: Fluxbot
email: fluxbot@example.com
messageTemplate: |
chore: update {{range .Updated.Images}}{{println .}}{{end}}
update:
strategy: Setters # uses imagepolicy markers
11. Argo CD vs Flux
| Feature | Argo CD | Flux v2 |
|---|---|---|
| UI | Rich web UI (app tree, diff view, sync log) | No built-in UI (Weave GitOps provides one) |
| Multi-cluster | Central management cluster deploys to all | Each cluster runs its own Flux controllers |
| Application model | Application CRD (Helm/Kustomize/raw YAML) | Separate HelmRelease + Kustomization CRDs |
| Fleet management | ApplicationSet (generators for fleet) | Per-cluster Flux install, no central ApplicationSet |
| Image automation | Argocd-image-updater (external tool) | Built-in image-reflector + image-automation controllers |
| Secrets decryption | Helm-secrets plugin (SOPS) or ESO | Native SOPS + age/gpg decryption in Kustomization |
| Drift self-heal | selfHeal: true (automatic) | Continuous reconciliation (always self-heals) |
| Notification | Built-in notifications controller | notification-controller with Provider/Alert CRDs |
| Dependency ordering | Sync waves + hooks | dependsOn in Kustomization/HelmRelease |
| RBAC | AppProject + RBAC policies (Casbin) | Kubernetes RBAC on CRDs; multi-tenancy via namespacing |
| Rollback | argocd app rollback to history entry | Helm rollback via HelmRelease spec changes |
| Progressive delivery | Argo Rollouts (separate but integrated) | Flagger (separate, supported) |
| CNCF status | Graduated (2022) | Graduated (2022) |
Use Argo CD when you need a central management plane for many clusters, a UI for developer visibility, and explicit sync control (sync windows, manual approval before prod). Use Flux when you prefer a fully GitOps-native (no UI) approach, native image automation, SOPS secrets integration, or deploying to clusters with restricted outbound network access (agent-only in-cluster). Many large organisations use both: Flux in clusters, Argo CD in management cluster for fleet visibility.
12. Git Branching Strategies for GitOps
Strategy Comparison
| Strategy | Branch per Env | Path per Env | Tag-based |
|---|---|---|---|
| Description | main→prod, staging→staging, dev→dev branches | Single branch; clusters/dev/, clusters/staging/, clusters/prod/ paths | Tags (v1.4.2-prod) trigger environment promotion |
| Promotion | PR/merge from dev→staging→main | Copy/update values from one path to another; single PR | Tag push triggers deployment pipeline |
| Auditability | Hard (diff across branches) | Easy (diff within same repo/branch) | Medium (tag history) |
| Complexity | High (merge conflicts, branch divergence) | Low (single source of truth) | Medium |
| Recommended | Not recommended | ✓ Recommended | For Helm chart promotion only |
Recommended: Path-per-Environment Structure
platform-gitops/
├── clusters/
│ ├── production/
│ │ ├── flux-system/ ← Flux bootstrap manifests (auto-managed)
│ │ └── apps/ ← ApplicationSet / Kustomization manifests
│ ├── staging/
│ │ └── apps/
│ └── dev/
│ └── apps/
│
├── add-ons/ ← Helm chart wrappers (environment-agnostic)
│ ├── cert-manager/
│ │ ├── kustomization.yaml
│ │ ├── helmrelease.yaml
│ │ └── values.yaml ← common defaults
│ └── kube-prometheus-stack/
│ ├── helmrelease.yaml
│ └── values.yaml
│
└── services/ ← Application workload definitions
├── order-service/
│ ├── base/ ← kustomize base
│ └── overlays/
│ ├── dev/ ← 1 replica, debug env vars
│ ├── staging/ ← 2 replicas, staging config
│ └── production/ ← 3 replicas, HPA enabled, PDB
Promotion Workflow
# Promotion = updating the image tag (or other config) in the target environment path
# Method 1: Automated (Flux image automation or Argo CD Image Updater)
# CI builds image → pushes tag → image automation detects → commits to services/order-service/overlays/staging/
# Method 2: Scripted promotion (in CI pipeline after staging tests pass)
CURRENT_TAG=$(yq eval '.images[0].newTag' services/order-service/overlays/staging/kustomization.yaml)
yq eval -i ".images[0].newTag = \"${CURRENT_TAG}\"" \
services/order-service/overlays/production/kustomization.yaml
git add services/order-service/overlays/production/
git commit -m "chore: promote order-service ${CURRENT_TAG} to production"
git push origin main
# GitOps agent detects the commit → syncs production cluster
# Method 3: Manual PR-based promotion (for regulated environments)
# Developer opens PR updating production overlay
# Requires approval from tech lead + platform team
# Auto-merge on approval → GitOps syncs
13. Secrets in GitOps
The fundamental constraint: Git must never contain plaintext secrets. Three approaches are widely used, each with different trust boundaries.
SOPS + age/GPG
Encrypt secret values before committing to Git. Decrypt at apply time in-cluster. Flux has native SOPS support. Argo CD needs the helm-secrets plugin.
in-repoExternal Secrets Operator (ESO)
Secrets live in Vault/AWS SM/GCP SM. ESO syncs them into Kubernetes Secrets. GitOps repos only contain ExternalSecret CRs (references, no values).
recommendedSealed Secrets
Encrypt with cluster's public key; only that cluster can decrypt. GitOps repo contains SealedSecret CRs. Simple but requires re-encryption for key rotation.
simpleSOPS + age Integration (Flux)
# 1. Generate age key pair
age-keygen -o age.agekey
# Public key: age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sf7m4scg9y4f
# 2. Store private key in cluster (for Flux decryption)
kubectl create secret generic sops-age \
-n flux-system \
--from-file=age.agekey=age.agekey
# 3. Create .sops.yaml to configure which files to encrypt
# and which keys to use:
cat .sops.yaml
creation_rules:
- path_regex: .*/secrets/.*\.yaml
age: age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sf7m4scg9y4f
# 4. Encrypt a secret file
sops -e secrets/db-credentials.yaml > secrets/db-credentials.enc.yaml
# 5. In Flux Kustomization, enable SOPS decryption:
spec:
decryption:
provider: sops
secretRef:
name: sops-age
External Secrets Operator (GitOps-compatible)
# In GitOps repo — no secret values, only references:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
namespace: order-service
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: aws-secretsmanager
target:
name: db-credentials # creates this K8s Secret
creationPolicy: Owner
data:
- secretKey: url
remoteRef:
key: production/order-service/db
property: connection_string
14. Drift Detection & Remediation
Argo CD Drift Detection
# View drift for an application
argocd app diff order-service
# Application status when drifted:
# STATUS: OutOfSync (even if selfHeal is enabled, there is a window before correction)
# PrometheusRule: alert when applications are out of sync
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: argocd-alerts
namespace: argocd
spec:
groups:
- name: argocd
rules:
- alert: ArgoCDAppOutOfSync
expr: |
argocd_app_info{sync_status="OutOfSync"} == 1
for: 10m # allow selfHeal window
labels:
severity: warning
team: platform
annotations:
summary: "Argo CD app {{ $labels.name }} is OutOfSync"
description: "App {{ $labels.name }} in project {{ $labels.project }} has been OutOfSync for 10m"
- alert: ArgoCDAppUnhealthy
expr: |
argocd_app_info{health_status!~"Healthy|Progressing"} == 1
for: 15m
labels:
severity: warning
annotations:
summary: "Argo CD app {{ $labels.name }} is {{ $labels.health_status }}"
- alert: ArgoCDSyncFailed
expr: |
argocd_app_info{sync_status="Unknown"} == 1
for: 5m
labels:
severity: critical
Argo CD Notifications
# notifications-cm ConfigMap — send Slack message on sync failure
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-notifications-cm
namespace: argocd
data:
service.slack: |
token: $slack-token
template.app-sync-failed: |
slack:
attachments: |
[{
"title": "{{.app.metadata.name}}",
"title_link": "{{.context.argocdUrl}}/applications/{{.app.metadata.name}}",
"color": "danger",
"fields": [
{"title": "Sync Status", "value": "{{.app.status.sync.status}}", "short": true},
{"title": "Health", "value": "{{.app.status.health.status}}", "short": true},
{"title": "Error", "value": "{{.app.status.operationState.message}}"}
]
}]
trigger.on-sync-failed: |
- when: app.status.operationState.phase in ['Error', 'Failed']
send: [app-sync-failed]
trigger.on-health-degraded: |
- when: app.status.health.status == 'Degraded'
send: [app-sync-failed]
subscriptions: |
- recipients: [slack:platform-alerts]
triggers: [on-sync-failed, on-health-degraded]
15. Best Practices
1. Use path-per-environment, not branch-per-env
Branch-per-environment causes merge conflicts and divergence. A single main branch with clusters/dev/, clusters/staging/, clusters/production/ paths is auditable and easy to diff.
2. Enable prune and selfHeal from day one
Disable these during initial migration only. Without prune, orphaned resources accumulate silently. Without selfHeal, manual changes go undetected. Git is truth — enforce it.
3. Use AppProjects to enforce team isolation
Never give teams access to the default AppProject. Each team gets an AppProject restricting their source repos, destination clusters/namespaces, and allowed resource types. This prevents a misconfigured app from affecting other teams.
4. Never store plaintext secrets in Git
Use SOPS (for in-repo encrypted secrets), ESO (for external secret stores), or Sealed Secrets. All three are compatible with GitOps. Plaintext secrets in Git are irrecoverable even after rotation (git history retains them).
5. Pin chart versions and image tags
Never use targetRevision: HEAD for production add-ons. Pin Helm chart versions (version: 60.3.0) and image tags. Use Renovate or Dependabot to propose version bumps via PR — not automatic floating.
6. Use sync waves for dependency ordering
CRDs must exist before CRs. Cert-manager must be healthy before Ingress with TLS. Database migrations must complete before new pods start. Model these with sync waves and hooks rather than sleeps in scripts.
7. Alert on OutOfSync and Degraded
A GitOps system that doesn't alert on persistent OutOfSync is providing a false sense of security. argocd_app_info{sync_status="OutOfSync"} for >10 minutes deserves a Slack notification at minimum.
8. Use ServerSideApply for large resources
Add ServerSideApply=true to syncOptions for all applications. SSA handles CRD updates, larger objects, and manager conflicts better than client-side apply. Required for Kyverno and Gatekeeper policy CRDs.
Coverage Checklist
- GitOps four principles (OpenGitOps spec): declarative/versioned/pull/reconciled
- GitOps flow diagram: git push → agent poll → diff → apply → self-heal
- Push-based vs pull-based CD comparison table (credentials/drift/auditability/multi-cluster)
- Argo CD architecture diagram (API server / repo server / application controller / Redis / Dex)
- Argo CD Helm install with argocd-values.yaml (HA, OIDC SSO, RBAC, repoServer replicas, ingress, SSA)
- Application CRD full reference: source (Helm/Kustomize), syncPolicy (automated prune/selfHeal/allowEmpty), syncOptions, retry, ignoreDifferences, finalizers
- AppProject CRD: destinations, sourceRepos, resource whitelist/blacklist, orphanedResources, roles
- App-of-apps pattern: directory structure + cert-manager Application example with sync-wave annotation
- ApplicationSet List generator (explicit multi-cluster)
- ApplicationSet Git generator (directory-based, one app per dir)
- ApplicationSet Matrix generator (clusters × services cartesian product)
- ApplicationSet Pull Request generator (GitHub PR preview environments)
- Sync waves: ordering pattern (-3 to 10) with resource annotation
- PreSync hook (db-migrate Job with BeforeHookCreation policy)
- PostSync hook (smoke-test Job with HookSucceeded delete policy)
- All hook types and delete policies reference
- Argo CD RBAC policy.csv syntax (p/g entries, resources, actions)
- argocd CLI: login, list, get, diff, sync, dry-run, history, rollback, wait, refresh, force, terminate-op
- Flux v2 controllers diagram (source/kustomize/helm/notification/image-reflector/image-automation)
- flux bootstrap github command (creates deploy key, installs, commits to Git)
- GitRepository source with interval, ref, secretRef, ignore patterns
- Flux Kustomization: interval, prune, healthChecks, postBuild substitute/substituteFrom, decryption (SOPS), dependsOn, timeout
- HelmRelease: chart spec with semver constraint, values, valuesFrom, upgrade remediation, rollback, driftDetection, test
- HelmRepository (HTTP + OCI type with ECR)
- Flux image automation: ImageRepository → ImagePolicy (semver range) → Deployment marker comment → ImageUpdateAutomation (writes back to Git)
- Argo CD vs Flux comparison table (12 dimensions)
- Choosing between them callout (central management vs per-cluster)
- Git branching strategies: branch-per-env vs path-per-env vs tag-based comparison table
- Path-per-environment recommended structure (clusters/add-ons/services with overlays)
- Promotion workflow: automated (image automation), scripted (yq + git push), manual PR
- Secrets in GitOps: SOPS vs ESO vs Sealed Secrets cards
- SOPS + age setup: keygen, store in cluster, .sops.yaml creation rules, encrypt command, Flux decryption config
- ESO ExternalSecret in GitOps repo (reference-only, no values)
- Drift detection: argocd app diff + PrometheusRule (OutOfSync/Unhealthy/SyncFailed alerts)
- Argo CD notifications ConfigMap (Slack on sync-failed + health-degraded)
- 8 best practices cards