GitOps

1. GitOps Principles

GitOps is an operational model where the desired state of a system is declared in Git, and an automated agent continuously reconciles the live state to match the declared state. The four core principles (OpenGitOps spec):

1. Declarative

The entire system is described declaratively — not imperative scripts. Kubernetes YAML, Helm charts, and Kustomize overlays express what should exist, not how to get there.

2. Versioned and Immutable

The desired state is stored in Git with a complete history. Every change is a commit with authorship, timestamp, and message. Rollback is a Git revert.

3. Pulled Automatically

Software agents (Argo CD, Flux) pull from the Git source and apply changes. No CI system needs cluster credentials. The cluster-side agent initiates all connections — reducing the attack surface.

4. Continuously Reconciled

Agents continuously compare live state with desired state and correct drift. A manual kubectl apply that deviates from Git is automatically reverted (self-heal mode).

GitOps flow:

Developer                  Git Repository            Kubernetes Cluster
    │                            │                          │
    │── git commit + push ──────▶│                          │
    │                            │◀── poll (every 3min) ────│
    │                            │       (Argo CD / Flux)   │
    │                            │──── diff detected ──────▶│
    │                            │                          │── apply/sync
    │                            │                          │── reconcile
    │                            │◀────── Health: Healthy ──│
    │                            │                          │
    │                            │  [manual kubectl apply]  │
    │                            │                          │── drift detected
    │                            │◀────── selfHeal ─────────│── revert to Git state

Push-based vs Pull-based CD

DimensionPush-based (traditional CD)Pull-based (GitOps)
Who initiates deploymentCI pipeline pushes to clusterAgent in cluster pulls from Git
Credentials locationCI system holds kubeconfig/service accountOnly Git credentials needed outside cluster
Attack surfaceCI compromise = cluster compromiseCluster credentials never leave cluster
Drift detectionNone (pipeline only runs on trigger)Continuous — every reconciliation loop
AuditabilityCI logs (may be transient)Git history (permanent, signed)
Multi-clusterComplex (credential management per cluster)Single agent per cluster, single Git repo

2. Argo CD Architecture

┌──────────────────────────────────────────────────────┐
│                    Argo CD                            │
│                                                      │
│  ┌──────────────┐  ┌───────────────┐                 │
│  │  API Server  │  │   Repo Server │                 │
│  │  (argocd-    │  │  (git clone,  │                 │
│  │   server)    │  │   template,   │                 │
│  │  REST + gRPC │  │   diff cache) │                 │
│  └──────┬───────┘  └───────┬───────┘                 │
│         │                  │                         │
│  ┌──────▼──────────────────▼───────┐                 │
│  │        Application Controller   │                 │
│  │  (reconcile loop, sync,         │                 │
│  │   health assessment,            │                 │
│  │   hook execution)               │                 │
│  └──────────────────────────────── ┘                 │
│                                                      │
│  ┌──────────────┐  ┌───────────────┐                 │
│  │  Redis       │  │  Dex (SSO)    │                 │
│  │  (cache,     │  │  (OIDC auth)  │                 │
│  │   sessions)  │  │               │                 │
│  └──────────────┘  └───────────────┘                 │
└──────────────────────────────────────────────────────┘

Storage: Application CRDs in Kubernetes etcd (not external DB)

Argo CD Install (Production)

kubectl create namespace argocd

# HA install (recommended for production — 3 app controller replicas)
kubectl apply -n argocd \
  -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/ha/install.yaml

# Or via Helm (recommended for managing Argo CD config-as-code)
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update

helm upgrade --install argocd argo/argo-cd \
  --namespace argocd \
  --create-namespace \
  --values argocd-values.yaml
# argocd-values.yaml (production)
global:
  domain: argocd.internal.example.com

configs:
  cm:
    # Resource tracking: label (default) or annotation
    application.resourceTrackingMethod: annotation

    # Timeout for sync operations
    timeout.reconciliation: 180s

    # OIDC SSO via Dex
    url: https://argocd.internal.example.com
    oidc.config: |
      name: Okta
      issuer: https://myorg.okta.com/oauth2/default
      clientId: $oidc.okta.clientId
      clientSecret: $oidc.okta.clientSecret
      requestedScopes: [openid, profile, email, groups]
      requestedIDTokenClaims:
        groups:
          essential: true

  rbac:
    policy.csv: |
      g, platform-admins, role:admin
      g, developers, role:readonly
      p, role:team-deployer, applications, sync, */*, allow
      p, role:team-deployer, applications, get, */*, allow
    policy.default: role:readonly
    scopes: '[groups]'

  params:
    server.insecure: false
    controller.diff.server.side: "true"    # use server-side diff

repoServer:
  replicas: 2
  resources:
    requests: {cpu: 250m, memory: 256Mi}
    limits: {memory: 1Gi}
  # Enable Helm plugins / custom tools
  volumes:
    - name: custom-tools
      emptyDir: {}
  initContainers:
    - name: download-tools
      image: alpine:3.18
      command: [sh, -c]
      args:
        - |
          # Install helm-secrets plugin for SOPS decryption
          helm plugin install https://github.com/jkroepke/helm-secrets

applicationSet:
  replicas: 2

server:
  replicas: 2
  service:
    type: ClusterIP
  ingress:
    enabled: true
    ingressClassName: nginx
    hostname: argocd.internal.example.com
    tls: true
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod

controller:
  replicas: 2   # HA: sharded app controller
  env:
    - name: ARGOCD_CONTROLLER_REPLICAS
      value: "2"

3. Argo CD Applications

Application CRD — Full Reference

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: order-service
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io  # cascade delete managed resources on app deletion
  annotations:
    argocd.argoproj.io/sync-wave: "5"          # sync ordering
spec:
  project: team-a                              # Argo CD AppProject for RBAC

  source:
    repoURL: https://github.com/myorg/services
    targetRevision: HEAD                       # or specific tag/branch/SHA
    path: order-service/helm

    # Helm source
    helm:
      releaseName: order-service
      valueFiles:
        - values.yaml
        - values-production.yaml
      values: |
        image:
          tag: "1.4.2"
      ignoreMissingValueFiles: false
      skipCrds: false

  # Alternative: Kustomize source
  # source:
  #   path: order-service/kustomize/overlays/production
  #   kustomize:
  #     namePrefix: prod-
  #     images:
  #       - myregistry/order-service:1.4.2

  destination:
    server: https://kubernetes.default.svc    # in-cluster
    # or: server: https://prod-us.example.com (registered external cluster)
    namespace: order-service

  syncPolicy:
    automated:
      prune: true        # delete resources removed from Git
      selfHeal: true     # revert manual kubectl changes
      allowEmpty: false  # never sync to an empty state
    syncOptions:
      - CreateNamespace=true
      - ServerSideApply=true       # use SSA instead of kubectl apply
      - PrunePropagationPolicy=foreground
      - PruneLast=true             # delete resources after all others synced
      - RespectIgnoreDifferences=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

  # Ignore specific fields that change outside Git (e.g., HPA replicas)
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas                    # HPA manages this
    - group: ""
      kind: Secret
      managedFieldsManagers:
        - secrets-store-sync               # ESO manages secret data

  revisionHistoryLimit: 10

AppProject — Team Isolation

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: team-a
  namespace: argocd
spec:
  description: "Team A microservices"

  # Only allow syncing to specific clusters/namespaces
  destinations:
    - server: https://kubernetes.default.svc
      namespace: team-a-*    # wildcard namespace match
    - server: https://prod-eu.example.com
      namespace: team-a-*

  # Only allow sources from team's repo
  sourceRepos:
    - https://github.com/myorg/team-a-services
    - https://charts.helm.sh/stable          # approved Helm repos

  # Restrict which K8s resources can be managed
  clusterResourceWhitelist:
    - group: ""
      kind: Namespace
  namespaceResourceBlacklist:
    - group: ""
      kind: ResourceQuota              # only platform team can set quotas
    - group: networking.k8s.io
      kind: NetworkPolicy              # only platform team can create network policies

  # Orphaned resource monitoring (alert on resources not in any app)
  orphanedResources:
    warn: true

  roles:
    - name: team-a-deployer
      policies:
        - p, proj:team-a:team-a-deployer, applications, sync, team-a/*, allow
        - p, proj:team-a:team-a-deployer, applications, get, team-a/*, allow
      groups:
        - team-a-developers

4. App-of-Apps Pattern

The app-of-apps pattern uses a single root Application that manages other Applications. This allows the entire cluster state to be bootstrapped from a single kubectl apply and maintained declaratively in Git.

platform-gitops/
└── clusters/production/apps/         ← root app points here
    ├── cert-manager.yaml              ← each file is an Application CR
    ├── ingress-nginx.yaml
    ├── kube-prometheus-stack.yaml
    ├── external-secrets.yaml
    ├── gatekeeper.yaml
    ├── karpenter.yaml
    └── team-apps/
        ├── team-a-apps.yaml           ← nested app-of-apps for each team
        └── team-b-apps.yaml
# clusters/production/apps/cert-manager.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: cert-manager
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: "-1"   # install before things that need certs
spec:
  project: platform
  source:
    repoURL: https://charts.jetstack.io
    chart: cert-manager
    targetRevision: v1.15.0
    helm:
      values: |
        installCRDs: true
        replicaCount: 2
        resources:
          requests: {cpu: 100m, memory: 64Mi}
          limits: {memory: 256Mi}
  destination:
    server: https://kubernetes.default.svc
    namespace: cert-manager
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
      - ServerSideApply=true

5. ApplicationSets

ApplicationSet is a controller that generates multiple Argo CD Application resources from a single template using generators. It enables fleet-wide deployments and per-team application management without manual Application YAML per environment.

List Generator — Explicit Multi-Cluster

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: guestbook
  namespace: argocd
spec:
  generators:
    - list:
        elements:
          - cluster: prod-us-east
            url: https://prod-us-east.example.com
            env: production
          - cluster: prod-eu-west
            url: https://prod-eu-west.example.com
            env: production
          - cluster: staging
            url: https://staging.example.com
            env: staging
  template:
    metadata:
      name: "guestbook-{{cluster}}"
    spec:
      project: platform
      source:
        repoURL: https://github.com/myorg/platform-gitops
        targetRevision: HEAD
        path: "apps/guestbook/overlays/{{env}}"
      destination:
        server: "{{url}}"
        namespace: guestbook
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

Git Generator — Directory-based Multi-Env

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: cluster-addons
  namespace: argocd
spec:
  generators:
    - git:
        repoURL: https://github.com/myorg/platform-gitops
        revision: HEAD
        directories:
          - path: "add-ons/*"           # one Application per directory
          - path: "add-ons/kustomize/*"
            exclude: true               # exclude subdirectories
  template:
    metadata:
      name: "addon-{{path.basename}}"
    spec:
      project: platform
      source:
        repoURL: https://github.com/myorg/platform-gitops
        targetRevision: HEAD
        path: "{{path}}"
      destination:
        server: https://kubernetes.default.svc
        namespace: "{{path.basename}}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

Matrix Generator — Env × App Cartesian Product

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: services-matrix
  namespace: argocd
spec:
  generators:
    - matrix:
        generators:
          # Generator 1: list of clusters
          - clusters:
              selector:
                matchLabels:
                  platform.example.com/tier: workload
          # Generator 2: list of services from git
          - git:
              repoURL: https://github.com/myorg/services
              revision: HEAD
              files:
                - path: "services/*/config.json"  # reads JSON per service
  template:
    metadata:
      name: "{{name}}-{{values.service}}"
    spec:
      project: "{{values.team}}"
      source:
        repoURL: https://github.com/myorg/services
        targetRevision: HEAD
        path: "services/{{values.service}}/helm"
        helm:
          valueFiles:
            - values.yaml
            - "values-{{metadata.labels.environment}}.yaml"
      destination:
        server: "{{server}}"
        namespace: "{{values.service}}"

Pull Request Generator — Preview Environments

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: preview-environments
  namespace: argocd
spec:
  generators:
    - pullRequest:
        github:
          owner: myorg
          repo: order-service
          tokenRef:
            secretName: github-token
            key: token
          labels:
            - preview             # only PRs with this label
  template:
    metadata:
      name: "pr-{{number}}-order-service"
    spec:
      project: preview
      source:
        repoURL: https://github.com/myorg/order-service
        targetRevision: "{{head_sha}}"
        path: helm
        helm:
          values: |
            image:
              tag: "pr-{{number}}"
            ingress:
              host: "pr-{{number}}.preview.example.com"
      destination:
        server: https://kubernetes.default.svc
        namespace: "preview-pr-{{number}}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

6. Sync Waves & Hooks

Sync Waves

Sync waves control the order in which resources are applied within a single sync operation. Resources in lower waves are applied first and must be healthy before higher waves begin.

# Resource annotation — wave number (default 0, range: any integer)
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "5"

# Common wave ordering pattern:
# Wave -3: Namespaces
# Wave -2: CRDs (cert-manager CRDs, Gatekeeper CRDs, etc.)
# Wave -1: Core infrastructure (cert-manager, external-dns, kube-state-metrics)
# Wave  0: Default (most resources)
# Wave  1: Applications that depend on infrastructure
# Wave  5: Ingress resources (after cert-manager and ingress controller)
# Wave 10: Smoke test jobs / verification

Resource Hooks

# PreSync hook — run before sync (e.g., database migrations)
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migrate
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: myregistry/order-service:1.4.2
          command: ["/app/migrate", "up"]
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: url
      restartPolicy: Never
  backoffLimit: 3
# PostSync hook — run after all resources are Healthy (e.g., smoke test)
apiVersion: batch/v1
kind: Job
metadata:
  name: smoke-test
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded  # auto-cleanup on success
spec:
  template:
    spec:
      containers:
        - name: smoke-test
          image: curlimages/curl:8.4.0
          command:
            - sh
            - -c
            - |
              until curl -sf http://order-service.order-service/health; do
                echo "waiting..."; sleep 3
              done
              echo "smoke test passed"
      restartPolicy: Never

# Available hook types:
# PreSync     — before sync begins
# Sync        — during sync (parallel with resource apply)
# PostSync    — after all resources are Healthy
# SyncFail    — if sync fails (rollback notification, cleanup)
# PostDelete  — when application is deleted

# Hook delete policies:
# HookSucceeded      — delete after successful run (default for most)
# HookFailed         — delete after failed run
# BeforeHookCreation — delete old hook before creating new one (idempotent)

7. Argo CD RBAC & SSO

RBAC Policy Syntax

# argocd-rbac-cm ConfigMap policy.csv
# Format: p, , , , , 
# Subject: user: or role: or group via g mapping

# Built-in roles: role:admin, role:readonly

# Grant team-a group ability to sync/delete only their apps
p, role:team-a-deployer, applications, sync,   team-a/*, allow
p, role:team-a-deployer, applications, delete, team-a/*, allow
p, role:team-a-deployer, applications, get,    team-a/*, allow
p, role:team-a-deployer, logs,        get,    team-a/*, allow

# Bind groups to roles
g, okta-group:team-a-engineers, role:team-a-deployer
g, okta-group:platform-engineers, role:admin

# Resources: applications, repositories, clusters, certificates, gpgkeys,
#            exec, logs, projects, accounts, applicationsets

# Actions: get, create, update, delete, sync, override, action/*

argocd CLI Usage

# Login
argocd login argocd.internal.example.com \
  --username admin \
  --password $(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath='{.data.password}' | base64 -d)

# App management
argocd app list
argocd app get order-service
argocd app diff order-service           # diff live vs desired state
argocd app sync order-service           # trigger manual sync
argocd app sync order-service --dry-run # preview sync without applying
argocd app history order-service        # deployment history
argocd app rollback order-service 5     # rollback to revision 5

# Wait for sync to complete in CI
argocd app wait order-service \
  --sync \
  --health \
  --timeout 300

# Refresh (re-fetch from Git without syncing)
argocd app get order-service --refresh

# Force sync (ignore sync window)
argocd app sync order-service --force

# Terminate running sync
argocd app terminate-op order-service

8. Flux Architecture

Flux v2 Controllers (GitOps Toolkit):

  ┌───────────────────────────────────────────────────────┐
  │                    Flux Controllers                    │
  │                                                       │
  │  source-controller      → GitRepository               │
  │                         → HelmRepository              │
  │                         → OCIRepository               │
  │                         → Bucket (S3/GCS)             │
  │                                                       │
  │  kustomize-controller   → Kustomization               │
  │                           (applies kustomize overlays)│
  │                                                       │
  │  helm-controller        → HelmRelease                 │
  │                           (manages Helm releases)     │
  │                                                       │
  │  notification-controller → Provider (Slack/Teams/PD) │
  │                          → Alert (what to notify)     │
  │                          → Receiver (webhook)         │
  │                                                       │
  │  image-reflector-controller → ImageRepository         │
  │                             → ImagePolicy             │
  │                                                       │
  │  image-automation-controller → ImageUpdateAutomation  │
  │                                (writes back to Git)   │
  └───────────────────────────────────────────────────────┘

Flux Bootstrap

# Install flux CLI
curl -s https://fluxcd.io/install.sh | bash

# Bootstrap: installs Flux into cluster AND commits manifests to Git
flux bootstrap github \
  --owner=myorg \
  --repository=platform-gitops \
  --branch=main \
  --path=clusters/production \
  --personal=false \
  --token-auth=false \
  --ssh-key-algorithm=ed25519

# This:
# 1. Creates GitHub deploy key
# 2. Installs Flux CRDs + controllers in flux-system namespace
# 3. Commits Flux manifests to clusters/production/flux-system/
# 4. Creates GitRepository pointing at the repo
# 5. Creates Kustomization pointing at clusters/production/

GitRepository Source

apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: platform-gitops
  namespace: flux-system
spec:
  interval: 1m                 # poll interval
  url: https://github.com/myorg/platform-gitops
  ref:
    branch: main
  secretRef:
    name: flux-system           # SSH deploy key or HTTPS token
  ignore: |
    # ignore non-Kubernetes files
    /.github/
    /docs/
    **/*.md
    **/*.png

9. Flux: Kustomize & Helm

Kustomization (Flux)

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: cluster-addons
  namespace: flux-system
spec:
  interval: 10m                # reconcile every 10 minutes
  path: "./clusters/production/add-ons"
  prune: true                  # delete removed resources from cluster
  sourceRef:
    kind: GitRepository
    name: platform-gitops
  healthChecks:                # wait for these to be healthy before proceeding
    - apiVersion: apps/v1
      kind: Deployment
      name: cert-manager
      namespace: cert-manager
  postBuild:
    substitute:
      CLUSTER_NAME: "production"
      AWS_ACCOUNT_ID: "123456789"
    substituteFrom:
      - kind: ConfigMap
        name: cluster-vars     # inject per-cluster variables
  decryption:
    provider: sops             # decrypt SOPS-encrypted secrets
    secretRef:
      name: sops-age           # age key for SOPS decryption
  timeout: 5m
  retryInterval: 30s

  # Dependency: only apply after infrastructure Kustomization is healthy
  dependsOn:
    - name: infrastructure

HelmRelease

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: kube-prometheus-stack
  namespace: observability
spec:
  interval: 1h
  chart:
    spec:
      chart: kube-prometheus-stack
      version: ">=60.0.0 <61.0.0"   # semver constraint
      sourceRef:
        kind: HelmRepository
        name: prometheus-community
        namespace: flux-system
      interval: 12h                  # check for chart updates

  # Override values
  values:
    prometheus:
      prometheusSpec:
        retention: 15d
        storageSpec:
          volumeClaimTemplate:
            spec:
              storageClassName: ebs-gp3
              resources:
                requests:
                  storage: 100Gi

  # Merge values from ConfigMap/Secret
  valuesFrom:
    - kind: ConfigMap
      name: prometheus-values
      optional: false
    - kind: Secret
      name: prometheus-secrets
      optional: true

  # Upgrade strategy
  upgrade:
    remediation:
      retries: 3               # retry failed upgrades
    cleanupOnFail: true        # remove new resources if upgrade fails

  # Rollback on upgrade failure
  rollback:
    timeout: 5m
    cleanupOnFail: true

  # Test hook after upgrade
  test:
    enable: true
    ignoreFailures: false

  # Drift detection: reconcile even if Helm thinks it's in sync
  driftDetection:
    mode: enabled
    ignore:
      - paths: ["/spec/replicas"]   # HPA manages replicas

HelmRepository Source

apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
  name: prometheus-community
  namespace: flux-system
spec:
  interval: 12h
  url: https://prometheus-community.github.io/helm-charts

---
# OCI Helm registry (Helm 3.8+, preferred for private charts)
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: myorg-charts
  namespace: flux-system
spec:
  interval: 12h
  type: oci
  url: oci://123456789.dkr.ecr.us-east-1.amazonaws.com/helm-charts
  secretRef:
    name: ecr-credentials

10. Flux Image Automation

Flux image automation monitors container registries for new image tags, selects the latest matching a policy (semver, alphabetical, regex), and writes the updated tag back to Git — triggering a GitOps sync automatically.

# 1. Define which registry to watch
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageRepository
metadata:
  name: order-service
  namespace: flux-system
spec:
  interval: 5m
  image: 123456789.dkr.ecr.us-east-1.amazonaws.com/order-service
  secretRef:
    name: ecr-credentials

---
# 2. Define how to select the "latest" tag
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy
metadata:
  name: order-service
  namespace: flux-system
spec:
  imageRepositoryRef:
    name: order-service
  policy:
    semver:
      range: ">=1.0.0 <2.0.0"   # only 1.x.x tags
    # Alternative: alphabetical (latest timestamp-tagged image)
    # alphabetical:
    #   order: asc

---
# 3. In the Deployment manifest, add a marker comment:
# (Flux image automation rewrites the tag between the markers)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  template:
    spec:
      containers:
        - name: order-service
          image: 123456789.dkr.ecr.us-east-1.amazonaws.com/order-service:1.3.5 # {"$imagepolicy": "flux-system:order-service"}

---
# 4. Configure the automation — writes updated tag back to Git
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageUpdateAutomation
metadata:
  name: flux-system
  namespace: flux-system
spec:
  interval: 5m
  sourceRef:
    kind: GitRepository
    name: platform-gitops
  git:
    push:
      branch: main
    commit:
      author:
        name: Fluxbot
        email: fluxbot@example.com
      messageTemplate: |
        chore: update {{range .Updated.Images}}{{println .}}{{end}}
  update:
    strategy: Setters    # uses imagepolicy markers

11. Argo CD vs Flux

FeatureArgo CDFlux v2
UIRich web UI (app tree, diff view, sync log)No built-in UI (Weave GitOps provides one)
Multi-clusterCentral management cluster deploys to allEach cluster runs its own Flux controllers
Application modelApplication CRD (Helm/Kustomize/raw YAML)Separate HelmRelease + Kustomization CRDs
Fleet managementApplicationSet (generators for fleet)Per-cluster Flux install, no central ApplicationSet
Image automationArgocd-image-updater (external tool)Built-in image-reflector + image-automation controllers
Secrets decryptionHelm-secrets plugin (SOPS) or ESONative SOPS + age/gpg decryption in Kustomization
Drift self-healselfHeal: true (automatic)Continuous reconciliation (always self-heals)
NotificationBuilt-in notifications controllernotification-controller with Provider/Alert CRDs
Dependency orderingSync waves + hooksdependsOn in Kustomization/HelmRelease
RBACAppProject + RBAC policies (Casbin)Kubernetes RBAC on CRDs; multi-tenancy via namespacing
Rollbackargocd app rollback to history entryHelm rollback via HelmRelease spec changes
Progressive deliveryArgo Rollouts (separate but integrated)Flagger (separate, supported)
CNCF statusGraduated (2022)Graduated (2022)
Choosing between them

Use Argo CD when you need a central management plane for many clusters, a UI for developer visibility, and explicit sync control (sync windows, manual approval before prod). Use Flux when you prefer a fully GitOps-native (no UI) approach, native image automation, SOPS secrets integration, or deploying to clusters with restricted outbound network access (agent-only in-cluster). Many large organisations use both: Flux in clusters, Argo CD in management cluster for fleet visibility.

12. Git Branching Strategies for GitOps

Strategy Comparison

StrategyBranch per EnvPath per EnvTag-based
Description main→prod, staging→staging, dev→dev branches Single branch; clusters/dev/, clusters/staging/, clusters/prod/ paths Tags (v1.4.2-prod) trigger environment promotion
Promotion PR/merge from dev→staging→main Copy/update values from one path to another; single PR Tag push triggers deployment pipeline
Auditability Hard (diff across branches) Easy (diff within same repo/branch) Medium (tag history)
Complexity High (merge conflicts, branch divergence) Low (single source of truth) Medium
Recommended Not recommended ✓ Recommended For Helm chart promotion only

Recommended: Path-per-Environment Structure

platform-gitops/
├── clusters/
│   ├── production/
│   │   ├── flux-system/         ← Flux bootstrap manifests (auto-managed)
│   │   └── apps/                ← ApplicationSet / Kustomization manifests
│   ├── staging/
│   │   └── apps/
│   └── dev/
│       └── apps/
│
├── add-ons/                     ← Helm chart wrappers (environment-agnostic)
│   ├── cert-manager/
│   │   ├── kustomization.yaml
│   │   ├── helmrelease.yaml
│   │   └── values.yaml          ← common defaults
│   └── kube-prometheus-stack/
│       ├── helmrelease.yaml
│       └── values.yaml
│
└── services/                    ← Application workload definitions
    ├── order-service/
    │   ├── base/                ← kustomize base
    │   └── overlays/
    │       ├── dev/             ← 1 replica, debug env vars
    │       ├── staging/         ← 2 replicas, staging config
    │       └── production/      ← 3 replicas, HPA enabled, PDB

Promotion Workflow

# Promotion = updating the image tag (or other config) in the target environment path

# Method 1: Automated (Flux image automation or Argo CD Image Updater)
# CI builds image → pushes tag → image automation detects → commits to services/order-service/overlays/staging/

# Method 2: Scripted promotion (in CI pipeline after staging tests pass)
CURRENT_TAG=$(yq eval '.images[0].newTag' services/order-service/overlays/staging/kustomization.yaml)
yq eval -i ".images[0].newTag = \"${CURRENT_TAG}\"" \
  services/order-service/overlays/production/kustomization.yaml

git add services/order-service/overlays/production/
git commit -m "chore: promote order-service ${CURRENT_TAG} to production"
git push origin main

# GitOps agent detects the commit → syncs production cluster

# Method 3: Manual PR-based promotion (for regulated environments)
# Developer opens PR updating production overlay
# Requires approval from tech lead + platform team
# Auto-merge on approval → GitOps syncs

13. Secrets in GitOps

The fundamental constraint: Git must never contain plaintext secrets. Three approaches are widely used, each with different trust boundaries.

SOPS + age/GPG

Encrypt secret values before committing to Git. Decrypt at apply time in-cluster. Flux has native SOPS support. Argo CD needs the helm-secrets plugin.

in-repo

External Secrets Operator (ESO)

Secrets live in Vault/AWS SM/GCP SM. ESO syncs them into Kubernetes Secrets. GitOps repos only contain ExternalSecret CRs (references, no values).

recommended

Sealed Secrets

Encrypt with cluster's public key; only that cluster can decrypt. GitOps repo contains SealedSecret CRs. Simple but requires re-encryption for key rotation.

simple

SOPS + age Integration (Flux)

# 1. Generate age key pair
age-keygen -o age.agekey
# Public key: age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sf7m4scg9y4f

# 2. Store private key in cluster (for Flux decryption)
kubectl create secret generic sops-age \
  -n flux-system \
  --from-file=age.agekey=age.agekey

# 3. Create .sops.yaml to configure which files to encrypt
# and which keys to use:
cat .sops.yaml
creation_rules:
  - path_regex: .*/secrets/.*\.yaml
    age: age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sf7m4scg9y4f

# 4. Encrypt a secret file
sops -e secrets/db-credentials.yaml > secrets/db-credentials.enc.yaml

# 5. In Flux Kustomization, enable SOPS decryption:
spec:
  decryption:
    provider: sops
    secretRef:
      name: sops-age

External Secrets Operator (GitOps-compatible)

# In GitOps repo — no secret values, only references:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-credentials
  namespace: order-service
spec:
  refreshInterval: 1h
  secretStoreRef:
    kind: ClusterSecretStore
    name: aws-secretsmanager
  target:
    name: db-credentials           # creates this K8s Secret
    creationPolicy: Owner
  data:
    - secretKey: url
      remoteRef:
        key: production/order-service/db
        property: connection_string

14. Drift Detection & Remediation

Argo CD Drift Detection

# View drift for an application
argocd app diff order-service

# Application status when drifted:
# STATUS: OutOfSync (even if selfHeal is enabled, there is a window before correction)

# PrometheusRule: alert when applications are out of sync
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: argocd-alerts
  namespace: argocd
spec:
  groups:
    - name: argocd
      rules:
        - alert: ArgoCDAppOutOfSync
          expr: |
            argocd_app_info{sync_status="OutOfSync"} == 1
          for: 10m          # allow selfHeal window
          labels:
            severity: warning
            team: platform
          annotations:
            summary: "Argo CD app {{ $labels.name }} is OutOfSync"
            description: "App {{ $labels.name }} in project {{ $labels.project }} has been OutOfSync for 10m"

        - alert: ArgoCDAppUnhealthy
          expr: |
            argocd_app_info{health_status!~"Healthy|Progressing"} == 1
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "Argo CD app {{ $labels.name }} is {{ $labels.health_status }}"

        - alert: ArgoCDSyncFailed
          expr: |
            argocd_app_info{sync_status="Unknown"} == 1
          for: 5m
          labels:
            severity: critical

Argo CD Notifications

# notifications-cm ConfigMap — send Slack message on sync failure
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
  namespace: argocd
data:
  service.slack: |
    token: $slack-token

  template.app-sync-failed: |
    slack:
      attachments: |
        [{
          "title": "{{.app.metadata.name}}",
          "title_link": "{{.context.argocdUrl}}/applications/{{.app.metadata.name}}",
          "color": "danger",
          "fields": [
            {"title": "Sync Status", "value": "{{.app.status.sync.status}}", "short": true},
            {"title": "Health", "value": "{{.app.status.health.status}}", "short": true},
            {"title": "Error", "value": "{{.app.status.operationState.message}}"}
          ]
        }]

  trigger.on-sync-failed: |
    - when: app.status.operationState.phase in ['Error', 'Failed']
      send: [app-sync-failed]

  trigger.on-health-degraded: |
    - when: app.status.health.status == 'Degraded'
      send: [app-sync-failed]

  subscriptions: |
    - recipients: [slack:platform-alerts]
      triggers: [on-sync-failed, on-health-degraded]

15. Best Practices

1. Use path-per-environment, not branch-per-env

Branch-per-environment causes merge conflicts and divergence. A single main branch with clusters/dev/, clusters/staging/, clusters/production/ paths is auditable and easy to diff.

2. Enable prune and selfHeal from day one

Disable these during initial migration only. Without prune, orphaned resources accumulate silently. Without selfHeal, manual changes go undetected. Git is truth — enforce it.

3. Use AppProjects to enforce team isolation

Never give teams access to the default AppProject. Each team gets an AppProject restricting their source repos, destination clusters/namespaces, and allowed resource types. This prevents a misconfigured app from affecting other teams.

4. Never store plaintext secrets in Git

Use SOPS (for in-repo encrypted secrets), ESO (for external secret stores), or Sealed Secrets. All three are compatible with GitOps. Plaintext secrets in Git are irrecoverable even after rotation (git history retains them).

5. Pin chart versions and image tags

Never use targetRevision: HEAD for production add-ons. Pin Helm chart versions (version: 60.3.0) and image tags. Use Renovate or Dependabot to propose version bumps via PR — not automatic floating.

6. Use sync waves for dependency ordering

CRDs must exist before CRs. Cert-manager must be healthy before Ingress with TLS. Database migrations must complete before new pods start. Model these with sync waves and hooks rather than sleeps in scripts.

7. Alert on OutOfSync and Degraded

A GitOps system that doesn't alert on persistent OutOfSync is providing a false sense of security. argocd_app_info{sync_status="OutOfSync"} for >10 minutes deserves a Slack notification at minimum.

8. Use ServerSideApply for large resources

Add ServerSideApply=true to syncOptions for all applications. SSA handles CRD updates, larger objects, and manager conflicts better than client-side apply. Required for Kyverno and Gatekeeper policy CRDs.

Coverage Checklist
  • GitOps four principles (OpenGitOps spec): declarative/versioned/pull/reconciled
  • GitOps flow diagram: git push → agent poll → diff → apply → self-heal
  • Push-based vs pull-based CD comparison table (credentials/drift/auditability/multi-cluster)
  • Argo CD architecture diagram (API server / repo server / application controller / Redis / Dex)
  • Argo CD Helm install with argocd-values.yaml (HA, OIDC SSO, RBAC, repoServer replicas, ingress, SSA)
  • Application CRD full reference: source (Helm/Kustomize), syncPolicy (automated prune/selfHeal/allowEmpty), syncOptions, retry, ignoreDifferences, finalizers
  • AppProject CRD: destinations, sourceRepos, resource whitelist/blacklist, orphanedResources, roles
  • App-of-apps pattern: directory structure + cert-manager Application example with sync-wave annotation
  • ApplicationSet List generator (explicit multi-cluster)
  • ApplicationSet Git generator (directory-based, one app per dir)
  • ApplicationSet Matrix generator (clusters × services cartesian product)
  • ApplicationSet Pull Request generator (GitHub PR preview environments)
  • Sync waves: ordering pattern (-3 to 10) with resource annotation
  • PreSync hook (db-migrate Job with BeforeHookCreation policy)
  • PostSync hook (smoke-test Job with HookSucceeded delete policy)
  • All hook types and delete policies reference
  • Argo CD RBAC policy.csv syntax (p/g entries, resources, actions)
  • argocd CLI: login, list, get, diff, sync, dry-run, history, rollback, wait, refresh, force, terminate-op
  • Flux v2 controllers diagram (source/kustomize/helm/notification/image-reflector/image-automation)
  • flux bootstrap github command (creates deploy key, installs, commits to Git)
  • GitRepository source with interval, ref, secretRef, ignore patterns
  • Flux Kustomization: interval, prune, healthChecks, postBuild substitute/substituteFrom, decryption (SOPS), dependsOn, timeout
  • HelmRelease: chart spec with semver constraint, values, valuesFrom, upgrade remediation, rollback, driftDetection, test
  • HelmRepository (HTTP + OCI type with ECR)
  • Flux image automation: ImageRepository → ImagePolicy (semver range) → Deployment marker comment → ImageUpdateAutomation (writes back to Git)
  • Argo CD vs Flux comparison table (12 dimensions)
  • Choosing between them callout (central management vs per-cluster)
  • Git branching strategies: branch-per-env vs path-per-env vs tag-based comparison table
  • Path-per-environment recommended structure (clusters/add-ons/services with overlays)
  • Promotion workflow: automated (image automation), scripted (yq + git push), manual PR
  • Secrets in GitOps: SOPS vs ESO vs Sealed Secrets cards
  • SOPS + age setup: keygen, store in cluster, .sops.yaml creation rules, encrypt command, Flux decryption config
  • ESO ExternalSecret in GitOps repo (reference-only, no values)
  • Drift detection: argocd app diff + PrometheusRule (OutOfSync/Unhealthy/SyncFailed alerts)
  • Argo CD notifications ConfigMap (Slack on sync-failed + health-degraded)
  • 8 best practices cards