💰 Cost Management

Kubernetes Cost Management

Complete guide to understanding, attributing, and optimizing cloud-native infrastructure spend — from resource right-sizing and spot instances to chargeback models, OpenCost, Kubecost, and FinOps practices for Kubernetes platforms.

📊 OpenCost / Kubecost ⚡ VPA / Right-sizing 🎯 Spot Instances 🏷️ Cost Allocation 📉 Savings Plans

Contents

  1. Kubernetes Cost Model
  2. OpenCost & Kubecost
  3. Cost Attribution & Chargeback
  4. Resource Right-Sizing
  5. Vertical Pod Autoscaler
  6. Spot / Preemptible Instances
  7. Cost-Aware Scheduling
  8. Reserved Instances & Savings Plans
  9. Cluster Efficiency Metrics
  10. FinOps Practices
  11. Cost Alerts & Budgets
  12. Best Practices

Kubernetes Cost Model

Cloud infrastructure costs in Kubernetes come from multiple sources that need to be decomposed and allocated to tenants. Understanding the model is prerequisite to optimization.

KUBERNETES CLUSTER COST COMPONENTS Cloud Provider Bill ┌────────────────────────────────────────────────────────────┐ │ Compute (60-80% of bill) │ │ ├── Node EC2/GCE/AKS instance hours │ │ │ ├── On-demand │ │ │ ├── Spot/Preemptible (60-90% cheaper) │ │ │ └── Reserved/Savings Plans (30-60% cheaper) │ │ └── Fargate/Autopilot pod-hours (serverless nodes) │ │ │ │ Storage (5-15%) │ │ ├── EBS/PD/managed disk (PersistentVolumes) │ │ ├── S3/GCS object storage (logs, backups, TechDocs) │ │ └── EFS/Filestore (shared ReadWriteMany PVCs) │ │ │ │ Network (5-20% — often invisible until it's huge) │ │ ├── Data transfer out to internet │ │ ├── Cross-AZ traffic (same region, different AZ) │ │ ├── Cross-region replication │ │ └── NAT Gateway (pods → internet) │ │ │ │ Managed Services (variable) │ │ ├── RDS / Cloud SQL (per-service databases) │ │ ├── ElastiCache / Memorystore │ │ └── Load Balancers (one per Service type=LoadBalancer) │ └────────────────────────────────────────────────────────────┘ │ ▼ Allocated to: Namespace → Team → Service → Feature → Environment

Cost vs Efficiency vs Waste

TermDefinitionMeasured By
CostDollars spent on infrastructureCloud bill + cost allocation tools
EfficiencyWorkload output per dollar (requests/s per $, SLO met per $)Business metrics / cost per request
WasteResources paid for but not used (idle CPU/RAM, orphaned PVs, zombie load balancers)requests.cpu vs actual usage; PV Bound but no pods
Right-sizing gapDifference between resource requests and actual usageVPA recommendations, Goldilocks
UtilizationActual usage / requested resourcesnode_cpu_utilization, container_memory_working_set
⚠️
Kubernetes bills by node, not by pod. If your nodes are at 15% CPU utilization, you're paying for 85% waste even if every pod has resource requests set. The lever is bin-packing (Karpenter consolidation) and right-sizing resource requests, not adding more nodes.

OpenCost & Kubecost

OpenCost is the CNCF-incubating open standard and open-source implementation for Kubernetes cost monitoring. Kubecost builds a commercial layer on top with additional features like savings recommendations, cluster right-sizing, and multi-cloud support.

OpenCost Architecture Cloud Provider APIs Kubernetes API (billing/pricing data) (pod/node/PV metrics) │ │ └──────────────┬────────────┘ ▼ OpenCost Server ├── Cost model (CPU/RAM/GPU/PV/network) ├── Allocation engine (namespace/label/pod) └── /allocation, /assets REST API │ ┌─────────┼──────────┐ ▼ ▼ ▼ Prometheus Grafana Kubecost UI (metrics (dashboards (recommendations storage) /alerts) /savings/budgets)

Install OpenCost

helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm repo update

helm install opencost opencost/opencost \
  --namespace opencost \
  --create-namespace \
  --version 1.42.0 \
  --set opencost.exporter.defaultClusterId=prod-us-east-1 \
  --set opencost.prometheus.internal.enabled=true \
  --set opencost.ui.enabled=true \
  --set opencost.cloudCost.enabled=true

# For AWS: provide cloud integration credentials
kubectl create secret generic cloud-integration \
  --namespace opencost \
  --from-literal=cloud-integration.json='{
    "aws": [{
      "athenaBucketName": "s3://my-cur-bucket",
      "athenaRegion": "us-east-1",
      "athenaDatabase": "athenacurcfn_my_database",
      "athenaTable": "my_cur_table",
      "projectID": "123456789012",
      "serviceKeyName": "opencost-aws-access"
    }]
  }'

OpenCost API Queries

# Cost by namespace for last 7 days
curl "http://opencost.opencost.svc:9003/allocation/compute?window=7d&aggregate=namespace&accumulate=true" | jq .

# Cost by label (e.g., by team)
curl "http://opencost.opencost.svc:9003/allocation/compute?window=30d&aggregate=label:team&accumulate=true" | jq .

# Cost breakdown for a specific namespace
curl "http://opencost.opencost.svc:9003/allocation/compute?window=1d&aggregate=pod&namespace=payments-api-production" | jq .

# Asset costs (nodes, PVs, load balancers)
curl "http://opencost.opencost.svc:9003/assets?window=7d&aggregate=type" | jq .

# Efficiency: CPU/RAM request utilization
curl "http://opencost.opencost.svc:9003/allocation/compute?window=1d&aggregate=namespace" | \
  jq '.data[0] | to_entries[] | {
    namespace: .key,
    cpuEfficiency: .value.cpuEfficiency,
    ramEfficiency: .value.ramEfficiency,
    totalCost: .value.totalCost
  }'

Kubecost Install (with savings features)

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update

helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --version 2.3.4 \
  --set kubecostToken="your-token" \
  --set global.prometheus.enabled=false \
  --set global.prometheus.fqdn=http://kube-prometheus-stack-prometheus.monitoring:9090 \
  --set kubecostProductConfigs.clusterName=prod-us-east-1 \
  --set kubecostProductConfigs.currencyCode=USD \
  --set savings.enabled=true \
  --set networkCosts.enabled=true

OpenCost Prometheus Metrics

# Key OpenCost metrics pushed to Prometheus
opencost_load_balancer_cost          # per LB hourly cost
opencost_node_total_hourly_cost      # per node
opencost_pod_seconds_total           # pod lifecycle seconds
node_total_hourly_cost               # node cost (from cloud pricing)
container_cpu_allocation             # CPU core-hours allocated
container_memory_allocation_bytes    # RAM byte-hours allocated

# Efficiency metrics (also via kube-state-metrics + cAdvisor)
container_cpu_usage_seconds_total    # actual CPU use
container_memory_working_set_bytes   # actual memory use
kube_pod_container_resource_requests # requested resources

Cost Attribution & Chargeback

Cost attribution maps cloud spend back to business units, teams, services, or features. This is the foundation of FinOps — you cannot optimize what you cannot measure and attribute.

Tagging Strategy

# Required labels on all K8s workloads (enforced by Kyverno — see 05-policy-enforcement.html)
# These labels flow into cost allocation tools
metadata:
  labels:
    team: payments
    env: production
    cost-center: CC-4892
    service: payments-api
    product: checkout-flow    # maps to P&L line
    component: backend        # frontend | backend | worker | batch

AWS Resource Tagging (Karpenter nodes inherit pod labels)

# EC2NodeClass: propagate K8s labels to EC2 instance tags
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: general
spec:
  amiFamily: AL2
  role: karpenter-node-role
  subnetSelectorTerms:
  - tags:
      karpenter.sh/discovery: prod-us-east-1
  securityGroupSelectorTerms:
  - tags:
      karpenter.sh/discovery: prod-us-east-1
  tags:
    # Static tags on every node
    Cluster: prod-us-east-1
    ManagedBy: karpenter
    Environment: production
  # Propagate workload labels to node tags via spec.kubelet.clusterDNS
  # Note: Karpenter 0.33+ supports kubelet.nodeLabels propagation

OpenCost Allocation by Label

# Monthly cost per team
curl "http://opencost.opencost.svc:9003/allocation/compute?window=month&aggregate=label:team&accumulate=true" | \
  jq '.data[0] | to_entries | sort_by(-.value.totalCost) |
  .[] | "\(.key): $\(.value.totalCost | . * 100 | round / 100)"'

# Cost split by environment
curl "http://opencost.opencost.svc:9003/allocation/compute?window=30d&aggregate=label:env&accumulate=true" | jq .

# Multi-level: team + service (groupBy in Kubecost UI)
curl "http://opencost.opencost.svc:9003/allocation/compute?window=7d&aggregate=label:team,label:service" | jq .

Chargeback vs Showback

Showback

Teams see their costs but are not financially charged. Builds awareness and accountability without creating accounting complexity. Good starting point. Teams respond to social pressure but have no financial incentive to optimize.

Chargeback

Teams are actually billed — deducted from their budget. Strongest incentive to right-size and eliminate waste. Requires accurate allocation data and agreed-upon shared cost methodology (how do you split shared infra like monitoring?).

Shared Cost Allocation Strategies

StrategyHow It WorksFair ForAvoid When
Even splitDivide shared infra cost equally among all tenantsSmall teams, similar sizesOne large + many small tenants
ProportionalEach tenant pays share proportional to their workload spendMost platforms (default in Kubecost)Teams have very different usage patterns
Weighted by usageShare monitoring/logging costs by metric cardinality or log volume emittedObservability cost attributionHard to instrument accurately
Fixed overhead rateFlat per-namespace fee covers shared platform costsSimplicity, SaaS modelLarge size variance across tenants

Resource Right-Sizing

Over-provisioned resource requests are the most common source of Kubernetes cost waste. Teams set requests high "to be safe" and never revisit them. Right-sizing closes the gap between requested and actual resources.

Identifying Over-Provisioned Workloads

# Find pods using < 20% of their CPU request
kubectl top pods -A --sort-by=cpu | awk 'NR>1 {print $1, $2, $3}'

# PromQL: CPU efficiency per deployment (lower = more waste)
# Ratio < 0.3 means using less than 30% of requested CPU
sum by (namespace, pod) (
  rate(container_cpu_usage_seconds_total{container!="",container!="POD"}[5m])
)
/
sum by (namespace, pod) (
  kube_pod_container_resource_requests{resource="cpu",container!=""}
)

# Memory efficiency (working set vs request)
sum by (namespace, pod) (
  container_memory_working_set_bytes{container!="",container!="POD"}
)
/
sum by (namespace, pod) (
  kube_pod_container_resource_requests{resource="memory",container!=""}
)

Goldilocks — VPA Recommendation UI

# Goldilocks runs VPA in recommendation-only mode and surfaces results in a UI
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm repo update

helm install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks \
  --create-namespace \
  --set vpa.enabled=true \
  --set dashboard.enabled=true

# Label namespaces to enable Goldilocks analysis
kubectl label ns payments-api-production goldilocks.fairwinds.com/enabled=true

# Goldilocks dashboard shows VPA recommendations per container
# with QoS class (Guaranteed/Burstable) and cost impact estimates

Typical Right-Sizing Findings

PatternExampleCorrectionTypical Savings
CPU request >> actual usagerequests.cpu: 1000m, actual: 80mSet requests.cpu: 200m (add 20% buffer)5× reduction in CPU allocation cost
Memory request >> working setrequests.memory: 2Gi, actual: 200MiSet requests.memory: 300Mi~85% memory cost reduction
Limits >> requests (burstable)limit: 4Gi, request: 128MiTighten limit to 2×–4× requestPrevents memory bombs, better bin-packing
No limits setPod can use unlimited CPU/RAMSet limits via LimitRange defaults or explicit specNode stability; predictable cost
Batch jobs with prod-class requestsCronJob with requests.cpu: 500mUse batch-low PriorityClass, reduce requests10–30% on batch-heavy workloads

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts CPU and memory requests based on observed usage history. It has three update modes — choose carefully, as Recreate mode restarts pods, which has availability implications.

Install VPA

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

# Or via Helm
helm repo add cowboysysop https://cowboysysop.github.io/charts/
helm install vpa cowboysysop/vertical-pod-autoscaler \
  --namespace kube-system

VPA Modes

ModeBehaviorUse WhenRisk
OffOnly compute recommendations; no changesAudit & right-sizing review with GoldilocksNone — read-only
InitialSet requests on new pods only (at creation)Gradual adoption; existing pods unchangedLow — no evictions
RecreateEvict & recreate pods to apply updated requestsStateless pods, non-critical workloadsPod restarts; availability impact
AutoRecreate now; in-place update when K8s supports it (alpha)Future default when in-place resize is GAPod restarts currently
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: payments-api-vpa
  namespace: payments-api-production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payments-api
  updatePolicy:
    updateMode: "Off"    # Start with Off; observe recommendations first
  resourcePolicy:
    containerPolicies:
    - containerName: payments-api
      minAllowed:
        cpu: "50m"
        memory: 64Mi
      maxAllowed:
        cpu: "2"          # cap to prevent runaway recommendations
        memory: 2Gi
      controlledResources: ["cpu","memory"]
      controlledValues: RequestsAndLimits

Reading VPA Recommendations

# View recommendations
kubectl describe vpa payments-api-vpa -n payments-api-production

# Output section:
#   Recommendation:
#     Container Recommendations:
#       Container Name: payments-api
#       Lower Bound:
#         Cpu:    50m
#         Memory: 123Mi
#       Target:             ← use this for your resource requests
#         Cpu:    210m
#         Memory: 340Mi
#       Uncapped Target:    ← without min/maxAllowed constraints
#         Cpu:    170m
#         Memory: 280Mi
#       Upper Bound:        ← maximum VPA would auto-set in Recreate mode
#         Cpu:    1200m
#         Memory: 2Gi

# Bulk export recommendations for all VPAs
kubectl get vpa -A -o json | jq '
  .items[] | {
    namespace: .metadata.namespace,
    name: .metadata.name,
    containers: [.status.recommendation.containerRecommendations[] | {
      name: .containerName,
      targetCpu: .target.cpu,
      targetMemory: .target.memory
    }]
  }'
⚠️
VPA and HPA conflict on CPU. Do not use both VPA (controlling CPU requests) and HPA (scaling on CPU utilization) on the same Deployment simultaneously — VPA changing requests resets the utilization baseline, causing HPA oscillation. Use VPA for memory only, or HPA on custom metrics (queue depth, RPS) instead of CPU.

Spot / Preemptible Instances

Spot instances (AWS) / Preemptible VMs (GCP) / Spot VMs (Azure) offer 60–90% discount over on-demand pricing in exchange for 2-minute eviction notice. With proper workload design, most stateless workloads can run on spot.

Workload Suitability for Spot

Workload TypeSpot SuitabilityNotes
Stateless web/API servers (≥2 replicas)✅ ExcellentRolling restart handles eviction; keep 1 on-demand replica
Batch/ML training jobs✅ ExcellentCheckpoint frequently; retryable; batch-low priority
CI/CD runners (Tekton, GitHub Actions)✅ ExcellentJobs are ephemeral by nature
Dev/staging environments✅ ExcellentBrief disruptions acceptable
Stateful services (databases)⚠️ RiskyNeed at least 1 on-demand replica; external managed DB preferred
Prometheus / Grafana⚠️ CautionScrape gaps during eviction; use remote_write to Thanos/Mimir
System components (CoreDNS, Kyverno)❌ AvoidMust stay on on-demand/system node pool

Karpenter Spot Configuration

# NodePool with spot preference and on-demand fallback
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot-general
spec:
  template:
    metadata:
      labels:
        node.kubernetes.io/workload: spot
    spec:
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot", "on-demand"]   # spot preferred; on-demand fallback
      - key: kubernetes.io/arch
        operator: In
        values: ["amd64","arm64"]
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values: ["m","c","r"]
      - key: karpenter.k8s.aws/instance-generation
        operator: Gt
        values: ["4"]
      - key: karpenter.k8s.aws/instance-size
        operator: NotIn
        values: ["nano","micro","small"]   # too small for bin-packing
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: general
      expireAfter: 336h   # rotate nodes every 2 weeks (AMI patching)
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s
    budgets:
    - nodes: "20%"   # max 20% of nodes replaced at once
  limits:
    cpu: "100"
    memory: 400Gi

Spot Instance Interruption Handling

# AWS Node Termination Handler — gracefully drain spot nodes
# Karpenter handles this natively via SQS interruption queue
# For managed node groups without Karpenter:
helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining=true \
  --set enableRebalanceMonitoring=true \
  --set enableScheduledEventDraining=true \
  --set queueURL=https://sqs.us-east-1.amazonaws.com/123456789012/spot-interruption

Designing Workloads for Spot Resilience

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
  namespace: payments-api-production
spec:
  replicas: 4
  strategy:
    rollingUpdate:
      maxUnavailable: 1       # tolerate 1 pod down during eviction
      maxSurge: 1
  template:
    spec:
      # Prefer spot, tolerate on-demand fallback
      nodeSelector:
        node.kubernetes.io/workload: general
      tolerations:
      - key: "karpenter.sh/capacity-type"
        operator: "Equal"
        value: "spot"
        effect: "NoSchedule"
      # Spread across AZs AND capacity types to reduce correlated eviction
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: payments-api
      - maxSkew: 2
        topologyKey: karpenter.sh/capacity-type
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: payments-api
      # Graceful shutdown — handle SIGTERM before pod is killed
      terminationGracePeriodSeconds: 60
      containers:
      - name: payments-api
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh","-c","sleep 5"]  # let LB drain connections

Cost-Aware Scheduling

Kubernetes scheduler places pods based on resource fit. These patterns steer workloads toward cheaper infrastructure without manual intervention.

Node Consolidation with Karpenter

# Karpenter consolidation: pack pods onto fewer nodes, terminate underutilized ones
# Controlled by NodePool disruption.consolidationPolicy: WhenUnderutilized

# Monitor consolidation events
kubectl get events -n karpenter --field-selector reason=Unconsolidatable
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -c controller | grep "consolidat"

# Manual consolidation: simulate what Karpenter would do
kubectl get nodeclaims -o wide  # shows which nodes Karpenter manages

Descheduler — Rebalance After Spot Evictions

helm repo add descheduler https://kubernetes-sigs.github.io/descheduler/
helm install descheduler descheduler/descheduler \
  --namespace kube-system \
  --set schedule="0 */6 * * *"
# Descheduler policy (ConfigMap)
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
- name: default
  pluginConfig:
  - name: DefaultEvictor
    args:
      ignorePvcPods: true
      evictLocalStoragePods: false
  - name: RemoveDuplicates        # remove duplicate pods on same node
    args: {}
  - name: LowNodeUtilization      # move pods from underutilized nodes
    args:
      thresholds:
        cpu: 20
        memory: 20
        pods: 20
      targetThresholds:
        cpu: 50
        memory: 50
        pods: 50
  - name: RemovePodsViolatingTopologySpreadConstraint  # rebalance after spot loss
    args:
      constraints: ["DoNotSchedule","ScheduleAnyway"]
  plugins:
    balance:
      enabled:
      - RemoveDuplicates
      - LowNodeUtilization
      - RemovePodsViolatingTopologySpreadConstraint

KEDA for Event-Driven Cost Savings

# KEDA ScaledObject: scale to zero when queue is empty
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: payments-worker-scaler
  namespace: payments-api-production
spec:
  scaleTargetRef:
    name: payments-worker
  minReplicaCount: 0    # scale to zero — no cost when idle
  maxReplicaCount: 20
  cooldownPeriod: 300   # seconds before scaling to zero
  pollingInterval: 15
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789/payments-queue
      queueLength: "5"   # target messages per pod
      awsRegion: us-east-1
      identityOwner: operator  # use IRSA

Reserved Instances & Savings Plans

Committed-use discounts give 30–60% off in exchange for 1- or 3-year commitments. The right mix of reservations + spot + on-demand minimizes cost while maintaining reliability.

Coverage Strategy

OPTIMAL COVERAGE MIX (example) 100% of baseline (always-on) workload capacity ├── 70% Savings Plans / Reserved Instances ← commit to baseline │ (predictable, 1yr no-upfront or 3yr) ├── 20% Spot instances ← stateless burst └── 10% On-demand ← system components + overflow Result: ~45% blended discount vs all on-demand

Identifying Commitment Opportunities

# AWS CLI: see current On-Demand usage vs Savings Plan coverage
aws ce get-savings-plans-coverage \
  --time-period Start=2025-01-01,End=2025-02-01 \
  --granularity MONTHLY \
  --filter '{"Dimensions":{"Key":"REGION","Values":["us-east-1"]}}' \
  --output json

# Get EC2 usage for commitment analysis
aws ce get-reservation-purchase-recommendation \
  --service "Amazon EC2" \
  --lookback-period-in-days SIXTY_DAYS \
  --payment-option NO_UPFRONT \
  --term-in-years ONE_YEAR

# Kubecost Savings Plans recommendation (if using Kubecost Enterprise)
curl http://kubecost.kubecost.svc:9090/savings/requestSizing | jq .

Compute Savings Plans vs EC2 Instance Savings Plans

TypeApplies ToFlexibilityDiscountBest For
Compute Savings PlansEC2, Fargate, LambdaAny instance family, size, region, OSUp to 66%Kubernetes workloads (Karpenter changes instance types)
EC2 Instance Savings PlansEC2 onlyAny size within committed family+regionUp to 72%Fixed instance families (e.g., always m5)
Reserved Instances (Standard)EC2 onlyExact instance type+AZ or regionUp to 75%Stable, predictable fixed workloads
GCP Committed Use DiscountsvCPU + RAM commitmentsAny machine type within commitmentUp to 57%GKE workloads
Azure Reserved VM InstancesSpecific VM familySize flexibility within familyUp to 72%AKS fixed node pools
ℹ️
Compute Savings Plans are ideal for Karpenter clusters because Karpenter selects instance types dynamically. Committing to a dollar-per-hour amount of compute (not a specific instance type) means your reservation applies regardless of which instance Karpenter provisions. Commit to ~70% of your average hourly spend.

Cluster Efficiency Metrics

Track these KPIs on a weekly cadence. Healthy clusters sit above 60% CPU utilization and 70% memory utilization against allocated resources.

>60%
Target CPU Utilization
>70%
Target RAM Utilization
<30%
Max Idle Node %
70%+
Spot Coverage Target

Key Efficiency PromQL Queries

-- Cluster-wide CPU utilization (actual / requested)
sum(rate(container_cpu_usage_seconds_total{container!="",container!="POD"}[5m]))
/
sum(kube_pod_container_resource_requests{resource="cpu",container!=""})

-- Cluster-wide memory efficiency
sum(container_memory_working_set_bytes{container!="",container!="POD"})
/
sum(kube_pod_container_resource_requests{resource="memory",container!=""})

-- Node allocatable utilization (requested / allocatable)
sum(kube_pod_container_resource_requests{resource="cpu"})
/
sum(kube_node_status_allocatable{resource="cpu"})

-- Idle nodes (no pods scheduled, excluding daemonset-only)
count(
  kube_node_status_condition{condition="Ready",status="true"}
) - count(
  count by (node) (kube_pod_info{node!=""})
)

-- Cost per request (requires request-rate metric from APM + node cost)
sum(rate(http_requests_total[5m]))
/
sum(node_total_hourly_cost)   # from OpenCost

-- Namespace waste: allocated but unused CPU
sum by (namespace) (
  kube_pod_container_resource_requests{resource="cpu"}
) - sum by (namespace) (
  rate(container_cpu_usage_seconds_total{container!="",container!="POD"}[5m])
)

Cost Efficiency Grafana Dashboard Panels

PanelQuery TypeThreshold
Monthly cloud spend trendOpenCost API + PrometheusAlert if +20% MoM
Top 10 most expensive namespacesOpenCost allocation API (bar chart)Review top 3 for right-sizing
CPU efficiency by namespacePromQL ratio (actual/request)Red: <30%, Yellow: 30–60%, Green: >60%
Memory efficiency by namespacePromQL ratioRed: <40%, Yellow: 40–70%, Green: >70%
Spot coverage %kube_node_labels{label_karpenter_sh_capacity-type="spot"}Target: >60%
Idle node countPromQL (nodes with no non-daemonset pods)Alert if >3 nodes idle >30min
Unattached PVskube_persistentvolume_status_phase{phase="Released"}Alert on any Released PVs
Orphaned load balancerskube_service_info{type="LoadBalancer"} with no EndpointsAlert immediately

FinOps Practices

FinOps is a cultural practice that brings financial accountability to cloud spend. The FinOps lifecycle has three phases: Inform, Optimize, Operate.

FinOps Maturity for Kubernetes

PhaseCrawlWalkRun
InformCloud bill visible to central teamPer-namespace cost with showbackReal-time cost per team/service/feature; unit economics (cost per request)
OptimizeTag cloud resources; eliminate obvious waste (unused LBs, idle nodes)Right-sizing recommendations acted on quarterly; spot for devAutomated right-sizing (VPA); spot for >60% workloads; KEDA scale-to-zero; savings plan coverage >70%
OperateMonthly cost reviewsCost budgets per team; chargeback model definedCost alerts trigger team tickets; anomaly detection; cost encoded in architectural decisions

Weekly FinOps Review Checklist

#!/bin/bash
# weekly-finops-review.sh

echo "=== COST ANOMALIES ==="
# OpenCost: compare this week vs last week by namespace
curl -s "http://opencost.opencost.svc:9003/allocation/compute?window=7d&aggregate=namespace&accumulate=true" | \
  jq '.data[0] | to_entries | sort_by(-.value.totalCost) | .[0:10] |
  .[] | "\(.key): $\(.value.totalCost | . * 100 | round / 100)"'

echo "=== IDLE NODES ==="
kubectl get nodes -o json | jq '
  .items[] | select(.metadata.labels["karpenter.sh/capacity-type"] != null) |
  {name: .metadata.name, capacity: .metadata.labels["karpenter.sh/capacity-type"]}'

echo "=== UNATTACHED PVs ==="
kubectl get pv -o json | jq '.items[] | select(.status.phase == "Released") |
  {name: .metadata.name, capacity: .spec.capacity.storage, storageClass: .spec.storageClassName}'

echo "=== VPA RECOMMENDATIONS (top savings) ==="
kubectl get vpa -A -o json | jq '
  .items[] | {
    ns: .metadata.namespace, name: .metadata.name,
    rec: .status.recommendation.containerRecommendations[0].target
  }'

echo "=== ORPHANED LOAD BALANCERS ==="
kubectl get svc -A --field-selector spec.type=LoadBalancer -o json | jq '
  .items[] | select(.spec.clusterIP != null) |
  {ns: .metadata.namespace, name: .metadata.name, ip: .status.loadBalancer.ingress[0].ip}'

Cost Alerts & Budgets

PrometheusRule: Cost Anomaly Alerts

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: cost-management-alerts
  namespace: monitoring
  labels:
    prometheus: kube-prometheus
    role: alert-rules
spec:
  groups:
  - name: cost.efficiency
    rules:

    # Namespace CPU efficiency < 20% for 1 hour (severe waste)
    - alert: NamespaceLowCPUEfficiency
      expr: |
        (
          sum by (namespace) (
            rate(container_cpu_usage_seconds_total{container!="",container!="POD"}[30m])
          )
          /
          sum by (namespace) (
            kube_pod_container_resource_requests{resource="cpu",container!=""}
          )
        ) < 0.20
      for: 1h
      labels:
        severity: warning
      annotations:
        summary: "Low CPU efficiency in namespace {{ $labels.namespace }}"
        description: "CPU efficiency is {{ $value | humanizePercentage }}. Run VPA analysis."

    # Unattached (Released) PersistentVolumes — pay for storage doing nothing
    - alert: OrphanedPersistentVolume
      expr: |
        kube_persistentvolume_status_phase{phase="Released"} > 0
      for: 30m
      labels:
        severity: warning
      annotations:
        summary: "Orphaned PersistentVolume: {{ $labels.persistentvolume }}"
        description: "PV is in Released state and still incurring storage costs. Delete or recycle."

    # Idle LoadBalancer service (no endpoints for 30 minutes)
    - alert: IdleLoadBalancer
      expr: |
        kube_service_spec_type{type="LoadBalancer"} unless
        kube_endpoint_address_available > 0
      for: 30m
      labels:
        severity: warning
      annotations:
        summary: "Idle LoadBalancer {{ $labels.namespace }}/{{ $labels.service }}"
        description: "LoadBalancer has no healthy endpoints. Still incurring hourly LB cost."

    # Node underutilization — paying for idle nodes
    - alert: NodeLowCPUUtilization
      expr: |
        (1 - avg by (node) (
          rate(node_cpu_seconds_total{mode="idle"}[10m])
        )) < 0.10
      for: 2h
      labels:
        severity: info
      annotations:
        summary: "Node {{ $labels.node }} CPU < 10% for 2 hours"
        description: "Consider Karpenter consolidation or draining this node."

    # Namespace exceeding 90% of CPU quota (may need quota increase or right-sizing)
    - alert: NamespaceCPURequestsNearQuota
      expr: |
        (
          kube_resourcequota{type="used",resource="requests.cpu"}
          /
          kube_resourcequota{type="hard",resource="requests.cpu"}
        ) > 0.90
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "{{ $labels.namespace }} CPU requests at {{ $value | humanizePercentage }} of quota"

OpenCost Budget API

# Create a budget via Kubecost API (requires Kubecost enterprise for enforcement)
curl -X POST http://kubecost.kubecost.svc:9090/budgets \
  -H "Content-Type: application/json" \
  -d '{
    "name": "payments-team-monthly",
    "window": "month",
    "amount": 5000,
    "filters": [{"property": "namespace", "value": "payments-api-production"}],
    "actions": [{"threshold": 0.80, "type": "slack",
                 "target": "https://hooks.slack.com/..."}]
  }'

Best Practices

Right-Size Before Scaling Out

A 5× over-provisioned pod running on 10 replicas costs as much as 50 right-sized pods. Always run VPA in Off mode on new services for 2 weeks to collect recommendations, then apply targets before considering horizontal scale-out.

Spot for All Stateless Workloads

Design stateless services to tolerate 2-minute eviction from day one — terminationGracePeriodSeconds: 60, preStop sleep 5, ≥2 replicas, topology spread across AZs. Spot can cut compute bills by 60–80%.

Enforce Resource Requests via Policy

Workloads without resource requests cannot be bin-packed by the scheduler. Use the Kyverno require-requests-limits policy from 05-policy-enforcement.html — this is a prerequisite for all cost optimization.

Label Everything for Attribution

Cost allocation is only as good as your label taxonomy. Enforce team, env, cost-center labels on all namespaces and workloads via Kyverno. Without labels, 30–40% of spend is "unallocated" and impossible to optimize.

Scale to Zero with KEDA

Non-production environments and event-driven workers that sit idle between jobs are zero-cost candidates. KEDA ScaledObjects with minReplicaCount: 0 eliminate idle compute entirely when no work is queued.

Weekly Cost Reviews

Cost waste accumulates silently. Schedule a 30-minute weekly review using the OpenCost dashboard: check top-10 namespaces for efficiency regression, review VPA recommendations, verify no orphaned PVs or idle LBs accumulated.

Commit After Observing

Buy Reserved Instances/Savings Plans only after 3+ months of stable production usage data. Commit to ~70% of your average baseline, leave 30% for on-demand. Over-committing to the wrong instance types locks in waste.

Watch Network Costs

Cross-AZ traffic (a pod in AZ-a calling a pod in AZ-b) costs $0.01/GB and adds up fast at scale. Use topology-aware routing (topologyKeys: topology.kubernetes.io/zone in EndpointSlice or Cilium topology-aware hints) to prefer same-AZ communication.

Coverage: 07 · Cost Management