💰 Cost Management

Kubernetes Cost Management

Complete guide to understanding, attributing, and optimizing cloud-native infrastructure spend — from resource right-sizing and spot instances to chargeback models, OpenCost, Kubecost, and FinOps practices for Kubernetes platforms.

📊 OpenCost / Kubecost ⚡ VPA / Right-sizing 🎯 Spot Instances 🏷️ Cost Allocation 📉 Savings Plans

Kubernetes Cost Model
OpenCost & Kubecost
Cost Attribution & Chargeback
Resource Right-Sizing
Vertical Pod Autoscaler
Spot / Preemptible Instances
Cost-Aware Scheduling
Reserved Instances & Savings Plans
Cluster Efficiency Metrics
FinOps Practices
Cost Alerts & Budgets
Best Practices

Kubernetes Cost Model

Cloud infrastructure costs in Kubernetes come from multiple sources that need to be decomposed and allocated to tenants. Understanding the model is prerequisite to optimization.

KUBERNETES CLUSTER COST COMPONENTS Cloud Provider Bill ┌────────────────────────────────────────────────────────────┐ │ Compute (60-80% of bill) │ │ ├── Node EC2/GCE/AKS instance hours │ │ │ ├── On-demand │ │ │ ├── Spot/Preemptible (60-90% cheaper) │ │ │ └── Reserved/Savings Plans (30-60% cheaper) │ │ └── Fargate/Autopilot pod-hours (serverless nodes) │ │ │ │ Storage (5-15%) │ │ ├── EBS/PD/managed disk (PersistentVolumes) │ │ ├── S3/GCS object storage (logs, backups, TechDocs) │ │ └── EFS/Filestore (shared ReadWriteMany PVCs) │ │ │ │ Network (5-20% — often invisible until it's huge) │ │ ├── Data transfer out to internet │ │ ├── Cross-AZ traffic (same region, different AZ) │ │ ├── Cross-region replication │ │ └── NAT Gateway (pods → internet) │ │ │ │ Managed Services (variable) │ │ ├── RDS / Cloud SQL (per-service databases) │ │ ├── ElastiCache / Memorystore │ │ └── Load Balancers (one per Service type=LoadBalancer) │ └────────────────────────────────────────────────────────────┘ │ ▼ Allocated to: Namespace → Team → Service → Feature → Environment

Cost vs Efficiency vs Waste

Term	Definition	Measured By
Cost	Dollars spent on infrastructure	Cloud bill + cost allocation tools
Efficiency	Workload output per dollar (requests/s per $, SLO met per $)	Business metrics / cost per request
Waste	Resources paid for but not used (idle CPU/RAM, orphaned PVs, zombie load balancers)	requests.cpu vs actual usage; PV Bound but no pods
Right-sizing gap	Difference between resource requests and actual usage	VPA recommendations, Goldilocks
Utilization	Actual usage / requested resources	node_cpu_utilization, container_memory_working_set

⚠️

Kubernetes bills by node, not by pod. If your nodes are at 15% CPU utilization, you're paying for 85% waste even if every pod has resource requests set. The lever is bin-packing (Karpenter consolidation) and right-sizing resource requests, not adding more nodes.

OpenCost & Kubecost

OpenCost is the CNCF-incubating open standard and open-source implementation for Kubernetes cost monitoring. Kubecost builds a commercial layer on top with additional features like savings recommendations, cluster right-sizing, and multi-cloud support.

OpenCost Architecture Cloud Provider APIs Kubernetes API (billing/pricing data) (pod/node/PV metrics) │ │ └──────────────┬────────────┘ ▼ OpenCost Server ├── Cost model (CPU/RAM/GPU/PV/network) ├── Allocation engine (namespace/label/pod) └── /allocation, /assets REST API │ ┌─────────┼──────────┐ ▼ ▼ ▼ Prometheus Grafana Kubecost UI (metrics (dashboards (recommendations storage) /alerts) /savings/budgets)

Install OpenCost

helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm repo update

helm install opencost opencost/opencost \
  --namespace opencost \
  --create-namespace \
  --version 1.42.0 \
  --set opencost.exporter.defaultClusterId=prod-us-east-1 \
  --set opencost.prometheus.internal.enabled=true \
  --set opencost.ui.enabled=true \
  --set opencost.cloudCost.enabled=true

# For AWS: provide cloud integration credentials
kubectl create secret generic cloud-integration \
  --namespace opencost \
  --from-literal=cloud-integration.json='{
    "aws": [{
      "athenaBucketName": "s3://my-cur-bucket",
      "athenaRegion": "us-east-1",
      "athenaDatabase": "athenacurcfn_my_database",
      "athenaTable": "my_cur_table",
      "projectID": "123456789012",
      "serviceKeyName": "opencost-aws-access"
    }]
  }'

OpenCost API Queries

# Cost by namespace for last 7 days
curl "http://opencost.opencost.svc:9003/allocation/compute?window=7d&aggregate=namespace&accumulate=true" | jq .

# Cost by label (e.g., by team)
curl "http://opencost.opencost.svc:9003/allocation/compute?window=30d&aggregate=label:team&accumulate=true" | jq .

# Cost breakdown for a specific namespace
curl "http://opencost.opencost.svc:9003/allocation/compute?window=1d&aggregate=pod&namespace=payments-api-production" | jq .

# Asset costs (nodes, PVs, load balancers)
curl "http://opencost.opencost.svc:9003/assets?window=7d&aggregate=type" | jq .

# Efficiency: CPU/RAM request utilization
curl "http://opencost.opencost.svc:9003/allocation/compute?window=1d&aggregate=namespace" | \
  jq '.data[0] | to_entries[] | {
    namespace: .key,
    cpuEfficiency: .value.cpuEfficiency,
    ramEfficiency: .value.ramEfficiency,
    totalCost: .value.totalCost
  }'

Kubecost Install (with savings features)

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update

helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --version 2.3.4 \
  --set kubecostToken="your-token" \
  --set global.prometheus.enabled=false \
  --set global.prometheus.fqdn=http://kube-prometheus-stack-prometheus.monitoring:9090 \
  --set kubecostProductConfigs.clusterName=prod-us-east-1 \
  --set kubecostProductConfigs.currencyCode=USD \
  --set savings.enabled=true \
  --set networkCosts.enabled=true

OpenCost Prometheus Metrics

# Key OpenCost metrics pushed to Prometheus
opencost_load_balancer_cost          # per LB hourly cost
opencost_node_total_hourly_cost      # per node
opencost_pod_seconds_total           # pod lifecycle seconds
node_total_hourly_cost               # node cost (from cloud pricing)
container_cpu_allocation             # CPU core-hours allocated
container_memory_allocation_bytes    # RAM byte-hours allocated

# Efficiency metrics (also via kube-state-metrics + cAdvisor)
container_cpu_usage_seconds_total    # actual CPU use
container_memory_working_set_bytes   # actual memory use
kube_pod_container_resource_requests # requested resources

Cost Attribution & Chargeback

Cost attribution maps cloud spend back to business units, teams, services, or features. This is the foundation of FinOps — you cannot optimize what you cannot measure and attribute.

Tagging Strategy

# Required labels on all K8s workloads (enforced by Kyverno — see 05-policy-enforcement.html)
# These labels flow into cost allocation tools
metadata:
  labels:
    team: payments
    env: production
    cost-center: CC-4892
    service: payments-api
    product: checkout-flow    # maps to P&L line
    component: backend        # frontend | backend | worker | batch

AWS Resource Tagging (Karpenter nodes inherit pod labels)

# EC2NodeClass: propagate K8s labels to EC2 instance tags
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: general
spec:
  amiFamily: AL2
  role: karpenter-node-role
  subnetSelectorTerms:
  - tags:
      karpenter.sh/discovery: prod-us-east-1
  securityGroupSelectorTerms:
  - tags:
      karpenter.sh/discovery: prod-us-east-1
  tags:
    # Static tags on every node
    Cluster: prod-us-east-1
    ManagedBy: karpenter
    Environment: production
  # Propagate workload labels to node tags via spec.kubelet.clusterDNS
  # Note: Karpenter 0.33+ supports kubelet.nodeLabels propagation

OpenCost Allocation by Label

# Monthly cost per team
curl "http://opencost.opencost.svc:9003/allocation/compute?window=month&aggregate=label:team&accumulate=true" | \
  jq '.data[0] | to_entries | sort_by(-.value.totalCost) |
  .[] | "\(.key): $\(.value.totalCost | . * 100 | round / 100)"'

# Cost split by environment
curl "http://opencost.opencost.svc:9003/allocation/compute?window=30d&aggregate=label:env&accumulate=true" | jq .

# Multi-level: team + service (groupBy in Kubecost UI)
curl "http://opencost.opencost.svc:9003/allocation/compute?window=7d&aggregate=label:team,label:service" | jq .

Chargeback vs Showback

Showback

Teams see their costs but are not financially charged. Builds awareness and accountability without creating accounting complexity. Good starting point. Teams respond to social pressure but have no financial incentive to optimize.

Chargeback

Teams are actually billed — deducted from their budget. Strongest incentive to right-size and eliminate waste. Requires accurate allocation data and agreed-upon shared cost methodology (how do you split shared infra like monitoring?).

Shared Cost Allocation Strategies

Strategy	How It Works	Fair For	Avoid When
Even split	Divide shared infra cost equally among all tenants	Small teams, similar sizes	One large + many small tenants
Proportional	Each tenant pays share proportional to their workload spend	Most platforms (default in Kubecost)	Teams have very different usage patterns
Weighted by usage	Share monitoring/logging costs by metric cardinality or log volume emitted	Observability cost attribution	Hard to instrument accurately
Fixed overhead rate	Flat per-namespace fee covers shared platform costs	Simplicity, SaaS model	Large size variance across tenants

Resource Right-Sizing

Over-provisioned resource requests are the most common source of Kubernetes cost waste. Teams set requests high "to be safe" and never revisit them. Right-sizing closes the gap between requested and actual resources.

Identifying Over-Provisioned Workloads

# Find pods using < 20% of their CPU request
kubectl top pods -A --sort-by=cpu | awk 'NR>1 {print $1, $2, $3}'

# PromQL: CPU efficiency per deployment (lower = more waste)
# Ratio < 0.3 means using less than 30% of requested CPU
sum by (namespace, pod) (
  rate(container_cpu_usage_seconds_total{container!="",container!="POD"}[5m])
)
/
sum by (namespace, pod) (
  kube_pod_container_resource_requests{resource="cpu",container!=""}
)

# Memory efficiency (working set vs request)
sum by (namespace, pod) (
  container_memory_working_set_bytes{container!="",container!="POD"}
)
/
sum by (namespace, pod) (
  kube_pod_container_resource_requests{resource="memory",container!=""}
)

Goldilocks — VPA Recommendation UI

# Goldilocks runs VPA in recommendation-only mode and surfaces results in a UI
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm repo update

helm install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks \
  --create-namespace \
  --set vpa.enabled=true \
  --set dashboard.enabled=true

# Label namespaces to enable Goldilocks analysis
kubectl label ns payments-api-production goldilocks.fairwinds.com/enabled=true

# Goldilocks dashboard shows VPA recommendations per container
# with QoS class (Guaranteed/Burstable) and cost impact estimates

Typical Right-Sizing Findings

Pattern	Example	Correction	Typical Savings
CPU request >> actual usage	requests.cpu: 1000m, actual: 80m	Set requests.cpu: 200m (add 20% buffer)	5× reduction in CPU allocation cost
Memory request >> working set	requests.memory: 2Gi, actual: 200Mi	Set requests.memory: 300Mi	~85% memory cost reduction
Limits >> requests (burstable)	limit: 4Gi, request: 128Mi	Tighten limit to 2×–4× request	Prevents memory bombs, better bin-packing
No limits set	Pod can use unlimited CPU/RAM	Set limits via LimitRange defaults or explicit spec	Node stability; predictable cost
Batch jobs with prod-class requests	CronJob with requests.cpu: 500m	Use batch-low PriorityClass, reduce requests	10–30% on batch-heavy workloads

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts CPU and memory requests based on observed usage history. It has three update modes — choose carefully, as Recreate mode restarts pods, which has availability implications.

Install VPA

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

# Or via Helm
helm repo add cowboysysop https://cowboysysop.github.io/charts/
helm install vpa cowboysysop/vertical-pod-autoscaler \
  --namespace kube-system

VPA Modes

Mode	Behavior	Use When	Risk
`Off`	Only compute recommendations; no changes	Audit & right-sizing review with Goldilocks	None — read-only
`Initial`	Set requests on new pods only (at creation)	Gradual adoption; existing pods unchanged	Low — no evictions
`Recreate`	Evict & recreate pods to apply updated requests	Stateless pods, non-critical workloads	Pod restarts; availability impact
`Auto`	Recreate now; in-place update when K8s supports it (alpha)	Future default when in-place resize is GA	Pod restarts currently

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: payments-api-vpa
  namespace: payments-api-production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payments-api
  updatePolicy:
    updateMode: "Off"    # Start with Off; observe recommendations first
  resourcePolicy:
    containerPolicies:
    - containerName: payments-api
      minAllowed:
        cpu: "50m"
        memory: 64Mi
      maxAllowed:
        cpu: "2"          # cap to prevent runaway recommendations
        memory: 2Gi
      controlledResources: ["cpu","memory"]
      controlledValues: RequestsAndLimits

Reading VPA Recommendations

# View recommendations
kubectl describe vpa payments-api-vpa -n payments-api-production

# Output section:
#   Recommendation:
#     Container Recommendations:
#       Container Name: payments-api
#       Lower Bound:
#         Cpu:    50m
#         Memory: 123Mi
#       Target:             ← use this for your resource requests
#         Cpu:    210m
#         Memory: 340Mi
#       Uncapped Target:    ← without min/maxAllowed constraints
#         Cpu:    170m
#         Memory: 280Mi
#       Upper Bound:        ← maximum VPA would auto-set in Recreate mode
#         Cpu:    1200m
#         Memory: 2Gi

# Bulk export recommendations for all VPAs
kubectl get vpa -A -o json | jq '
  .items[] | {
    namespace: .metadata.namespace,
    name: .metadata.name,
    containers: [.status.recommendation.containerRecommendations[] | {
      name: .containerName,
      targetCpu: .target.cpu,
      targetMemory: .target.memory
    }]
  }'

⚠️

VPA and HPA conflict on CPU. Do not use both VPA (controlling CPU requests) and HPA (scaling on CPU utilization) on the same Deployment simultaneously — VPA changing requests resets the utilization baseline, causing HPA oscillation. Use VPA for memory only, or HPA on custom metrics (queue depth, RPS) instead of CPU.

Spot / Preemptible Instances

Spot instances (AWS) / Preemptible VMs (GCP) / Spot VMs (Azure) offer 60–90% discount over on-demand pricing in exchange for 2-minute eviction notice. With proper workload design, most stateless workloads can run on spot.

Workload Suitability for Spot

Workload Type	Spot Suitability	Notes
Stateless web/API servers (≥2 replicas)	✅ Excellent	Rolling restart handles eviction; keep 1 on-demand replica
Batch/ML training jobs	✅ Excellent	Checkpoint frequently; retryable; batch-low priority
CI/CD runners (Tekton, GitHub Actions)	✅ Excellent	Jobs are ephemeral by nature
Dev/staging environments	✅ Excellent	Brief disruptions acceptable
Stateful services (databases)	⚠️ Risky	Need at least 1 on-demand replica; external managed DB preferred
Prometheus / Grafana	⚠️ Caution	Scrape gaps during eviction; use remote_write to Thanos/Mimir
System components (CoreDNS, Kyverno)	❌ Avoid	Must stay on on-demand/system node pool

Karpenter Spot Configuration

# NodePool with spot preference and on-demand fallback
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot-general
spec:
  template:
    metadata:
      labels:
        node.kubernetes.io/workload: spot
    spec:
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot", "on-demand"]   # spot preferred; on-demand fallback
      - key: kubernetes.io/arch
        operator: In
        values: ["amd64","arm64"]
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values: ["m","c","r"]
      - key: karpenter.k8s.aws/instance-generation
        operator: Gt
        values: ["4"]
      - key: karpenter.k8s.aws/instance-size
        operator: NotIn
        values: ["nano","micro","small"]   # too small for bin-packing
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: general
      expireAfter: 336h   # rotate nodes every 2 weeks (AMI patching)
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s
    budgets:
    - nodes: "20%"   # max 20% of nodes replaced at once
  limits:
    cpu: "100"
    memory: 400Gi

Spot Instance Interruption Handling

# AWS Node Termination Handler — gracefully drain spot nodes
# Karpenter handles this natively via SQS interruption queue
# For managed node groups without Karpenter:
helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining=true \
  --set enableRebalanceMonitoring=true \
  --set enableScheduledEventDraining=true \
  --set queueURL=https://sqs.us-east-1.amazonaws.com/123456789012/spot-interruption

Designing Workloads for Spot Resilience

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
  namespace: payments-api-production
spec:
  replicas: 4
  strategy:
    rollingUpdate:
      maxUnavailable: 1       # tolerate 1 pod down during eviction
      maxSurge: 1
  template:
    spec:
      # Prefer spot, tolerate on-demand fallback
      nodeSelector:
        node.kubernetes.io/workload: general
      tolerations:
      - key: "karpenter.sh/capacity-type"
        operator: "Equal"
        value: "spot"
        effect: "NoSchedule"
      # Spread across AZs AND capacity types to reduce correlated eviction
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: payments-api
      - maxSkew: 2
        topologyKey: karpenter.sh/capacity-type
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: payments-api
      # Graceful shutdown — handle SIGTERM before pod is killed
      terminationGracePeriodSeconds: 60
      containers:
      - name: payments-api
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh","-c","sleep 5"]  # let LB drain connections

Cost-Aware Scheduling

Kubernetes scheduler places pods based on resource fit. These patterns steer workloads toward cheaper infrastructure without manual intervention.

Node Consolidation with Karpenter

# Karpenter consolidation: pack pods onto fewer nodes, terminate underutilized ones
# Controlled by NodePool disruption.consolidationPolicy: WhenUnderutilized

# Monitor consolidation events
kubectl get events -n karpenter --field-selector reason=Unconsolidatable
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -c controller | grep "consolidat"

# Manual consolidation: simulate what Karpenter would do
kubectl get nodeclaims -o wide  # shows which nodes Karpenter manages

Descheduler — Rebalance After Spot Evictions

helm repo add descheduler https://kubernetes-sigs.github.io/descheduler/
helm install descheduler descheduler/descheduler \
  --namespace kube-system \
  --set schedule="0 */6 * * *"

# Descheduler policy (ConfigMap)
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
- name: default
  pluginConfig:
  - name: DefaultEvictor
    args:
      ignorePvcPods: true
      evictLocalStoragePods: false
  - name: RemoveDuplicates        # remove duplicate pods on same node
    args: {}
  - name: LowNodeUtilization      # move pods from underutilized nodes
    args:
      thresholds:
        cpu: 20
        memory: 20
        pods: 20
      targetThresholds:
        cpu: 50
        memory: 50
        pods: 50
  - name: RemovePodsViolatingTopologySpreadConstraint  # rebalance after spot loss
    args:
      constraints: ["DoNotSchedule","ScheduleAnyway"]
  plugins:
    balance:
      enabled:
      - RemoveDuplicates
      - LowNodeUtilization
      - RemovePodsViolatingTopologySpreadConstraint

KEDA for Event-Driven Cost Savings

# KEDA ScaledObject: scale to zero when queue is empty
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: payments-worker-scaler
  namespace: payments-api-production
spec:
  scaleTargetRef:
    name: payments-worker
  minReplicaCount: 0    # scale to zero — no cost when idle
  maxReplicaCount: 20
  cooldownPeriod: 300   # seconds before scaling to zero
  pollingInterval: 15
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789/payments-queue
      queueLength: "5"   # target messages per pod
      awsRegion: us-east-1
      identityOwner: operator  # use IRSA

Reserved Instances & Savings Plans

Committed-use discounts give 30–60% off in exchange for 1- or 3-year commitments. The right mix of reservations + spot + on-demand minimizes cost while maintaining reliability.

Coverage Strategy

OPTIMAL COVERAGE MIX (example) 100% of baseline (always-on) workload capacity ├── 70% Savings Plans / Reserved Instances ← commit to baseline │ (predictable, 1yr no-upfront or 3yr) ├── 20% Spot instances ← stateless burst └── 10% On-demand ← system components + overflow Result: ~45% blended discount vs all on-demand

Identifying Commitment Opportunities

# AWS CLI: see current On-Demand usage vs Savings Plan coverage
aws ce get-savings-plans-coverage \
  --time-period Start=2025-01-01,End=2025-02-01 \
  --granularity MONTHLY \
  --filter '{"Dimensions":{"Key":"REGION","Values":["us-east-1"]}}' \
  --output json

# Get EC2 usage for commitment analysis
aws ce get-reservation-purchase-recommendation \
  --service "Amazon EC2" \
  --lookback-period-in-days SIXTY_DAYS \
  --payment-option NO_UPFRONT \
  --term-in-years ONE_YEAR

# Kubecost Savings Plans recommendation (if using Kubecost Enterprise)
curl http://kubecost.kubecost.svc:9090/savings/requestSizing | jq .

Compute Savings Plans vs EC2 Instance Savings Plans

Type	Applies To	Flexibility	Discount	Best For
Compute Savings Plans	EC2, Fargate, Lambda	Any instance family, size, region, OS	Up to 66%	Kubernetes workloads (Karpenter changes instance types)
EC2 Instance Savings Plans	EC2 only	Any size within committed family+region	Up to 72%	Fixed instance families (e.g., always m5)
Reserved Instances (Standard)	EC2 only	Exact instance type+AZ or region	Up to 75%	Stable, predictable fixed workloads
GCP Committed Use Discounts	vCPU + RAM commitments	Any machine type within commitment	Up to 57%	GKE workloads
Azure Reserved VM Instances	Specific VM family	Size flexibility within family	Up to 72%	AKS fixed node pools

ℹ️

Compute Savings Plans are ideal for Karpenter clusters because Karpenter selects instance types dynamically. Committing to a dollar-per-hour amount of compute (not a specific instance type) means your reservation applies regardless of which instance Karpenter provisions. Commit to ~70% of your average hourly spend.

Cluster Efficiency Metrics

Track these KPIs on a weekly cadence. Healthy clusters sit above 60% CPU utilization and 70% memory utilization against allocated resources.

>60%

Target CPU Utilization

>70%

Target RAM Utilization

<30%

Max Idle Node %

70%+

Spot Coverage Target

Key Efficiency PromQL Queries

-- Cluster-wide CPU utilization (actual / requested)
sum(rate(container_cpu_usage_seconds_total{container!="",container!="POD"}[5m]))
/
sum(kube_pod_container_resource_requests{resource="cpu",container!=""})

-- Cluster-wide memory efficiency
sum(container_memory_working_set_bytes{container!="",container!="POD"})
/
sum(kube_pod_container_resource_requests{resource="memory",container!=""})

-- Node allocatable utilization (requested / allocatable)
sum(kube_pod_container_resource_requests{resource="cpu"})
/
sum(kube_node_status_allocatable{resource="cpu"})

-- Idle nodes (no pods scheduled, excluding daemonset-only)
count(
  kube_node_status_condition{condition="Ready",status="true"}
) - count(
  count by (node) (kube_pod_info{node!=""})
)

-- Cost per request (requires request-rate metric from APM + node cost)
sum(rate(http_requests_total[5m]))
/
sum(node_total_hourly_cost)   # from OpenCost

-- Namespace waste: allocated but unused CPU
sum by (namespace) (
  kube_pod_container_resource_requests{resource="cpu"}
) - sum by (namespace) (
  rate(container_cpu_usage_seconds_total{container!="",container!="POD"}[5m])
)

Cost Efficiency Grafana Dashboard Panels

Panel	Query Type	Threshold
Monthly cloud spend trend	OpenCost API + Prometheus	Alert if +20% MoM
Top 10 most expensive namespaces	OpenCost allocation API (bar chart)	Review top 3 for right-sizing
CPU efficiency by namespace	PromQL ratio (actual/request)	Red: <30%, Yellow: 30–60%, Green: >60%
Memory efficiency by namespace	PromQL ratio	Red: <40%, Yellow: 40–70%, Green: >70%
Spot coverage %	kube_node_labels{label_karpenter_sh_capacity-type="spot"}	Target: >60%
Idle node count	PromQL (nodes with no non-daemonset pods)	Alert if >3 nodes idle >30min
Unattached PVs	kube_persistentvolume_status_phase{phase="Released"}	Alert on any Released PVs
Orphaned load balancers	kube_service_info{type="LoadBalancer"} with no Endpoints	Alert immediately

FinOps Practices

FinOps is a cultural practice that brings financial accountability to cloud spend. The FinOps lifecycle has three phases: Inform, Optimize, Operate.

FinOps Maturity for Kubernetes

Phase	Crawl	Walk	Run
Inform	Cloud bill visible to central team	Per-namespace cost with showback	Real-time cost per team/service/feature; unit economics (cost per request)
Optimize	Tag cloud resources; eliminate obvious waste (unused LBs, idle nodes)	Right-sizing recommendations acted on quarterly; spot for dev	Automated right-sizing (VPA); spot for >60% workloads; KEDA scale-to-zero; savings plan coverage >70%
Operate	Monthly cost reviews	Cost budgets per team; chargeback model defined	Cost alerts trigger team tickets; anomaly detection; cost encoded in architectural decisions

Weekly FinOps Review Checklist

#!/bin/bash
# weekly-finops-review.sh

echo "=== COST ANOMALIES ==="
# OpenCost: compare this week vs last week by namespace
curl -s "http://opencost.opencost.svc:9003/allocation/compute?window=7d&aggregate=namespace&accumulate=true" | \
  jq '.data[0] | to_entries | sort_by(-.value.totalCost) | .[0:10] |
  .[] | "\(.key): $\(.value.totalCost | . * 100 | round / 100)"'

echo "=== IDLE NODES ==="
kubectl get nodes -o json | jq '
  .items[] | select(.metadata.labels["karpenter.sh/capacity-type"] != null) |
  {name: .metadata.name, capacity: .metadata.labels["karpenter.sh/capacity-type"]}'

echo "=== UNATTACHED PVs ==="
kubectl get pv -o json | jq '.items[] | select(.status.phase == "Released") |
  {name: .metadata.name, capacity: .spec.capacity.storage, storageClass: .spec.storageClassName}'

echo "=== VPA RECOMMENDATIONS (top savings) ==="
kubectl get vpa -A -o json | jq '
  .items[] | {
    ns: .metadata.namespace, name: .metadata.name,
    rec: .status.recommendation.containerRecommendations[0].target
  }'

echo "=== ORPHANED LOAD BALANCERS ==="
kubectl get svc -A --field-selector spec.type=LoadBalancer -o json | jq '
  .items[] | select(.spec.clusterIP != null) |
  {ns: .metadata.namespace, name: .metadata.name, ip: .status.loadBalancer.ingress[0].ip}'

Cost Alerts & Budgets

PrometheusRule: Cost Anomaly Alerts

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: cost-management-alerts
  namespace: monitoring
  labels:
    prometheus: kube-prometheus
    role: alert-rules
spec:
  groups:
  - name: cost.efficiency
    rules:

    # Namespace CPU efficiency < 20% for 1 hour (severe waste)
    - alert: NamespaceLowCPUEfficiency
      expr: |
        (
          sum by (namespace) (
            rate(container_cpu_usage_seconds_total{container!="",container!="POD"}[30m])
          )
          /
          sum by (namespace) (
            kube_pod_container_resource_requests{resource="cpu",container!=""}
          )
        ) < 0.20
      for: 1h
      labels:
        severity: warning
      annotations:
        summary: "Low CPU efficiency in namespace {{ $labels.namespace }}"
        description: "CPU efficiency is {{ $value | humanizePercentage }}. Run VPA analysis."

    # Unattached (Released) PersistentVolumes — pay for storage doing nothing
    - alert: OrphanedPersistentVolume
      expr: |
        kube_persistentvolume_status_phase{phase="Released"} > 0
      for: 30m
      labels:
        severity: warning
      annotations:
        summary: "Orphaned PersistentVolume: {{ $labels.persistentvolume }}"
        description: "PV is in Released state and still incurring storage costs. Delete or recycle."

    # Idle LoadBalancer service (no endpoints for 30 minutes)
    - alert: IdleLoadBalancer
      expr: |
        kube_service_spec_type{type="LoadBalancer"} unless
        kube_endpoint_address_available > 0
      for: 30m
      labels:
        severity: warning
      annotations:
        summary: "Idle LoadBalancer {{ $labels.namespace }}/{{ $labels.service }}"
        description: "LoadBalancer has no healthy endpoints. Still incurring hourly LB cost."

    # Node underutilization — paying for idle nodes
    - alert: NodeLowCPUUtilization
      expr: |
        (1 - avg by (node) (
          rate(node_cpu_seconds_total{mode="idle"}[10m])
        )) < 0.10
      for: 2h
      labels:
        severity: info
      annotations:
        summary: "Node {{ $labels.node }} CPU < 10% for 2 hours"
        description: "Consider Karpenter consolidation or draining this node."

    # Namespace exceeding 90% of CPU quota (may need quota increase or right-sizing)
    - alert: NamespaceCPURequestsNearQuota
      expr: |
        (
          kube_resourcequota{type="used",resource="requests.cpu"}
          /
          kube_resourcequota{type="hard",resource="requests.cpu"}
        ) > 0.90
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "{{ $labels.namespace }} CPU requests at {{ $value | humanizePercentage }} of quota"

OpenCost Budget API

# Create a budget via Kubecost API (requires Kubecost enterprise for enforcement)
curl -X POST http://kubecost.kubecost.svc:9090/budgets \
  -H "Content-Type: application/json" \
  -d '{
    "name": "payments-team-monthly",
    "window": "month",
    "amount": 5000,
    "filters": [{"property": "namespace", "value": "payments-api-production"}],
    "actions": [{"threshold": 0.80, "type": "slack",
                 "target": "https://hooks.slack.com/..."}]
  }'

Best Practices

Right-Size Before Scaling Out

A 5× over-provisioned pod running on 10 replicas costs as much as 50 right-sized pods. Always run VPA in Off mode on new services for 2 weeks to collect recommendations, then apply targets before considering horizontal scale-out.

Spot for All Stateless Workloads

Design stateless services to tolerate 2-minute eviction from day one — terminationGracePeriodSeconds: 60, preStop sleep 5, ≥2 replicas, topology spread across AZs. Spot can cut compute bills by 60–80%.

Enforce Resource Requests via Policy

Workloads without resource requests cannot be bin-packed by the scheduler. Use the Kyverno require-requests-limits policy from 05-policy-enforcement.html — this is a prerequisite for all cost optimization.

Label Everything for Attribution

Cost allocation is only as good as your label taxonomy. Enforce team, env, cost-center labels on all namespaces and workloads via Kyverno. Without labels, 30–40% of spend is "unallocated" and impossible to optimize.

Scale to Zero with KEDA

Non-production environments and event-driven workers that sit idle between jobs are zero-cost candidates. KEDA ScaledObjects with minReplicaCount: 0 eliminate idle compute entirely when no work is queued.

Weekly Cost Reviews

Cost waste accumulates silently. Schedule a 30-minute weekly review using the OpenCost dashboard: check top-10 namespaces for efficiency regression, review VPA recommendations, verify no orphaned PVs or idle LBs accumulated.

Commit After Observing

Buy Reserved Instances/Savings Plans only after 3+ months of stable production usage data. Commit to ~70% of your average baseline, leave 30% for on-demand. Over-committing to the wrong instance types locks in waste.

Watch Network Costs

Cross-AZ traffic (a pod in AZ-a calling a pod in AZ-b) costs $0.01/GB and adds up fast at scale. Use topology-aware routing (topologyKeys: topology.kubernetes.io/zone in EndpointSlice or Cilium topology-aware hints) to prefer same-AZ communication.

Coverage: 07 · Cost Management

Kubernetes cost model diagram (compute/storage/network/managed services with % of bill)
Cost vs efficiency vs waste vs right-sizing gap vs utilization definitions table
Node billing callout (bills by node not pod; 85% waste at 15% utilization)
OpenCost architecture diagram (cloud APIs + K8s API → allocation engine → Prometheus/Grafana)
OpenCost Helm install (defaultClusterId, cloudCost enabled, AWS CUR Athena integration secret)
OpenCost API queries (namespace 7d/label 30d/pod namespace/asset costs/efficiency ratio via jq)
Kubecost Helm install (external Prometheus, savings.enabled, networkCosts.enabled)
OpenCost Prometheus metrics reference (node_total_hourly_cost, container_cpu/memory_allocation)
Tagging strategy YAML (team/env/cost-center/service/product/component labels)
EC2NodeClass tags block for Karpenter node AWS tag propagation
OpenCost allocation by label (monthly per-team, by env, multi-level team+service)
Showback vs chargeback comparison cards
Shared cost allocation strategies table (even split / proportional / weighted by usage / fixed overhead)
PromQL: CPU and memory efficiency ratios per deployment and namespace
Goldilocks Helm install (VPA recommendation-only mode + dashboard)
Right-sizing findings table (5 patterns: CPU over-provisioned / memory over-provisioned / limits>>requests / no limits / batch with prod requests)
VPA modes table (Off/Initial/Recreate/Auto with behavior, use case, risk)
VPA CRD: payments-api-vpa with minAllowed/maxAllowed/controlledResources
kubectl describe VPA output interpretation (LowerBound/Target/UncappedTarget/UpperBound)
Bulk VPA recommendation export via kubectl + jq
VPA + HPA conflict callout (do not use both on CPU; use HPA on custom metrics)
Workload suitability for spot table (stateless/batch/CI/dev/stateful/monitoring/system)
Karpenter NodePool: spot+on-demand requirements, instance generation/size constraints, consolidation budget 20%
AWS Node Termination Handler Helm install (SQS queue, spot interruption + rebalance + scheduled event draining)
Spot-resilient Deployment: maxUnavailable:1, topologySpreadConstraints (zone + capacity-type), terminationGracePeriodSeconds:60, preStop sleep
Karpenter consolidation monitoring (events, logs, nodeclaims)
Descheduler Helm install + DeschedulerPolicy: RemoveDuplicates/LowNodeUtilization(20→50%)/RemovePodsViolatingTopologySpreadConstraint
KEDA ScaledObject: scale-to-zero on SQS queue (minReplicaCount:0, queueLength target, IRSA)
Optimal coverage mix diagram (70% savings plans + 20% spot + 10% on-demand → ~45% discount)
AWS CLI: get-savings-plans-coverage + get-reservation-purchase-recommendation
Savings plan types comparison table (Compute/EC2 Instance/Reserved/GCP CUD/Azure RI with flexibility and discount)
Compute Savings Plans for Karpenter callout (commit to dollar amount not instance type)
Efficiency KPI targets (>60% CPU util, >70% RAM util, <30% idle nodes, 70%+ spot coverage)
Key efficiency PromQL queries (cluster CPU/memory/node allocatable/idle nodes/cost per request/namespace waste)
Grafana dashboard panels table (8 panels with queries, thresholds)
FinOps maturity table: Crawl/Walk/Run for Inform/Optimize/Operate phases
Weekly FinOps review shell script (cost anomalies/idle nodes/unattached PVs/VPA recs/orphaned LBs)
PrometheusRule: NamespaceLowCPUEfficiency / OrphanedPersistentVolume / IdleLoadBalancer / NodeLowCPUUtilization / NamespaceCPURequestsNearQuota
OpenCost/Kubecost Budget API (POST budget with namespace filter + Slack webhook threshold)
8 best practices cards (right-size before scale / spot for stateless / enforce requests / label everything / KEDA scale-to-zero / weekly reviews / commit after observing / watch network costs)