Kubernetes Cost Management
Complete guide to understanding, attributing, and optimizing cloud-native infrastructure spend — from resource right-sizing and spot instances to chargeback models, OpenCost, Kubecost, and FinOps practices for Kubernetes platforms.
Contents
Kubernetes Cost Model
Cloud infrastructure costs in Kubernetes come from multiple sources that need to be decomposed and allocated to tenants. Understanding the model is prerequisite to optimization.
Cost vs Efficiency vs Waste
| Term | Definition | Measured By |
|---|---|---|
| Cost | Dollars spent on infrastructure | Cloud bill + cost allocation tools |
| Efficiency | Workload output per dollar (requests/s per $, SLO met per $) | Business metrics / cost per request |
| Waste | Resources paid for but not used (idle CPU/RAM, orphaned PVs, zombie load balancers) | requests.cpu vs actual usage; PV Bound but no pods |
| Right-sizing gap | Difference between resource requests and actual usage | VPA recommendations, Goldilocks |
| Utilization | Actual usage / requested resources | node_cpu_utilization, container_memory_working_set |
OpenCost & Kubecost
OpenCost is the CNCF-incubating open standard and open-source implementation for Kubernetes cost monitoring. Kubecost builds a commercial layer on top with additional features like savings recommendations, cluster right-sizing, and multi-cloud support.
Install OpenCost
helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm repo update
helm install opencost opencost/opencost \
--namespace opencost \
--create-namespace \
--version 1.42.0 \
--set opencost.exporter.defaultClusterId=prod-us-east-1 \
--set opencost.prometheus.internal.enabled=true \
--set opencost.ui.enabled=true \
--set opencost.cloudCost.enabled=true
# For AWS: provide cloud integration credentials
kubectl create secret generic cloud-integration \
--namespace opencost \
--from-literal=cloud-integration.json='{
"aws": [{
"athenaBucketName": "s3://my-cur-bucket",
"athenaRegion": "us-east-1",
"athenaDatabase": "athenacurcfn_my_database",
"athenaTable": "my_cur_table",
"projectID": "123456789012",
"serviceKeyName": "opencost-aws-access"
}]
}'
OpenCost API Queries
# Cost by namespace for last 7 days
curl "http://opencost.opencost.svc:9003/allocation/compute?window=7d&aggregate=namespace&accumulate=true" | jq .
# Cost by label (e.g., by team)
curl "http://opencost.opencost.svc:9003/allocation/compute?window=30d&aggregate=label:team&accumulate=true" | jq .
# Cost breakdown for a specific namespace
curl "http://opencost.opencost.svc:9003/allocation/compute?window=1d&aggregate=pod&namespace=payments-api-production" | jq .
# Asset costs (nodes, PVs, load balancers)
curl "http://opencost.opencost.svc:9003/assets?window=7d&aggregate=type" | jq .
# Efficiency: CPU/RAM request utilization
curl "http://opencost.opencost.svc:9003/allocation/compute?window=1d&aggregate=namespace" | \
jq '.data[0] | to_entries[] | {
namespace: .key,
cpuEfficiency: .value.cpuEfficiency,
ramEfficiency: .value.ramEfficiency,
totalCost: .value.totalCost
}'
Kubecost Install (with savings features)
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--version 2.3.4 \
--set kubecostToken="your-token" \
--set global.prometheus.enabled=false \
--set global.prometheus.fqdn=http://kube-prometheus-stack-prometheus.monitoring:9090 \
--set kubecostProductConfigs.clusterName=prod-us-east-1 \
--set kubecostProductConfigs.currencyCode=USD \
--set savings.enabled=true \
--set networkCosts.enabled=true
OpenCost Prometheus Metrics
# Key OpenCost metrics pushed to Prometheus
opencost_load_balancer_cost # per LB hourly cost
opencost_node_total_hourly_cost # per node
opencost_pod_seconds_total # pod lifecycle seconds
node_total_hourly_cost # node cost (from cloud pricing)
container_cpu_allocation # CPU core-hours allocated
container_memory_allocation_bytes # RAM byte-hours allocated
# Efficiency metrics (also via kube-state-metrics + cAdvisor)
container_cpu_usage_seconds_total # actual CPU use
container_memory_working_set_bytes # actual memory use
kube_pod_container_resource_requests # requested resources
Cost Attribution & Chargeback
Cost attribution maps cloud spend back to business units, teams, services, or features. This is the foundation of FinOps — you cannot optimize what you cannot measure and attribute.
Tagging Strategy
# Required labels on all K8s workloads (enforced by Kyverno — see 05-policy-enforcement.html)
# These labels flow into cost allocation tools
metadata:
labels:
team: payments
env: production
cost-center: CC-4892
service: payments-api
product: checkout-flow # maps to P&L line
component: backend # frontend | backend | worker | batch
AWS Resource Tagging (Karpenter nodes inherit pod labels)
# EC2NodeClass: propagate K8s labels to EC2 instance tags
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: general
spec:
amiFamily: AL2
role: karpenter-node-role
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: prod-us-east-1
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: prod-us-east-1
tags:
# Static tags on every node
Cluster: prod-us-east-1
ManagedBy: karpenter
Environment: production
# Propagate workload labels to node tags via spec.kubelet.clusterDNS
# Note: Karpenter 0.33+ supports kubelet.nodeLabels propagation
OpenCost Allocation by Label
# Monthly cost per team
curl "http://opencost.opencost.svc:9003/allocation/compute?window=month&aggregate=label:team&accumulate=true" | \
jq '.data[0] | to_entries | sort_by(-.value.totalCost) |
.[] | "\(.key): $\(.value.totalCost | . * 100 | round / 100)"'
# Cost split by environment
curl "http://opencost.opencost.svc:9003/allocation/compute?window=30d&aggregate=label:env&accumulate=true" | jq .
# Multi-level: team + service (groupBy in Kubecost UI)
curl "http://opencost.opencost.svc:9003/allocation/compute?window=7d&aggregate=label:team,label:service" | jq .
Chargeback vs Showback
Showback
Teams see their costs but are not financially charged. Builds awareness and accountability without creating accounting complexity. Good starting point. Teams respond to social pressure but have no financial incentive to optimize.
Chargeback
Teams are actually billed — deducted from their budget. Strongest incentive to right-size and eliminate waste. Requires accurate allocation data and agreed-upon shared cost methodology (how do you split shared infra like monitoring?).
Shared Cost Allocation Strategies
| Strategy | How It Works | Fair For | Avoid When |
|---|---|---|---|
| Even split | Divide shared infra cost equally among all tenants | Small teams, similar sizes | One large + many small tenants |
| Proportional | Each tenant pays share proportional to their workload spend | Most platforms (default in Kubecost) | Teams have very different usage patterns |
| Weighted by usage | Share monitoring/logging costs by metric cardinality or log volume emitted | Observability cost attribution | Hard to instrument accurately |
| Fixed overhead rate | Flat per-namespace fee covers shared platform costs | Simplicity, SaaS model | Large size variance across tenants |
Resource Right-Sizing
Over-provisioned resource requests are the most common source of Kubernetes cost waste. Teams set requests high "to be safe" and never revisit them. Right-sizing closes the gap between requested and actual resources.
Identifying Over-Provisioned Workloads
# Find pods using < 20% of their CPU request
kubectl top pods -A --sort-by=cpu | awk 'NR>1 {print $1, $2, $3}'
# PromQL: CPU efficiency per deployment (lower = more waste)
# Ratio < 0.3 means using less than 30% of requested CPU
sum by (namespace, pod) (
rate(container_cpu_usage_seconds_total{container!="",container!="POD"}[5m])
)
/
sum by (namespace, pod) (
kube_pod_container_resource_requests{resource="cpu",container!=""}
)
# Memory efficiency (working set vs request)
sum by (namespace, pod) (
container_memory_working_set_bytes{container!="",container!="POD"}
)
/
sum by (namespace, pod) (
kube_pod_container_resource_requests{resource="memory",container!=""}
)
Goldilocks — VPA Recommendation UI
# Goldilocks runs VPA in recommendation-only mode and surfaces results in a UI
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm repo update
helm install goldilocks fairwinds-stable/goldilocks \
--namespace goldilocks \
--create-namespace \
--set vpa.enabled=true \
--set dashboard.enabled=true
# Label namespaces to enable Goldilocks analysis
kubectl label ns payments-api-production goldilocks.fairwinds.com/enabled=true
# Goldilocks dashboard shows VPA recommendations per container
# with QoS class (Guaranteed/Burstable) and cost impact estimates
Typical Right-Sizing Findings
| Pattern | Example | Correction | Typical Savings |
|---|---|---|---|
| CPU request >> actual usage | requests.cpu: 1000m, actual: 80m | Set requests.cpu: 200m (add 20% buffer) | 5× reduction in CPU allocation cost |
| Memory request >> working set | requests.memory: 2Gi, actual: 200Mi | Set requests.memory: 300Mi | ~85% memory cost reduction |
| Limits >> requests (burstable) | limit: 4Gi, request: 128Mi | Tighten limit to 2×–4× request | Prevents memory bombs, better bin-packing |
| No limits set | Pod can use unlimited CPU/RAM | Set limits via LimitRange defaults or explicit spec | Node stability; predictable cost |
| Batch jobs with prod-class requests | CronJob with requests.cpu: 500m | Use batch-low PriorityClass, reduce requests | 10–30% on batch-heavy workloads |
Vertical Pod Autoscaler (VPA)
VPA automatically adjusts CPU and memory requests based on observed usage history. It has three update modes — choose carefully, as Recreate mode restarts pods, which has availability implications.
Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
# Or via Helm
helm repo add cowboysysop https://cowboysysop.github.io/charts/
helm install vpa cowboysysop/vertical-pod-autoscaler \
--namespace kube-system
VPA Modes
| Mode | Behavior | Use When | Risk |
|---|---|---|---|
Off | Only compute recommendations; no changes | Audit & right-sizing review with Goldilocks | None — read-only |
Initial | Set requests on new pods only (at creation) | Gradual adoption; existing pods unchanged | Low — no evictions |
Recreate | Evict & recreate pods to apply updated requests | Stateless pods, non-critical workloads | Pod restarts; availability impact |
Auto | Recreate now; in-place update when K8s supports it (alpha) | Future default when in-place resize is GA | Pod restarts currently |
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: payments-api-vpa
namespace: payments-api-production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: payments-api
updatePolicy:
updateMode: "Off" # Start with Off; observe recommendations first
resourcePolicy:
containerPolicies:
- containerName: payments-api
minAllowed:
cpu: "50m"
memory: 64Mi
maxAllowed:
cpu: "2" # cap to prevent runaway recommendations
memory: 2Gi
controlledResources: ["cpu","memory"]
controlledValues: RequestsAndLimits
Reading VPA Recommendations
# View recommendations
kubectl describe vpa payments-api-vpa -n payments-api-production
# Output section:
# Recommendation:
# Container Recommendations:
# Container Name: payments-api
# Lower Bound:
# Cpu: 50m
# Memory: 123Mi
# Target: ← use this for your resource requests
# Cpu: 210m
# Memory: 340Mi
# Uncapped Target: ← without min/maxAllowed constraints
# Cpu: 170m
# Memory: 280Mi
# Upper Bound: ← maximum VPA would auto-set in Recreate mode
# Cpu: 1200m
# Memory: 2Gi
# Bulk export recommendations for all VPAs
kubectl get vpa -A -o json | jq '
.items[] | {
namespace: .metadata.namespace,
name: .metadata.name,
containers: [.status.recommendation.containerRecommendations[] | {
name: .containerName,
targetCpu: .target.cpu,
targetMemory: .target.memory
}]
}'
Spot / Preemptible Instances
Spot instances (AWS) / Preemptible VMs (GCP) / Spot VMs (Azure) offer 60–90% discount over on-demand pricing in exchange for 2-minute eviction notice. With proper workload design, most stateless workloads can run on spot.
Workload Suitability for Spot
| Workload Type | Spot Suitability | Notes |
|---|---|---|
| Stateless web/API servers (≥2 replicas) | ✅ Excellent | Rolling restart handles eviction; keep 1 on-demand replica |
| Batch/ML training jobs | ✅ Excellent | Checkpoint frequently; retryable; batch-low priority |
| CI/CD runners (Tekton, GitHub Actions) | ✅ Excellent | Jobs are ephemeral by nature |
| Dev/staging environments | ✅ Excellent | Brief disruptions acceptable |
| Stateful services (databases) | ⚠️ Risky | Need at least 1 on-demand replica; external managed DB preferred |
| Prometheus / Grafana | ⚠️ Caution | Scrape gaps during eviction; use remote_write to Thanos/Mimir |
| System components (CoreDNS, Kyverno) | ❌ Avoid | Must stay on on-demand/system node pool |
Karpenter Spot Configuration
# NodePool with spot preference and on-demand fallback
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: spot-general
spec:
template:
metadata:
labels:
node.kubernetes.io/workload: spot
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"] # spot preferred; on-demand fallback
- key: kubernetes.io/arch
operator: In
values: ["amd64","arm64"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["m","c","r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["4"]
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values: ["nano","micro","small"] # too small for bin-packing
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: general
expireAfter: 336h # rotate nodes every 2 weeks (AMI patching)
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s
budgets:
- nodes: "20%" # max 20% of nodes replaced at once
limits:
cpu: "100"
memory: 400Gi
Spot Instance Interruption Handling
# AWS Node Termination Handler — gracefully drain spot nodes
# Karpenter handles this natively via SQS interruption queue
# For managed node groups without Karpenter:
helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler eks/aws-node-termination-handler \
--namespace kube-system \
--set enableSpotInterruptionDraining=true \
--set enableRebalanceMonitoring=true \
--set enableScheduledEventDraining=true \
--set queueURL=https://sqs.us-east-1.amazonaws.com/123456789012/spot-interruption
Designing Workloads for Spot Resilience
apiVersion: apps/v1
kind: Deployment
metadata:
name: payments-api
namespace: payments-api-production
spec:
replicas: 4
strategy:
rollingUpdate:
maxUnavailable: 1 # tolerate 1 pod down during eviction
maxSurge: 1
template:
spec:
# Prefer spot, tolerate on-demand fallback
nodeSelector:
node.kubernetes.io/workload: general
tolerations:
- key: "karpenter.sh/capacity-type"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
# Spread across AZs AND capacity types to reduce correlated eviction
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: payments-api
- maxSkew: 2
topologyKey: karpenter.sh/capacity-type
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: payments-api
# Graceful shutdown — handle SIGTERM before pod is killed
terminationGracePeriodSeconds: 60
containers:
- name: payments-api
lifecycle:
preStop:
exec:
command: ["/bin/sh","-c","sleep 5"] # let LB drain connections
Cost-Aware Scheduling
Kubernetes scheduler places pods based on resource fit. These patterns steer workloads toward cheaper infrastructure without manual intervention.
Node Consolidation with Karpenter
# Karpenter consolidation: pack pods onto fewer nodes, terminate underutilized ones
# Controlled by NodePool disruption.consolidationPolicy: WhenUnderutilized
# Monitor consolidation events
kubectl get events -n karpenter --field-selector reason=Unconsolidatable
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -c controller | grep "consolidat"
# Manual consolidation: simulate what Karpenter would do
kubectl get nodeclaims -o wide # shows which nodes Karpenter manages
Descheduler — Rebalance After Spot Evictions
helm repo add descheduler https://kubernetes-sigs.github.io/descheduler/
helm install descheduler descheduler/descheduler \
--namespace kube-system \
--set schedule="0 */6 * * *"
# Descheduler policy (ConfigMap)
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
- name: default
pluginConfig:
- name: DefaultEvictor
args:
ignorePvcPods: true
evictLocalStoragePods: false
- name: RemoveDuplicates # remove duplicate pods on same node
args: {}
- name: LowNodeUtilization # move pods from underutilized nodes
args:
thresholds:
cpu: 20
memory: 20
pods: 20
targetThresholds:
cpu: 50
memory: 50
pods: 50
- name: RemovePodsViolatingTopologySpreadConstraint # rebalance after spot loss
args:
constraints: ["DoNotSchedule","ScheduleAnyway"]
plugins:
balance:
enabled:
- RemoveDuplicates
- LowNodeUtilization
- RemovePodsViolatingTopologySpreadConstraint
KEDA for Event-Driven Cost Savings
# KEDA ScaledObject: scale to zero when queue is empty
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: payments-worker-scaler
namespace: payments-api-production
spec:
scaleTargetRef:
name: payments-worker
minReplicaCount: 0 # scale to zero — no cost when idle
maxReplicaCount: 20
cooldownPeriod: 300 # seconds before scaling to zero
pollingInterval: 15
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456789/payments-queue
queueLength: "5" # target messages per pod
awsRegion: us-east-1
identityOwner: operator # use IRSA
Reserved Instances & Savings Plans
Committed-use discounts give 30–60% off in exchange for 1- or 3-year commitments. The right mix of reservations + spot + on-demand minimizes cost while maintaining reliability.
Coverage Strategy
Identifying Commitment Opportunities
# AWS CLI: see current On-Demand usage vs Savings Plan coverage
aws ce get-savings-plans-coverage \
--time-period Start=2025-01-01,End=2025-02-01 \
--granularity MONTHLY \
--filter '{"Dimensions":{"Key":"REGION","Values":["us-east-1"]}}' \
--output json
# Get EC2 usage for commitment analysis
aws ce get-reservation-purchase-recommendation \
--service "Amazon EC2" \
--lookback-period-in-days SIXTY_DAYS \
--payment-option NO_UPFRONT \
--term-in-years ONE_YEAR
# Kubecost Savings Plans recommendation (if using Kubecost Enterprise)
curl http://kubecost.kubecost.svc:9090/savings/requestSizing | jq .
Compute Savings Plans vs EC2 Instance Savings Plans
| Type | Applies To | Flexibility | Discount | Best For |
|---|---|---|---|---|
| Compute Savings Plans | EC2, Fargate, Lambda | Any instance family, size, region, OS | Up to 66% | Kubernetes workloads (Karpenter changes instance types) |
| EC2 Instance Savings Plans | EC2 only | Any size within committed family+region | Up to 72% | Fixed instance families (e.g., always m5) |
| Reserved Instances (Standard) | EC2 only | Exact instance type+AZ or region | Up to 75% | Stable, predictable fixed workloads |
| GCP Committed Use Discounts | vCPU + RAM commitments | Any machine type within commitment | Up to 57% | GKE workloads |
| Azure Reserved VM Instances | Specific VM family | Size flexibility within family | Up to 72% | AKS fixed node pools |
Cluster Efficiency Metrics
Track these KPIs on a weekly cadence. Healthy clusters sit above 60% CPU utilization and 70% memory utilization against allocated resources.
Key Efficiency PromQL Queries
-- Cluster-wide CPU utilization (actual / requested)
sum(rate(container_cpu_usage_seconds_total{container!="",container!="POD"}[5m]))
/
sum(kube_pod_container_resource_requests{resource="cpu",container!=""})
-- Cluster-wide memory efficiency
sum(container_memory_working_set_bytes{container!="",container!="POD"})
/
sum(kube_pod_container_resource_requests{resource="memory",container!=""})
-- Node allocatable utilization (requested / allocatable)
sum(kube_pod_container_resource_requests{resource="cpu"})
/
sum(kube_node_status_allocatable{resource="cpu"})
-- Idle nodes (no pods scheduled, excluding daemonset-only)
count(
kube_node_status_condition{condition="Ready",status="true"}
) - count(
count by (node) (kube_pod_info{node!=""})
)
-- Cost per request (requires request-rate metric from APM + node cost)
sum(rate(http_requests_total[5m]))
/
sum(node_total_hourly_cost) # from OpenCost
-- Namespace waste: allocated but unused CPU
sum by (namespace) (
kube_pod_container_resource_requests{resource="cpu"}
) - sum by (namespace) (
rate(container_cpu_usage_seconds_total{container!="",container!="POD"}[5m])
)
Cost Efficiency Grafana Dashboard Panels
| Panel | Query Type | Threshold |
|---|---|---|
| Monthly cloud spend trend | OpenCost API + Prometheus | Alert if +20% MoM |
| Top 10 most expensive namespaces | OpenCost allocation API (bar chart) | Review top 3 for right-sizing |
| CPU efficiency by namespace | PromQL ratio (actual/request) | Red: <30%, Yellow: 30–60%, Green: >60% |
| Memory efficiency by namespace | PromQL ratio | Red: <40%, Yellow: 40–70%, Green: >70% |
| Spot coverage % | kube_node_labels{label_karpenter_sh_capacity-type="spot"} | Target: >60% |
| Idle node count | PromQL (nodes with no non-daemonset pods) | Alert if >3 nodes idle >30min |
| Unattached PVs | kube_persistentvolume_status_phase{phase="Released"} | Alert on any Released PVs |
| Orphaned load balancers | kube_service_info{type="LoadBalancer"} with no Endpoints | Alert immediately |
FinOps Practices
FinOps is a cultural practice that brings financial accountability to cloud spend. The FinOps lifecycle has three phases: Inform, Optimize, Operate.
FinOps Maturity for Kubernetes
| Phase | Crawl | Walk | Run |
|---|---|---|---|
| Inform | Cloud bill visible to central team | Per-namespace cost with showback | Real-time cost per team/service/feature; unit economics (cost per request) |
| Optimize | Tag cloud resources; eliminate obvious waste (unused LBs, idle nodes) | Right-sizing recommendations acted on quarterly; spot for dev | Automated right-sizing (VPA); spot for >60% workloads; KEDA scale-to-zero; savings plan coverage >70% |
| Operate | Monthly cost reviews | Cost budgets per team; chargeback model defined | Cost alerts trigger team tickets; anomaly detection; cost encoded in architectural decisions |
Weekly FinOps Review Checklist
#!/bin/bash
# weekly-finops-review.sh
echo "=== COST ANOMALIES ==="
# OpenCost: compare this week vs last week by namespace
curl -s "http://opencost.opencost.svc:9003/allocation/compute?window=7d&aggregate=namespace&accumulate=true" | \
jq '.data[0] | to_entries | sort_by(-.value.totalCost) | .[0:10] |
.[] | "\(.key): $\(.value.totalCost | . * 100 | round / 100)"'
echo "=== IDLE NODES ==="
kubectl get nodes -o json | jq '
.items[] | select(.metadata.labels["karpenter.sh/capacity-type"] != null) |
{name: .metadata.name, capacity: .metadata.labels["karpenter.sh/capacity-type"]}'
echo "=== UNATTACHED PVs ==="
kubectl get pv -o json | jq '.items[] | select(.status.phase == "Released") |
{name: .metadata.name, capacity: .spec.capacity.storage, storageClass: .spec.storageClassName}'
echo "=== VPA RECOMMENDATIONS (top savings) ==="
kubectl get vpa -A -o json | jq '
.items[] | {
ns: .metadata.namespace, name: .metadata.name,
rec: .status.recommendation.containerRecommendations[0].target
}'
echo "=== ORPHANED LOAD BALANCERS ==="
kubectl get svc -A --field-selector spec.type=LoadBalancer -o json | jq '
.items[] | select(.spec.clusterIP != null) |
{ns: .metadata.namespace, name: .metadata.name, ip: .status.loadBalancer.ingress[0].ip}'
Cost Alerts & Budgets
PrometheusRule: Cost Anomaly Alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cost-management-alerts
namespace: monitoring
labels:
prometheus: kube-prometheus
role: alert-rules
spec:
groups:
- name: cost.efficiency
rules:
# Namespace CPU efficiency < 20% for 1 hour (severe waste)
- alert: NamespaceLowCPUEfficiency
expr: |
(
sum by (namespace) (
rate(container_cpu_usage_seconds_total{container!="",container!="POD"}[30m])
)
/
sum by (namespace) (
kube_pod_container_resource_requests{resource="cpu",container!=""}
)
) < 0.20
for: 1h
labels:
severity: warning
annotations:
summary: "Low CPU efficiency in namespace {{ $labels.namespace }}"
description: "CPU efficiency is {{ $value | humanizePercentage }}. Run VPA analysis."
# Unattached (Released) PersistentVolumes — pay for storage doing nothing
- alert: OrphanedPersistentVolume
expr: |
kube_persistentvolume_status_phase{phase="Released"} > 0
for: 30m
labels:
severity: warning
annotations:
summary: "Orphaned PersistentVolume: {{ $labels.persistentvolume }}"
description: "PV is in Released state and still incurring storage costs. Delete or recycle."
# Idle LoadBalancer service (no endpoints for 30 minutes)
- alert: IdleLoadBalancer
expr: |
kube_service_spec_type{type="LoadBalancer"} unless
kube_endpoint_address_available > 0
for: 30m
labels:
severity: warning
annotations:
summary: "Idle LoadBalancer {{ $labels.namespace }}/{{ $labels.service }}"
description: "LoadBalancer has no healthy endpoints. Still incurring hourly LB cost."
# Node underutilization — paying for idle nodes
- alert: NodeLowCPUUtilization
expr: |
(1 - avg by (node) (
rate(node_cpu_seconds_total{mode="idle"}[10m])
)) < 0.10
for: 2h
labels:
severity: info
annotations:
summary: "Node {{ $labels.node }} CPU < 10% for 2 hours"
description: "Consider Karpenter consolidation or draining this node."
# Namespace exceeding 90% of CPU quota (may need quota increase or right-sizing)
- alert: NamespaceCPURequestsNearQuota
expr: |
(
kube_resourcequota{type="used",resource="requests.cpu"}
/
kube_resourcequota{type="hard",resource="requests.cpu"}
) > 0.90
for: 15m
labels:
severity: warning
annotations:
summary: "{{ $labels.namespace }} CPU requests at {{ $value | humanizePercentage }} of quota"
OpenCost Budget API
# Create a budget via Kubecost API (requires Kubecost enterprise for enforcement)
curl -X POST http://kubecost.kubecost.svc:9090/budgets \
-H "Content-Type: application/json" \
-d '{
"name": "payments-team-monthly",
"window": "month",
"amount": 5000,
"filters": [{"property": "namespace", "value": "payments-api-production"}],
"actions": [{"threshold": 0.80, "type": "slack",
"target": "https://hooks.slack.com/..."}]
}'
Best Practices
Right-Size Before Scaling Out
A 5× over-provisioned pod running on 10 replicas costs as much as 50 right-sized pods. Always run VPA in Off mode on new services for 2 weeks to collect recommendations, then apply targets before considering horizontal scale-out.
Spot for All Stateless Workloads
Design stateless services to tolerate 2-minute eviction from day one — terminationGracePeriodSeconds: 60, preStop sleep 5, ≥2 replicas, topology spread across AZs. Spot can cut compute bills by 60–80%.
Enforce Resource Requests via Policy
Workloads without resource requests cannot be bin-packed by the scheduler. Use the Kyverno require-requests-limits policy from 05-policy-enforcement.html — this is a prerequisite for all cost optimization.
Label Everything for Attribution
Cost allocation is only as good as your label taxonomy. Enforce team, env, cost-center labels on all namespaces and workloads via Kyverno. Without labels, 30–40% of spend is "unallocated" and impossible to optimize.
Scale to Zero with KEDA
Non-production environments and event-driven workers that sit idle between jobs are zero-cost candidates. KEDA ScaledObjects with minReplicaCount: 0 eliminate idle compute entirely when no work is queued.
Weekly Cost Reviews
Cost waste accumulates silently. Schedule a 30-minute weekly review using the OpenCost dashboard: check top-10 namespaces for efficiency regression, review VPA recommendations, verify no orphaned PVs or idle LBs accumulated.
Commit After Observing
Buy Reserved Instances/Savings Plans only after 3+ months of stable production usage data. Commit to ~70% of your average baseline, leave 30% for on-demand. Over-committing to the wrong instance types locks in waste.
Watch Network Costs
Cross-AZ traffic (a pod in AZ-a calling a pod in AZ-b) costs $0.01/GB and adds up fast at scale. Use topology-aware routing (topologyKeys: topology.kubernetes.io/zone in EndpointSlice or Cilium topology-aware hints) to prefer same-AZ communication.
Coverage: 07 · Cost Management
- Kubernetes cost model diagram (compute/storage/network/managed services with % of bill)
- Cost vs efficiency vs waste vs right-sizing gap vs utilization definitions table
- Node billing callout (bills by node not pod; 85% waste at 15% utilization)
- OpenCost architecture diagram (cloud APIs + K8s API → allocation engine → Prometheus/Grafana)
- OpenCost Helm install (defaultClusterId, cloudCost enabled, AWS CUR Athena integration secret)
- OpenCost API queries (namespace 7d/label 30d/pod namespace/asset costs/efficiency ratio via jq)
- Kubecost Helm install (external Prometheus, savings.enabled, networkCosts.enabled)
- OpenCost Prometheus metrics reference (node_total_hourly_cost, container_cpu/memory_allocation)
- Tagging strategy YAML (team/env/cost-center/service/product/component labels)
- EC2NodeClass tags block for Karpenter node AWS tag propagation
- OpenCost allocation by label (monthly per-team, by env, multi-level team+service)
- Showback vs chargeback comparison cards
- Shared cost allocation strategies table (even split / proportional / weighted by usage / fixed overhead)
- PromQL: CPU and memory efficiency ratios per deployment and namespace
- Goldilocks Helm install (VPA recommendation-only mode + dashboard)
- Right-sizing findings table (5 patterns: CPU over-provisioned / memory over-provisioned / limits>>requests / no limits / batch with prod requests)
- VPA modes table (Off/Initial/Recreate/Auto with behavior, use case, risk)
- VPA CRD: payments-api-vpa with minAllowed/maxAllowed/controlledResources
- kubectl describe VPA output interpretation (LowerBound/Target/UncappedTarget/UpperBound)
- Bulk VPA recommendation export via kubectl + jq
- VPA + HPA conflict callout (do not use both on CPU; use HPA on custom metrics)
- Workload suitability for spot table (stateless/batch/CI/dev/stateful/monitoring/system)
- Karpenter NodePool: spot+on-demand requirements, instance generation/size constraints, consolidation budget 20%
- AWS Node Termination Handler Helm install (SQS queue, spot interruption + rebalance + scheduled event draining)
- Spot-resilient Deployment: maxUnavailable:1, topologySpreadConstraints (zone + capacity-type), terminationGracePeriodSeconds:60, preStop sleep
- Karpenter consolidation monitoring (events, logs, nodeclaims)
- Descheduler Helm install + DeschedulerPolicy: RemoveDuplicates/LowNodeUtilization(20→50%)/RemovePodsViolatingTopologySpreadConstraint
- KEDA ScaledObject: scale-to-zero on SQS queue (minReplicaCount:0, queueLength target, IRSA)
- Optimal coverage mix diagram (70% savings plans + 20% spot + 10% on-demand → ~45% discount)
- AWS CLI: get-savings-plans-coverage + get-reservation-purchase-recommendation
- Savings plan types comparison table (Compute/EC2 Instance/Reserved/GCP CUD/Azure RI with flexibility and discount)
- Compute Savings Plans for Karpenter callout (commit to dollar amount not instance type)
- Efficiency KPI targets (>60% CPU util, >70% RAM util, <30% idle nodes, 70%+ spot coverage)
- Key efficiency PromQL queries (cluster CPU/memory/node allocatable/idle nodes/cost per request/namespace waste)
- Grafana dashboard panels table (8 panels with queries, thresholds)
- FinOps maturity table: Crawl/Walk/Run for Inform/Optimize/Operate phases
- Weekly FinOps review shell script (cost anomalies/idle nodes/unattached PVs/VPA recs/orphaned LBs)
- PrometheusRule: NamespaceLowCPUEfficiency / OrphanedPersistentVolume / IdleLoadBalancer / NodeLowCPUUtilization / NamespaceCPURequestsNearQuota
- OpenCost/Kubecost Budget API (POST budget with namespace filter + Slack webhook threshold)
- 8 best practices cards (right-size before scale / spot for stateless / enforce requests / label everything / KEDA scale-to-zero / weekly reviews / commit after observing / watch network costs)