📋 Page Coverage Checklist

Requests vs limits: scheduling vs runtime enforcement distinction

CPU: CFS shares (requests) and CFS quota throttling (limits)

Memory: OOM scoring (requests), hard OOM kill (limits)

QoS classes: Guaranteed, Burstable, BestEffort — eviction order

LimitRange: Container/Pod/PVC scopes, all field types

ResourceQuota: namespace quotas, scoped quotas (BestEffort/PriorityClass)

Node allocatable: capacity vs allocatable, kube-reserved, system-reserved, eviction threshold

CPU throttling trap: CFS quota mechanics, detection via throttled_periods

Right-sizing workflow: kubectl top, VPA Off mode, Goldilocks

Extended resources: NVIDIA GPU, device plugins, fractional GPU

Ephemeral storage: requests/limits, container logs, eviction

Pod overhead: RuntimeClass overhead field for gVisor/Kata

FinOps: cost attribution with namespace labels, chargeback patterns

5 metrics + 4 alerting rules + 5 runbooks + 8 best practices

Resource Management

Requests, limits, QoS classes, LimitRanges, ResourceQuotas, and right-sizing

v1/Pod v1/ResourceQuota v1/LimitRange Platform Engineer

Resource management in Kubernetes operates at two distinct layers: the scheduler uses resource requests to decide where pods land, and the Linux kernel uses resource limits to constrain what running containers can consume. Understanding this separation — and the mechanisms behind each — is essential for building clusters that are both highly utilized and operationally stable.

Requests vs Limits

Resource model summary: spec.containers[].resources: requests: ← Used by SCHEDULER for placement cpu: 500m ← Reserve 0.5 CPU shares on the node memory: 512Mi ← Reserve 512MiB for OOM scoring limits: ← Used by KERNEL at runtime cpu: 2 ← CFS quota: max 2 CPU seconds per second memory: 1Gi ← Hard limit: OOM kill if exceeded Scheduler sees: sum(requests) ≤ node.allocatable Kernel enforces: container cannot exceed limits Overcommit: Node capacity: 8 CPU, 32Gi memory Total requests: 6 CPU, 24Gi → fits on node ✓ Total limits: 24 CPU, 96Gi → 3× overcommitted (allowed)

Property	Requests	Limits
Used by	kube-scheduler (pod placement)	Linux kernel cgroups (runtime enforcement)
CPU mechanism	CFS cpu.shares (proportional)	CFS cpu.cfs_quota_us (hard cap)
Memory mechanism	OOM score adjustment	cgroup memory.limit_in_bytes (hard kill)
Effect if exceeded	Pod won't schedule (Pending)	CPU: throttled; Memory: OOM killed
Required?	No (but strongly recommended)	No (but required for Guaranteed QoS)
Node overcommit	Sum of requests ≤ allocatable	Sum of limits may far exceed capacity

CPU: CFS Shares and Quota

CPU Requests → CFS Shares

CPU requests map to Linux CFS (Completely Fair Scheduler) cpu.shares. Shares are proportional — a pod with requests.cpu: 1000m gets twice as much CPU time as one with requests.cpu: 500m when the node is contended. When the node is idle, any container can use all available CPU regardless of its request.

cpu.shares = request_millicores × 1024 / 1000
  → requests.cpu: 500m  → cpu.shares = 512
  → requests.cpu: 1     → cpu.shares = 1024
  → requests.cpu: 250m  → cpu.shares = 256

CPU Limits → CFS Quota (Throttling Trap)

CPU limits map to cpu.cfs_quota_us and cpu.cfs_period_us. Every 100ms (the default period), each container is allocated a quota of CPU time equal to its limit. If a container exhausts its quota before the period ends, it is throttled (suspended) until the next period — even if CPUs are otherwise idle.

CFS quota mechanics (limit: 1 CPU = 100ms quota per 100ms period): Period 1 (0–100ms): Container uses 100ms of CPU → quota exhausted at 80ms Container THROTTLED for remaining 20ms (idle CPU wasted) Period 2 (100–200ms): Quota refills to 100ms Container resumes Result: p99 latency spikes every ~100ms even though the node has free CPU

CPU throttling causes latency spikes, not errors
A throttled container doesn't crash — it silently pauses. This manifests as p99/p999 latency spikes, timeout errors from downstream callers, and HPA confusion (CPU utilization appears low because throttled time doesn't count as "used"). Throttling is one of the most common and least-diagnosed performance issues in Kubernetes. Monitor container_cpu_cfs_throttled_periods_total and alert when throttling exceeds 25%.

# Detect CPU throttling for a container
kubectl exec -n <ns> <pod> -- cat /sys/fs/cgroup/cpu/cpu.stat
# throttled_time: nanoseconds spent throttled
# nr_throttled: number of periods where container was throttled

# Prometheus query for throttling ratio
# (throttled periods / total periods) per container
rate(container_cpu_cfs_throttled_periods_total[5m])
/ rate(container_cpu_cfs_periods_total[5m])

No CPU Limit Pattern

Some teams deliberately omit CPU limits to avoid throttling, relying on requests alone for scheduling. This is viable when:

Nodes run homogeneous workloads with predictable contention
ResourceQuota enforces limits at namespace level (limits.cpu)
LimitRange provides defaults so pods without limits still have them

No CPU limit = unlimited burst potential
Without CPU limits, one misbehaving container can consume all idle CPU on a node, starving other pods' request allocations during CFS contention. If you run without per-pod CPU limits, use ResourceQuota at the namespace level to cap total CPU consumption.

Memory: OOM Scoring and Hard Limits

Memory Requests → OOM Score

Memory requests affect the OOM score adjustment (oom_score_adj) of the container's processes. A lower score means the OOM killer is less likely to kill that process when the node runs out of memory.

OOM score adjustment range: -1000 (never kill) to +1000 (kill first)

  Guaranteed pods (req = limit):   oom_score_adj = -998  (protected)
  Burstable pods:                  oom_score_adj = 2–999  (proportional to memory)
  BestEffort pods (no requests):   oom_score_adj = 1000  (kill first)

  Formula (Burstable):
  oom_score_adj = 1000 - (1000 × memory_request / node_allocatable_memory)

Memory Limits → Hard OOM Kill

When a container's RSS exceeds limits.memory, the kernel's OOM killer sends SIGKILL (exit code 137) to the container. Unlike CPU throttling, this is immediate and unrecoverable — the container is killed and restarted by kubelet (if restartPolicy allows).

# Check if container was OOM killed
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[*].lastState.terminated.reason}'
# → OOMKilled

kubectl describe pod <pod> | grep -A3 "Last State"
# Last State: Terminated
#   Reason: OOMKilled
#   Exit Code: 137

Memory limit = RSS + page cache working set
The cgroup memory accounting includes both anonymous RSS (heap, stack) and file-backed page cache. A container reading large files can be OOM killed even if its heap is small if the total memory.usage_in_bytes hits the limit. Set limits with headroom for file I/O working sets, not just heap.

QoS Classes

Kubernetes assigns one of three QoS classes to every pod based on its resource configuration. This class determines eviction order when nodes face memory pressure.

QoS Class	Criteria	OOM priority	Eviction order
Guaranteed	Every container has both cpu and memory requests AND limits, and requests == limits	oom_score_adj = -998 (protected)	Last to be evicted
Burstable	At least one container has a request or limit set, but not all Guaranteed criteria met	oom_score_adj 2–999 (proportional)	Middle — evicted before Guaranteed
BestEffort	No containers have any requests or limits set	oom_score_adj = 1000 (first killed)	First to be evicted

# Guaranteed QoS — requests must equal limits for ALL containers
containers:
  - name: app
    resources:
      requests:
        cpu: 500m
        memory: 512Mi
      limits:
        cpu: 500m       # ← must equal request
        memory: 512Mi   # ← must equal request

# Burstable QoS — requests < limits (or only some containers have them)
containers:
  - name: app
    resources:
      requests:
        cpu: 200m
        memory: 256Mi
      limits:
        cpu: 2
        memory: 1Gi     # Higher limits allow bursting

# BestEffort QoS — no resources at all (avoid in production)
containers:
  - name: app
    resources: {}       # No requests or limits

# Check QoS class of a pod
kubectl get pod <pod> -o jsonpath='{.status.qosClass}'
# → Guaranteed | Burstable | BestEffort

Guaranteed QoS does not mean no CPU throttling
Even a Guaranteed pod (cpu request = limit) will be throttled by CFS quota when it exceeds its limit. Guaranteed QoS only controls OOM kill order and memory eviction priority, not CPU scheduling behavior.

Node Allocatable

Not all of a node's capacity is available for pods. The scheduler schedules pods against Allocatable, which reserves capacity for the OS, kubelet, and eviction headroom.

Node capacity breakdown: Node capacity (e.g., 16 CPU, 64Gi RAM) │ ├─ kube-reserved (reserved for kubelet, container runtime) │ e.g., 500m CPU, 1Gi memory │ ├─ system-reserved (reserved for OS daemons, sshd, etc.) │ e.g., 500m CPU, 2Gi memory │ ├─ eviction-threshold (kubelet memory buffer) │ e.g., 100Mi memory (soft), 200Mi (hard) │ └─ Allocatable = capacity - kube-reserved - system-reserved - eviction-threshold = 16 CPU - 1 CPU = 15 CPU allocatable = 64Gi - 3.3Gi ≈ 60.7Gi allocatable kubectl describe node <node> | grep -A5 "Allocatable:"

# Check node capacity and allocatable
kubectl describe node <node> | grep -A10 "Capacity:\|Allocatable:"

# Check current resource consumption vs allocatable
kubectl describe node <node> | grep -A20 "Allocated resources:"

# Get allocatable across all nodes (JSON)
kubectl get nodes -o json | jq '.items[] | {
  name: .metadata.name,
  allocatable: .status.allocatable
}'

# kubelet configuration for reservations (in KubeletConfiguration)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
kubeReserved:
  cpu: "500m"
  memory: "1Gi"
  ephemeral-storage: "2Gi"
systemReserved:
  cpu: "500m"
  memory: "2Gi"
evictionHard:
  memory.available: "200Mi"
  nodefs.available: "10%"
  nodefs.inodesFree: "5%"
evictionSoft:
  memory.available: "500Mi"
evictionSoftGracePeriod:
  memory.available: "90s"

LimitRange

LimitRange sets default, minimum, and maximum resource values for containers, pods, and PVCs within a namespace. It applies at admission time — pods created without explicit resource values receive the defaults.

apiVersion: v1
kind: LimitRange
metadata:
  name: platform-limits
  namespace: production
spec:
  limits:
    # Container-level defaults and bounds
    - type: Container
      default:              # Applied as limit if none specified
        cpu: "1"
        memory: 512Mi
      defaultRequest:       # Applied as request if none specified
        cpu: 100m
        memory: 128Mi
      min:                  # Reject pods with requests below this
        cpu: 50m
        memory: 64Mi
      max:                  # Reject pods with requests above this
        cpu: "8"
        memory: 16Gi
      maxLimitRequestRatio: # Reject if limit/request exceeds this ratio
        cpu: "10"           # Prevent limit 10× higher than request
        memory: "4"

    # Pod-level (sum of all containers)
    - type: Pod
      max:
        cpu: "16"
        memory: 32Gi

    # PVC storage bounds
    - type: PersistentVolumeClaim
      min:
        storage: 1Gi
      max:
        storage: 100Gi

LimitRange defaults apply to containers without explicit resources
If a container specifies a limit but no request, LimitRange sets the request equal to the limit. If neither is specified, both default and defaultRequest apply. LimitRange does not retroactively change existing pods — it only affects pods created after the LimitRange exists.

# View effective LimitRange in a namespace
kubectl describe limitrange -n production

# Test what resources a new pod would get
kubectl run test --image=nginx --dry-run=server -o yaml -n production | \
  grep -A10 resources

ResourceQuota

ResourceQuota enforces aggregate resource consumption limits within a namespace. Unlike LimitRange (per-object bounds), ResourceQuota tracks cumulative usage across all objects.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    # Compute resources
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "100"
    limits.memory: 200Gi

    # Object counts
    pods: "200"
    services: "50"
    secrets: "200"
    configmaps: "100"
    persistentvolumeclaims: "50"
    services.loadbalancers: "5"
    services.nodeports: "0"     # Prohibit NodePort services

    # Storage
    requests.storage: 2Ti
    requests.ephemeral-storage: 50Gi

    # Per-StorageClass storage quota
    gold-ssd.storageclass.storage.k8s.io/requests.storage: 500Gi
    standard.storageclass.storage.k8s.io/requests.storage: 1Ti

Scoped Quotas

# Quota applying only to BestEffort pods (no resources set)
apiVersion: v1
kind: ResourceQuota
metadata:
  name: besteffort-quota
  namespace: production
spec:
  hard:
    pods: "10"
  scopeSelector:
    matchExpressions:
      - scopeName: BestEffort

---
# Quota for high-priority batch jobs
apiVersion: v1
kind: ResourceQuota
metadata:
  name: high-priority-quota
  namespace: platform
spec:
  hard:
    requests.cpu: "50"
    requests.memory: 100Gi
    pods: "50"
  scopeSelector:
    matchExpressions:
      - scopeName: PriorityClass
        operator: In
        values: ["high-priority"]

Scope	Matches pods that
`BestEffort`	Have no requests or limits (BestEffort QoS)
`NotBestEffort`	Have at least one request or limit (Guaranteed or Burstable)
`Terminating`	Have `activeDeadlineSeconds` set (Jobs)
`NotTerminating`	Do not have `activeDeadlineSeconds` (long-running workloads)
`PriorityClass`	Have a specific PriorityClass name

# Check quota usage in a namespace
kubectl describe resourcequota -n production

# Get quota as JSON for automation
kubectl get resourcequota production-quota -n production \
  -o jsonpath='{range .status.hard}{@.key}: hard={@.value}, used={.status.used[?(@.key)]}{"\n"}{end}'

# Watch quota consumption
watch kubectl get resourcequota -n production

Extended Resources

Extended resources represent non-standard hardware (GPUs, FPGAs, InfiniBand, SR-IOV NICs). They are advertised by node device plugins via the kubelet API and consumed in pod specs like CPU/memory.

NVIDIA GPU

# Pod requesting 1 NVIDIA GPU
spec:
  containers:
    - name: ml-trainer
      image: nvcr.io/nvidia/pytorch:23.10-py3
      resources:
        requests:
          nvidia.com/gpu: 1
        limits:
          nvidia.com/gpu: 1   # GPU resources: requests must equal limits

GPU resources must have requests == limits
Extended resources like GPUs are integer resources — they cannot be fractionally requested, and requests must always equal limits. Fractional GPU sharing (e.g., NVIDIA MIG, time-slicing) requires specific device plugin configurations that expose virtual GPU resources (e.g., nvidia.com/mig-1g.5gb).

# Check available GPU resources on nodes
kubectl get nodes -o json | jq '.items[] | {
  name: .metadata.name,
  gpus: .status.allocatable["nvidia.com/gpu"]
}'

# Check GPU allocation per pod
kubectl get pods -A -o json | jq '
  .items[] | select(.spec.containers[].resources.requests["nvidia.com/gpu"] != null) | {
    name: .metadata.name,
    namespace: .metadata.namespace,
    gpus: .spec.containers[].resources.requests["nvidia.com/gpu"]
  }'

Custom Extended Resources

# Manually advertise an extended resource on a node (for testing)
kubectl proxy &
curl -X PATCH \
  "http://localhost:8001/api/v1/nodes/<node>/status" \
  -H "Content-Type: application/json-patch+json" \
  -d '[{"op":"add","path":"/status/capacity/example.com~1fpga","value":"2"}]'

Ephemeral Storage

Ephemeral storage includes emptyDir volumes, container logs, and container image layers written at runtime. Like CPU/memory, it can have requests and limits.

resources:
  requests:
    ephemeral-storage: 1Gi    # Scheduler reserves this on the node
  limits:
    ephemeral-storage: 5Gi    # Pod evicted if total ephemeral usage exceeds this

Ephemeral storage is measured as the sum of:

Writable container layer (overlay diff from image)
Container logs written to /var/log/pods/
emptyDir volumes (unless backed by tmpfs/memory)

Log-heavy containers can exhaust ephemeral storage
A container writing 100MB/s of logs with limits.ephemeral-storage: 2Gi will be evicted within 20 seconds of its log rotation window if logs aren't shipped externally. Set explicit log rotation in your container runtime config (--log-opt max-size=100m --log-opt max-file=5 for docker) and forward logs to an external system before relying on ephemeral storage limits.

Pod Overhead (RuntimeClass)

Sandbox runtimes (gVisor, Kata Containers, Firecracker) introduce fixed overhead beyond what the containers request. The overhead field on a RuntimeClass declares this overhead, and the scheduler adds it to pod resource consumption.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: kata-containers
handler: kata
overhead:
  podFixed:
    cpu: 250m         # Fixed overhead per pod for the VM/sandbox runtime
    memory: 128Mi     # Included in scheduler placement decisions
scheduling:
  nodeClassification:
    tolerations:
      - key: kata-containers
        operator: Exists
        effect: NoSchedule

# Pod using this RuntimeClass
spec:
  runtimeClassName: kata-containers
  containers:
    - name: app
      resources:
        requests:
          cpu: 500m      # Scheduler places on node with >= 750m available
          memory: 512Mi  # (500m + 250m overhead = 750m total charged)
        limits:
          cpu: 2
          memory: 1Gi

Right-Sizing Workflow

Over-requesting wastes cluster capacity; under-requesting causes throttling or OOM. A systematic right-sizing workflow:

Right-sizing workflow: Step 1: Observe current usage └─ kubectl top pods --containers -n <ns> OR Prometheus: container_cpu_usage_seconds_total, container_memory_working_set_bytes Step 2: Compare to requests (over/under-provisioned?) └─ VPA Off mode creates recommendations in VPA.status OR Goldilocks dashboard shows current vs recommended Step 3: Apply recommendations └─ Option A: Update Deployment manifest manually with VPA target values Option B: VPA Initial/Recreate/Auto mode applies automatically Option C: Use Goldilocks copy-paste YAML snippet Step 4: Validate └─ Monitor throttling ratio (should be < 5% for latency-sensitive) Monitor OOM kill rate (should be 0) Monitor HPA behavior (requests affect utilization %) Step 5: Iterate └─ Re-run after traffic pattern changes (seasonal, new features)

# Current pod resource usage vs requests
kubectl top pods --containers -n production

# Pods with no resource requests (risk: unknown scheduling behavior)
kubectl get pods -A -o json | jq '
  .items[] |
  select(.spec.containers[].resources.requests == null) |
  {ns: .metadata.namespace, name: .metadata.name}'

# Namespace-level resource consumption summary
kubectl describe resourcequota -n production | grep -E "requests|limits"

# Find pods where actual CPU < 20% of requested (over-provisioned)
# (requires Prometheus)
# promql:
# container_cpu_usage_seconds_total / container_spec_cpu_shares < 0.2

FinOps: Cost Attribution

Resource requests are the primary driver of infrastructure cost in Kubernetes — node sizing, autoscaling, and reserved instance purchasing all flow from request profiles. Proper cost attribution requires namespace/team labeling.

# Namespace with team labels for chargeback
apiVersion: v1
kind: Namespace
metadata:
  name: checkout-service
  labels:
    team: checkout
    cost-center: "CC-1042"
    environment: production

# Aggregate requested resources by namespace (rough cost proxy)
kubectl get pods -A -o json | jq -r '
  .items[] |
  "\(.metadata.namespace) \(.spec.containers[].resources.requests.cpu // "0") \(.spec.containers[].resources.requests.memory // "0")"' | \
  sort | uniq -c

# Tools for cost attribution:
# - Kubecost: per-namespace/pod/deployment cost breakdown
# - OpenCost (CNCF): open-source Kubecost alternative
# - Cloud provider cost allocation tags

Full Resource Spec Reference

containers:
  - name: app
    image: myapp:v2
    resources:
      requests:
        cpu: "500m"              # 0.5 vCPU shares
        memory: "512Mi"          # 512 mebibytes
        ephemeral-storage: "1Gi" # Ephemeral storage reservation
        hugepages-2Mi: "128Mi"   # Huge pages (if supported)
      limits:
        cpu: "2"                 # 2 vCPU hard cap (CFS quota)
        memory: "1Gi"            # 1 GiB hard cap (OOM kill)
        ephemeral-storage: "5Gi" # Ephemeral storage cap
        nvidia.com/gpu: "1"      # 1 GPU (extended resource)
        hugepages-2Mi: "128Mi"   # Huge pages (req must == limit)

Metrics

Metric	Labels	Use
`container_cpu_cfs_throttled_periods_total`	`container`, `pod`, `namespace`	CFS throttling periods — high ratio = CPU limit too tight
`container_memory_working_set_bytes`	`container`, `pod`, `namespace`	Actual memory in use (vs limit)
`kube_pod_container_resource_requests`	`resource`, `container`, `namespace`	Configured requests — denominator for utilization %
`kube_resourcequota`	`resource`, `type` (hard/used)	Quota utilization — alert when near limit
`kube_pod_container_status_last_terminated_reason`	`reason`=OOMKilled	OOM kill rate — should be 0 in steady state

Alerting Rules

groups:
  - name: resource-management
    rules:
      # CPU throttling exceeding 25%
      - alert: ContainerCPUThrottling
        expr: |
          rate(container_cpu_cfs_throttled_periods_total{container!=""}[5m])
          / rate(container_cpu_cfs_periods_total{container!=""}[5m]) > 0.25
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "{{ $labels.namespace }}/{{ $labels.pod }}/{{ $labels.container }} is CPU throttled >25%"
          description: "Increase CPU limit or reduce request/limit ratio"

      # Container OOM killed
      - alert: ContainerOOMKilled
        expr: |
          kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: "{{ $labels.namespace }}/{{ $labels.pod }}/{{ $labels.container }} OOM killed"

      # ResourceQuota near limit (>85% used)
      - alert: ResourceQuotaAlmostFull
        expr: |
          kube_resourcequota{type="used"} / kube_resourcequota{type="hard"} > 0.85
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "ResourceQuota {{ $labels.namespace }}/{{ $labels.resourcequota }} {{ $labels.resource }} at >85%"

      # Pods with no resource requests (scheduling risk)
      - alert: PodMissingResourceRequests
        expr: |
          kube_pod_container_resource_requests{resource="cpu"} == 0
          unless on(pod, namespace) kube_pod_status_phase{phase="Succeeded"}
        for: 1h
        labels:
          severity: info
        annotations:
          summary: "{{ $labels.namespace }}/{{ $labels.pod }} has no CPU request"

Runbooks

Container Stuck in CrashLoopBackOff with OOMKilled

# Confirm OOM kill
kubectl describe pod <pod> -n <namespace> | grep -A5 "Last State"

# Check current memory limit
kubectl get pod <pod> -n <namespace> -o jsonpath=\
'{.spec.containers[*].resources.limits.memory}'

# Check historical usage (Prometheus)
# container_memory_working_set_bytes{pod="<pod>", container="app"}

# Fix: increase memory limit
kubectl set resources deployment <name> -n <namespace> \
  --containers=app --limits=memory=2Gi

# Or patch directly
kubectl patch deployment <name> -n <namespace> --type=json -p='[
  {"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/memory","value":"2Gi"}
]'

Pod Pending Due to Insufficient Resources

# Check pod events
kubectl describe pod <pod> -n <namespace> | grep -A10 Events

# Check node capacity
kubectl describe nodes | grep -A5 "Allocatable:"

# Check what's consuming resources on each node
kubectl describe nodes | grep -A20 "Allocated resources:"

# Find the most resource-hungry pods
kubectl top pods -A --sort-by=memory | head -20

# Check if ResourceQuota is blocking
kubectl describe resourcequota -n <namespace> | grep -v "0/"

CPU Throttling Causing Latency Spikes

# Identify throttled containers
kubectl top pods --containers -n <namespace>
# Or check metrics directly in the container:
kubectl exec -n <ns> <pod> -c <container> -- \
  cat /sys/fs/cgroup/cpu/cpu.stat

# Quick fix: remove CPU limit (allow bursting)
kubectl patch deployment <name> -n <namespace> --type=json -p='[
  {"op":"remove","path":"/spec/template/spec/containers/0/resources/limits/cpu"}
]'

# Better fix: increase CPU limit to match observed usage
kubectl set resources deployment <name> -n <namespace> \
  --containers=app --limits=cpu=2

Namespace ResourceQuota Full

# See what's using the quota
kubectl describe resourcequota -n <namespace>

# Find over-provisioned pods in the namespace
kubectl top pods -n <namespace> --containers --sort-by=cpu

# Identify pods with high request but low usage
kubectl get pods -n <namespace> -o json | jq '
  .items[] | {
    name: .metadata.name,
    cpu_request: .spec.containers[].resources.requests.cpu,
    memory_request: .spec.containers[].resources.requests.memory
  }'

# Increase quota (requires cluster-admin)
kubectl patch resourcequota production-quota -n <namespace> --type=merge \
  -p '{"spec":{"hard":{"requests.cpu":"30","requests.memory":"60Gi"}}}'

LimitRange Rejecting Pod Creation

# Check LimitRange in namespace
kubectl describe limitrange -n <namespace>

# Error from pod creation:
# "pods maximum cpu usage per Container is 8, but limit is 16"
# Fix: reduce the container's limit or update LimitRange max

kubectl patch limitrange platform-limits -n <namespace> --type=merge \
  -p '{"spec":{"limits":[{"type":"Container","max":{"cpu":"16"}}]}}'

Best Practices

Always set CPU and memory requests — without requests, pods receive BestEffort QoS and are first to be evicted under node pressure. The scheduler also cannot make intelligent placement decisions.
Set memory limits conservatively (p99.9 of observed usage) — OOM kills are disruptive but better than unbounded memory consumption starving other pods. Give 20–30% headroom above steady-state RSS.
Monitor CPU throttling, not just CPU usage — a container at 30% CPU utilization can still be throttled 80% of the time if its burst exceeds the limit. Check container_cpu_cfs_throttled_periods_total alongside utilization.
Use LimitRange to enforce defaults in every namespace — prevents pods deployed without resource specs from becoming BestEffort or consuming unbounded resources.
Apply ResourceQuota to every tenant namespace — without quotas, one team's bug (infinite loop, memory leak) can exhaust cluster capacity for all tenants.
Right-size with VPA Off mode before enabling VPA updates — use Goldilocks to get recommendations passively for 1–2 weeks, then apply them in a controlled rollout. Don't jump straight to Auto mode.
For latency-sensitive workloads, prefer Guaranteed QoS — set requests = limits to avoid OOM scoring disadvantage under memory pressure and make CPU scheduling predictable (though throttling still applies).
Include ephemeral-storage limits for log-heavy workloads — without limits, a container writing excessive logs can fill the node's disk and trigger eviction of all pods on that node.