Resource Management
📋 Page Coverage Checklist
  • Requests vs limits: scheduling vs runtime enforcement distinction
  • CPU: CFS shares (requests) and CFS quota throttling (limits)
  • Memory: OOM scoring (requests), hard OOM kill (limits)
  • QoS classes: Guaranteed, Burstable, BestEffort — eviction order
  • LimitRange: Container/Pod/PVC scopes, all field types
  • ResourceQuota: namespace quotas, scoped quotas (BestEffort/PriorityClass)
  • Node allocatable: capacity vs allocatable, kube-reserved, system-reserved, eviction threshold
  • CPU throttling trap: CFS quota mechanics, detection via throttled_periods
  • Right-sizing workflow: kubectl top, VPA Off mode, Goldilocks
  • Extended resources: NVIDIA GPU, device plugins, fractional GPU
  • Ephemeral storage: requests/limits, container logs, eviction
  • Pod overhead: RuntimeClass overhead field for gVisor/Kata
  • FinOps: cost attribution with namespace labels, chargeback patterns
  • 5 metrics + 4 alerting rules + 5 runbooks + 8 best practices
  • Resource Management

    Requests, limits, QoS classes, LimitRanges, ResourceQuotas, and right-sizing

    v1/Pod v1/ResourceQuota v1/LimitRange Platform Engineer

    Resource management in Kubernetes operates at two distinct layers: the scheduler uses resource requests to decide where pods land, and the Linux kernel uses resource limits to constrain what running containers can consume. Understanding this separation — and the mechanisms behind each — is essential for building clusters that are both highly utilized and operationally stable.

    Requests vs Limits

    Resource model summary: spec.containers[].resources: requests: ← Used by SCHEDULER for placement cpu: 500m ← Reserve 0.5 CPU shares on the node memory: 512Mi ← Reserve 512MiB for OOM scoring limits: ← Used by KERNEL at runtime cpu: 2 ← CFS quota: max 2 CPU seconds per second memory: 1Gi ← Hard limit: OOM kill if exceeded Scheduler sees: sum(requests) ≤ node.allocatable Kernel enforces: container cannot exceed limits Overcommit: Node capacity: 8 CPU, 32Gi memory Total requests: 6 CPU, 24Gi → fits on node ✓ Total limits: 24 CPU, 96Gi → 3× overcommitted (allowed)
    PropertyRequestsLimits
    Used bykube-scheduler (pod placement)Linux kernel cgroups (runtime enforcement)
    CPU mechanismCFS cpu.shares (proportional)CFS cpu.cfs_quota_us (hard cap)
    Memory mechanismOOM score adjustmentcgroup memory.limit_in_bytes (hard kill)
    Effect if exceededPod won't schedule (Pending)CPU: throttled; Memory: OOM killed
    Required?No (but strongly recommended)No (but required for Guaranteed QoS)
    Node overcommitSum of requests ≤ allocatableSum of limits may far exceed capacity

    CPU: CFS Shares and Quota

    CPU Requests → CFS Shares

    CPU requests map to Linux CFS (Completely Fair Scheduler) cpu.shares. Shares are proportional — a pod with requests.cpu: 1000m gets twice as much CPU time as one with requests.cpu: 500m when the node is contended. When the node is idle, any container can use all available CPU regardless of its request.

    cpu.shares = request_millicores × 1024 / 1000
      → requests.cpu: 500m  → cpu.shares = 512
      → requests.cpu: 1     → cpu.shares = 1024
      → requests.cpu: 250m  → cpu.shares = 256

    CPU Limits → CFS Quota (Throttling Trap)

    CPU limits map to cpu.cfs_quota_us and cpu.cfs_period_us. Every 100ms (the default period), each container is allocated a quota of CPU time equal to its limit. If a container exhausts its quota before the period ends, it is throttled (suspended) until the next period — even if CPUs are otherwise idle.

    CFS quota mechanics (limit: 1 CPU = 100ms quota per 100ms period): Period 1 (0–100ms): Container uses 100ms of CPU → quota exhausted at 80ms Container THROTTLED for remaining 20ms (idle CPU wasted) Period 2 (100–200ms): Quota refills to 100ms Container resumes Result: p99 latency spikes every ~100ms even though the node has free CPU
    CPU throttling causes latency spikes, not errors
    A throttled container doesn't crash — it silently pauses. This manifests as p99/p999 latency spikes, timeout errors from downstream callers, and HPA confusion (CPU utilization appears low because throttled time doesn't count as "used"). Throttling is one of the most common and least-diagnosed performance issues in Kubernetes. Monitor container_cpu_cfs_throttled_periods_total and alert when throttling exceeds 25%.
    # Detect CPU throttling for a container
    kubectl exec -n <ns> <pod> -- cat /sys/fs/cgroup/cpu/cpu.stat
    # throttled_time: nanoseconds spent throttled
    # nr_throttled: number of periods where container was throttled
    
    # Prometheus query for throttling ratio
    # (throttled periods / total periods) per container
    rate(container_cpu_cfs_throttled_periods_total[5m])
    / rate(container_cpu_cfs_periods_total[5m])

    No CPU Limit Pattern

    Some teams deliberately omit CPU limits to avoid throttling, relying on requests alone for scheduling. This is viable when:

    • Nodes run homogeneous workloads with predictable contention
    • ResourceQuota enforces limits at namespace level (limits.cpu)
    • LimitRange provides defaults so pods without limits still have them
    No CPU limit = unlimited burst potential
    Without CPU limits, one misbehaving container can consume all idle CPU on a node, starving other pods' request allocations during CFS contention. If you run without per-pod CPU limits, use ResourceQuota at the namespace level to cap total CPU consumption.

    Memory: OOM Scoring and Hard Limits

    Memory Requests → OOM Score

    Memory requests affect the OOM score adjustment (oom_score_adj) of the container's processes. A lower score means the OOM killer is less likely to kill that process when the node runs out of memory.

    OOM score adjustment range: -1000 (never kill) to +1000 (kill first)
    
      Guaranteed pods (req = limit):   oom_score_adj = -998  (protected)
      Burstable pods:                  oom_score_adj = 2–999  (proportional to memory)
      BestEffort pods (no requests):   oom_score_adj = 1000  (kill first)
    
      Formula (Burstable):
      oom_score_adj = 1000 - (1000 × memory_request / node_allocatable_memory)

    Memory Limits → Hard OOM Kill

    When a container's RSS exceeds limits.memory, the kernel's OOM killer sends SIGKILL (exit code 137) to the container. Unlike CPU throttling, this is immediate and unrecoverable — the container is killed and restarted by kubelet (if restartPolicy allows).

    # Check if container was OOM killed
    kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[*].lastState.terminated.reason}'
    # → OOMKilled
    
    kubectl describe pod <pod> | grep -A3 "Last State"
    # Last State: Terminated
    #   Reason: OOMKilled
    #   Exit Code: 137
    Memory limit = RSS + page cache working set
    The cgroup memory accounting includes both anonymous RSS (heap, stack) and file-backed page cache. A container reading large files can be OOM killed even if its heap is small if the total memory.usage_in_bytes hits the limit. Set limits with headroom for file I/O working sets, not just heap.

    QoS Classes

    Kubernetes assigns one of three QoS classes to every pod based on its resource configuration. This class determines eviction order when nodes face memory pressure.

    QoS ClassCriteriaOOM priorityEviction order
    Guaranteed Every container has both cpu and memory requests AND limits, and requests == limits oom_score_adj = -998 (protected) Last to be evicted
    Burstable At least one container has a request or limit set, but not all Guaranteed criteria met oom_score_adj 2–999 (proportional) Middle — evicted before Guaranteed
    BestEffort No containers have any requests or limits set oom_score_adj = 1000 (first killed) First to be evicted
    # Guaranteed QoS — requests must equal limits for ALL containers
    containers:
      - name: app
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 500m       # ← must equal request
            memory: 512Mi   # ← must equal request
    
    # Burstable QoS — requests < limits (or only some containers have them)
    containers:
      - name: app
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 2
            memory: 1Gi     # Higher limits allow bursting
    
    # BestEffort QoS — no resources at all (avoid in production)
    containers:
      - name: app
        resources: {}       # No requests or limits
    # Check QoS class of a pod
    kubectl get pod <pod> -o jsonpath='{.status.qosClass}'
    # → Guaranteed | Burstable | BestEffort
    Guaranteed QoS does not mean no CPU throttling
    Even a Guaranteed pod (cpu request = limit) will be throttled by CFS quota when it exceeds its limit. Guaranteed QoS only controls OOM kill order and memory eviction priority, not CPU scheduling behavior.

    Node Allocatable

    Not all of a node's capacity is available for pods. The scheduler schedules pods against Allocatable, which reserves capacity for the OS, kubelet, and eviction headroom.

    Node capacity breakdown: Node capacity (e.g., 16 CPU, 64Gi RAM) │ ├─ kube-reserved (reserved for kubelet, container runtime) │ e.g., 500m CPU, 1Gi memory │ ├─ system-reserved (reserved for OS daemons, sshd, etc.) │ e.g., 500m CPU, 2Gi memory │ ├─ eviction-threshold (kubelet memory buffer) │ e.g., 100Mi memory (soft), 200Mi (hard) │ └─ Allocatable = capacity - kube-reserved - system-reserved - eviction-threshold = 16 CPU - 1 CPU = 15 CPU allocatable = 64Gi - 3.3Gi ≈ 60.7Gi allocatable kubectl describe node <node> | grep -A5 "Allocatable:"
    # Check node capacity and allocatable
    kubectl describe node <node> | grep -A10 "Capacity:\|Allocatable:"
    
    # Check current resource consumption vs allocatable
    kubectl describe node <node> | grep -A20 "Allocated resources:"
    
    # Get allocatable across all nodes (JSON)
    kubectl get nodes -o json | jq '.items[] | {
      name: .metadata.name,
      allocatable: .status.allocatable
    }'
    # kubelet configuration for reservations (in KubeletConfiguration)
    apiVersion: kubelet.config.k8s.io/v1beta1
    kind: KubeletConfiguration
    kubeReserved:
      cpu: "500m"
      memory: "1Gi"
      ephemeral-storage: "2Gi"
    systemReserved:
      cpu: "500m"
      memory: "2Gi"
    evictionHard:
      memory.available: "200Mi"
      nodefs.available: "10%"
      nodefs.inodesFree: "5%"
    evictionSoft:
      memory.available: "500Mi"
    evictionSoftGracePeriod:
      memory.available: "90s"

    LimitRange

    LimitRange sets default, minimum, and maximum resource values for containers, pods, and PVCs within a namespace. It applies at admission time — pods created without explicit resource values receive the defaults.

    apiVersion: v1
    kind: LimitRange
    metadata:
      name: platform-limits
      namespace: production
    spec:
      limits:
        # Container-level defaults and bounds
        - type: Container
          default:              # Applied as limit if none specified
            cpu: "1"
            memory: 512Mi
          defaultRequest:       # Applied as request if none specified
            cpu: 100m
            memory: 128Mi
          min:                  # Reject pods with requests below this
            cpu: 50m
            memory: 64Mi
          max:                  # Reject pods with requests above this
            cpu: "8"
            memory: 16Gi
          maxLimitRequestRatio: # Reject if limit/request exceeds this ratio
            cpu: "10"           # Prevent limit 10× higher than request
            memory: "4"
    
        # Pod-level (sum of all containers)
        - type: Pod
          max:
            cpu: "16"
            memory: 32Gi
    
        # PVC storage bounds
        - type: PersistentVolumeClaim
          min:
            storage: 1Gi
          max:
            storage: 100Gi
    LimitRange defaults apply to containers without explicit resources
    If a container specifies a limit but no request, LimitRange sets the request equal to the limit. If neither is specified, both default and defaultRequest apply. LimitRange does not retroactively change existing pods — it only affects pods created after the LimitRange exists.
    # View effective LimitRange in a namespace
    kubectl describe limitrange -n production
    
    # Test what resources a new pod would get
    kubectl run test --image=nginx --dry-run=server -o yaml -n production | \
      grep -A10 resources

    ResourceQuota

    ResourceQuota enforces aggregate resource consumption limits within a namespace. Unlike LimitRange (per-object bounds), ResourceQuota tracks cumulative usage across all objects.

    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: production-quota
      namespace: production
    spec:
      hard:
        # Compute resources
        requests.cpu: "20"
        requests.memory: 40Gi
        limits.cpu: "100"
        limits.memory: 200Gi
    
        # Object counts
        pods: "200"
        services: "50"
        secrets: "200"
        configmaps: "100"
        persistentvolumeclaims: "50"
        services.loadbalancers: "5"
        services.nodeports: "0"     # Prohibit NodePort services
    
        # Storage
        requests.storage: 2Ti
        requests.ephemeral-storage: 50Gi
    
        # Per-StorageClass storage quota
        gold-ssd.storageclass.storage.k8s.io/requests.storage: 500Gi
        standard.storageclass.storage.k8s.io/requests.storage: 1Ti

    Scoped Quotas

    # Quota applying only to BestEffort pods (no resources set)
    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: besteffort-quota
      namespace: production
    spec:
      hard:
        pods: "10"
      scopeSelector:
        matchExpressions:
          - scopeName: BestEffort
    
    ---
    # Quota for high-priority batch jobs
    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: high-priority-quota
      namespace: platform
    spec:
      hard:
        requests.cpu: "50"
        requests.memory: 100Gi
        pods: "50"
      scopeSelector:
        matchExpressions:
          - scopeName: PriorityClass
            operator: In
            values: ["high-priority"]
    ScopeMatches pods that
    BestEffortHave no requests or limits (BestEffort QoS)
    NotBestEffortHave at least one request or limit (Guaranteed or Burstable)
    TerminatingHave activeDeadlineSeconds set (Jobs)
    NotTerminatingDo not have activeDeadlineSeconds (long-running workloads)
    PriorityClassHave a specific PriorityClass name
    # Check quota usage in a namespace
    kubectl describe resourcequota -n production
    
    # Get quota as JSON for automation
    kubectl get resourcequota production-quota -n production \
      -o jsonpath='{range .status.hard}{@.key}: hard={@.value}, used={.status.used[?(@.key)]}{"\n"}{end}'
    
    # Watch quota consumption
    watch kubectl get resourcequota -n production

    Extended Resources

    Extended resources represent non-standard hardware (GPUs, FPGAs, InfiniBand, SR-IOV NICs). They are advertised by node device plugins via the kubelet API and consumed in pod specs like CPU/memory.

    NVIDIA GPU

    # Pod requesting 1 NVIDIA GPU
    spec:
      containers:
        - name: ml-trainer
          image: nvcr.io/nvidia/pytorch:23.10-py3
          resources:
            requests:
              nvidia.com/gpu: 1
            limits:
              nvidia.com/gpu: 1   # GPU resources: requests must equal limits
    GPU resources must have requests == limits
    Extended resources like GPUs are integer resources — they cannot be fractionally requested, and requests must always equal limits. Fractional GPU sharing (e.g., NVIDIA MIG, time-slicing) requires specific device plugin configurations that expose virtual GPU resources (e.g., nvidia.com/mig-1g.5gb).
    # Check available GPU resources on nodes
    kubectl get nodes -o json | jq '.items[] | {
      name: .metadata.name,
      gpus: .status.allocatable["nvidia.com/gpu"]
    }'
    
    # Check GPU allocation per pod
    kubectl get pods -A -o json | jq '
      .items[] | select(.spec.containers[].resources.requests["nvidia.com/gpu"] != null) | {
        name: .metadata.name,
        namespace: .metadata.namespace,
        gpus: .spec.containers[].resources.requests["nvidia.com/gpu"]
      }'

    Custom Extended Resources

    # Manually advertise an extended resource on a node (for testing)
    kubectl proxy &
    curl -X PATCH \
      "http://localhost:8001/api/v1/nodes/<node>/status" \
      -H "Content-Type: application/json-patch+json" \
      -d '[{"op":"add","path":"/status/capacity/example.com~1fpga","value":"2"}]'

    Ephemeral Storage

    Ephemeral storage includes emptyDir volumes, container logs, and container image layers written at runtime. Like CPU/memory, it can have requests and limits.

    resources:
      requests:
        ephemeral-storage: 1Gi    # Scheduler reserves this on the node
      limits:
        ephemeral-storage: 5Gi    # Pod evicted if total ephemeral usage exceeds this

    Ephemeral storage is measured as the sum of:

    • Writable container layer (overlay diff from image)
    • Container logs written to /var/log/pods/
    • emptyDir volumes (unless backed by tmpfs/memory)
    Log-heavy containers can exhaust ephemeral storage
    A container writing 100MB/s of logs with limits.ephemeral-storage: 2Gi will be evicted within 20 seconds of its log rotation window if logs aren't shipped externally. Set explicit log rotation in your container runtime config (--log-opt max-size=100m --log-opt max-file=5 for docker) and forward logs to an external system before relying on ephemeral storage limits.

    Pod Overhead (RuntimeClass)

    Sandbox runtimes (gVisor, Kata Containers, Firecracker) introduce fixed overhead beyond what the containers request. The overhead field on a RuntimeClass declares this overhead, and the scheduler adds it to pod resource consumption.

    apiVersion: node.k8s.io/v1
    kind: RuntimeClass
    metadata:
      name: kata-containers
    handler: kata
    overhead:
      podFixed:
        cpu: 250m         # Fixed overhead per pod for the VM/sandbox runtime
        memory: 128Mi     # Included in scheduler placement decisions
    scheduling:
      nodeClassification:
        tolerations:
          - key: kata-containers
            operator: Exists
            effect: NoSchedule
    # Pod using this RuntimeClass
    spec:
      runtimeClassName: kata-containers
      containers:
        - name: app
          resources:
            requests:
              cpu: 500m      # Scheduler places on node with >= 750m available
              memory: 512Mi  # (500m + 250m overhead = 750m total charged)
            limits:
              cpu: 2
              memory: 1Gi

    Right-Sizing Workflow

    Over-requesting wastes cluster capacity; under-requesting causes throttling or OOM. A systematic right-sizing workflow:

    Right-sizing workflow: Step 1: Observe current usage └─ kubectl top pods --containers -n <ns> OR Prometheus: container_cpu_usage_seconds_total, container_memory_working_set_bytes Step 2: Compare to requests (over/under-provisioned?) └─ VPA Off mode creates recommendations in VPA.status OR Goldilocks dashboard shows current vs recommended Step 3: Apply recommendations └─ Option A: Update Deployment manifest manually with VPA target values Option B: VPA Initial/Recreate/Auto mode applies automatically Option C: Use Goldilocks copy-paste YAML snippet Step 4: Validate └─ Monitor throttling ratio (should be < 5% for latency-sensitive) Monitor OOM kill rate (should be 0) Monitor HPA behavior (requests affect utilization %) Step 5: Iterate └─ Re-run after traffic pattern changes (seasonal, new features)
    # Current pod resource usage vs requests
    kubectl top pods --containers -n production
    
    # Pods with no resource requests (risk: unknown scheduling behavior)
    kubectl get pods -A -o json | jq '
      .items[] |
      select(.spec.containers[].resources.requests == null) |
      {ns: .metadata.namespace, name: .metadata.name}'
    
    # Namespace-level resource consumption summary
    kubectl describe resourcequota -n production | grep -E "requests|limits"
    
    # Find pods where actual CPU < 20% of requested (over-provisioned)
    # (requires Prometheus)
    # promql:
    # container_cpu_usage_seconds_total / container_spec_cpu_shares < 0.2

    FinOps: Cost Attribution

    Resource requests are the primary driver of infrastructure cost in Kubernetes — node sizing, autoscaling, and reserved instance purchasing all flow from request profiles. Proper cost attribution requires namespace/team labeling.

    # Namespace with team labels for chargeback
    apiVersion: v1
    kind: Namespace
    metadata:
      name: checkout-service
      labels:
        team: checkout
        cost-center: "CC-1042"
        environment: production
    # Aggregate requested resources by namespace (rough cost proxy)
    kubectl get pods -A -o json | jq -r '
      .items[] |
      "\(.metadata.namespace) \(.spec.containers[].resources.requests.cpu // "0") \(.spec.containers[].resources.requests.memory // "0")"' | \
      sort | uniq -c
    
    # Tools for cost attribution:
    # - Kubecost: per-namespace/pod/deployment cost breakdown
    # - OpenCost (CNCF): open-source Kubecost alternative
    # - Cloud provider cost allocation tags

    Full Resource Spec Reference

    containers:
      - name: app
        image: myapp:v2
        resources:
          requests:
            cpu: "500m"              # 0.5 vCPU shares
            memory: "512Mi"          # 512 mebibytes
            ephemeral-storage: "1Gi" # Ephemeral storage reservation
            hugepages-2Mi: "128Mi"   # Huge pages (if supported)
          limits:
            cpu: "2"                 # 2 vCPU hard cap (CFS quota)
            memory: "1Gi"            # 1 GiB hard cap (OOM kill)
            ephemeral-storage: "5Gi" # Ephemeral storage cap
            nvidia.com/gpu: "1"      # 1 GPU (extended resource)
            hugepages-2Mi: "128Mi"   # Huge pages (req must == limit)

    Metrics

    MetricLabelsUse
    container_cpu_cfs_throttled_periods_totalcontainer, pod, namespaceCFS throttling periods — high ratio = CPU limit too tight
    container_memory_working_set_bytescontainer, pod, namespaceActual memory in use (vs limit)
    kube_pod_container_resource_requestsresource, container, namespaceConfigured requests — denominator for utilization %
    kube_resourcequotaresource, type (hard/used)Quota utilization — alert when near limit
    kube_pod_container_status_last_terminated_reasonreason=OOMKilledOOM kill rate — should be 0 in steady state

    Alerting Rules

    groups:
      - name: resource-management
        rules:
          # CPU throttling exceeding 25%
          - alert: ContainerCPUThrottling
            expr: |
              rate(container_cpu_cfs_throttled_periods_total{container!=""}[5m])
              / rate(container_cpu_cfs_periods_total{container!=""}[5m]) > 0.25
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "{{ $labels.namespace }}/{{ $labels.pod }}/{{ $labels.container }} is CPU throttled >25%"
              description: "Increase CPU limit or reduce request/limit ratio"
    
          # Container OOM killed
          - alert: ContainerOOMKilled
            expr: |
              kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1
            for: 0m
            labels:
              severity: warning
            annotations:
              summary: "{{ $labels.namespace }}/{{ $labels.pod }}/{{ $labels.container }} OOM killed"
    
          # ResourceQuota near limit (>85% used)
          - alert: ResourceQuotaAlmostFull
            expr: |
              kube_resourcequota{type="used"} / kube_resourcequota{type="hard"} > 0.85
            for: 15m
            labels:
              severity: warning
            annotations:
              summary: "ResourceQuota {{ $labels.namespace }}/{{ $labels.resourcequota }} {{ $labels.resource }} at >85%"
    
          # Pods with no resource requests (scheduling risk)
          - alert: PodMissingResourceRequests
            expr: |
              kube_pod_container_resource_requests{resource="cpu"} == 0
              unless on(pod, namespace) kube_pod_status_phase{phase="Succeeded"}
            for: 1h
            labels:
              severity: info
            annotations:
              summary: "{{ $labels.namespace }}/{{ $labels.pod }} has no CPU request"

    Runbooks

    Container Stuck in CrashLoopBackOff with OOMKilled

    # Confirm OOM kill
    kubectl describe pod <pod> -n <namespace> | grep -A5 "Last State"
    
    # Check current memory limit
    kubectl get pod <pod> -n <namespace> -o jsonpath=\
    '{.spec.containers[*].resources.limits.memory}'
    
    # Check historical usage (Prometheus)
    # container_memory_working_set_bytes{pod="<pod>", container="app"}
    
    # Fix: increase memory limit
    kubectl set resources deployment <name> -n <namespace> \
      --containers=app --limits=memory=2Gi
    
    # Or patch directly
    kubectl patch deployment <name> -n <namespace> --type=json -p='[
      {"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/memory","value":"2Gi"}
    ]'

    Pod Pending Due to Insufficient Resources

    # Check pod events
    kubectl describe pod <pod> -n <namespace> | grep -A10 Events
    
    # Check node capacity
    kubectl describe nodes | grep -A5 "Allocatable:"
    
    # Check what's consuming resources on each node
    kubectl describe nodes | grep -A20 "Allocated resources:"
    
    # Find the most resource-hungry pods
    kubectl top pods -A --sort-by=memory | head -20
    
    # Check if ResourceQuota is blocking
    kubectl describe resourcequota -n <namespace> | grep -v "0/"

    CPU Throttling Causing Latency Spikes

    # Identify throttled containers
    kubectl top pods --containers -n <namespace>
    # Or check metrics directly in the container:
    kubectl exec -n <ns> <pod> -c <container> -- \
      cat /sys/fs/cgroup/cpu/cpu.stat
    
    # Quick fix: remove CPU limit (allow bursting)
    kubectl patch deployment <name> -n <namespace> --type=json -p='[
      {"op":"remove","path":"/spec/template/spec/containers/0/resources/limits/cpu"}
    ]'
    
    # Better fix: increase CPU limit to match observed usage
    kubectl set resources deployment <name> -n <namespace> \
      --containers=app --limits=cpu=2

    Namespace ResourceQuota Full

    # See what's using the quota
    kubectl describe resourcequota -n <namespace>
    
    # Find over-provisioned pods in the namespace
    kubectl top pods -n <namespace> --containers --sort-by=cpu
    
    # Identify pods with high request but low usage
    kubectl get pods -n <namespace> -o json | jq '
      .items[] | {
        name: .metadata.name,
        cpu_request: .spec.containers[].resources.requests.cpu,
        memory_request: .spec.containers[].resources.requests.memory
      }'
    
    # Increase quota (requires cluster-admin)
    kubectl patch resourcequota production-quota -n <namespace> --type=merge \
      -p '{"spec":{"hard":{"requests.cpu":"30","requests.memory":"60Gi"}}}'

    LimitRange Rejecting Pod Creation

    # Check LimitRange in namespace
    kubectl describe limitrange -n <namespace>
    
    # Error from pod creation:
    # "pods maximum cpu usage per Container is 8, but limit is 16"
    # Fix: reduce the container's limit or update LimitRange max
    
    kubectl patch limitrange platform-limits -n <namespace> --type=merge \
      -p '{"spec":{"limits":[{"type":"Container","max":{"cpu":"16"}}]}}'

    Best Practices

    1. Always set CPU and memory requests — without requests, pods receive BestEffort QoS and are first to be evicted under node pressure. The scheduler also cannot make intelligent placement decisions.
    2. Set memory limits conservatively (p99.9 of observed usage) — OOM kills are disruptive but better than unbounded memory consumption starving other pods. Give 20–30% headroom above steady-state RSS.
    3. Monitor CPU throttling, not just CPU usage — a container at 30% CPU utilization can still be throttled 80% of the time if its burst exceeds the limit. Check container_cpu_cfs_throttled_periods_total alongside utilization.
    4. Use LimitRange to enforce defaults in every namespace — prevents pods deployed without resource specs from becoming BestEffort or consuming unbounded resources.
    5. Apply ResourceQuota to every tenant namespace — without quotas, one team's bug (infinite loop, memory leak) can exhaust cluster capacity for all tenants.
    6. Right-size with VPA Off mode before enabling VPA updates — use Goldilocks to get recommendations passively for 1–2 weeks, then apply them in a controlled rollout. Don't jump straight to Auto mode.
    7. For latency-sensitive workloads, prefer Guaranteed QoS — set requests = limits to avoid OOM scoring disadvantage under memory pressure and make CPU scheduling predictable (though throttling still applies).
    8. Include ephemeral-storage limits for log-heavy workloads — without limits, a container writing excessive logs can fill the node's disk and trigger eviction of all pods on that node.