Continuous Profiling

1. What Is Continuous Profiling?

Profiling answers the question where is time and memory being spent? — a question that metrics, logs, and traces cannot fully answer. Metrics tell you a service is slow. Traces narrow it to a function call. Profiling shows which exact line of code consumed 37% of CPU.

Continuous profiling means collecting profiles in production constantly, not just during a debug session. Profiles are stored indexed by service/version/time, enabling comparison across deploys and correlation with incidents.

Metrics tell you what

CPU 90%, p99 latency 2s, memory 4 GiB. Aggregate, low-cardinality, no code-level detail.

prometheus

Traces tell you where

Request took 2s: 1.8s in db.Query(). Span-level, single-request granularity.

tempo / jaeger

Profiles tell you why

db.Query() spent 1.4s in JSON serialisation of a 50-field struct. Function-level hot path.

pyroscope / pprof

Continuous vs On-Demand Profiling

DimensionOn-DemandContinuous
TriggerManual (incident or debug session)Always running — timed samples
OverheadHigh during collection (can be >10%)Low constant overhead (0.5–2%)
ReproducibilityDifficult — must reproduce the issueProfiles captured during the original event
Historical comparisonNoneCompare any two timestamps or commits
Kubernetes fitPoor — pods are ephemeralExcellent — samples labelled by pod/node

The Observability Stack with Profiling

Alert fires (Alertmanager)
  │
  ▼
Metric spike  ──────── Grafana / Prometheus
  │  (exemplar)
  ▼
Trace (Tempo / Jaeger) — which function was slow?
  │  (profile_id link)
  ▼
Profile (Pyroscope) — which line consumed CPU/memory?
  │
  ▼
Code fix + deploy — compare profiles before/after

2. Profile Types Reference

Profile TypeWhat It MeasuresLanguagesTypical Use Case
cpu CPU time consumed by each function (sampled at 100 Hz) Go, Java, Python, Rust, .NET, Ruby High CPU usage, latency regression
heap Live heap allocations (bytes allocated / objects in use) Go, Java (heap dump), .NET Memory leak, OOM root cause
allocs All memory allocations including those already freed Go GC pressure, excessive short-lived objects
goroutine Stack traces of all current goroutines Go Goroutine leak, deadlock investigation
mutex Contended mutex lock wait time Go Lock contention under concurrency
block Blocking operations (channel receives, syscalls) Go Channel deadlocks, slow I/O paths
wall Wall-clock time (CPU + I/O wait) Go, Java (async-profiler), eBPF Distinguishing I/O-bound vs CPU-bound work
threadcreate Stack traces that led to OS thread creation Go Excessive cgo or syscall thread spawning
eBPF CPU CPU samples from OS kernel (zero instrumentation) Any (Go/C/C++/Rust/Java) Cross-language profiling, kernel overhead
Sampling Profilers vs Instrumented Profilers

Sampling profilers (pprof, async-profiler, eBPF) interrupt the process at a fixed rate (e.g., 100 Hz) and record the call stack. Overhead is proportional to sample rate. Instrumented profilers (JVM TI, .NET ETW) inject code at every method entry/exit — exact counts but 5–20× overhead. Always prefer sampling for production continuous profiling.

3. Go: pprof Endpoints & Analysis

Enabling the pprof HTTP Server

// main.go — register net/http/pprof handlers on a separate port
import (
    "net/http"
    _ "net/http/pprof" // registers /debug/pprof/* handlers as side effect
)

func main() {
    // Application server on :8080
    go func() {
        http.ListenAndServe(":6060", nil) // pprof on :6060
    }()
    // ...
}
Never expose pprof on the public-facing port

pprof endpoints reveal heap contents, goroutine stacks (may contain request data), and allow CPU load spikes. Bind to a separate port (e.g., :6060) and restrict via NetworkPolicy to only the profiling scraper DaemonSet or admin namespace.

pprof HTTP Endpoints

EndpointProfileDuration
/debug/pprof/profile?seconds=30CPU (30s sample)Required: seconds param
/debug/pprof/heapHeap (live allocations)Instant snapshot
/debug/pprof/allocsAll allocations since startInstant snapshot
/debug/pprof/goroutineAll goroutine stacksInstant snapshot
/debug/pprof/mutexMutex contentionInstant (requires mutex fraction)
/debug/pprof/blockBlock/channel waitsInstant (requires block rate)
/debug/pprof/threadcreateThread creation stack tracesInstant snapshot
/debug/pprof/trace?seconds=5Go runtime execution traceRequired: seconds param

Enabling Mutex and Block Profiling

import "runtime"

func init() {
    // Report 1/5 of all mutex contention events (fraction = 5)
    runtime.SetMutexProfileFraction(5)
    // Report 1/1 of all blocking events (rate = 1 ns threshold)
    runtime.SetBlockProfileRate(1)
}

Collecting and Analysing Profiles with go tool pprof

# Download CPU profile (30s sample) from a pod
kubectl port-forward pod/myapp-7d9f8b6c4-xk2p9 6060:6060 &

# CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# Heap profile
go tool pprof http://localhost:6060/debug/pprof/heap

# Interactive pprof commands:
(pprof) top10              # top 10 functions by flat/cumulative time
(pprof) web                # open flame graph in browser (requires graphviz)
(pprof) list FunctionName  # annotated source listing
(pprof) traces             # sample-level call stacks
(pprof) tree               # tree view of cumulative costs
(pprof) svg                # output SVG call graph

# Save for later comparison:
go tool pprof -output before.pb.gz http://localhost:6060/debug/pprof/heap
# After deploy:
go tool pprof -output after.pb.gz http://localhost:6060/debug/pprof/heap
# Diff:
go tool pprof -diff_base before.pb.gz after.pb.gz

Continuous pprof Collection via Pyroscope SDK (Go)

import (
    "github.com/grafana/pyroscope-go"
)

func initProfiling() {
    pyroscope.Start(pyroscope.Config{
        ApplicationName: "order-service",
        ServerAddress:   "http://pyroscope.observability.svc:4040",
        Logger:          pyroscope.StandardLogger,
        Tags: map[string]string{
            "pod":       os.Getenv("POD_NAME"),
            "namespace": os.Getenv("POD_NAMESPACE"),
            "version":   os.Getenv("APP_VERSION"),
        },
        ProfileTypes: []pyroscope.ProfileType{
            pyroscope.ProfileCPU,
            pyroscope.ProfileAllocObjects,
            pyroscope.ProfileAllocSpace,
            pyroscope.ProfileInuseObjects,
            pyroscope.ProfileInuseSpace,
            pyroscope.ProfileGoroutines,
            pyroscope.ProfileMutexCount,
            pyroscope.ProfileMutexDuration,
            pyroscope.ProfileBlockCount,
            pyroscope.ProfileBlockDuration,
        },
    })
}

Labels for Per-Request Profiling

Pyroscope labels (and pprof runtime labels) allow attributing CPU time to a specific tenant, endpoint, or user — without separate profiling runs.

import (
    "github.com/grafana/pyroscope-go"
    "runtime/pprof"
)

func handleRequest(w http.ResponseWriter, r *http.Request) {
    tenantID := r.Header.Get("X-Tenant-ID")
    endpoint := r.URL.Path

    // Dynamic labels — profile data is segmented by these in Pyroscope
    pyroscope.TagWrapper(r.Context(), pyroscope.Labels(
        "tenant_id", tenantID,
        "endpoint", endpoint,
    ), func(ctx context.Context) {
        // All CPU time while this closure executes is tagged
        processRequest(ctx, w, r)
    })
}

4. Language-Specific Profilers

Java: async-profiler & JFR

async-profiler is an async-safe sampling profiler for JVM that uses AsyncGetCallTrace API (avoids safepoint bias present in older profilers like YourKit in sampling mode) and perf_events for CPU samples.

# Attach async-profiler to a running JVM (via agent)
# In Dockerfile or K8s initContainer, download async-profiler
RUN wget https://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-linux-x64.tar.gz

# Kubernetes: mount via initContainer, attach via profiler container
# Or use Pyroscope Java agent for continuous collection:
java -javaagent:/opt/pyroscope.jar \
     -Dpyroscope.server.address=http://pyroscope.observability.svc:4040 \
     -Dpyroscope.application.name=payment-service \
     -Dpyroscope.format=jfr \
     -Dpyroscope.profiler.event=cpu \
     -jar app.jar
# application.properties for Pyroscope Spring Boot integration
pyroscope.server.address=http://pyroscope.observability.svc:4040
pyroscope.application.name=${spring.application.name}
pyroscope.format=jfr
pyroscope.profiling.interval=10ms
pyroscope.labels.pod=${POD_NAME}
pyroscope.labels.namespace=${POD_NAMESPACE}

JVM Flight Recorder (JFR) — Built-in since JDK 11

# Start JFR recording in a running pod
kubectl exec -it pod/payment-svc-abc123 -- \
  jcmd 1 JFR.start duration=60s filename=/tmp/recording.jfr settings=profile

# Copy and analyse with JDK Mission Control
kubectl cp payment-svc-abc123:/tmp/recording.jfr ./recording.jfr
jmc -open recording.jfr

Python: py-spy

py-spy is a sampling profiler for Python that reads the CPython process memory without requiring code changes or a Python interpreter restart.

# Install py-spy in your Python container (or as ephemeral container)
pip install py-spy

# Attach to a running Python process (PID 1 in container)
py-spy record --output profile.svg --duration 30 --pid 1

# For continuous collection with Pyroscope:
pip install pyroscope-io
# Python app — Pyroscope SDK
import pyroscope

pyroscope.configure(
    application_name="ml-inference",
    server_address="http://pyroscope.observability.svc:4040",
    tags={
        "pod": os.environ.get("POD_NAME", ""),
        "version": os.environ.get("APP_VERSION", ""),
    },
)

# Tag per-request
with pyroscope.tag_wrapper({"endpoint": "/predict", "model": model_name}):
    result = run_inference(payload)

Node.js: Clinic.js & 0x

# Install profiling tools
npm install -g clinic 0x

# Production-safe: Pyroscope Node.js SDK
npm install @pyroscope/nodejs
// index.js — Pyroscope SDK for Node.js
const Pyroscope = require('@pyroscope/nodejs');

Pyroscope.init({
  serverAddress: 'http://pyroscope.observability.svc:4040',
  appName: 'api-gateway',
  tags: {
    pod: process.env.POD_NAME || '',
    namespace: process.env.POD_NAMESPACE || '',
  },
});
Pyroscope.start();

Rust: pprof-rs

# Cargo.toml
[dependencies]
pprof = { version = "0.13", features = ["flamegraph", "protobuf-codec"] }

# Expose /debug/pprof endpoint via actix-web or axum handler
use pprof::ProfilerGuardBuilder;

async fn cpu_profile() -> impl Responder {
    let guard = ProfilerGuardBuilder::default()
        .frequency(100)
        .blocklist(&["libc", "libgcc", "pthread"])
        .build()
        .unwrap();
    tokio::time::sleep(Duration::from_secs(30)).await;
    let report = guard.report().build().unwrap();
    let mut body = Vec::new();
    report.pprof().unwrap().encode(&mut body).unwrap();
    HttpResponse::Ok().content_type("application/octet-stream").body(body)
}

.NET: dotnet-trace & Pyroscope

# dotnet-trace (built-in .NET diagnostic tool)
dotnet-trace collect --process-id 1 --duration 00:00:30

# Pyroscope .NET agent (via environment variable injection)
# In Kubernetes Pod spec:
env:
- name: CORECLR_ENABLE_PROFILING
  value: "1"
- name: CORECLR_PROFILER
  value: "{BD1A650D-AC5D-4896-B64F-D6FA25D6B26A}"
- name: CORECLR_PROFILER_PATH
  value: /pyroscope/Pyroscope.Profiler.Native.so
- name: PYROSCOPE_SERVER_ADDRESS
  value: http://pyroscope.observability.svc:4040
- name: PYROSCOPE_APPLICATION_NAME
  value: cart-service

5. Grafana Pyroscope

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Pyroscope Cluster                     │
│                                                          │
│  Push (SDK)          Pull (scrape)                       │
│  ┌────────┐         ┌────────────────┐                   │
│  │ app    │──push──▶│  distributor   │                   │
│  │ SDK    │         └───────┬────────┘                   │
│  └────────┘                 │ fan-out                    │
│                    ┌────────▼────────┐                   │
│  ┌────────────┐   │    ingester      │ (WAL + ring)      │
│  │ Pyroscope  │───│    (3 replicas)  │                   │
│  │ scrape     │   └────────┬─────────┘                   │
│  │ (pprof pull│            │ flush                       │
│  │  targets)  │   ┌────────▼────────┐                   │
│  └────────────┘   │   object store  │ (S3 / GCS)        │
│                    │   (blocks)      │                   │
│                    └────────┬────────┘                   │
│                    ┌────────▼────────┐                   │
│                    │  store-gateway  │ (cache + query)   │
│                    └────────┬────────┘                   │
│                    ┌────────▼────────┐                   │
│                    │ query-frontend  │◀── Grafana        │
│                    └─────────────────┘                   │
└─────────────────────────────────────────────────────────┘

Helm Install — Pyroscope Distributed

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm upgrade --install pyroscope grafana/pyroscope \
  --namespace observability \
  --create-namespace \
  --values pyroscope-values.yaml

pyroscope-values.yaml (Production)

pyroscope:
  replicationFactor: 3

  storage:
    backend: s3
    s3:
      bucket_name: my-pyroscope-profiles
      region: us-east-1
      # Use IRSA — no access keys in cluster

  components:
    distributor:
      replicas: 2
      resources:
        requests: {cpu: 500m, memory: 512Mi}
        limits: {memory: 1Gi}

    ingester:
      replicas: 3
      persistence:
        enabled: true
        size: 20Gi
      resources:
        requests: {cpu: 1, memory: 2Gi}
        limits: {memory: 4Gi}

    querier:
      replicas: 2
      resources:
        requests: {cpu: 1, memory: 1Gi}
        limits: {memory: 2Gi}

    query-frontend:
      replicas: 2

    store-gateway:
      replicas: 3
      persistence:
        enabled: true
        size: 50Gi

    compactor:
      replicas: 1
      persistence:
        enabled: true
        size: 50Gi

  limits:
    # Global defaults
    max_sample_age: 24h
    # Per-tenant overrides via ConfigMap

  retention:
    default: 720h  # 30 days

  # Scrape pprof endpoints from pods with annotation
  scrapeConfigs:
    - job_name: kubernetes-pods
      kubernetes_sd_configs:
        - role: pod
      relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_profiles_grafana_com_cpu_scrape]
          action: keep
          regex: "true"
        - source_labels: [__meta_kubernetes_pod_annotation_profiles_grafana_com_cpu_port]
          action: replace
          target_label: __address__
          regex: (.+)
          replacement: "${1}:$1"
          # Override with annotation port
        - source_labels: [__meta_kubernetes_pod_name]
          target_label: pod
        - source_labels: [__meta_kubernetes_namespace]
          target_label: namespace
        - source_labels: [__meta_kubernetes_pod_label_app]
          target_label: service_name

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/pyroscope-s3-role

Pod Annotations for Pull-Mode Scraping

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  template:
    metadata:
      annotations:
        # Enable CPU profiling scrape
        profiles.grafana.com/cpu.scrape: "true"
        profiles.grafana.com/cpu.port: "6060"
        profiles.grafana.com/cpu.path: "/debug/pprof/profile"
        # Enable memory profiling scrape
        profiles.grafana.com/memory.scrape: "true"
        profiles.grafana.com/memory.port: "6060"
        profiles.grafana.com/memory.path: "/debug/pprof/heap"
        # Enable goroutine profiling scrape
        profiles.grafana.com/goroutine.scrape: "true"
        profiles.grafana.com/goroutine.port: "6060"
        profiles.grafana.com/goroutine.path: "/debug/pprof/goroutine"

Grafana Data Source for Pyroscope

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasource-pyroscope
  namespace: observability
  labels:
    grafana_datasource: "1"
data:
  pyroscope.yaml: |
    apiVersion: 1
    datasources:
      - name: Pyroscope
        type: grafana-pyroscope-datasource
        url: http://pyroscope.observability.svc:4040
        isDefault: false
        jsonData:
          minStep: "15s"

Pyroscope Query Language (FlameQL)

# Select CPU profiles for the order-service
{service_name="order-service"}

# Filter by namespace and pod
{service_name="order-service", namespace="production", pod=~"order-service-.*"}

# Filter by dynamic label (set via SDK TagWrapper)
{service_name="order-service", endpoint="/api/checkout"}

# Compare two time ranges (diff mode in Grafana UI):
# baseline: 2024-01-15T10:00:00Z / 2024-01-15T10:30:00Z
# comparison: 2024-01-15T11:00:00Z / 2024-01-15T11:30:00Z

6. eBPF-Based Profiling

eBPF profilers run in the Linux kernel and capture CPU stack traces for every process on a node — including native code, JVM internals, and kernel functions — without any application instrumentation. This makes them ideal for profiling languages where SDK injection is impractical (C/C++, Rust) or for getting a full-system view.

Grafana Alloy (eBPF profiler)

Part of the Grafana observability stack. DaemonSet agent that uses eBPF perf events + DWARF unwinding. Ships profiles directly to Pyroscope.

recommended

Parca Agent

CNCF project. eBPF CPU profiling DaemonSet. Stores profiles in Parca server (similar to Pyroscope). Open source, BPF CO-RE (no kernel header deps).

CNCF

Pixie

CNCF project. eBPF-based auto-telemetry (metrics + traces + profiles + network maps) with no instrumentation. Strong Go/Java/C++ support.

CNCF

Elastic Universal Profiling

Commercial eBPF profiler in Elastic Stack. Full-stack profiling from kernel to application. Correlates with Elastic APM traces.

commercial

Grafana Alloy eBPF Profiling DaemonSet

helm upgrade --install alloy grafana/alloy \
  --namespace observability \
  --values alloy-values.yaml
# alloy-values.yaml (eBPF profiling config)
alloy:
  configMap:
    content: |
      // eBPF CPU profiler — no app instrumentation required
      pyroscope.ebpf "all_pods" {
        targets_only      = false          // profile all processes
        demangle          = "full"         // C++ symbol demangling
        python_enabled    = true           // Python frame unwinding
        collect_interval  = "15s"

        forward_to = [pyroscope.write.default.receiver]
      }

      // Kubernetes pod enrichment — add namespace/pod/service labels
      discovery.kubernetes "pods" {
        role = "pod"
      }

      pyroscope.write "default" {
        endpoint {
          url = "http://pyroscope.observability.svc:4040"
        }
        external_labels = {
          "cluster" = "production",
        }
      }

daemonset:
  enabled: true

# Required Linux capabilities for eBPF
podSecurityContext:
  runAsUser: 0

containerSecurityContext:
  privileged: true          # eBPF requires elevated privileges
  capabilities:
    add:
      - SYS_ADMIN
      - SYS_PTRACE
      - NET_ADMIN
eBPF Privileges

eBPF profilers require SYS_ADMIN and SYS_PTRACE capabilities. They must run as root (UID 0) and use hostPID: true to see all container processes. This is expected for node-level agents — apply strict NetworkPolicy and RBAC so the DaemonSet ServiceAccount cannot access application secrets or APIs beyond what's needed for pod label enrichment.

eBPF vs SDK Profiling Comparison

DimensionSDK / Pull (pprof)eBPF Agent (Alloy/Parca)
Instrumentation requiredYes (import SDK or add agent JVM arg)None
Languages supportedPer-SDK (Go, Java, Python, .NET, Node)All (C, Go, Rust, Java, Python, Node)
Profile typesCPU, heap, allocs, goroutine, mutex, blockCPU (wall), some memory via USDT probes
Per-request labelsYes (TagWrapper / runtime labels)No (pod/container level only)
Kernel visibilityUserspace onlyFull kernel + userspace stacks
JVM Java framesFull (JVM knows method names)Partial without javaagent; use perf-map-agent
Container overheadLow per app (~1–2% CPU)Low per node (~0.5% per profiled process)
Zero-day coverageOnly opted-in servicesEvery process on the node automatically

7. Kubernetes Component Profiling

All core Kubernetes components expose pprof endpoints natively — enabling profiling of the control plane itself when diagnosing API server latency, scheduler queue depth, or etcd compaction pauses.

Accessing Component pprof Endpoints

# kube-apiserver — requires authentication
# Port-forward via kubectl proxy or direct pod port-forward
kubectl -n kube-system port-forward pod/kube-apiserver-controlplane-0 6443:6443

# CPU profile of the API server (30s)
curl -sk --cert /etc/kubernetes/pki/admin.crt \
         --key  /etc/kubernetes/pki/admin.key \
         "https://localhost:6443/debug/pprof/profile?seconds=30" \
         > apiserver.pprof

go tool pprof apiserver.pprof
# kube-scheduler (insecure port 10251 or secure port 10259)
kubectl -n kube-system port-forward pod/kube-scheduler-controlplane-0 10259:10259
curl -sk "https://localhost:10259/debug/pprof/heap" --cert ... > scheduler-heap.pprof

# kube-controller-manager (port 10257)
kubectl -n kube-system port-forward pod/kube-controller-manager-controlplane-0 10257:10257
curl -sk "https://localhost:10257/debug/pprof/goroutine?debug=1" --cert ...

# kubelet (port 10250 on each node)
NODE_IP=$(kubectl get node worker-1 -o jsonpath='{.status.addresses[0].address}')
curl -sk "https://${NODE_IP}:10250/debug/pprof/profile?seconds=30" \
     --header "Authorization: Bearer $(kubectl create token default)" > kubelet.pprof

# etcd (port 2379 — requires etcd client cert)
kubectl -n kube-system exec etcd-controlplane-0 -- \
  curl -sk "https://localhost:2379/debug/pprof/heap" \
       --cert /etc/kubernetes/pki/etcd/server.crt \
       --key  /etc/kubernetes/pki/etcd/server.key > etcd-heap.pprof

Common Control Plane Profiling Scenarios

SymptomComponentProfile TypeWhat to Look For
API server high latencykube-apiserverCPU + goroutineetcd calls, admission webhook latency, LIST serialisation
Scheduler queue growingkube-schedulerCPU + goroutinePredicate/scoring plugins, priority queue operations
etcd high memoryetcdheapLarge objects in watch cache, compaction lag, mvcc index
Controller manager slowkube-controller-managerCPUGC loops, requeue storms, informer cache resync
Node CPU spikekubeletCPU + goroutineImage pulls, pod lifecycle, CRI calls, eviction checks

8. Reading Flame Graphs

A flame graph shows all stack traces collected during the profiling period, merged and sorted alphabetically at each level. The x-axis represents proportion of sampled time (not wall-clock order). The y-axis represents call depth (root at bottom in traditional flame graphs, or top in Pyroscope's icicle graphs).

Reading a flame graph:

  ┌──────────────────────────────────────────────────────┐
  │                     runtime.main                     │  ← root (widest = most time)
  ├──────────────────────────┬───────────────────────────┤
  │    http.(*ServeMux).     │    runtime.gcBgMarkWorker │  ← GC taking ~30% CPU!
  │    ServeHTTP  (70%)      │    (30%)                  │
  ├──────────┬───────────────┤                           │
  │ handler  │ middleware.Do │                           │
  │ (45%)    │ (25%)         │                           │
  ├───┬──────┤               │                           │
  │db │json  │               │                           │
  │   │.Marshal               │                           │
  └───┴──────┴───────────────┴───────────────────────────┘

Width   = proportion of total samples where this function was on the stack
Tall    = deep call stack (not necessarily slow)
Wide    = this function (or its callees) uses a lot of CPU
Plateau = the function itself (not its callees) is using the CPU
Color   = arbitrary (typically indicates package/module/type in Pyroscope)
Identifying Hot Functions

Look for wide frames at the top of the flame graph (or bottom of an icicle graph) — these are "leaf" functions that consume time themselves rather than passing it to callees. A wide frame in the middle means a common call path, but the cost may be in its children. Use the diff mode in Pyroscope to highlight functions that grew between a before/after comparison.

Pyroscope Diff Workflow

# Using Pyroscope HTTP API to compare two time ranges
# Baseline: before deploy (T-1h to T-30m)
# Comparison: after deploy (T-10m to now)

curl "http://pyroscope.observability.svc:4040/render?
  query=order-service.cpu%7Bnamespace%3D%22production%22%7D
  &from=now-1h&until=now-30m
  &format=json" > baseline.json

curl "http://pyroscope.observability.svc:4040/render?
  query=order-service.cpu%7Bnamespace%3D%22production%22%7D
  &from=now-10m&until=now
  &format=json" > after.json

# Or use Grafana Explore → Pyroscope datasource → Diff mode (select two date ranges)

9. Linking Traces to Profiles

Pyroscope supports attaching a profile_id label to profiles at the same time a trace span is active. This allows Grafana Tempo to show a "View Profile" link directly from a trace span, drilling into the exact CPU usage during that request.

Go: Profiling a Specific Trace Span

import (
    "github.com/grafana/otel-profiling-go"
    "go.opentelemetry.io/otel"
)

// In your HTTP handler, after setting up OTel tracer and Pyroscope:
func handleCheckout(w http.ResponseWriter, r *http.Request) {
    ctx, span := otel.Tracer("order-service").Start(r.Context(), "checkout")
    defer span.End()

    // otelpyroscope middleware attaches span context to Pyroscope labels
    // so profiles can be linked to this specific trace ID
    ctx = otelpyroscope.Start(ctx)
    defer otelpyroscope.Stop(ctx)

    // The CPU time of processCheckout is now labelled with:
    // profile_id = span.TraceID (correlates with Tempo)
    processCheckout(ctx, w, r)
}

Grafana Tempo → Pyroscope Integration

# grafana/provisioning/datasources/tempo.yaml
apiVersion: 1
datasources:
  - name: Tempo
    type: tempo
    url: http://tempo-query-frontend.observability.svc:3100
    jsonData:
      tracesToProfiles:
        datasourceUid: pyroscope
        tags:
          - key: service.name
            value: service_name
        profileTypeId: "process_cpu:cpu:nanoseconds:cpu:nanoseconds"
        customQuery: false

With this configuration, Tempo trace spans show a "Profile" button. Clicking it opens Pyroscope filtered to the same service_name and time window as the trace span — giving a CPU flame graph for the exact duration of the slow request.

Correlation Flow

1. Alert: "p99 checkout latency > 3s for last 10 min"
              │
2. Grafana → Tempo trace search
   → Find a slow checkout trace (2.8s)
   → Identify slow span: "process_payment" (2.4s)
              │
3. Click "View Profile" on the span
   → Pyroscope flame graph for that 2.4s window
   → 68% of CPU in json.Marshal(*PaymentResponse)
   → PaymentResponse has 120 fields, most nil
              │
4. Fix: return only populated fields / use proto instead of JSON
   → Deploy → compare Pyroscope profiles before/after
   → json.Marshal reduced from 68% → 4% of CPU
   → p99 latency: 2.8s → 180ms

10. Alerting & Anomaly Detection

Pyroscope Self-Metrics (Prometheus)

# Pyroscope exposes Prometheus metrics on :4040/metrics
pyroscope_ingester_profiles_received_total    # profiles being pushed
pyroscope_distributor_received_samples_total  # samples received
pyroscope_querier_query_duration_seconds      # query latency histogram
pyroscope_compactor_block_cleanup_failures_total
pyroscope_ring_members{name="ingester",state="ACTIVE"}  # ring health

PrometheusRule — Pyroscope Health

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: pyroscope-alerts
  namespace: observability
spec:
  groups:
    - name: pyroscope
      interval: 1m
      rules:
        - alert: PyroscopeIngesterDown
          expr: |
            count(pyroscope_ring_members{name="ingester",state="ACTIVE"}) < 2
          for: 5m
          labels:
            severity: warning
            team: platform
          annotations:
            summary: "Pyroscope ingester ring below quorum"
            description: "Only {{ $value }} ingesters active, expected ≥ 2"

        - alert: PyroscopeDroppingProfiles
          expr: |
            rate(pyroscope_distributor_received_samples_total{status="dropped"}[5m]) > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Pyroscope distributor is dropping profiles"

        - alert: PyroscopeQueryLatencyHigh
          expr: |
            histogram_quantile(0.99,
              rate(pyroscope_querier_query_duration_seconds_bucket[5m])
            ) > 30
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Pyroscope p99 query latency > 30s"

Detecting CPU Regressions via Pyroscope API

Because Pyroscope stores profiles as time series, you can query aggregated CPU time and alert on regressions using Prometheus-style metric queries if you enable Pyroscope's metric export.

# pyroscope-values.yaml — enable metric exporter
pyroscope:
  extraConfig:
    metric_store:
      enabled: true
      # Aggregates CPU samples as prometheus metrics:
      # pyroscope_app_cpu_seconds_total{service_name, namespace}

# Then alert on CPU regression:
- alert: ServiceCPURegressionDetected
  expr: |
    (
      rate(pyroscope_app_cpu_seconds_total{namespace="production"}[30m])
      /
      rate(pyroscope_app_cpu_seconds_total{namespace="production"}[30m] offset 1h)
    ) > 1.5
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: "{{ $labels.service_name }} CPU increased >50% vs 1h ago"
    description: "CPU regression detected — compare profiles in Pyroscope"
    runbook: "https://runbooks.internal/cpu-regression"

NetworkPolicy: Restrict pprof Access

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-pprof-only-from-pyroscope
  namespace: production
spec:
  podSelector:
    matchLabels:
      profiles.grafana.com/cpu.scrape: "true"
  policyTypes: [Ingress]
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: observability
          podSelector:
            matchLabels:
              app.kubernetes.io/name: alloy
      ports:
        - port: 6060
          protocol: TCP

11. Best Practices

1. Use SDK for heap + goroutine; eBPF for CPU

eBPF gives zero-instrumentation CPU visibility across all services. SDK (pprof/Pyroscope) gives heap, allocs, goroutine, mutex profiles that eBPF cannot collect from userspace.

2. Add dynamic labels at request boundaries

Use pyroscope.TagWrapper / runtime/pprof.Do to tag CPU time by endpoint, tenant, queue. Otherwise all requests merge into an unfiltered flame graph.

3. Separate pprof port from app port

Bind pprof to :6060 (or similar). Add a NetworkPolicy that only allows the Pyroscope scraper to reach it. Never expose pprof through Ingress.

4. Enable mutex and block profiling deliberately

Call runtime.SetMutexProfileFraction(5) and SetBlockProfileRate(1) only in services where you suspect contention. These add ~1–3% overhead.

5. Compare profiles across deploys, not just time

Store the git SHA as a Pyroscope label. Use Pyroscope diff mode with version="v1.4.2" vs version="v1.4.3" to attribute CPU changes to a specific commit.

6. Profile control plane components during incidents

API server, scheduler, and etcd have built-in pprof. High API latency is often a LIST cardinality issue visible as json.Marshal in the API server heap profile.

7. Use Pyroscope diff for post-deploy regression check

Make profile comparison part of your deployment runbook: collect a 10-minute CPU profile before the deploy and after, confirm no function gained >20% CPU share.

8. Set retention to 30 days minimum

Profiles older than the incident won't help diagnose root cause. 30 days retains enough history for seasonal pattern analysis (e.g., end-of-month batch jobs).

Profiling Overhead Reference

ProfilerCPU OverheadMemory OverheadNotes
Go pprof CPU (pull, 100 Hz)~1–2%NegligibleOnly during active sample collection
Go pprof heap (pull)~0.5%Proportional to live heapAlways active when enabled
Go mutex profile (fraction=5)~1–3%LowReports 1/5 contention events
Pyroscope Go SDK (push, 100 Hz)~1–3%~50 MiBContinuous with periodic upload
Java async-profiler (100 Hz)~1–2%~30 MiB agentJFR format reduces overhead vs JVMTI
Python py-spy (100 Hz)~2–5%Negligible (external)Does not require code changes
eBPF CPU (Alloy/Parca)~0.5–1% per node~100 MiB agentCovers all processes on node
Coverage Checklist
  • Continuous vs on-demand profiling comparison
  • Profiling as the fifth observability signal
  • Metric → Trace → Profile correlation flow
  • Profile types reference table (CPU/heap/allocs/goroutine/mutex/block/wall/eBPF)
  • Sampling vs instrumented profiler overhead comparison
  • Go net/http/pprof HTTP endpoints reference
  • pprof server on separate port + danger callout
  • go tool pprof CLI commands (top10/web/list/traces/diff_base)
  • Mutex and block profiling enablement (SetMutexProfileFraction/SetBlockProfileRate)
  • Pyroscope Go SDK (Config/ProfileTypes/Tags/TagWrapper)
  • Dynamic per-request labels with pyroscope.TagWrapper
  • Java async-profiler attach + JFR recording commands
  • Pyroscope Java javaagent config (application.properties)
  • Python py-spy + Pyroscope SDK with tag_wrapper
  • Node.js @pyroscope/nodejs SDK
  • Rust pprof-rs HTTP endpoint via actix-web
  • .NET dotnet-trace + Pyroscope env var injection
  • Pyroscope architecture diagram (distributor/ingester/S3/store-gateway/querier)
  • Pyroscope Helm install (distributed mode)
  • pyroscope-values.yaml (S3/replication/retention/limits/per-component resources/IRSA)
  • Pod annotations for pull-mode scraping (cpu/memory/goroutine)
  • Grafana Pyroscope data source provisioning YAML
  • FlameQL query examples (service/namespace/dynamic labels/diff)
  • eBPF profiling overview and tool comparison (Alloy/Parca/Pixie/Elastic)
  • Grafana Alloy eBPF DaemonSet values YAML
  • SYS_ADMIN/SYS_PTRACE privilege warning
  • eBPF vs SDK profiling comparison table
  • kube-apiserver pprof access (cert auth curl command)
  • kube-scheduler/controller-manager/kubelet/etcd pprof commands
  • Control plane profiling scenarios table
  • Flame graph anatomy and reading guide
  • Pyroscope diff workflow (HTTP API before/after comparison)
  • Trace → Profile linking (otelpyroscope middleware in Go)
  • Grafana Tempo → Pyroscope data source tracesToProfiles config
  • End-to-end correlation flow (alert → trace → profile → code fix)
  • Pyroscope self-metrics reference
  • PrometheusRule for Pyroscope health (ingester/drop/query latency)
  • CPU regression alerting via metric export
  • NetworkPolicy restricting pprof access to Pyroscope scraper only
  • 8 best practices with cards
  • Profiling overhead reference table