Logging — Kubernetes Observability

Logging in Kubernetes

Complete guide to log collection, structured logging, Fluent Bit, Loki/LogQL, Elasticsearch, OpenTelemetry logs, and production log management at scale.

Kubernetes Logging Architecture

Kubernetes does not provide a built-in log aggregation or persistence system. Containers write to stdout/stderr; the container runtime captures those streams and writes them to log files on the node. Kubernetes exposes these via kubectl logs but does not ship them anywhere. Cluster operators are responsible for collecting, forwarding, and storing logs.

Logs Are Ephemeral by Default

Pod logs exist only while the pod exists on the node, subject to log rotation. If a pod is evicted, rescheduled, or deleted, its logs disappear. A log collector DaemonSet must tail files in real-time to avoid data loss.

Node-Level Log File Paths

The container runtime writes logs at a location determined by its log driver. The kubelet creates stable symlinks:

┌─ Container stdout/stderr │ ▼ Container Runtime (containerd / CRI-O) │ writes to: /var/log/pods/<ns>_<pod>_<uid>/<container>/<n>.log │ ├── Kubelet creates symlinks: │ /var/log/containers/<pod>_<ns>_<container>-<id>.log │ → /var/log/pods/.../<n>.log │ ├── Log rotation managed by kubelet: │ --container-log-max-size=10Mi (default 10Mi per file) │ --container-log-max-files=5 (default 5 rotated files) │ └── kubectl logs reads from: /var/log/pods/... via kubelet API

CRI Log Format

Since Kubernetes 1.14, the CRI log format (used by containerd and CRI-O) prefixes each line with metadata:

# Format: <timestamp> <stream> <flags> <log message>
2024-01-15T10:23:45.123456789Z stdout F {"level":"info","msg":"server started","port":8080}
2024-01-15T10:23:45.124000000Z stderr F Error: connection refused to db:5432
# Flags: F = full line, P = partial line (multiline handling needed)
Docker vs containerd Log Formats

Docker's default json-file driver wraps each log line in JSON: {"log":"...\n","stream":"stdout","time":"..."}. containerd uses the CRI format above. Fluent Bit parsers must match the actual runtime in your cluster — check with kubectl get nodes -o wide for the container runtime version.

Log Retention on Node

kubelet FlagDefaultEffect
--container-log-max-size10MiMax size before rotation
--container-log-max-files5Number of rotated files to keep
--node-status-update-frequency10sAffects log availability window
Total per-container storage~50Mimax-size × max-files

Log Collection Patterns

There are three fundamental architectures for log collection in Kubernetes. Most production clusters use a combination of patterns 1 and 2.

1. Node-Level Agent (DaemonSet)

A log collector runs as a DaemonSet on every node, tailing /var/log/containers/. One collector per node — low overhead, zero application changes required. Works for any container writing to stdout/stderr.

Recommended for most workloads

2. Sidecar Container

A dedicated log-shipper container runs in the same pod as the application, reading logs from a shared volume. Required when the application writes to files rather than stdout, or when per-application routing logic is needed.

Use sparingly — resource overhead per pod

3. Sidecar Streaming

A sidecar reads from the application log file and re-emits to stdout. Useful for legacy applications that cannot be modified to write to stdout. The node-level agent then picks up the sidecar's stdout.

Legacy apps only

4. Direct-to-Backend

The application SDK ships logs directly to the logging backend (e.g., Datadog Agent via UDP, Loki HTTP API). Bypasses the node filesystem entirely. Requires code changes; provides richest metadata but no fallback if backend unreachable.

Cloud-native apps

Pattern Comparison

PatternResource OverheadApp ChangesFile LoggingPer-Pod RoutingData Loss Risk
Node-Level DaemonSetLow (1 agent/node)NoneNo (stdout only)LimitedIf pod deleted before tail
Sidecar ShipperHigh (1 sidecar/pod)Shared volumeYesYesLow (direct ship)
Sidecar StreamingMediumShared volumeYesNoSame as DaemonSet
Direct SDKNone extraYesYesFullIf backend unreachable

Structured Logging

Structured logs emit machine-parseable records (JSON) rather than free-text strings. This enables log backends to index fields, run aggregations, and support precise query predicates — critical for high-volume environments.

Standard Log Fields

FieldTypePurposeExample
timestampRFC3339Event time (UTC)"2024-01-15T10:23:45.123Z"
levelstringSeverity"info", "error", "warn"
messagestringHuman-readable summary"request completed"
servicestringApplication name"order-service"
versionstringApp version/build"v2.4.1"
trace_idstring (hex)W3C TraceContext — links to trace"4bf92f3577b34da6..."
span_idstring (hex)Span within trace"00f067aa0ba902b7"
podstringKubernetes pod name"order-7d5f9-xk2pq"
namespacestringKubernetes namespace"production"
nodestringNode hosting the pod"ip-10-0-1-45"
errorstring / objectError details when level=error"connection refused"
duration_msfloatRequest/operation duration34.7
http.methodstringHTTP verb (OTel semantic conventions)"POST"
http.status_codeintHTTP response code200

Go: Zap Structured Logger

package main

import (
    "go.uber.org/zap"
    "go.opentelemetry.io/otel/trace"
)

func setupLogger() *zap.Logger {
    cfg := zap.NewProductionConfig()
    cfg.EncoderConfig.TimeKey = "timestamp"
    cfg.EncoderConfig.MessageKey = "message"
    logger, _ := cfg.Build()
    return logger
}

// Inject trace context into every log record
func loggerFromSpan(base *zap.Logger, span trace.Span) *zap.Logger {
    sc := span.SpanContext()
    return base.With(
        zap.String("trace_id", sc.TraceID().String()),
        zap.String("span_id", sc.SpanID().String()),
        zap.Bool("trace_sampled", sc.IsSampled()),
    )
}

func handleRequest(logger *zap.Logger, span trace.Span, method, path string) {
    log := loggerFromSpan(logger, span)
    log.Info("request received",
        zap.String("http.method", method),
        zap.String("http.path", path),
    )
}

// Output:
// {"timestamp":"2024-01-15T10:23:45.123Z","level":"info","message":"request received",
//  "http.method":"POST","http.path":"/orders","trace_id":"4bf92f3577b34da6...","span_id":"00f067aa0ba902b7"}

Java: Logback JSON Encoder (Logstash)

<!-- pom.xml dependency -->
<dependency>
  <groupId>net.logstash.logback</groupId>
  <artifactId>logstash-logback-encoder</artifactId>
  <version>7.4</version>
</dependency>

<!-- logback-spring.xml -->
<configuration>
  <appender name="JSON" class="ch.qos.logback.core.ConsoleAppender">
    <encoder class="net.logstash.logback.encoder.LogstashEncoder">
      <fieldNames>
        <timestamp>timestamp</timestamp>
        <message>message</message>
        <logger>logger</logger>
      </fieldNames>
      <includeMdcKeyName>trace_id</includeMdcKeyName>
      <includeMdcKeyName>span_id</includeMdcKeyName>
    </encoder>
  </appender>
  <root level="INFO">
    <appender-ref ref="JSON"/>
  </root>
</configuration>

Python: structlog

import structlog
import logging

structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,   # thread-local context (trace_id etc.)
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso", utc=True, key="timestamp"),
        structlog.processors.dict_tracebacks,
        structlog.processors.JSONRenderer(),
    ],
    logger_factory=structlog.PrintLoggerFactory(),
)

log = structlog.get_logger()

# Bind request context once per request:
structlog.contextvars.bind_contextvars(
    trace_id="4bf92f3577b34da6a3ce929d0e0e4736",
    span_id="00f067aa0ba902b7",
    service="order-service",
)

log.info("order created", order_id=12345, amount=99.50)
# {"timestamp":"2024-01-15T10:23:45.123Z","level":"info","event":"order created",
#  "order_id":12345,"amount":99.5,"trace_id":"4bf92f3577b34da6...","service":"order-service"}
Node.js: pino

pino is the standard for Node.js structured logging: const log = pino({ level: 'info', timestamp: pino.stdTimeFunctions.isoTime }). Use pino-http middleware to automatically log HTTP requests with duration and status code.

Fluent Bit

Fluent Bit is the preferred lightweight log collector for Kubernetes (written in C, ~700KB binary, ~30MB RSS). It replaces Fluentd in most modern deployments due to its lower resource footprint. Fluent Bit reads CRI log files, enriches them with Kubernetes metadata via the API, and forwards to one or more backends.

Fluent Bit vs Fluentd

AspectFluent BitFluentd
LanguageCRuby
Memory footprint~30–50MB~200–600MB
Plugin ecosystem~100 plugins~1,000 plugins
PerformanceHigher (native)Lower (Ruby GIL)
Complex routingLimitedExcellent
Use caseEdge/DaemonSet collectorAggregator / heavy processing
CNCF statusGraduatedGraduated

Fluent Bit Pipeline Model

INPUT (tail /var/log/containers/*.log) │ ▼ PARSER (containerd / docker / json / regex) │ ▼ FILTER: kubernetes ← calls kubelet API for pod/namespace/labels/annotations │ ▼ FILTER: modify / nest / rewrite_tag / throttle / lua │ ▼ BUFFER (filesystem or memory) │ ▼ OUTPUT → Loki / Elasticsearch / S3 / Kafka / Splunk / CloudWatch / multiple

Helm Install

helm repo add fluent https://fluent.github.io/helm-charts
helm repo update

helm upgrade --install fluent-bit fluent/fluent-bit \
  --namespace logging \
  --create-namespace \
  --values fluentbit-values.yaml

Production ConfigMap (Loki backend)

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         5
        Daemon        Off
        Log_Level     warn
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020
        storage.type  filesystem              # buffer to disk (survive node pressure)
        storage.path  /var/log/flb-storage/
        storage.sync  normal
        storage.checksum Off
        storage.max_chunks_up 128

    [INPUT]
        Name              tail
        Path              /var/log/containers/*.log
        # Exclude fluent-bit's own logs and kube-system noise
        Exclude_Path      /var/log/containers/fluent-bit*,/var/log/containers/*_kube-system_*
        multiline.parser  docker, cri          # handles both runtimes
        Tag               kube.*
        Refresh_Interval  5
        Rotate_Wait       30
        Mem_Buf_Limit     64MB
        Skip_Long_Lines   On
        DB                /var/log/flb-storage/tail.db  # position tracking
        DB.sync           normal
        Ignore_Older      24h

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           On               # parse nested JSON from container log field
        Keep_Log            Off              # drop original 'log' key after merge
        Merge_Log_Key       app              # merged fields under 'app.' prefix
        K8S-Logging.Parser  On              # respect pod annotation for custom parser
        K8S-Logging.Exclude On              # respect annotation to exclude pod
        Labels              On
        Annotations         Off             # annotations add cardinality — disable unless needed

    # Drop noisy health/readiness probe logs
    [FILTER]
        Name   grep
        Match  kube.*
        Exclude log /healthz|/readyz|/livez|/metrics

    # Add cluster identifier
    [FILTER]
        Name   modify
        Match  kube.*
        Add    cluster prod-us-east-1

    [OUTPUT]
        Name            loki
        Match           kube.*
        Host            loki-gateway.monitoring.svc.cluster.local
        Port            80
        Labels          job=fluent-bit,cluster=$cluster,namespace=$kubernetes_namespace_name,pod=$kubernetes_pod_name,container=$kubernetes_container_name
        Label_Keys      level,severity
        Remove_Keys     kubernetes,stream
        Retry_Limit     False             # retry indefinitely (disk-buffered)
        Workers         4

  parsers.conf: |
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name        cri
        Format      regex
        Regex       ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<flags>[^ ]*) (?<log>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z

    [PARSER]
        Name        json
        Format      json
        Time_Key    timestamp
        Time_Format %Y-%m-%dT%H:%M:%S.%LZ

Fluent Bit DaemonSet Resource Requirements

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
      annotations:
        fluentbit.io/exclude: "true"     # exclude own logs from collection
    spec:
      serviceAccountName: fluent-bit
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          effect: NoSchedule
        - key: node.kubernetes.io/not-ready
          effect: NoSchedule
      containers:
        - name: fluent-bit
          image: cr.fluentbit.io/fluent/fluent-bit:3.2.0
          resources:
            requests:
              cpu: 50m
              memory: 64Mi
            limits:
              cpu: 200m
              memory: 256Mi
          volumeMounts:
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: config
              mountPath: /fluent-bit/etc
            - name: storage
              mountPath: /var/log/flb-storage
          ports:
            - name: http
              containerPort: 2020
          livenessProbe:
            httpGet:
              path: /
              port: http
            initialDelaySeconds: 10
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: config
          configMap:
            name: fluent-bit-config
        - name: storage
          hostPath:
            path: /var/lib/fluent-bit
            type: DirectoryOrCreate
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluent-bit
rules:
  - apiGroups: [""]
    resources: [pods, namespaces, nodes, nodes/proxy]
    verbs: [get, list, watch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: fluent-bit
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluent-bit
subjects:
  - kind: ServiceAccount
    name: fluent-bit
    namespace: logging

Multiline Log Handling

Stack traces (Java, Python) and structured exceptions span multiple lines. The CRI format uses the P (partial) flag for continuation lines, but applications may not use it. Fluent Bit supports multiline via the multiline.parser option or dedicated multiline filter:

[FILTER]
    Name            multiline
    Match           kube.*
    multiline.key_content  log
    multiline.parser       java    # built-in: java, go, python, ruby, docker

# Custom multiline rule (e.g., Python traceback)
[MULTILINE_PARSER]
    name          python_custom
    type          regex
    flush_timeout 2000         # flush after 2s of inactivity
    rule "start_state" "/^[^\s]/" "cont"
    rule "cont"        "/^\s/"   "cont"

Loki & LogQL

Grafana Loki is a horizontally scalable, highly available log aggregation system. Unlike Elasticsearch, Loki does not index the contents of log lines — only the metadata labels. This makes it extremely cost-efficient. Log content is compressed and stored in object storage (S3, GCS, Azure Blob).

Loki Architecture

Fluent Bit / Promtail / OTel Collector │ HTTP POST /loki/api/v1/push ▼ ┌─────────────────────────────────────────────────────┐ │ Loki Cluster │ │ │ │ Distributor → Ingester → Compactor │ │ (hash ring, (WAL, (index/chunk │ │ validation) chunks) compaction) │ │ │ │ Querier ← Query Frontend ← Grafana │ │ │ (caching, │ │ │ sharding) │ │ ▼ │ │ Store Gateway ← Object Store (S3/GCS/Azure Blob) │ │ (chunk index) + Index Store (BoltDB / tsdb) │ └─────────────────────────────────────────────────────┘
Label Cardinality Warning

Loki indexes labels, not log content. High-cardinality labels (e.g., pod_name with thousands of values, user_id, request_id) cause index explosion, query performance degradation, and excessive memory use in ingesters. Keep label count <10, cardinality per label <1,000. Put high-cardinality values in the log body and use |= / | json filter expressions to query them.

Recommended Label Set

cluster="prod-us-east-1"    # static — low cardinality
namespace="payments"         # ~10–100 values
pod="payments-api-7d5f9-xk"  # AVOID: high cardinality — use for debugging only
container="payments-api"     # ~1–5 per namespace
app="payments-api"           # from Kubernetes labels
level="error"                # from log body parsing

Loki Helm Install

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# Single-binary mode (dev/small clusters)
helm upgrade --install loki grafana/loki \
  --namespace monitoring \
  --set deploymentMode=SingleBinary \
  --set loki.storage.type=s3 \
  --set loki.storage.s3.region=us-east-1 \
  --set loki.storage.s3.bucketnames=my-loki-chunks \
  --set loki.auth_enabled=false \
  --set singleBinary.replicas=3

# Distributed mode (production)
helm upgrade --install loki grafana/loki-distributed \
  --namespace monitoring \
  --values loki-distributed-values.yaml

Loki Distributed Values (Production)

# loki-distributed-values.yaml
loki:
  auth_enabled: true
  commonConfig:
    replication_factor: 3
  storage:
    type: s3
    s3:
      region: us-east-1
      bucketnames: prod-loki-chunks
      insecure: false
  schemaConfig:
    configs:
      - from: "2024-01-01"
        store: tsdb              # TSDB index (Loki 2.8+)
        object_store: s3
        schema: v13
        index:
          prefix: loki_index_
          period: 24h
  limits_config:
    ingestion_rate_mb: 64
    ingestion_burst_size_mb: 128
    max_label_names_per_series: 15
    max_label_value_length: 2048
    max_streams_per_user: 50000
    max_chunks_per_query: 2000000
    retention_period: 744h       # 31 days
    per_tenant_override_config: /etc/loki/overrides.yaml

ingester:
  replicas: 3
  resources:
    requests: {cpu: 500m, memory: 1Gi}
    limits: {cpu: 2, memory: 4Gi}

distributor:
  replicas: 3
  resources:
    requests: {cpu: 200m, memory: 256Mi}

querier:
  replicas: 3
  resources:
    requests: {cpu: 500m, memory: 512Mi}
    limits: {cpu: 2, memory: 2Gi}

queryFrontend:
  replicas: 2

compactor:
  enabled: true
  resources:
    requests: {cpu: 200m, memory: 256Mi}

LogQL Reference

LogQL is Loki's query language. It has two types: log queries (return log lines) and metric queries (return time series derived from log lines).

Log Stream Selectors

# Select all logs from namespace=payments
{namespace="payments"}

# Multiple label matchers
{cluster="prod", namespace="payments", container="api"}

# Regex match on label value
{namespace=~"payments|orders"}

# Negative match
{namespace!="kube-system"}

Log Pipeline Expressions

# Line filter — substring match (fastest)
{namespace="payments"} |= "error"
{namespace="payments"} != "health"

# Line filter — regex
{namespace="payments"} |~ "ERROR|FATAL"
{namespace="payments"} !~ "GET /health|GET /metrics"

# JSON parser — extract fields from JSON log body
{namespace="payments"} | json

# JSON parser with field aliases
{namespace="payments"} | json level="level", traceId="trace_id"

# Logfmt parser (key=value format)
{namespace="payments"} | logfmt

# Pattern parser (positional)
{namespace="payments"} | pattern `<_>    `

# Label filter — after parsing
{namespace="payments"} | json | level = "error"
{namespace="payments"} | json | status_code >= 500
{namespace="payments"} | json | duration_ms > 1000

# Line format — reshape output
{namespace="payments"} | json | line_format "{{.level}} {{.message}} traceId={{.trace_id}}"

# Label format — rename/derive labels
{namespace="payments"} | json | label_format service=app, lvl=level

Metric Queries

# Count error log rate per service (last 5m window)
sum by (app) (rate({namespace="production"} |= "error" [5m]))

# P99 request duration from logs (requires duration_ms field)
quantile_over_time(0.99,
  {namespace="payments"} | json | unwrap duration_ms [5m]
) by (app)

# Error rate as percentage of all logs
sum(rate({namespace="payments"} | json | level="error" [5m]))
  /
sum(rate({namespace="payments"} [5m]))

# Bytes ingested per namespace
sum by (namespace) (bytes_rate({cluster="prod"} [5m]))

# Log volume (lines per second) per service
sum by (app) (rate({namespace="production"} [1m]))

# Count distinct values (approx cardinality)
count_over_time({namespace="payments"} |= "user_id" | json | __error__="" [1h])

Useful Operational Queries

# Find all errors in last 15 minutes
{namespace="production"} | json | level="error" | line_format "{{.timestamp}} {{.service}} {{.message}}"

# Trace a request by trace_id
{cluster="prod"} | json | trace_id = "4bf92f3577b34da6a3ce929d0e0e4736"

# Find OOMKilled events across all namespaces
{cluster="prod"} |= "OOMKilled"

# Slow queries (> 5s)
{namespace="payments"} | json | duration_ms > 5000 | line_format "{{.timestamp}} {{.message}} dur={{.duration_ms}}ms"

# Count unique error messages (top 10)
topk(10, sum by (message) (count_over_time({namespace="payments"} | json | level="error" [1h])))

Loki Recording Rules

apiVersion: monitoring.coreos.com/v1alpha1
kind: PrometheusRule
metadata:
  name: loki-log-rates
  namespace: monitoring
spec:
  groups:
    - name: log-rates
      interval: 1m
      rules:
        - record: namespace:log_lines:rate5m
          expr: |
            sum by (namespace) (
              rate({cluster="prod"} [5m])
            )
        - record: namespace_app:log_errors:rate5m
          expr: |
            sum by (namespace, app) (
              rate({cluster="prod"} | json | level="error" [5m])
            )

Elasticsearch / OpenSearch

Elasticsearch (and its open-source fork OpenSearch) indexes every field in every log record, enabling full-text search and complex aggregations. This power comes at significantly higher resource cost (10–20× more storage and CPU than Loki for equivalent log volume).

ECK (Elastic Cloud on Kubernetes)

# Install ECK Operator
kubectl create -f https://download.elastic.co/downloads/eck/2.11.0/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/2.11.0/operator.yaml
# Production Elasticsearch cluster
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: prod-logs
  namespace: logging
spec:
  version: 8.12.0
  nodeSets:
    - name: hot
      count: 3
      config:
        node.roles: [master, data_hot, data_content, ingest]
        xpack.security.enabled: true
      podTemplate:
        spec:
          containers:
            - name: elasticsearch
              resources:
                requests: {cpu: 2, memory: 8Gi}
                limits: {cpu: 4, memory: 8Gi}
              env:
                - name: ES_JAVA_OPTS
                  value: "-Xms4g -Xmx4g"
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            storageClassName: gp3
            resources:
              requests:
                storage: 500Gi
    - name: warm
      count: 2
      config:
        node.roles: [data_warm]
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            storageClassName: gp2     # cheaper storage for warm tier
            resources:
              requests:
                storage: 2Ti

Index Lifecycle Management (ILM)

{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50gb",
            "max_age": "1d"
          },
          "set_priority": {"priority": 100}
        }
      },
      "warm": {
        "min_age": "3d",
        "actions": {
          "shrink": {"number_of_shards": 1},
          "forcemerge": {"max_num_segments": 1},
          "set_priority": {"priority": 50},
          "allocate": {"require": {"data": "warm"}}
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {"require": {"data": "cold"}},
          "set_priority": {"priority": 0},
          "freeze": {}
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {"delete": {}}
      }
    }
  }
}

Elasticsearch vs Loki Comparison

AspectElasticsearch / OpenSearchGrafana Loki
IndexingFull inverted index of all fieldsLabels only; log body compressed
Storage costHigh (10–20× raw log size)Low (2–5× compressed)
Full-text searchExcellent (analyzed text fields)Via line filter (regex scan)
Query languageLucene / KQL / EQLLogQL
ScalabilityComplex (shard management)Simpler (object store backed)
Operational complexityHigh (JVM tuning, ILM, shards)Medium
Grafana integrationVia data sourceNative (first-class)
Best forSecurity events, full-text search, complianceApplication logs, cost-sensitive, correlated observability

OpenTelemetry Logs

The OpenTelemetry Collector can act as a log collection and forwarding pipeline, replacing Fluent Bit or running alongside it. It enables unified signal collection — the same collector handles metrics, traces, and logs.

OTel Collector Log Pipeline

# OTel Collector config for log collection
receivers:
  filelog:
    include:
      - /var/log/containers/*.log
    exclude:
      - /var/log/containers/otelcol*
    start_at: beginning
    include_file_path: true
    include_file_name: false
    operators:
      # Parse CRI log format
      - type: regex_parser
        id: parse_cri
        regex: '^(?P<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{9}Z)\s(?P<stream>stdout|stderr)\s(?P<logtag>[^ ]*)\s(?P<log>.*)$'
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      # Extract Kubernetes metadata from file path
      - type: regex_parser
        id: parse_k8s
        regex: '\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[^\/]+)\/(?P<container_name>[^\/]+)\/'
        parse_from: attributes["log.file.path"]
      # Try to parse JSON log body
      - type: json_parser
        id: parse_json
        parse_from: attributes.log
        if: 'attributes.log matches "^\\{"'
        on_error: send
      # Move log to body
      - type: move
        from: attributes.log
        to: body

  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 5s
    send_batch_size: 8192
  memory_limiter:
    limit_mib: 512
    spike_limit_mib: 128
  k8sattributes:        # enrich with K8s metadata
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.node.name
    pod_association:
      - sources:
          - from: resource_attribute
            name: k8s.pod.uid
  resource:
    attributes:
      - key: cluster
        value: "prod-us-east-1"
        action: insert

exporters:
  loki:
    endpoint: http://loki-gateway.monitoring.svc/loki/api/v1/push
    default_labels_enabled:
      exporter: false
      job: true
    labels:
      resource:
        cluster: ""
        k8s.namespace.name: ""
        k8s.pod.name: ""
        k8s.container.name: ""
  otlp/tempo:           # also forward to Tempo for log→trace correlation
    endpoint: tempo.monitoring.svc:4317

service:
  pipelines:
    logs:
      receivers: [filelog, otlp]
      processors: [memory_limiter, k8sattributes, resource, batch]
      exporters: [loki, otlp/tempo]

OTel Log Data Model

The OpenTelemetry log data model provides a standard schema that maps to existing formats:

OTel FieldDescriptionMaps to (Loki)
TimestampEvent time (nanoseconds)Log timestamp
SeverityNumber1–24 (TRACE, DEBUG, INFO, WARN, ERROR, FATAL)Label level
SeverityTextOriginal severity stringLabel severity
BodyLog message (any type)Log line
TraceIdW3C trace ID (16 bytes)Indexed attribute
SpanIdW3C span ID (8 bytes)Indexed attribute
AttributesKey-value pairs (structured fields)Parsed fields
ResourceOrigin resource (service, host, k8s)Labels

Kubernetes Component Logs

Control-plane component logs are critical for diagnosing cluster-level issues. Their collection differs from application logs because they may run as static pods, system services, or managed cloud services.

kube-apiserver

# Static pod — logs via kubectl
kubectl -n kube-system logs kube-apiserver-<node> --tail=100 --since=1h

# Increase verbosity temporarily (restart required for static pods)
# Edit /etc/kubernetes/manifests/kube-apiserver.yaml and add --v=4

# Key log patterns to watch:
# "too many requests" — client throttling
# "Timeout: request did not complete" — etcd latency
# "DENY" — admission controller rejection
grep -i "too many requests\|timeout\|DENY" /var/log/kubernetes/apiserver.log

kubelet

# kubelet runs as a systemd service (not a pod)
journalctl -u kubelet -n 500 --since "1 hour ago"
journalctl -u kubelet -f    # follow

# Increase kubelet verbosity (reload required)
# Edit /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
# Add --v=4 to KUBELET_EXTRA_ARGS

# Key kubelet log patterns:
# "orphaned pod" — stale pod directories
# "failed to get container info" — runtime disconnected
# "evicted" — resource pressure eviction
journalctl -u kubelet | grep -i "evict\|oom\|failed to\|error"

containerd

journalctl -u containerd -n 200

# containerd log level (edit /etc/containerd/config.toml)
[debug]
  level = "warn"   # info = verbose, warn = quieter

# Key patterns:
# "failed to pull image" — registry auth or network
# "failed to create shim" — runc error (OCI runtime issue)
# "context deadline exceeded" — slow image pulls

etcd

kubectl -n kube-system logs etcd-<node> --tail=100

# Key patterns:
# "slow fdatasync" — disk I/O issue (use SSD with <1ms fsync)
# "leader changed" — leader election event (check disk/network)
# "failed to send message" — inter-etcd network issue
# "applying snapshot" — member catching up (acceptable on rejoin)

# etcd verbosity (add to manifest):
# --log-level=warn   (options: debug, info, warn, error, panic, fatal)

Control-Plane Log Verbosity Levels

--v LevelContentUse When
0Always-visible messages (errors, panics)Normal production
1Basic high-level infoNormal production
2Steady-state changes, reconciliation loopsDefault for most components
3Extended info, informational changesLight debugging
4Debug-level loggingActive debugging session
5–6Verbose — each API call loggedDeep debugging (high volume)
7–10Extremely verboseCore development only

kubectl logs & Tooling

kubectl logs Reference

# Basic usage
kubectl logs <pod>
kubectl logs <pod> -c <container>          # specific container in multi-container pod
kubectl logs <pod> --previous              # logs from the previous (crashed) container

# Streaming
kubectl logs -f <pod>                      # follow (like tail -f)
kubectl logs -f deployment/<name>          # follow deployment (picks one pod)

# Filtering by time
kubectl logs <pod> --since=1h             # last 1 hour
kubectl logs <pod> --since-time="2024-01-15T10:00:00Z"

# Limit output
kubectl logs <pod> --tail=100             # last 100 lines

# All pods matching label selector
kubectl logs -l app=payments --all-containers=true --prefix=true

# All containers in all pods of a deployment
kubectl logs -f deployment/payments-api --all-containers=true --max-log-requests=10

Stern — Multi-Pod Log Streaming

# Install
brew install stern     # macOS
# or: https://github.com/stern/stern/releases

# Follow all pods matching regex in any namespace
stern "payments.*" --namespace production

# Multiple namespaces
stern "api" --namespace production --namespace staging

# Filter by container name
stern "." --container "main" --namespace production

# Tail last N lines then follow
stern "payments" --tail 50

# Output as JSON with pod prefix
stern "payments" --output json --namespace production

# Filter with grep
stern "payments" | grep "error"

# Custom output template
stern "payments" --template '{{.PodName}} {{.ContainerName}} {{.Message}}'

k9s Log Viewer

k9s provides a TUI (terminal UI) for Kubernetes. Press l on a pod to view logs, / to filter, s to save to file.

Log Correlation via Trace ID

# Find all logs for a specific trace (across all services)
# In Grafana: Explore → Loki → LogQL:
{cluster="prod"} | json | trace_id = "4bf92f3577b34da6a3ce929d0e0e4736"

# In CLI using kubelog or stern with grep:
stern "." --namespace production --since 1h | grep "4bf92f3577b34da6"

# In Loki with derived field (configured in Grafana data source):
# "Derived fields" → regex "trace_id=(\w+)" → internal link to Tempo

Cardinality & Cost Management

Log Volume Estimation

# Estimate current log volume per namespace (in Loki)
sum by (namespace) (bytes_rate({cluster="prod"} [1h]))

# Estimate bytes via Fluent Bit metrics
kubectl port-forward ds/fluent-bit 2020:2020 -n logging
curl -s localhost:2020/api/v1/metrics | jq '.output[] | {name:.name, bytes:.bytes}'

# Check Loki ingestion rate
curl -s http://loki-gateway:80/metrics | grep loki_distributor_bytes_received_total

Cost Reduction Strategies

Drop Debug/Trace Logs in Production

Add a Fluent Bit FILTER grep rule to drop level=debug or level=trace before forwarding. Can reduce volume by 60–80% for verbose services.

[FILTER]
    Name  grep
    Match kube.*
    Exclude log "\"level\":\"debug\""

Throttle Noisy Sources

Fluent Bit's throttle filter caps logs per interval per tag. Use rewrite_tag to separate chatty pods and apply different rate limits to them.

Sampling

For INFO logs from high-volume services, sample 10–20% using Fluent Bit's Lua filter or OTel Collector's probabilistic_sampler processor. Always forward 100% of WARN/ERROR.

Tiered Retention

Keep last 7 days in hot storage (fast SSD), 30 days in warm (standard), 365 days in cold (S3 Glacier). Loki's compactor + ILM policy handles tier movement automatically.

Dedup Before Ship

Avoid shipping the same log to both Loki and Elasticsearch. Use a single authoritative store; export compliance-required logs to S3 separately via a Fluent Bit S3 output with compression.

Compression

Enable gzip/snappy compression in Fluent Bit output and Loki storage. JSON logs typically compress 10:1. Loki uses snappy by default for chunks.

Fluent Bit Throttle Filter

[FILTER]
    Name          throttle
    Match         kube.*
    Rate          5000          # max 5,000 log records per interval
    Window        5             # 5-second window
    Print_Status  On            # log when throttling occurs

Sampling with OTel Collector

processors:
  filter/sample_info:
    logs:
      log_record:
        # Only keep ERROR/WARN at 100% — sample INFO at ~10%
        - 'severity_number < SEVERITY_NUMBER_WARN and (random() > 0.1)'

  # probabilistic sampler for log records
  probabilistic_sampler:
    sampling_percentage: 10
    attribute_source: record
    from_attribute: sampling_priority

Alerting on Logs

Loki supports alerting rules using LogQL metric expressions. Alerts are evaluated by the Loki ruler and routed to Alertmanager — the same Alertmanager used for Prometheus alerts. See 06-alerting.html for Alertmanager routing configuration.

Loki Alerting Rules

apiVersion: monitoring.coreos.com/v1alpha1
kind: PrometheusRule
metadata:
  name: loki-log-alerts
  namespace: monitoring
spec:
  groups:
    - name: log-error-rates
      interval: 1m
      rules:
        - alert: HighErrorLogRate
          expr: |
            sum by (namespace, app) (
              rate({cluster="prod"} | json | level="error" [5m])
            ) > 1
          for: 2m
          labels:
            severity: warning
            team: platform
          annotations:
            summary: "High error log rate in {{ $labels.namespace }}/{{ $labels.app }}"
            description: "Error rate is {{ $value | humanize }} errors/sec"
            runbook_url: "https://wiki/runbooks/high-error-log-rate"

        - alert: OOMKilledContainer
          expr: |
            count_over_time({cluster="prod"} |= "OOMKilled" [5m]) > 0
          labels:
            severity: critical
          annotations:
            summary: "Container OOMKilled detected"
            description: "A container was OOMKilled in the last 5 minutes"

        - alert: PanicInApplication
          expr: |
            sum by (namespace, app) (
              rate({cluster="prod"} |~ "panic:|PANIC|runtime error:" [5m])
            ) > 0
          labels:
            severity: critical
          annotations:
            summary: "Application panic detected in {{ $labels.namespace }}/{{ $labels.app }}"

        - alert: DatabaseConnectionErrors
          expr: |
            sum by (namespace, app) (
              rate({cluster="prod"} |= "connection refused" |= "database" [5m])
            ) > 0.5
          for: 3m
          labels:
            severity: warning
          annotations:
            summary: "Database connection errors in {{ $labels.namespace }}/{{ $labels.app }}"

Log-Based SLO Burn Rate Alerts

# Error rate SLO: 99.5% of requests must succeed
# Derived from HTTP status logs rather than metrics

- alert: ErrorBudgetBurnRateFast
  expr: |
    (
      sum(rate({namespace="payments"} | json | http_status >= 500 [1h]))
      /
      sum(rate({namespace="payments"} | json [1h]))
    ) > (14.4 * 0.005)   # 14.4× budget burn rate for 1h window (page fast)
  labels:
    severity: critical
  annotations:
    summary: "Error budget burning fast for payments service"

Metrics, Alerts & Runbooks

Key Logging Infrastructure Metrics

MetricSourceAlert ThresholdMeaning
fluentbit_input_records_totalFluent BitRecords ingested from all sources
fluentbit_output_errors_totalFluent Bit>10/minFailed deliveries to backend
fluentbit_output_retried_records_totalFluent Bit>100/minRecords being retried (backend pressure)
loki_distributor_ingester_append_failures_totalLoki>0Ingestion failures in Loki
loki_ingester_wal_replay_activeLoki>0 for >5mIngester replaying WAL — possible crash recovery
loki_query_frontend_retriesLoki>10/minQuery retries — querier under pressure
loki_compactor_runs_totalLokiNot zero (should run regularly)Compaction health

Alert Rules

groups:
  - name: logging-infrastructure
    rules:
      - alert: FluentBitOutputErrors
        expr: rate(fluentbit_output_errors_total[5m]) > 0.1
        for: 5m
        labels: {severity: warning}
        annotations:
          summary: "Fluent Bit output errors — logs may be lost"
          runbook: "Check output plugin config and backend connectivity"

      - alert: LokiIngestionFailing
        expr: rate(loki_distributor_ingester_append_failures_total[5m]) > 0
        for: 3m
        labels: {severity: critical}
        annotations:
          summary: "Loki ingestion failures — check ingester health"

      - alert: LokiQueryLatencyHigh
        expr: histogram_quantile(0.99, loki_request_duration_seconds_bucket{route="/loki/api/v1/query_range"}) > 30
        for: 5m
        labels: {severity: warning}
        annotations:
          summary: "Loki query p99 latency > 30s — user queries degraded"

      - alert: FluentBitBufferDiskFull
        expr: (1 - node_filesystem_avail_bytes{mountpoint="/var/log/flb-storage"} / node_filesystem_size_bytes{mountpoint="/var/log/flb-storage"}) > 0.8
        for: 10m
        labels: {severity: warning}
        annotations:
          summary: "Fluent Bit disk buffer > 80% full on {{ $labels.node }}"

Runbooks

Fluent Bit Not Collecting Logs

  1. Check DaemonSet pod status: kubectl get ds fluent-bit -n logging
  2. Check pod logs: kubectl logs ds/fluent-bit -n logging --tail=50
  3. Verify hostPath mount: kubectl exec -n logging ds/fluent-bit -- ls /var/log/containers/
  4. Check RBAC: kubectl auth can-i list pods --as=system:serviceaccount:logging:fluent-bit
  5. Check Fluent Bit metrics: curl localhost:2020/api/v1/metrics/prometheus

Logs Missing in Loki

  1. Verify Fluent Bit output has no errors: check fluentbit_output_errors_total
  2. Test Loki endpoint directly: curl -v http://loki-gateway/loki/api/v1/labels
  3. Check Loki distributor logs: kubectl logs -l app=loki-distributed-distributor
  4. Verify label set is valid (no special chars, no high-cardinality labels)
  5. Check retention: query logs with a longer time range

Loki Query Timeout

  1. Add stream selectors to narrow scan: avoid {cluster="prod"} alone
  2. Reduce time range of query
  3. Check querier CPU/memory: kubectl top pod -n monitoring -l app=loki-distributed-querier
  4. Increase query_timeout in Loki config
  5. Add query sharding: enable query_shards in limits_config

Log Volume Spike

  1. Find top namespaces: LogQL topk(5, sum by(namespace)(bytes_rate({cluster="prod"}[5m])))
  2. Find chatty pods: topk(10, sum by(pod)(rate({cluster="prod"}[5m])))
  3. Apply throttle filter for that pod label
  4. Check for log loop (service logging its own log output)
  5. Drop debug logs from offending service

OOMKilled Log Collector

  1. Check log volume: sudden spike may have exhausted input buffer
  2. Reduce Mem_Buf_Limit to force disk spill earlier
  3. Increase memory limits in DaemonSet
  4. Add throttle filter before buffering
  5. Switch storage.type to filesystem to reduce memory pressure

Best Practices

  1. Always use structured JSON logging. Free-text logs require brittle regex parsers and cannot support field-level alerting, aggregation, or precise filtering at scale.
  2. Always inject trace_id and span_id into log records. This enables one-click navigation from a log line to the originating distributed trace in Grafana (Loki → Tempo derived fields).
  3. Keep Loki label cardinality low. Never use pod name, user ID, request ID, or any unbounded value as a Loki label. These belong in the log body, queryable via | json pipeline expressions.
  4. Buffer to disk in Fluent Bit. Set storage.type filesystem and configure a hostPath volume. In-memory buffering loses logs on node memory pressure or Fluent Bit OOMKill.
  5. Exclude health-check and metrics-scrape logs. kubectl liveness probes and Prometheus scrapes generate thousands of lines per hour with zero diagnostic value. Drop them in the Fluent Bit grep filter.
  6. Set per-namespace retention policies in Loki. Production namespaces may need 90-day retention for compliance; dev/staging namespaces can use 7-day retention. Use Loki's per-tenant override config.
  7. Separate log streams by severity in alerting. Always forward 100% of ERROR and WARN logs. Apply sampling (10–20%) only to INFO logs from high-volume services. Never sample FATAL or CRITICAL.
  8. Test log collection in your deployment pipeline. Run a canary pod that emits known log patterns and write a test asserting those patterns appear in the logging backend within 60 seconds of deployment.
Coverage Details
  • Kubernetes logging architecture: stdout/stderr → CRI → node files
  • CRI log format (/var/log/containers/, /var/log/pods/)
  • Docker vs containerd log format differences
  • kubelet log rotation flags (container-log-max-size, max-files)
  • Four log collection patterns: DaemonSet / sidecar / streaming sidecar / direct SDK
  • Pattern comparison table (overhead, app changes, file logging)
  • Structured logging: standard field schema (15 fields)
  • Go: Zap logger with trace context injection
  • Java: Logback + logstash-logback-encoder JSON config
  • Python: structlog with contextvars for trace injection
  • Node.js: pino reference
  • Fluent Bit vs Fluentd comparison table
  • Fluent Bit pipeline model: INPUT → PARSER → FILTER → BUFFER → OUTPUT
  • Fluent Bit Helm install command
  • Production Fluent Bit ConfigMap (filesystem storage, kubernetes filter, Loki output)
  • Fluent Bit parsers.conf (docker, CRI, JSON parsers)
  • Fluent Bit DaemonSet YAML (tolerations, hostPath volumes, RBAC)
  • Multiline log handling (built-in parsers + custom MULTILINE_PARSER)
  • Loki architecture: distributor/ingester/compactor/querier/store gateway
  • Label cardinality warning (anti-patterns: pod_name, user_id, request_id)
  • Recommended Loki label set
  • Loki Helm install (single-binary and distributed modes)
  • Loki distributed values YAML (TSDB schema v13, S3, retention, limits_config)
  • LogQL: log stream selectors, label matchers, regex match
  • LogQL pipeline: line filters, JSON/logfmt/pattern parsers, label filters, line_format, label_format
  • LogQL metric queries: rate, count_over_time, quantile_over_time, bytes_rate, unwrap
  • Useful operational LogQL queries (error trace, OOMKilled, slow queries, top errors)
  • Loki recording rules via PrometheusRule CRD
  • Elasticsearch/OpenSearch: ECK operator install and cluster YAML (hot/warm tiers)
  • Index Lifecycle Management (ILM) JSON: hot/warm/cold/delete phases
  • Elasticsearch vs Loki comparison table
  • OTel Collector filelog receiver config (CRI parser, K8s path regex, JSON merge)
  • OTel Collector k8sattributes processor for metadata enrichment
  • OTel Log data model fields (Timestamp, SeverityNumber, Body, TraceId, SpanId, Resource)
  • Kubernetes component logs: kube-apiserver, kubelet, containerd, etcd commands
  • klog verbosity levels (--v 0–10)
  • kubectl logs reference: --previous, --since, --tail, --follow, label selector, deployment
  • Stern multi-pod log streaming (install, usage patterns, output templates)
  • Log correlation via trace_id (Loki LogQL, Grafana derived fields)
  • Log volume estimation with LogQL bytes_rate and Fluent Bit metrics API
  • Cost reduction strategies: drop debug logs, throttle filter, sampling, tiered retention, dedup, compression
  • Fluent Bit throttle filter and OTel probabilistic_sampler
  • Loki alerting rules via PrometheusRule: HighErrorLogRate, OOMKilled, Panic, DB errors
  • Log-based SLO burn rate alert pattern
  • 7 logging infrastructure metrics table with alert thresholds
  • 4 PrometheusRule alert rules (FluentBitOutputErrors, LokiIngestionFailing, QueryLatencyHigh, BufferDiskFull)
  • 5 runbooks (not collecting, missing in Loki, query timeout, volume spike, OOMKilled collector)
  • 8 best practices (structured JSON, trace injection, label cardinality, disk buffer, health exclusion, retention, sampling policy, pipeline testing)