Logging in Kubernetes

Complete guide to log collection, structured logging, Fluent Bit, Loki/LogQL, Elasticsearch, OpenTelemetry logs, and production log management at scale.

Kubernetes Logging Architecture

Kubernetes does not provide a built-in log aggregation or persistence system. Containers write to stdout/stderr; the container runtime captures those streams and writes them to log files on the node. Kubernetes exposes these via kubectl logs but does not ship them anywhere. Cluster operators are responsible for collecting, forwarding, and storing logs.

Logs Are Ephemeral by Default

Pod logs exist only while the pod exists on the node, subject to log rotation. If a pod is evicted, rescheduled, or deleted, its logs disappear. A log collector DaemonSet must tail files in real-time to avoid data loss.

Node-Level Log File Paths

The container runtime writes logs at a location determined by its log driver. The kubelet creates stable symlinks:

┌─ Container stdout/stderr │ ▼ Container Runtime (containerd / CRI-O) │ writes to: /var/log/pods/<ns>_<pod>_<uid>/<container>/<n>.log │ ├── Kubelet creates symlinks: │ /var/log/containers/<pod>_<ns>_<container>-<id>.log │ → /var/log/pods/.../<n>.log │ ├── Log rotation managed by kubelet: │ --container-log-max-size=10Mi (default 10Mi per file) │ --container-log-max-files=5 (default 5 rotated files) │ └── kubectl logs reads from: /var/log/pods/... via kubelet API

CRI Log Format

Since Kubernetes 1.14, the CRI log format (used by containerd and CRI-O) prefixes each line with metadata:

# Format: <timestamp> <stream> <flags> <log message>
2024-01-15T10:23:45.123456789Z stdout F {"level":"info","msg":"server started","port":8080}
2024-01-15T10:23:45.124000000Z stderr F Error: connection refused to db:5432
# Flags: F = full line, P = partial line (multiline handling needed)

Docker vs containerd Log Formats

Docker's default json-file driver wraps each log line in JSON: {"log":"...\n","stream":"stdout","time":"..."}. containerd uses the CRI format above. Fluent Bit parsers must match the actual runtime in your cluster — check with kubectl get nodes -o wide for the container runtime version.

Log Retention on Node

kubelet Flag	Default	Effect
`--container-log-max-size`	10Mi	Max size before rotation
`--container-log-max-files`	5	Number of rotated files to keep
`--node-status-update-frequency`	10s	Affects log availability window
Total per-container storage	~50Mi	max-size × max-files

Log Collection Patterns

There are three fundamental architectures for log collection in Kubernetes. Most production clusters use a combination of patterns 1 and 2.

1. Node-Level Agent (DaemonSet)

A log collector runs as a DaemonSet on every node, tailing /var/log/containers/. One collector per node — low overhead, zero application changes required. Works for any container writing to stdout/stderr.

Recommended for most workloads

2. Sidecar Container

A dedicated log-shipper container runs in the same pod as the application, reading logs from a shared volume. Required when the application writes to files rather than stdout, or when per-application routing logic is needed.

Use sparingly — resource overhead per pod

3. Sidecar Streaming

A sidecar reads from the application log file and re-emits to stdout. Useful for legacy applications that cannot be modified to write to stdout. The node-level agent then picks up the sidecar's stdout.

Legacy apps only

4. Direct-to-Backend

The application SDK ships logs directly to the logging backend (e.g., Datadog Agent via UDP, Loki HTTP API). Bypasses the node filesystem entirely. Requires code changes; provides richest metadata but no fallback if backend unreachable.

Cloud-native apps

Pattern Comparison

Pattern	Resource Overhead	App Changes	File Logging	Per-Pod Routing	Data Loss Risk
Node-Level DaemonSet	Low (1 agent/node)	None	No (stdout only)	Limited	If pod deleted before tail
Sidecar Shipper	High (1 sidecar/pod)	Shared volume	Yes	Yes	Low (direct ship)
Sidecar Streaming	Medium	Shared volume	Yes	No	Same as DaemonSet
Direct SDK	None extra	Yes	Yes	Full	If backend unreachable

Structured Logging

Structured logs emit machine-parseable records (JSON) rather than free-text strings. This enables log backends to index fields, run aggregations, and support precise query predicates — critical for high-volume environments.

Standard Log Fields

Field	Type	Purpose	Example
`timestamp`	RFC3339	Event time (UTC)	`"2024-01-15T10:23:45.123Z"`
`level`	string	Severity	`"info"`, `"error"`, `"warn"`
`message`	string	Human-readable summary	`"request completed"`
`service`	string	Application name	`"order-service"`
`version`	string	App version/build	`"v2.4.1"`
`trace_id`	string (hex)	W3C TraceContext — links to trace	`"4bf92f3577b34da6..."`
`span_id`	string (hex)	Span within trace	`"00f067aa0ba902b7"`
`pod`	string	Kubernetes pod name	`"order-7d5f9-xk2pq"`
`namespace`	string	Kubernetes namespace	`"production"`
`node`	string	Node hosting the pod	`"ip-10-0-1-45"`
`error`	string / object	Error details when level=error	`"connection refused"`
`duration_ms`	float	Request/operation duration	`34.7`
`http.method`	string	HTTP verb (OTel semantic conventions)	`"POST"`
`http.status_code`	int	HTTP response code	`200`

Go: Zap Structured Logger

package main

import (
    "go.uber.org/zap"
    "go.opentelemetry.io/otel/trace"
)

func setupLogger() *zap.Logger {
    cfg := zap.NewProductionConfig()
    cfg.EncoderConfig.TimeKey = "timestamp"
    cfg.EncoderConfig.MessageKey = "message"
    logger, _ := cfg.Build()
    return logger
}

// Inject trace context into every log record
func loggerFromSpan(base *zap.Logger, span trace.Span) *zap.Logger {
    sc := span.SpanContext()
    return base.With(
        zap.String("trace_id", sc.TraceID().String()),
        zap.String("span_id", sc.SpanID().String()),
        zap.Bool("trace_sampled", sc.IsSampled()),
    )
}

func handleRequest(logger *zap.Logger, span trace.Span, method, path string) {
    log := loggerFromSpan(logger, span)
    log.Info("request received",
        zap.String("http.method", method),
        zap.String("http.path", path),
    )
}

// Output:
// {"timestamp":"2024-01-15T10:23:45.123Z","level":"info","message":"request received",
//  "http.method":"POST","http.path":"/orders","trace_id":"4bf92f3577b34da6...","span_id":"00f067aa0ba902b7"}

Java: Logback JSON Encoder (Logstash)

<!-- pom.xml dependency -->
<dependency>
  <groupId>net.logstash.logback</groupId>
  <artifactId>logstash-logback-encoder</artifactId>
  <version>7.4</version>
</dependency>

<!-- logback-spring.xml -->
<configuration>
  <appender name="JSON" class="ch.qos.logback.core.ConsoleAppender">
    <encoder class="net.logstash.logback.encoder.LogstashEncoder">
      <fieldNames>
        <timestamp>timestamp</timestamp>
        <message>message</message>
        <logger>logger</logger>
      </fieldNames>
      <includeMdcKeyName>trace_id</includeMdcKeyName>
      <includeMdcKeyName>span_id</includeMdcKeyName>
    </encoder>
  </appender>
  <root level="INFO">
    <appender-ref ref="JSON"/>
  </root>
</configuration>

Python: structlog

import structlog
import logging

structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,   # thread-local context (trace_id etc.)
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso", utc=True, key="timestamp"),
        structlog.processors.dict_tracebacks,
        structlog.processors.JSONRenderer(),
    ],
    logger_factory=structlog.PrintLoggerFactory(),
)

log = structlog.get_logger()

# Bind request context once per request:
structlog.contextvars.bind_contextvars(
    trace_id="4bf92f3577b34da6a3ce929d0e0e4736",
    span_id="00f067aa0ba902b7",
    service="order-service",
)

log.info("order created", order_id=12345, amount=99.50)
# {"timestamp":"2024-01-15T10:23:45.123Z","level":"info","event":"order created",
#  "order_id":12345,"amount":99.5,"trace_id":"4bf92f3577b34da6...","service":"order-service"}

Node.js: pino

pino is the standard for Node.js structured logging: const log = pino({ level: 'info', timestamp: pino.stdTimeFunctions.isoTime }). Use pino-http middleware to automatically log HTTP requests with duration and status code.

Fluent Bit

Fluent Bit is the preferred lightweight log collector for Kubernetes (written in C, ~700KB binary, ~30MB RSS). It replaces Fluentd in most modern deployments due to its lower resource footprint. Fluent Bit reads CRI log files, enriches them with Kubernetes metadata via the API, and forwards to one or more backends.

Fluent Bit vs Fluentd

Aspect	Fluent Bit	Fluentd
Language	C	Ruby
Memory footprint	~30–50MB	~200–600MB
Plugin ecosystem	~100 plugins	~1,000 plugins
Performance	Higher (native)	Lower (Ruby GIL)
Complex routing	Limited	Excellent
Use case	Edge/DaemonSet collector	Aggregator / heavy processing
CNCF status	Graduated	Graduated

Fluent Bit Pipeline Model

INPUT (tail /var/log/containers/*.log) │ ▼ PARSER (containerd / docker / json / regex) │ ▼ FILTER: kubernetes ← calls kubelet API for pod/namespace/labels/annotations │ ▼ FILTER: modify / nest / rewrite_tag / throttle / lua │ ▼ BUFFER (filesystem or memory) │ ▼ OUTPUT → Loki / Elasticsearch / S3 / Kafka / Splunk / CloudWatch / multiple

Helm Install

helm repo add fluent https://fluent.github.io/helm-charts
helm repo update

helm upgrade --install fluent-bit fluent/fluent-bit \
  --namespace logging \
  --create-namespace \
  --values fluentbit-values.yaml

Production ConfigMap (Loki backend)

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         5
        Daemon        Off
        Log_Level     warn
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020
        storage.type  filesystem              # buffer to disk (survive node pressure)
        storage.path  /var/log/flb-storage/
        storage.sync  normal
        storage.checksum Off
        storage.max_chunks_up 128

    [INPUT]
        Name              tail
        Path              /var/log/containers/*.log
        # Exclude fluent-bit's own logs and kube-system noise
        Exclude_Path      /var/log/containers/fluent-bit*,/var/log/containers/*_kube-system_*
        multiline.parser  docker, cri          # handles both runtimes
        Tag               kube.*
        Refresh_Interval  5
        Rotate_Wait       30
        Mem_Buf_Limit     64MB
        Skip_Long_Lines   On
        DB                /var/log/flb-storage/tail.db  # position tracking
        DB.sync           normal
        Ignore_Older      24h

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           On               # parse nested JSON from container log field
        Keep_Log            Off              # drop original 'log' key after merge
        Merge_Log_Key       app              # merged fields under 'app.' prefix
        K8S-Logging.Parser  On              # respect pod annotation for custom parser
        K8S-Logging.Exclude On              # respect annotation to exclude pod
        Labels              On
        Annotations         Off             # annotations add cardinality — disable unless needed

    # Drop noisy health/readiness probe logs
    [FILTER]
        Name   grep
        Match  kube.*
        Exclude log /healthz|/readyz|/livez|/metrics

    # Add cluster identifier
    [FILTER]
        Name   modify
        Match  kube.*
        Add    cluster prod-us-east-1

    [OUTPUT]
        Name            loki
        Match           kube.*
        Host            loki-gateway.monitoring.svc.cluster.local
        Port            80
        Labels          job=fluent-bit,cluster=$cluster,namespace=$kubernetes_namespace_name,pod=$kubernetes_pod_name,container=$kubernetes_container_name
        Label_Keys      level,severity
        Remove_Keys     kubernetes,stream
        Retry_Limit     False             # retry indefinitely (disk-buffered)
        Workers         4

  parsers.conf: |
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name        cri
        Format      regex
        Regex       ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<flags>[^ ]*) (?<log>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z

    [PARSER]
        Name        json
        Format      json
        Time_Key    timestamp
        Time_Format %Y-%m-%dT%H:%M:%S.%LZ

Fluent Bit DaemonSet Resource Requirements

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
      annotations:
        fluentbit.io/exclude: "true"     # exclude own logs from collection
    spec:
      serviceAccountName: fluent-bit
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          effect: NoSchedule
        - key: node.kubernetes.io/not-ready
          effect: NoSchedule
      containers:
        - name: fluent-bit
          image: cr.fluentbit.io/fluent/fluent-bit:3.2.0
          resources:
            requests:
              cpu: 50m
              memory: 64Mi
            limits:
              cpu: 200m
              memory: 256Mi
          volumeMounts:
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: config
              mountPath: /fluent-bit/etc
            - name: storage
              mountPath: /var/log/flb-storage
          ports:
            - name: http
              containerPort: 2020
          livenessProbe:
            httpGet:
              path: /
              port: http
            initialDelaySeconds: 10
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: config
          configMap:
            name: fluent-bit-config
        - name: storage
          hostPath:
            path: /var/lib/fluent-bit
            type: DirectoryOrCreate
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluent-bit
rules:
  - apiGroups: [""]
    resources: [pods, namespaces, nodes, nodes/proxy]
    verbs: [get, list, watch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: fluent-bit
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluent-bit
subjects:
  - kind: ServiceAccount
    name: fluent-bit
    namespace: logging

Multiline Log Handling

Stack traces (Java, Python) and structured exceptions span multiple lines. The CRI format uses the P (partial) flag for continuation lines, but applications may not use it. Fluent Bit supports multiline via the multiline.parser option or dedicated multiline filter:

[FILTER]
    Name            multiline
    Match           kube.*
    multiline.key_content  log
    multiline.parser       java    # built-in: java, go, python, ruby, docker

# Custom multiline rule (e.g., Python traceback)
[MULTILINE_PARSER]
    name          python_custom
    type          regex
    flush_timeout 2000         # flush after 2s of inactivity
    rule "start_state" "/^[^\s]/" "cont"
    rule "cont"        "/^\s/"   "cont"

Loki & LogQL

Grafana Loki is a horizontally scalable, highly available log aggregation system. Unlike Elasticsearch, Loki does not index the contents of log lines — only the metadata labels. This makes it extremely cost-efficient. Log content is compressed and stored in object storage (S3, GCS, Azure Blob).

Loki Architecture

Fluent Bit / Promtail / OTel Collector │ HTTP POST /loki/api/v1/push ▼ ┌─────────────────────────────────────────────────────┐ │ Loki Cluster │ │ │ │ Distributor → Ingester → Compactor │ │ (hash ring, (WAL, (index/chunk │ │ validation) chunks) compaction) │ │ │ │ Querier ← Query Frontend ← Grafana │ │ │ (caching, │ │ │ sharding) │ │ ▼ │ │ Store Gateway ← Object Store (S3/GCS/Azure Blob) │ │ (chunk index) + Index Store (BoltDB / tsdb) │ └─────────────────────────────────────────────────────┘

Label Cardinality Warning

Loki indexes labels, not log content. High-cardinality labels (e.g., pod_name with thousands of values, user_id, request_id) cause index explosion, query performance degradation, and excessive memory use in ingesters. Keep label count <10, cardinality per label <1,000. Put high-cardinality values in the log body and use |= / | json filter expressions to query them.

Recommended Label Set

cluster="prod-us-east-1"    # static — low cardinality
namespace="payments"         # ~10–100 values
pod="payments-api-7d5f9-xk"  # AVOID: high cardinality — use for debugging only
container="payments-api"     # ~1–5 per namespace
app="payments-api"           # from Kubernetes labels
level="error"                # from log body parsing

Loki Helm Install

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# Single-binary mode (dev/small clusters)
helm upgrade --install loki grafana/loki \
  --namespace monitoring \
  --set deploymentMode=SingleBinary \
  --set loki.storage.type=s3 \
  --set loki.storage.s3.region=us-east-1 \
  --set loki.storage.s3.bucketnames=my-loki-chunks \
  --set loki.auth_enabled=false \
  --set singleBinary.replicas=3

# Distributed mode (production)
helm upgrade --install loki grafana/loki-distributed \
  --namespace monitoring \
  --values loki-distributed-values.yaml

Loki Distributed Values (Production)

# loki-distributed-values.yaml
loki:
  auth_enabled: true
  commonConfig:
    replication_factor: 3
  storage:
    type: s3
    s3:
      region: us-east-1
      bucketnames: prod-loki-chunks
      insecure: false
  schemaConfig:
    configs:
      - from: "2024-01-01"
        store: tsdb              # TSDB index (Loki 2.8+)
        object_store: s3
        schema: v13
        index:
          prefix: loki_index_
          period: 24h
  limits_config:
    ingestion_rate_mb: 64
    ingestion_burst_size_mb: 128
    max_label_names_per_series: 15
    max_label_value_length: 2048
    max_streams_per_user: 50000
    max_chunks_per_query: 2000000
    retention_period: 744h       # 31 days
    per_tenant_override_config: /etc/loki/overrides.yaml

ingester:
  replicas: 3
  resources:
    requests: {cpu: 500m, memory: 1Gi}
    limits: {cpu: 2, memory: 4Gi}

distributor:
  replicas: 3
  resources:
    requests: {cpu: 200m, memory: 256Mi}

querier:
  replicas: 3
  resources:
    requests: {cpu: 500m, memory: 512Mi}
    limits: {cpu: 2, memory: 2Gi}

queryFrontend:
  replicas: 2

compactor:
  enabled: true
  resources:
    requests: {cpu: 200m, memory: 256Mi}

LogQL Reference

LogQL is Loki's query language. It has two types: log queries (return log lines) and metric queries (return time series derived from log lines).

Log Stream Selectors

# Select all logs from namespace=payments
{namespace="payments"}

# Multiple label matchers
{cluster="prod", namespace="payments", container="api"}

# Regex match on label value
{namespace=~"payments|orders"}

# Negative match
{namespace!="kube-system"}

Log Pipeline Expressions

# Line filter — substring match (fastest)
{namespace="payments"} |= "error"
{namespace="payments"} != "health"

# Line filter — regex
{namespace="payments"} |~ "ERROR|FATAL"
{namespace="payments"} !~ "GET /health|GET /metrics"

# JSON parser — extract fields from JSON log body
{namespace="payments"} | json

# JSON parser with field aliases
{namespace="payments"} | json level="level", traceId="trace_id"

# Logfmt parser (key=value format)
{namespace="payments"} | logfmt

# Pattern parser (positional)
{namespace="payments"} | pattern `<_>    `

# Label filter — after parsing
{namespace="payments"} | json | level = "error"
{namespace="payments"} | json | status_code >= 500
{namespace="payments"} | json | duration_ms > 1000

# Line format — reshape output
{namespace="payments"} | json | line_format "{{.level}} {{.message}} traceId={{.trace_id}}"

# Label format — rename/derive labels
{namespace="payments"} | json | label_format service=app, lvl=level

Metric Queries

# Count error log rate per service (last 5m window)
sum by (app) (rate({namespace="production"} |= "error" [5m]))

# P99 request duration from logs (requires duration_ms field)
quantile_over_time(0.99,
  {namespace="payments"} | json | unwrap duration_ms [5m]
) by (app)

# Error rate as percentage of all logs
sum(rate({namespace="payments"} | json | level="error" [5m]))
  /
sum(rate({namespace="payments"} [5m]))

# Bytes ingested per namespace
sum by (namespace) (bytes_rate({cluster="prod"} [5m]))

# Log volume (lines per second) per service
sum by (app) (rate({namespace="production"} [1m]))

# Count distinct values (approx cardinality)
count_over_time({namespace="payments"} |= "user_id" | json | __error__="" [1h])

Useful Operational Queries

# Find all errors in last 15 minutes
{namespace="production"} | json | level="error" | line_format "{{.timestamp}} {{.service}} {{.message}}"

# Trace a request by trace_id
{cluster="prod"} | json | trace_id = "4bf92f3577b34da6a3ce929d0e0e4736"

# Find OOMKilled events across all namespaces
{cluster="prod"} |= "OOMKilled"

# Slow queries (> 5s)
{namespace="payments"} | json | duration_ms > 5000 | line_format "{{.timestamp}} {{.message}} dur={{.duration_ms}}ms"

# Count unique error messages (top 10)
topk(10, sum by (message) (count_over_time({namespace="payments"} | json | level="error" [1h])))

Loki Recording Rules

apiVersion: monitoring.coreos.com/v1alpha1
kind: PrometheusRule
metadata:
  name: loki-log-rates
  namespace: monitoring
spec:
  groups:
    - name: log-rates
      interval: 1m
      rules:
        - record: namespace:log_lines:rate5m
          expr: |
            sum by (namespace) (
              rate({cluster="prod"} [5m])
            )
        - record: namespace_app:log_errors:rate5m
          expr: |
            sum by (namespace, app) (
              rate({cluster="prod"} | json | level="error" [5m])
            )

Elasticsearch / OpenSearch

Elasticsearch (and its open-source fork OpenSearch) indexes every field in every log record, enabling full-text search and complex aggregations. This power comes at significantly higher resource cost (10–20× more storage and CPU than Loki for equivalent log volume).

ECK (Elastic Cloud on Kubernetes)

# Install ECK Operator
kubectl create -f https://download.elastic.co/downloads/eck/2.11.0/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/2.11.0/operator.yaml

# Production Elasticsearch cluster
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: prod-logs
  namespace: logging
spec:
  version: 8.12.0
  nodeSets:
    - name: hot
      count: 3
      config:
        node.roles: [master, data_hot, data_content, ingest]
        xpack.security.enabled: true
      podTemplate:
        spec:
          containers:
            - name: elasticsearch
              resources:
                requests: {cpu: 2, memory: 8Gi}
                limits: {cpu: 4, memory: 8Gi}
              env:
                - name: ES_JAVA_OPTS
                  value: "-Xms4g -Xmx4g"
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            storageClassName: gp3
            resources:
              requests:
                storage: 500Gi
    - name: warm
      count: 2
      config:
        node.roles: [data_warm]
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            storageClassName: gp2     # cheaper storage for warm tier
            resources:
              requests:
                storage: 2Ti

Index Lifecycle Management (ILM)

{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50gb",
            "max_age": "1d"
          },
          "set_priority": {"priority": 100}
        }
      },
      "warm": {
        "min_age": "3d",
        "actions": {
          "shrink": {"number_of_shards": 1},
          "forcemerge": {"max_num_segments": 1},
          "set_priority": {"priority": 50},
          "allocate": {"require": {"data": "warm"}}
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {"require": {"data": "cold"}},
          "set_priority": {"priority": 0},
          "freeze": {}
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {"delete": {}}
      }
    }
  }
}

Elasticsearch vs Loki Comparison

Aspect	Elasticsearch / OpenSearch	Grafana Loki
Indexing	Full inverted index of all fields	Labels only; log body compressed
Storage cost	High (10–20× raw log size)	Low (2–5× compressed)
Full-text search	Excellent (analyzed text fields)	Via line filter (regex scan)
Query language	Lucene / KQL / EQL	LogQL
Scalability	Complex (shard management)	Simpler (object store backed)
Operational complexity	High (JVM tuning, ILM, shards)	Medium
Grafana integration	Via data source	Native (first-class)
Best for	Security events, full-text search, compliance	Application logs, cost-sensitive, correlated observability

OpenTelemetry Logs

The OpenTelemetry Collector can act as a log collection and forwarding pipeline, replacing Fluent Bit or running alongside it. It enables unified signal collection — the same collector handles metrics, traces, and logs.

OTel Collector Log Pipeline

# OTel Collector config for log collection
receivers:
  filelog:
    include:
      - /var/log/containers/*.log
    exclude:
      - /var/log/containers/otelcol*
    start_at: beginning
    include_file_path: true
    include_file_name: false
    operators:
      # Parse CRI log format
      - type: regex_parser
        id: parse_cri
        regex: '^(?P<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{9}Z)\s(?P<stream>stdout|stderr)\s(?P<logtag>[^ ]*)\s(?P<log>.*)$'
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
      # Extract Kubernetes metadata from file path
      - type: regex_parser
        id: parse_k8s
        regex: '\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[^\/]+)\/(?P<container_name>[^\/]+)\/'
        parse_from: attributes["log.file.path"]
      # Try to parse JSON log body
      - type: json_parser
        id: parse_json
        parse_from: attributes.log
        if: 'attributes.log matches "^\\{"'
        on_error: send
      # Move log to body
      - type: move
        from: attributes.log
        to: body

  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 5s
    send_batch_size: 8192
  memory_limiter:
    limit_mib: 512
    spike_limit_mib: 128
  k8sattributes:        # enrich with K8s metadata
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.node.name
    pod_association:
      - sources:
          - from: resource_attribute
            name: k8s.pod.uid
  resource:
    attributes:
      - key: cluster
        value: "prod-us-east-1"
        action: insert

exporters:
  loki:
    endpoint: http://loki-gateway.monitoring.svc/loki/api/v1/push
    default_labels_enabled:
      exporter: false
      job: true
    labels:
      resource:
        cluster: ""
        k8s.namespace.name: ""
        k8s.pod.name: ""
        k8s.container.name: ""
  otlp/tempo:           # also forward to Tempo for log→trace correlation
    endpoint: tempo.monitoring.svc:4317

service:
  pipelines:
    logs:
      receivers: [filelog, otlp]
      processors: [memory_limiter, k8sattributes, resource, batch]
      exporters: [loki, otlp/tempo]

OTel Log Data Model

The OpenTelemetry log data model provides a standard schema that maps to existing formats:

OTel Field	Description	Maps to (Loki)
`Timestamp`	Event time (nanoseconds)	Log timestamp
`SeverityNumber`	1–24 (TRACE, DEBUG, INFO, WARN, ERROR, FATAL)	Label `level`
`SeverityText`	Original severity string	Label `severity`
`Body`	Log message (any type)	Log line
`TraceId`	W3C trace ID (16 bytes)	Indexed attribute
`SpanId`	W3C span ID (8 bytes)	Indexed attribute
`Attributes`	Key-value pairs (structured fields)	Parsed fields
`Resource`	Origin resource (service, host, k8s)	Labels

Kubernetes Component Logs

Control-plane component logs are critical for diagnosing cluster-level issues. Their collection differs from application logs because they may run as static pods, system services, or managed cloud services.

kube-apiserver

# Static pod — logs via kubectl
kubectl -n kube-system logs kube-apiserver-<node> --tail=100 --since=1h

# Increase verbosity temporarily (restart required for static pods)
# Edit /etc/kubernetes/manifests/kube-apiserver.yaml and add --v=4

# Key log patterns to watch:
# "too many requests" — client throttling
# "Timeout: request did not complete" — etcd latency
# "DENY" — admission controller rejection
grep -i "too many requests\|timeout\|DENY" /var/log/kubernetes/apiserver.log

kubelet

# kubelet runs as a systemd service (not a pod)
journalctl -u kubelet -n 500 --since "1 hour ago"
journalctl -u kubelet -f    # follow

# Increase kubelet verbosity (reload required)
# Edit /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
# Add --v=4 to KUBELET_EXTRA_ARGS

# Key kubelet log patterns:
# "orphaned pod" — stale pod directories
# "failed to get container info" — runtime disconnected
# "evicted" — resource pressure eviction
journalctl -u kubelet | grep -i "evict\|oom\|failed to\|error"

containerd

journalctl -u containerd -n 200

# containerd log level (edit /etc/containerd/config.toml)
[debug]
  level = "warn"   # info = verbose, warn = quieter

# Key patterns:
# "failed to pull image" — registry auth or network
# "failed to create shim" — runc error (OCI runtime issue)
# "context deadline exceeded" — slow image pulls

etcd

kubectl -n kube-system logs etcd-<node> --tail=100

# Key patterns:
# "slow fdatasync" — disk I/O issue (use SSD with <1ms fsync)
# "leader changed" — leader election event (check disk/network)
# "failed to send message" — inter-etcd network issue
# "applying snapshot" — member catching up (acceptable on rejoin)

# etcd verbosity (add to manifest):
# --log-level=warn   (options: debug, info, warn, error, panic, fatal)

Control-Plane Log Verbosity Levels

--v Level	Content	Use When
0	Always-visible messages (errors, panics)	Normal production
1	Basic high-level info	Normal production
2	Steady-state changes, reconciliation loops	Default for most components
3	Extended info, informational changes	Light debugging
4	Debug-level logging	Active debugging session
5–6	Verbose — each API call logged	Deep debugging (high volume)
7–10	Extremely verbose	Core development only

kubectl logs & Tooling

kubectl logs Reference

# Basic usage
kubectl logs <pod>
kubectl logs <pod> -c <container>          # specific container in multi-container pod
kubectl logs <pod> --previous              # logs from the previous (crashed) container

# Streaming
kubectl logs -f <pod>                      # follow (like tail -f)
kubectl logs -f deployment/<name>          # follow deployment (picks one pod)

# Filtering by time
kubectl logs <pod> --since=1h             # last 1 hour
kubectl logs <pod> --since-time="2024-01-15T10:00:00Z"

# Limit output
kubectl logs <pod> --tail=100             # last 100 lines

# All pods matching label selector
kubectl logs -l app=payments --all-containers=true --prefix=true

# All containers in all pods of a deployment
kubectl logs -f deployment/payments-api --all-containers=true --max-log-requests=10

Stern — Multi-Pod Log Streaming

# Install
brew install stern     # macOS
# or: https://github.com/stern/stern/releases

# Follow all pods matching regex in any namespace
stern "payments.*" --namespace production

# Multiple namespaces
stern "api" --namespace production --namespace staging

# Filter by container name
stern "." --container "main" --namespace production

# Tail last N lines then follow
stern "payments" --tail 50

# Output as JSON with pod prefix
stern "payments" --output json --namespace production

# Filter with grep
stern "payments" | grep "error"

# Custom output template
stern "payments" --template '{{.PodName}} {{.ContainerName}} {{.Message}}'

k9s Log Viewer

k9s provides a TUI (terminal UI) for Kubernetes. Press l on a pod to view logs, / to filter, s to save to file.

Log Correlation via Trace ID

# Find all logs for a specific trace (across all services)
# In Grafana: Explore → Loki → LogQL:
{cluster="prod"} | json | trace_id = "4bf92f3577b34da6a3ce929d0e0e4736"

# In CLI using kubelog or stern with grep:
stern "." --namespace production --since 1h | grep "4bf92f3577b34da6"

# In Loki with derived field (configured in Grafana data source):
# "Derived fields" → regex "trace_id=(\w+)" → internal link to Tempo

Cardinality & Cost Management

Log Volume Estimation

# Estimate current log volume per namespace (in Loki)
sum by (namespace) (bytes_rate({cluster="prod"} [1h]))

# Estimate bytes via Fluent Bit metrics
kubectl port-forward ds/fluent-bit 2020:2020 -n logging
curl -s localhost:2020/api/v1/metrics | jq '.output[] | {name:.name, bytes:.bytes}'

# Check Loki ingestion rate
curl -s http://loki-gateway:80/metrics | grep loki_distributor_bytes_received_total

Cost Reduction Strategies

Drop Debug/Trace Logs in Production

Add a Fluent Bit FILTER grep rule to drop level=debug or level=trace before forwarding. Can reduce volume by 60–80% for verbose services.

[FILTER]
    Name  grep
    Match kube.*
    Exclude log "\"level\":\"debug\""

Throttle Noisy Sources

Fluent Bit's throttle filter caps logs per interval per tag. Use rewrite_tag to separate chatty pods and apply different rate limits to them.

Sampling

For INFO logs from high-volume services, sample 10–20% using Fluent Bit's Lua filter or OTel Collector's probabilistic_sampler processor. Always forward 100% of WARN/ERROR.

Tiered Retention

Keep last 7 days in hot storage (fast SSD), 30 days in warm (standard), 365 days in cold (S3 Glacier). Loki's compactor + ILM policy handles tier movement automatically.

Dedup Before Ship

Avoid shipping the same log to both Loki and Elasticsearch. Use a single authoritative store; export compliance-required logs to S3 separately via a Fluent Bit S3 output with compression.

Compression

Enable gzip/snappy compression in Fluent Bit output and Loki storage. JSON logs typically compress 10:1. Loki uses snappy by default for chunks.

Fluent Bit Throttle Filter

[FILTER]
    Name          throttle
    Match         kube.*
    Rate          5000          # max 5,000 log records per interval
    Window        5             # 5-second window
    Print_Status  On            # log when throttling occurs

Sampling with OTel Collector

processors:
  filter/sample_info:
    logs:
      log_record:
        # Only keep ERROR/WARN at 100% — sample INFO at ~10%
        - 'severity_number < SEVERITY_NUMBER_WARN and (random() > 0.1)'

  # probabilistic sampler for log records
  probabilistic_sampler:
    sampling_percentage: 10
    attribute_source: record
    from_attribute: sampling_priority

Alerting on Logs

Loki supports alerting rules using LogQL metric expressions. Alerts are evaluated by the Loki ruler and routed to Alertmanager — the same Alertmanager used for Prometheus alerts. See 06-alerting.html for Alertmanager routing configuration.

Loki Alerting Rules

apiVersion: monitoring.coreos.com/v1alpha1
kind: PrometheusRule
metadata:
  name: loki-log-alerts
  namespace: monitoring
spec:
  groups:
    - name: log-error-rates
      interval: 1m
      rules:
        - alert: HighErrorLogRate
          expr: |
            sum by (namespace, app) (
              rate({cluster="prod"} | json | level="error" [5m])
            ) > 1
          for: 2m
          labels:
            severity: warning
            team: platform
          annotations:
            summary: "High error log rate in {{ $labels.namespace }}/{{ $labels.app }}"
            description: "Error rate is {{ $value | humanize }} errors/sec"
            runbook_url: "https://wiki/runbooks/high-error-log-rate"

        - alert: OOMKilledContainer
          expr: |
            count_over_time({cluster="prod"} |= "OOMKilled" [5m]) > 0
          labels:
            severity: critical
          annotations:
            summary: "Container OOMKilled detected"
            description: "A container was OOMKilled in the last 5 minutes"

        - alert: PanicInApplication
          expr: |
            sum by (namespace, app) (
              rate({cluster="prod"} |~ "panic:|PANIC|runtime error:" [5m])
            ) > 0
          labels:
            severity: critical
          annotations:
            summary: "Application panic detected in {{ $labels.namespace }}/{{ $labels.app }}"

        - alert: DatabaseConnectionErrors
          expr: |
            sum by (namespace, app) (
              rate({cluster="prod"} |= "connection refused" |= "database" [5m])
            ) > 0.5
          for: 3m
          labels:
            severity: warning
          annotations:
            summary: "Database connection errors in {{ $labels.namespace }}/{{ $labels.app }}"

Log-Based SLO Burn Rate Alerts

# Error rate SLO: 99.5% of requests must succeed
# Derived from HTTP status logs rather than metrics

- alert: ErrorBudgetBurnRateFast
  expr: |
    (
      sum(rate({namespace="payments"} | json | http_status >= 500 [1h]))
      /
      sum(rate({namespace="payments"} | json [1h]))
    ) > (14.4 * 0.005)   # 14.4× budget burn rate for 1h window (page fast)
  labels:
    severity: critical
  annotations:
    summary: "Error budget burning fast for payments service"

Metrics, Alerts & Runbooks

Key Logging Infrastructure Metrics

Metric	Source	Alert Threshold	Meaning
`fluentbit_input_records_total`	Fluent Bit	—	Records ingested from all sources
`fluentbit_output_errors_total`	Fluent Bit	>10/min	Failed deliveries to backend
`fluentbit_output_retried_records_total`	Fluent Bit	>100/min	Records being retried (backend pressure)
`loki_distributor_ingester_append_failures_total`	Loki	>0	Ingestion failures in Loki
`loki_ingester_wal_replay_active`	Loki	>0 for >5m	Ingester replaying WAL — possible crash recovery
`loki_query_frontend_retries`	Loki	>10/min	Query retries — querier under pressure
`loki_compactor_runs_total`	Loki	Not zero (should run regularly)	Compaction health

Alert Rules

groups:
  - name: logging-infrastructure
    rules:
      - alert: FluentBitOutputErrors
        expr: rate(fluentbit_output_errors_total[5m]) > 0.1
        for: 5m
        labels: {severity: warning}
        annotations:
          summary: "Fluent Bit output errors — logs may be lost"
          runbook: "Check output plugin config and backend connectivity"

      - alert: LokiIngestionFailing
        expr: rate(loki_distributor_ingester_append_failures_total[5m]) > 0
        for: 3m
        labels: {severity: critical}
        annotations:
          summary: "Loki ingestion failures — check ingester health"

      - alert: LokiQueryLatencyHigh
        expr: histogram_quantile(0.99, loki_request_duration_seconds_bucket{route="/loki/api/v1/query_range"}) > 30
        for: 5m
        labels: {severity: warning}
        annotations:
          summary: "Loki query p99 latency > 30s — user queries degraded"

      - alert: FluentBitBufferDiskFull
        expr: (1 - node_filesystem_avail_bytes{mountpoint="/var/log/flb-storage"} / node_filesystem_size_bytes{mountpoint="/var/log/flb-storage"}) > 0.8
        for: 10m
        labels: {severity: warning}
        annotations:
          summary: "Fluent Bit disk buffer > 80% full on {{ $labels.node }}"

Runbooks

Fluent Bit Not Collecting Logs

Check DaemonSet pod status: kubectl get ds fluent-bit -n logging
Check pod logs: kubectl logs ds/fluent-bit -n logging --tail=50
Verify hostPath mount: kubectl exec -n logging ds/fluent-bit -- ls /var/log/containers/
Check RBAC: kubectl auth can-i list pods --as=system:serviceaccount:logging:fluent-bit
Check Fluent Bit metrics: curl localhost:2020/api/v1/metrics/prometheus

Logs Missing in Loki

Verify Fluent Bit output has no errors: check fluentbit_output_errors_total
Test Loki endpoint directly: curl -v http://loki-gateway/loki/api/v1/labels
Check Loki distributor logs: kubectl logs -l app=loki-distributed-distributor
Verify label set is valid (no special chars, no high-cardinality labels)
Check retention: query logs with a longer time range

Loki Query Timeout

Add stream selectors to narrow scan: avoid {cluster="prod"} alone
Reduce time range of query
Check querier CPU/memory: kubectl top pod -n monitoring -l app=loki-distributed-querier
Increase query_timeout in Loki config
Add query sharding: enable query_shards in limits_config

Log Volume Spike

Find top namespaces: LogQL topk(5, sum by(namespace)(bytes_rate({cluster="prod"}[5m])))
Find chatty pods: topk(10, sum by(pod)(rate({cluster="prod"}[5m])))
Apply throttle filter for that pod label
Check for log loop (service logging its own log output)
Drop debug logs from offending service

OOMKilled Log Collector

Check log volume: sudden spike may have exhausted input buffer
Reduce Mem_Buf_Limit to force disk spill earlier
Increase memory limits in DaemonSet
Add throttle filter before buffering
Switch storage.type to filesystem to reduce memory pressure

Best Practices

Always use structured JSON logging. Free-text logs require brittle regex parsers and cannot support field-level alerting, aggregation, or precise filtering at scale.
Always inject trace_id and span_id into log records. This enables one-click navigation from a log line to the originating distributed trace in Grafana (Loki → Tempo derived fields).
Keep Loki label cardinality low. Never use pod name, user ID, request ID, or any unbounded value as a Loki label. These belong in the log body, queryable via | json pipeline expressions.
Buffer to disk in Fluent Bit. Set storage.type filesystem and configure a hostPath volume. In-memory buffering loses logs on node memory pressure or Fluent Bit OOMKill.
Exclude health-check and metrics-scrape logs. kubectl liveness probes and Prometheus scrapes generate thousands of lines per hour with zero diagnostic value. Drop them in the Fluent Bit grep filter.
Set per-namespace retention policies in Loki. Production namespaces may need 90-day retention for compliance; dev/staging namespaces can use 7-day retention. Use Loki's per-tenant override config.
Separate log streams by severity in alerting. Always forward 100% of ERROR and WARN logs. Apply sampling (10–20%) only to INFO logs from high-volume services. Never sample FATAL or CRITICAL.
Test log collection in your deployment pipeline. Run a canary pod that emits known log patterns and write a test asserting those patterns appear in the logging backend within 60 seconds of deployment.

Coverage Details

Kubernetes logging architecture: stdout/stderr → CRI → node files
CRI log format (/var/log/containers/, /var/log/pods/)
Docker vs containerd log format differences
kubelet log rotation flags (container-log-max-size, max-files)
Four log collection patterns: DaemonSet / sidecar / streaming sidecar / direct SDK
Pattern comparison table (overhead, app changes, file logging)
Structured logging: standard field schema (15 fields)
Go: Zap logger with trace context injection
Java: Logback + logstash-logback-encoder JSON config
Python: structlog with contextvars for trace injection
Node.js: pino reference
Fluent Bit vs Fluentd comparison table
Fluent Bit pipeline model: INPUT → PARSER → FILTER → BUFFER → OUTPUT
Fluent Bit Helm install command
Production Fluent Bit ConfigMap (filesystem storage, kubernetes filter, Loki output)
Fluent Bit parsers.conf (docker, CRI, JSON parsers)
Fluent Bit DaemonSet YAML (tolerations, hostPath volumes, RBAC)
Multiline log handling (built-in parsers + custom MULTILINE_PARSER)
Loki architecture: distributor/ingester/compactor/querier/store gateway
Label cardinality warning (anti-patterns: pod_name, user_id, request_id)
Recommended Loki label set
Loki Helm install (single-binary and distributed modes)
Loki distributed values YAML (TSDB schema v13, S3, retention, limits_config)
LogQL: log stream selectors, label matchers, regex match
LogQL pipeline: line filters, JSON/logfmt/pattern parsers, label filters, line_format, label_format
LogQL metric queries: rate, count_over_time, quantile_over_time, bytes_rate, unwrap
Useful operational LogQL queries (error trace, OOMKilled, slow queries, top errors)
Loki recording rules via PrometheusRule CRD
Elasticsearch/OpenSearch: ECK operator install and cluster YAML (hot/warm tiers)
Index Lifecycle Management (ILM) JSON: hot/warm/cold/delete phases
Elasticsearch vs Loki comparison table
OTel Collector filelog receiver config (CRI parser, K8s path regex, JSON merge)
OTel Collector k8sattributes processor for metadata enrichment
OTel Log data model fields (Timestamp, SeverityNumber, Body, TraceId, SpanId, Resource)
Kubernetes component logs: kube-apiserver, kubelet, containerd, etcd commands
klog verbosity levels (--v 0–10)
kubectl logs reference: --previous, --since, --tail, --follow, label selector, deployment
Stern multi-pod log streaming (install, usage patterns, output templates)
Log correlation via trace_id (Loki LogQL, Grafana derived fields)
Log volume estimation with LogQL bytes_rate and Fluent Bit metrics API
Cost reduction strategies: drop debug logs, throttle filter, sampling, tiered retention, dedup, compression
Fluent Bit throttle filter and OTel probabilistic_sampler
Loki alerting rules via PrometheusRule: HighErrorLogRate, OOMKilled, Panic, DB errors
Log-based SLO burn rate alert pattern
7 logging infrastructure metrics table with alert thresholds
4 PrometheusRule alert rules (FluentBitOutputErrors, LokiIngestionFailing, QueryLatencyHigh, BufferDiskFull)
5 runbooks (not collecting, missing in Loki, query timeout, volume spike, OOMKilled collector)
8 best practices (structured JSON, trace injection, label cardinality, disk buffer, health exclusion, retention, sampling policy, pipeline testing)