Logging in Kubernetes
Complete guide to log collection, structured logging, Fluent Bit, Loki/LogQL, Elasticsearch, OpenTelemetry logs, and production log management at scale.
Kubernetes Logging Architecture
Kubernetes does not provide a built-in log aggregation or persistence system. Containers write to stdout/stderr; the container runtime captures those streams and writes them to log files on the node. Kubernetes exposes these via kubectl logs but does not ship them anywhere. Cluster operators are responsible for collecting, forwarding, and storing logs.
Pod logs exist only while the pod exists on the node, subject to log rotation. If a pod is evicted, rescheduled, or deleted, its logs disappear. A log collector DaemonSet must tail files in real-time to avoid data loss.
Node-Level Log File Paths
The container runtime writes logs at a location determined by its log driver. The kubelet creates stable symlinks:
CRI Log Format
Since Kubernetes 1.14, the CRI log format (used by containerd and CRI-O) prefixes each line with metadata:
# Format: <timestamp> <stream> <flags> <log message>
2024-01-15T10:23:45.123456789Z stdout F {"level":"info","msg":"server started","port":8080}
2024-01-15T10:23:45.124000000Z stderr F Error: connection refused to db:5432
# Flags: F = full line, P = partial line (multiline handling needed)
Docker's default json-file driver wraps each log line in JSON: {"log":"...\n","stream":"stdout","time":"..."}. containerd uses the CRI format above. Fluent Bit parsers must match the actual runtime in your cluster — check with kubectl get nodes -o wide for the container runtime version.
Log Retention on Node
| kubelet Flag | Default | Effect |
|---|---|---|
--container-log-max-size | 10Mi | Max size before rotation |
--container-log-max-files | 5 | Number of rotated files to keep |
--node-status-update-frequency | 10s | Affects log availability window |
| Total per-container storage | ~50Mi | max-size × max-files |
Log Collection Patterns
There are three fundamental architectures for log collection in Kubernetes. Most production clusters use a combination of patterns 1 and 2.
1. Node-Level Agent (DaemonSet)
A log collector runs as a DaemonSet on every node, tailing /var/log/containers/. One collector per node — low overhead, zero application changes required. Works for any container writing to stdout/stderr.
Recommended for most workloads
2. Sidecar Container
A dedicated log-shipper container runs in the same pod as the application, reading logs from a shared volume. Required when the application writes to files rather than stdout, or when per-application routing logic is needed.
Use sparingly — resource overhead per pod
3. Sidecar Streaming
A sidecar reads from the application log file and re-emits to stdout. Useful for legacy applications that cannot be modified to write to stdout. The node-level agent then picks up the sidecar's stdout.
Legacy apps only
4. Direct-to-Backend
The application SDK ships logs directly to the logging backend (e.g., Datadog Agent via UDP, Loki HTTP API). Bypasses the node filesystem entirely. Requires code changes; provides richest metadata but no fallback if backend unreachable.
Cloud-native apps
Pattern Comparison
| Pattern | Resource Overhead | App Changes | File Logging | Per-Pod Routing | Data Loss Risk |
|---|---|---|---|---|---|
| Node-Level DaemonSet | Low (1 agent/node) | None | No (stdout only) | Limited | If pod deleted before tail |
| Sidecar Shipper | High (1 sidecar/pod) | Shared volume | Yes | Yes | Low (direct ship) |
| Sidecar Streaming | Medium | Shared volume | Yes | No | Same as DaemonSet |
| Direct SDK | None extra | Yes | Yes | Full | If backend unreachable |
Structured Logging
Structured logs emit machine-parseable records (JSON) rather than free-text strings. This enables log backends to index fields, run aggregations, and support precise query predicates — critical for high-volume environments.
Standard Log Fields
| Field | Type | Purpose | Example |
|---|---|---|---|
timestamp | RFC3339 | Event time (UTC) | "2024-01-15T10:23:45.123Z" |
level | string | Severity | "info", "error", "warn" |
message | string | Human-readable summary | "request completed" |
service | string | Application name | "order-service" |
version | string | App version/build | "v2.4.1" |
trace_id | string (hex) | W3C TraceContext — links to trace | "4bf92f3577b34da6..." |
span_id | string (hex) | Span within trace | "00f067aa0ba902b7" |
pod | string | Kubernetes pod name | "order-7d5f9-xk2pq" |
namespace | string | Kubernetes namespace | "production" |
node | string | Node hosting the pod | "ip-10-0-1-45" |
error | string / object | Error details when level=error | "connection refused" |
duration_ms | float | Request/operation duration | 34.7 |
http.method | string | HTTP verb (OTel semantic conventions) | "POST" |
http.status_code | int | HTTP response code | 200 |
Go: Zap Structured Logger
package main
import (
"go.uber.org/zap"
"go.opentelemetry.io/otel/trace"
)
func setupLogger() *zap.Logger {
cfg := zap.NewProductionConfig()
cfg.EncoderConfig.TimeKey = "timestamp"
cfg.EncoderConfig.MessageKey = "message"
logger, _ := cfg.Build()
return logger
}
// Inject trace context into every log record
func loggerFromSpan(base *zap.Logger, span trace.Span) *zap.Logger {
sc := span.SpanContext()
return base.With(
zap.String("trace_id", sc.TraceID().String()),
zap.String("span_id", sc.SpanID().String()),
zap.Bool("trace_sampled", sc.IsSampled()),
)
}
func handleRequest(logger *zap.Logger, span trace.Span, method, path string) {
log := loggerFromSpan(logger, span)
log.Info("request received",
zap.String("http.method", method),
zap.String("http.path", path),
)
}
// Output:
// {"timestamp":"2024-01-15T10:23:45.123Z","level":"info","message":"request received",
// "http.method":"POST","http.path":"/orders","trace_id":"4bf92f3577b34da6...","span_id":"00f067aa0ba902b7"}
Java: Logback JSON Encoder (Logstash)
<!-- pom.xml dependency -->
<dependency>
<groupId>net.logstash.logback</groupId>
<artifactId>logstash-logback-encoder</artifactId>
<version>7.4</version>
</dependency>
<!-- logback-spring.xml -->
<configuration>
<appender name="JSON" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<fieldNames>
<timestamp>timestamp</timestamp>
<message>message</message>
<logger>logger</logger>
</fieldNames>
<includeMdcKeyName>trace_id</includeMdcKeyName>
<includeMdcKeyName>span_id</includeMdcKeyName>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="JSON"/>
</root>
</configuration>
Python: structlog
import structlog
import logging
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars, # thread-local context (trace_id etc.)
structlog.processors.add_log_level,
structlog.processors.TimeStamper(fmt="iso", utc=True, key="timestamp"),
structlog.processors.dict_tracebacks,
structlog.processors.JSONRenderer(),
],
logger_factory=structlog.PrintLoggerFactory(),
)
log = structlog.get_logger()
# Bind request context once per request:
structlog.contextvars.bind_contextvars(
trace_id="4bf92f3577b34da6a3ce929d0e0e4736",
span_id="00f067aa0ba902b7",
service="order-service",
)
log.info("order created", order_id=12345, amount=99.50)
# {"timestamp":"2024-01-15T10:23:45.123Z","level":"info","event":"order created",
# "order_id":12345,"amount":99.5,"trace_id":"4bf92f3577b34da6...","service":"order-service"}
pino is the standard for Node.js structured logging: const log = pino({ level: 'info', timestamp: pino.stdTimeFunctions.isoTime }). Use pino-http middleware to automatically log HTTP requests with duration and status code.
Fluent Bit
Fluent Bit is the preferred lightweight log collector for Kubernetes (written in C, ~700KB binary, ~30MB RSS). It replaces Fluentd in most modern deployments due to its lower resource footprint. Fluent Bit reads CRI log files, enriches them with Kubernetes metadata via the API, and forwards to one or more backends.
Fluent Bit vs Fluentd
| Aspect | Fluent Bit | Fluentd |
|---|---|---|
| Language | C | Ruby |
| Memory footprint | ~30–50MB | ~200–600MB |
| Plugin ecosystem | ~100 plugins | ~1,000 plugins |
| Performance | Higher (native) | Lower (Ruby GIL) |
| Complex routing | Limited | Excellent |
| Use case | Edge/DaemonSet collector | Aggregator / heavy processing |
| CNCF status | Graduated | Graduated |
Fluent Bit Pipeline Model
Helm Install
helm repo add fluent https://fluent.github.io/helm-charts
helm repo update
helm upgrade --install fluent-bit fluent/fluent-bit \
--namespace logging \
--create-namespace \
--values fluentbit-values.yaml
Production ConfigMap (Loki backend)
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Daemon Off
Log_Level warn
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
storage.type filesystem # buffer to disk (survive node pressure)
storage.path /var/log/flb-storage/
storage.sync normal
storage.checksum Off
storage.max_chunks_up 128
[INPUT]
Name tail
Path /var/log/containers/*.log
# Exclude fluent-bit's own logs and kube-system noise
Exclude_Path /var/log/containers/fluent-bit*,/var/log/containers/*_kube-system_*
multiline.parser docker, cri # handles both runtimes
Tag kube.*
Refresh_Interval 5
Rotate_Wait 30
Mem_Buf_Limit 64MB
Skip_Long_Lines On
DB /var/log/flb-storage/tail.db # position tracking
DB.sync normal
Ignore_Older 24h
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On # parse nested JSON from container log field
Keep_Log Off # drop original 'log' key after merge
Merge_Log_Key app # merged fields under 'app.' prefix
K8S-Logging.Parser On # respect pod annotation for custom parser
K8S-Logging.Exclude On # respect annotation to exclude pod
Labels On
Annotations Off # annotations add cardinality — disable unless needed
# Drop noisy health/readiness probe logs
[FILTER]
Name grep
Match kube.*
Exclude log /healthz|/readyz|/livez|/metrics
# Add cluster identifier
[FILTER]
Name modify
Match kube.*
Add cluster prod-us-east-1
[OUTPUT]
Name loki
Match kube.*
Host loki-gateway.monitoring.svc.cluster.local
Port 80
Labels job=fluent-bit,cluster=$cluster,namespace=$kubernetes_namespace_name,pod=$kubernetes_pod_name,container=$kubernetes_container_name
Label_Keys level,severity
Remove_Keys kubernetes,stream
Retry_Limit False # retry indefinitely (disk-buffered)
Workers 4
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name cri
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<flags>[^ ]*) (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
Name json
Format json
Time_Key timestamp
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
Fluent Bit DaemonSet Resource Requirements
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
spec:
selector:
matchLabels:
app: fluent-bit
template:
metadata:
labels:
app: fluent-bit
annotations:
fluentbit.io/exclude: "true" # exclude own logs from collection
spec:
serviceAccountName: fluent-bit
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
- key: node.kubernetes.io/not-ready
effect: NoSchedule
containers:
- name: fluent-bit
image: cr.fluentbit.io/fluent/fluent-bit:3.2.0
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
volumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true
- name: config
mountPath: /fluent-bit/etc
- name: storage
mountPath: /var/log/flb-storage
ports:
- name: http
containerPort: 2020
livenessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: config
configMap:
name: fluent-bit-config
- name: storage
hostPath:
path: /var/lib/fluent-bit
type: DirectoryOrCreate
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluent-bit
rules:
- apiGroups: [""]
resources: [pods, namespaces, nodes, nodes/proxy]
verbs: [get, list, watch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fluent-bit
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: logging
Multiline Log Handling
Stack traces (Java, Python) and structured exceptions span multiple lines. The CRI format uses the P (partial) flag for continuation lines, but applications may not use it. Fluent Bit supports multiline via the multiline.parser option or dedicated multiline filter:
[FILTER]
Name multiline
Match kube.*
multiline.key_content log
multiline.parser java # built-in: java, go, python, ruby, docker
# Custom multiline rule (e.g., Python traceback)
[MULTILINE_PARSER]
name python_custom
type regex
flush_timeout 2000 # flush after 2s of inactivity
rule "start_state" "/^[^\s]/" "cont"
rule "cont" "/^\s/" "cont"
Loki & LogQL
Grafana Loki is a horizontally scalable, highly available log aggregation system. Unlike Elasticsearch, Loki does not index the contents of log lines — only the metadata labels. This makes it extremely cost-efficient. Log content is compressed and stored in object storage (S3, GCS, Azure Blob).
Loki Architecture
Loki indexes labels, not log content. High-cardinality labels (e.g., pod_name with thousands of values, user_id, request_id) cause index explosion, query performance degradation, and excessive memory use in ingesters. Keep label count <10, cardinality per label <1,000. Put high-cardinality values in the log body and use |= / | json filter expressions to query them.
Recommended Label Set
cluster="prod-us-east-1" # static — low cardinality
namespace="payments" # ~10–100 values
pod="payments-api-7d5f9-xk" # AVOID: high cardinality — use for debugging only
container="payments-api" # ~1–5 per namespace
app="payments-api" # from Kubernetes labels
level="error" # from log body parsing
Loki Helm Install
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Single-binary mode (dev/small clusters)
helm upgrade --install loki grafana/loki \
--namespace monitoring \
--set deploymentMode=SingleBinary \
--set loki.storage.type=s3 \
--set loki.storage.s3.region=us-east-1 \
--set loki.storage.s3.bucketnames=my-loki-chunks \
--set loki.auth_enabled=false \
--set singleBinary.replicas=3
# Distributed mode (production)
helm upgrade --install loki grafana/loki-distributed \
--namespace monitoring \
--values loki-distributed-values.yaml
Loki Distributed Values (Production)
# loki-distributed-values.yaml
loki:
auth_enabled: true
commonConfig:
replication_factor: 3
storage:
type: s3
s3:
region: us-east-1
bucketnames: prod-loki-chunks
insecure: false
schemaConfig:
configs:
- from: "2024-01-01"
store: tsdb # TSDB index (Loki 2.8+)
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
limits_config:
ingestion_rate_mb: 64
ingestion_burst_size_mb: 128
max_label_names_per_series: 15
max_label_value_length: 2048
max_streams_per_user: 50000
max_chunks_per_query: 2000000
retention_period: 744h # 31 days
per_tenant_override_config: /etc/loki/overrides.yaml
ingester:
replicas: 3
resources:
requests: {cpu: 500m, memory: 1Gi}
limits: {cpu: 2, memory: 4Gi}
distributor:
replicas: 3
resources:
requests: {cpu: 200m, memory: 256Mi}
querier:
replicas: 3
resources:
requests: {cpu: 500m, memory: 512Mi}
limits: {cpu: 2, memory: 2Gi}
queryFrontend:
replicas: 2
compactor:
enabled: true
resources:
requests: {cpu: 200m, memory: 256Mi}
LogQL Reference
LogQL is Loki's query language. It has two types: log queries (return log lines) and metric queries (return time series derived from log lines).
Log Stream Selectors
# Select all logs from namespace=payments
{namespace="payments"}
# Multiple label matchers
{cluster="prod", namespace="payments", container="api"}
# Regex match on label value
{namespace=~"payments|orders"}
# Negative match
{namespace!="kube-system"}
Log Pipeline Expressions
# Line filter — substring match (fastest)
{namespace="payments"} |= "error"
{namespace="payments"} != "health"
# Line filter — regex
{namespace="payments"} |~ "ERROR|FATAL"
{namespace="payments"} !~ "GET /health|GET /metrics"
# JSON parser — extract fields from JSON log body
{namespace="payments"} | json
# JSON parser with field aliases
{namespace="payments"} | json level="level", traceId="trace_id"
# Logfmt parser (key=value format)
{namespace="payments"} | logfmt
# Pattern parser (positional)
{namespace="payments"} | pattern `<_> `
# Label filter — after parsing
{namespace="payments"} | json | level = "error"
{namespace="payments"} | json | status_code >= 500
{namespace="payments"} | json | duration_ms > 1000
# Line format — reshape output
{namespace="payments"} | json | line_format "{{.level}} {{.message}} traceId={{.trace_id}}"
# Label format — rename/derive labels
{namespace="payments"} | json | label_format service=app, lvl=level
Metric Queries
# Count error log rate per service (last 5m window)
sum by (app) (rate({namespace="production"} |= "error" [5m]))
# P99 request duration from logs (requires duration_ms field)
quantile_over_time(0.99,
{namespace="payments"} | json | unwrap duration_ms [5m]
) by (app)
# Error rate as percentage of all logs
sum(rate({namespace="payments"} | json | level="error" [5m]))
/
sum(rate({namespace="payments"} [5m]))
# Bytes ingested per namespace
sum by (namespace) (bytes_rate({cluster="prod"} [5m]))
# Log volume (lines per second) per service
sum by (app) (rate({namespace="production"} [1m]))
# Count distinct values (approx cardinality)
count_over_time({namespace="payments"} |= "user_id" | json | __error__="" [1h])
Useful Operational Queries
# Find all errors in last 15 minutes
{namespace="production"} | json | level="error" | line_format "{{.timestamp}} {{.service}} {{.message}}"
# Trace a request by trace_id
{cluster="prod"} | json | trace_id = "4bf92f3577b34da6a3ce929d0e0e4736"
# Find OOMKilled events across all namespaces
{cluster="prod"} |= "OOMKilled"
# Slow queries (> 5s)
{namespace="payments"} | json | duration_ms > 5000 | line_format "{{.timestamp}} {{.message}} dur={{.duration_ms}}ms"
# Count unique error messages (top 10)
topk(10, sum by (message) (count_over_time({namespace="payments"} | json | level="error" [1h])))
Loki Recording Rules
apiVersion: monitoring.coreos.com/v1alpha1
kind: PrometheusRule
metadata:
name: loki-log-rates
namespace: monitoring
spec:
groups:
- name: log-rates
interval: 1m
rules:
- record: namespace:log_lines:rate5m
expr: |
sum by (namespace) (
rate({cluster="prod"} [5m])
)
- record: namespace_app:log_errors:rate5m
expr: |
sum by (namespace, app) (
rate({cluster="prod"} | json | level="error" [5m])
)
Elasticsearch / OpenSearch
Elasticsearch (and its open-source fork OpenSearch) indexes every field in every log record, enabling full-text search and complex aggregations. This power comes at significantly higher resource cost (10–20× more storage and CPU than Loki for equivalent log volume).
ECK (Elastic Cloud on Kubernetes)
# Install ECK Operator
kubectl create -f https://download.elastic.co/downloads/eck/2.11.0/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/2.11.0/operator.yaml
# Production Elasticsearch cluster
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: prod-logs
namespace: logging
spec:
version: 8.12.0
nodeSets:
- name: hot
count: 3
config:
node.roles: [master, data_hot, data_content, ingest]
xpack.security.enabled: true
podTemplate:
spec:
containers:
- name: elasticsearch
resources:
requests: {cpu: 2, memory: 8Gi}
limits: {cpu: 4, memory: 8Gi}
env:
- name: ES_JAVA_OPTS
value: "-Xms4g -Xmx4g"
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
storageClassName: gp3
resources:
requests:
storage: 500Gi
- name: warm
count: 2
config:
node.roles: [data_warm]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
storageClassName: gp2 # cheaper storage for warm tier
resources:
requests:
storage: 2Ti
Index Lifecycle Management (ILM)
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_primary_shard_size": "50gb",
"max_age": "1d"
},
"set_priority": {"priority": 100}
}
},
"warm": {
"min_age": "3d",
"actions": {
"shrink": {"number_of_shards": 1},
"forcemerge": {"max_num_segments": 1},
"set_priority": {"priority": 50},
"allocate": {"require": {"data": "warm"}}
}
},
"cold": {
"min_age": "30d",
"actions": {
"allocate": {"require": {"data": "cold"}},
"set_priority": {"priority": 0},
"freeze": {}
}
},
"delete": {
"min_age": "90d",
"actions": {"delete": {}}
}
}
}
}
Elasticsearch vs Loki Comparison
| Aspect | Elasticsearch / OpenSearch | Grafana Loki |
|---|---|---|
| Indexing | Full inverted index of all fields | Labels only; log body compressed |
| Storage cost | High (10–20× raw log size) | Low (2–5× compressed) |
| Full-text search | Excellent (analyzed text fields) | Via line filter (regex scan) |
| Query language | Lucene / KQL / EQL | LogQL |
| Scalability | Complex (shard management) | Simpler (object store backed) |
| Operational complexity | High (JVM tuning, ILM, shards) | Medium |
| Grafana integration | Via data source | Native (first-class) |
| Best for | Security events, full-text search, compliance | Application logs, cost-sensitive, correlated observability |
OpenTelemetry Logs
The OpenTelemetry Collector can act as a log collection and forwarding pipeline, replacing Fluent Bit or running alongside it. It enables unified signal collection — the same collector handles metrics, traces, and logs.
OTel Collector Log Pipeline
# OTel Collector config for log collection
receivers:
filelog:
include:
- /var/log/containers/*.log
exclude:
- /var/log/containers/otelcol*
start_at: beginning
include_file_path: true
include_file_name: false
operators:
# Parse CRI log format
- type: regex_parser
id: parse_cri
regex: '^(?P<time>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{9}Z)\s(?P<stream>stdout|stderr)\s(?P<logtag>[^ ]*)\s(?P<log>.*)$'
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
# Extract Kubernetes metadata from file path
- type: regex_parser
id: parse_k8s
regex: '\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[^\/]+)\/(?P<container_name>[^\/]+)\/'
parse_from: attributes["log.file.path"]
# Try to parse JSON log body
- type: json_parser
id: parse_json
parse_from: attributes.log
if: 'attributes.log matches "^\\{"'
on_error: send
# Move log to body
- type: move
from: attributes.log
to: body
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 5s
send_batch_size: 8192
memory_limiter:
limit_mib: 512
spike_limit_mib: 128
k8sattributes: # enrich with K8s metadata
extract:
metadata:
- k8s.namespace.name
- k8s.pod.name
- k8s.pod.uid
- k8s.deployment.name
- k8s.node.name
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.uid
resource:
attributes:
- key: cluster
value: "prod-us-east-1"
action: insert
exporters:
loki:
endpoint: http://loki-gateway.monitoring.svc/loki/api/v1/push
default_labels_enabled:
exporter: false
job: true
labels:
resource:
cluster: ""
k8s.namespace.name: ""
k8s.pod.name: ""
k8s.container.name: ""
otlp/tempo: # also forward to Tempo for log→trace correlation
endpoint: tempo.monitoring.svc:4317
service:
pipelines:
logs:
receivers: [filelog, otlp]
processors: [memory_limiter, k8sattributes, resource, batch]
exporters: [loki, otlp/tempo]
OTel Log Data Model
The OpenTelemetry log data model provides a standard schema that maps to existing formats:
| OTel Field | Description | Maps to (Loki) |
|---|---|---|
Timestamp | Event time (nanoseconds) | Log timestamp |
SeverityNumber | 1–24 (TRACE, DEBUG, INFO, WARN, ERROR, FATAL) | Label level |
SeverityText | Original severity string | Label severity |
Body | Log message (any type) | Log line |
TraceId | W3C trace ID (16 bytes) | Indexed attribute |
SpanId | W3C span ID (8 bytes) | Indexed attribute |
Attributes | Key-value pairs (structured fields) | Parsed fields |
Resource | Origin resource (service, host, k8s) | Labels |
Kubernetes Component Logs
Control-plane component logs are critical for diagnosing cluster-level issues. Their collection differs from application logs because they may run as static pods, system services, or managed cloud services.
kube-apiserver
# Static pod — logs via kubectl
kubectl -n kube-system logs kube-apiserver-<node> --tail=100 --since=1h
# Increase verbosity temporarily (restart required for static pods)
# Edit /etc/kubernetes/manifests/kube-apiserver.yaml and add --v=4
# Key log patterns to watch:
# "too many requests" — client throttling
# "Timeout: request did not complete" — etcd latency
# "DENY" — admission controller rejection
grep -i "too many requests\|timeout\|DENY" /var/log/kubernetes/apiserver.log
kubelet
# kubelet runs as a systemd service (not a pod)
journalctl -u kubelet -n 500 --since "1 hour ago"
journalctl -u kubelet -f # follow
# Increase kubelet verbosity (reload required)
# Edit /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
# Add --v=4 to KUBELET_EXTRA_ARGS
# Key kubelet log patterns:
# "orphaned pod" — stale pod directories
# "failed to get container info" — runtime disconnected
# "evicted" — resource pressure eviction
journalctl -u kubelet | grep -i "evict\|oom\|failed to\|error"
containerd
journalctl -u containerd -n 200
# containerd log level (edit /etc/containerd/config.toml)
[debug]
level = "warn" # info = verbose, warn = quieter
# Key patterns:
# "failed to pull image" — registry auth or network
# "failed to create shim" — runc error (OCI runtime issue)
# "context deadline exceeded" — slow image pulls
etcd
kubectl -n kube-system logs etcd-<node> --tail=100
# Key patterns:
# "slow fdatasync" — disk I/O issue (use SSD with <1ms fsync)
# "leader changed" — leader election event (check disk/network)
# "failed to send message" — inter-etcd network issue
# "applying snapshot" — member catching up (acceptable on rejoin)
# etcd verbosity (add to manifest):
# --log-level=warn (options: debug, info, warn, error, panic, fatal)
Control-Plane Log Verbosity Levels
| --v Level | Content | Use When |
|---|---|---|
| 0 | Always-visible messages (errors, panics) | Normal production |
| 1 | Basic high-level info | Normal production |
| 2 | Steady-state changes, reconciliation loops | Default for most components |
| 3 | Extended info, informational changes | Light debugging |
| 4 | Debug-level logging | Active debugging session |
| 5–6 | Verbose — each API call logged | Deep debugging (high volume) |
| 7–10 | Extremely verbose | Core development only |
kubectl logs & Tooling
kubectl logs Reference
# Basic usage
kubectl logs <pod>
kubectl logs <pod> -c <container> # specific container in multi-container pod
kubectl logs <pod> --previous # logs from the previous (crashed) container
# Streaming
kubectl logs -f <pod> # follow (like tail -f)
kubectl logs -f deployment/<name> # follow deployment (picks one pod)
# Filtering by time
kubectl logs <pod> --since=1h # last 1 hour
kubectl logs <pod> --since-time="2024-01-15T10:00:00Z"
# Limit output
kubectl logs <pod> --tail=100 # last 100 lines
# All pods matching label selector
kubectl logs -l app=payments --all-containers=true --prefix=true
# All containers in all pods of a deployment
kubectl logs -f deployment/payments-api --all-containers=true --max-log-requests=10
Stern — Multi-Pod Log Streaming
# Install
brew install stern # macOS
# or: https://github.com/stern/stern/releases
# Follow all pods matching regex in any namespace
stern "payments.*" --namespace production
# Multiple namespaces
stern "api" --namespace production --namespace staging
# Filter by container name
stern "." --container "main" --namespace production
# Tail last N lines then follow
stern "payments" --tail 50
# Output as JSON with pod prefix
stern "payments" --output json --namespace production
# Filter with grep
stern "payments" | grep "error"
# Custom output template
stern "payments" --template '{{.PodName}} {{.ContainerName}} {{.Message}}'
k9s Log Viewer
k9s provides a TUI (terminal UI) for Kubernetes. Press l on a pod to view logs, / to filter, s to save to file.
Log Correlation via Trace ID
# Find all logs for a specific trace (across all services)
# In Grafana: Explore → Loki → LogQL:
{cluster="prod"} | json | trace_id = "4bf92f3577b34da6a3ce929d0e0e4736"
# In CLI using kubelog or stern with grep:
stern "." --namespace production --since 1h | grep "4bf92f3577b34da6"
# In Loki with derived field (configured in Grafana data source):
# "Derived fields" → regex "trace_id=(\w+)" → internal link to Tempo
Cardinality & Cost Management
Log Volume Estimation
# Estimate current log volume per namespace (in Loki)
sum by (namespace) (bytes_rate({cluster="prod"} [1h]))
# Estimate bytes via Fluent Bit metrics
kubectl port-forward ds/fluent-bit 2020:2020 -n logging
curl -s localhost:2020/api/v1/metrics | jq '.output[] | {name:.name, bytes:.bytes}'
# Check Loki ingestion rate
curl -s http://loki-gateway:80/metrics | grep loki_distributor_bytes_received_total
Cost Reduction Strategies
Drop Debug/Trace Logs in Production
Add a Fluent Bit FILTER grep rule to drop level=debug or level=trace before forwarding. Can reduce volume by 60–80% for verbose services.
[FILTER]
Name grep
Match kube.*
Exclude log "\"level\":\"debug\""
Throttle Noisy Sources
Fluent Bit's throttle filter caps logs per interval per tag. Use rewrite_tag to separate chatty pods and apply different rate limits to them.
Sampling
For INFO logs from high-volume services, sample 10–20% using Fluent Bit's Lua filter or OTel Collector's probabilistic_sampler processor. Always forward 100% of WARN/ERROR.
Tiered Retention
Keep last 7 days in hot storage (fast SSD), 30 days in warm (standard), 365 days in cold (S3 Glacier). Loki's compactor + ILM policy handles tier movement automatically.
Dedup Before Ship
Avoid shipping the same log to both Loki and Elasticsearch. Use a single authoritative store; export compliance-required logs to S3 separately via a Fluent Bit S3 output with compression.
Compression
Enable gzip/snappy compression in Fluent Bit output and Loki storage. JSON logs typically compress 10:1. Loki uses snappy by default for chunks.
Fluent Bit Throttle Filter
[FILTER]
Name throttle
Match kube.*
Rate 5000 # max 5,000 log records per interval
Window 5 # 5-second window
Print_Status On # log when throttling occurs
Sampling with OTel Collector
processors:
filter/sample_info:
logs:
log_record:
# Only keep ERROR/WARN at 100% — sample INFO at ~10%
- 'severity_number < SEVERITY_NUMBER_WARN and (random() > 0.1)'
# probabilistic sampler for log records
probabilistic_sampler:
sampling_percentage: 10
attribute_source: record
from_attribute: sampling_priority
Alerting on Logs
Loki supports alerting rules using LogQL metric expressions. Alerts are evaluated by the Loki ruler and routed to Alertmanager — the same Alertmanager used for Prometheus alerts. See 06-alerting.html for Alertmanager routing configuration.
Loki Alerting Rules
apiVersion: monitoring.coreos.com/v1alpha1
kind: PrometheusRule
metadata:
name: loki-log-alerts
namespace: monitoring
spec:
groups:
- name: log-error-rates
interval: 1m
rules:
- alert: HighErrorLogRate
expr: |
sum by (namespace, app) (
rate({cluster="prod"} | json | level="error" [5m])
) > 1
for: 2m
labels:
severity: warning
team: platform
annotations:
summary: "High error log rate in {{ $labels.namespace }}/{{ $labels.app }}"
description: "Error rate is {{ $value | humanize }} errors/sec"
runbook_url: "https://wiki/runbooks/high-error-log-rate"
- alert: OOMKilledContainer
expr: |
count_over_time({cluster="prod"} |= "OOMKilled" [5m]) > 0
labels:
severity: critical
annotations:
summary: "Container OOMKilled detected"
description: "A container was OOMKilled in the last 5 minutes"
- alert: PanicInApplication
expr: |
sum by (namespace, app) (
rate({cluster="prod"} |~ "panic:|PANIC|runtime error:" [5m])
) > 0
labels:
severity: critical
annotations:
summary: "Application panic detected in {{ $labels.namespace }}/{{ $labels.app }}"
- alert: DatabaseConnectionErrors
expr: |
sum by (namespace, app) (
rate({cluster="prod"} |= "connection refused" |= "database" [5m])
) > 0.5
for: 3m
labels:
severity: warning
annotations:
summary: "Database connection errors in {{ $labels.namespace }}/{{ $labels.app }}"
Log-Based SLO Burn Rate Alerts
# Error rate SLO: 99.5% of requests must succeed
# Derived from HTTP status logs rather than metrics
- alert: ErrorBudgetBurnRateFast
expr: |
(
sum(rate({namespace="payments"} | json | http_status >= 500 [1h]))
/
sum(rate({namespace="payments"} | json [1h]))
) > (14.4 * 0.005) # 14.4× budget burn rate for 1h window (page fast)
labels:
severity: critical
annotations:
summary: "Error budget burning fast for payments service"
Metrics, Alerts & Runbooks
Key Logging Infrastructure Metrics
| Metric | Source | Alert Threshold | Meaning |
|---|---|---|---|
fluentbit_input_records_total | Fluent Bit | — | Records ingested from all sources |
fluentbit_output_errors_total | Fluent Bit | >10/min | Failed deliveries to backend |
fluentbit_output_retried_records_total | Fluent Bit | >100/min | Records being retried (backend pressure) |
loki_distributor_ingester_append_failures_total | Loki | >0 | Ingestion failures in Loki |
loki_ingester_wal_replay_active | Loki | >0 for >5m | Ingester replaying WAL — possible crash recovery |
loki_query_frontend_retries | Loki | >10/min | Query retries — querier under pressure |
loki_compactor_runs_total | Loki | Not zero (should run regularly) | Compaction health |
Alert Rules
groups:
- name: logging-infrastructure
rules:
- alert: FluentBitOutputErrors
expr: rate(fluentbit_output_errors_total[5m]) > 0.1
for: 5m
labels: {severity: warning}
annotations:
summary: "Fluent Bit output errors — logs may be lost"
runbook: "Check output plugin config and backend connectivity"
- alert: LokiIngestionFailing
expr: rate(loki_distributor_ingester_append_failures_total[5m]) > 0
for: 3m
labels: {severity: critical}
annotations:
summary: "Loki ingestion failures — check ingester health"
- alert: LokiQueryLatencyHigh
expr: histogram_quantile(0.99, loki_request_duration_seconds_bucket{route="/loki/api/v1/query_range"}) > 30
for: 5m
labels: {severity: warning}
annotations:
summary: "Loki query p99 latency > 30s — user queries degraded"
- alert: FluentBitBufferDiskFull
expr: (1 - node_filesystem_avail_bytes{mountpoint="/var/log/flb-storage"} / node_filesystem_size_bytes{mountpoint="/var/log/flb-storage"}) > 0.8
for: 10m
labels: {severity: warning}
annotations:
summary: "Fluent Bit disk buffer > 80% full on {{ $labels.node }}"
Runbooks
Fluent Bit Not Collecting Logs
- Check DaemonSet pod status:
kubectl get ds fluent-bit -n logging - Check pod logs:
kubectl logs ds/fluent-bit -n logging --tail=50 - Verify hostPath mount:
kubectl exec -n logging ds/fluent-bit -- ls /var/log/containers/ - Check RBAC:
kubectl auth can-i list pods --as=system:serviceaccount:logging:fluent-bit - Check Fluent Bit metrics:
curl localhost:2020/api/v1/metrics/prometheus
Logs Missing in Loki
- Verify Fluent Bit output has no errors: check
fluentbit_output_errors_total - Test Loki endpoint directly:
curl -v http://loki-gateway/loki/api/v1/labels - Check Loki distributor logs:
kubectl logs -l app=loki-distributed-distributor - Verify label set is valid (no special chars, no high-cardinality labels)
- Check retention: query logs with a longer time range
Loki Query Timeout
- Add stream selectors to narrow scan: avoid
{cluster="prod"}alone - Reduce time range of query
- Check querier CPU/memory:
kubectl top pod -n monitoring -l app=loki-distributed-querier - Increase
query_timeoutin Loki config - Add query sharding: enable
query_shardsin limits_config
Log Volume Spike
- Find top namespaces: LogQL
topk(5, sum by(namespace)(bytes_rate({cluster="prod"}[5m]))) - Find chatty pods:
topk(10, sum by(pod)(rate({cluster="prod"}[5m]))) - Apply throttle filter for that pod label
- Check for log loop (service logging its own log output)
- Drop debug logs from offending service
OOMKilled Log Collector
- Check log volume: sudden spike may have exhausted input buffer
- Reduce
Mem_Buf_Limitto force disk spill earlier - Increase memory limits in DaemonSet
- Add throttle filter before buffering
- Switch
storage.typetofilesystemto reduce memory pressure
Best Practices
- Always use structured JSON logging. Free-text logs require brittle regex parsers and cannot support field-level alerting, aggregation, or precise filtering at scale.
- Always inject trace_id and span_id into log records. This enables one-click navigation from a log line to the originating distributed trace in Grafana (Loki → Tempo derived fields).
- Keep Loki label cardinality low. Never use pod name, user ID, request ID, or any unbounded value as a Loki label. These belong in the log body, queryable via
| jsonpipeline expressions. - Buffer to disk in Fluent Bit. Set
storage.type filesystemand configure ahostPathvolume. In-memory buffering loses logs on node memory pressure or Fluent Bit OOMKill. - Exclude health-check and metrics-scrape logs. kubectl liveness probes and Prometheus scrapes generate thousands of lines per hour with zero diagnostic value. Drop them in the Fluent Bit grep filter.
- Set per-namespace retention policies in Loki. Production namespaces may need 90-day retention for compliance; dev/staging namespaces can use 7-day retention. Use Loki's per-tenant override config.
- Separate log streams by severity in alerting. Always forward 100% of ERROR and WARN logs. Apply sampling (10–20%) only to INFO logs from high-volume services. Never sample FATAL or CRITICAL.
- Test log collection in your deployment pipeline. Run a canary pod that emits known log patterns and write a test asserting those patterns appear in the logging backend within 60 seconds of deployment.
Coverage Details
- Kubernetes logging architecture: stdout/stderr → CRI → node files
- CRI log format (/var/log/containers/, /var/log/pods/)
- Docker vs containerd log format differences
- kubelet log rotation flags (container-log-max-size, max-files)
- Four log collection patterns: DaemonSet / sidecar / streaming sidecar / direct SDK
- Pattern comparison table (overhead, app changes, file logging)
- Structured logging: standard field schema (15 fields)
- Go: Zap logger with trace context injection
- Java: Logback + logstash-logback-encoder JSON config
- Python: structlog with contextvars for trace injection
- Node.js: pino reference
- Fluent Bit vs Fluentd comparison table
- Fluent Bit pipeline model: INPUT → PARSER → FILTER → BUFFER → OUTPUT
- Fluent Bit Helm install command
- Production Fluent Bit ConfigMap (filesystem storage, kubernetes filter, Loki output)
- Fluent Bit parsers.conf (docker, CRI, JSON parsers)
- Fluent Bit DaemonSet YAML (tolerations, hostPath volumes, RBAC)
- Multiline log handling (built-in parsers + custom MULTILINE_PARSER)
- Loki architecture: distributor/ingester/compactor/querier/store gateway
- Label cardinality warning (anti-patterns: pod_name, user_id, request_id)
- Recommended Loki label set
- Loki Helm install (single-binary and distributed modes)
- Loki distributed values YAML (TSDB schema v13, S3, retention, limits_config)
- LogQL: log stream selectors, label matchers, regex match
- LogQL pipeline: line filters, JSON/logfmt/pattern parsers, label filters, line_format, label_format
- LogQL metric queries: rate, count_over_time, quantile_over_time, bytes_rate, unwrap
- Useful operational LogQL queries (error trace, OOMKilled, slow queries, top errors)
- Loki recording rules via PrometheusRule CRD
- Elasticsearch/OpenSearch: ECK operator install and cluster YAML (hot/warm tiers)
- Index Lifecycle Management (ILM) JSON: hot/warm/cold/delete phases
- Elasticsearch vs Loki comparison table
- OTel Collector filelog receiver config (CRI parser, K8s path regex, JSON merge)
- OTel Collector k8sattributes processor for metadata enrichment
- OTel Log data model fields (Timestamp, SeverityNumber, Body, TraceId, SpanId, Resource)
- Kubernetes component logs: kube-apiserver, kubelet, containerd, etcd commands
- klog verbosity levels (--v 0–10)
- kubectl logs reference: --previous, --since, --tail, --follow, label selector, deployment
- Stern multi-pod log streaming (install, usage patterns, output templates)
- Log correlation via trace_id (Loki LogQL, Grafana derived fields)
- Log volume estimation with LogQL bytes_rate and Fluent Bit metrics API
- Cost reduction strategies: drop debug logs, throttle filter, sampling, tiered retention, dedup, compression
- Fluent Bit throttle filter and OTel probabilistic_sampler
- Loki alerting rules via PrometheusRule: HighErrorLogRate, OOMKilled, Panic, DB errors
- Log-based SLO burn rate alert pattern
- 7 logging infrastructure metrics table with alert thresholds
- 4 PrometheusRule alert rules (FluentBitOutputErrors, LokiIngestionFailing, QueryLatencyHigh, BufferDiskFull)
- 5 runbooks (not collecting, missing in Loki, query timeout, volume spike, OOMKilled collector)
- 8 best practices (structured JSON, trace injection, label cardinality, disk buffer, health exclusion, retention, sampling policy, pipeline testing)