Skip to content

ContainerLogManager to rotate logs of all containers in case of disk pressure on host #129447

Open
@Zeel-Patel

Description

What would you like to be added?

The ContainerLogManager, responsible for log rotation and cleanup of log files of containers should also rotate logs of all containers in case of disk pressure on host.

Why is this needed?

It often happens that the containers generating heavy log data have compressed log file with size exceeding the containerLogMaxSize limit set in kubelet config.

For example, kubelet has

containerLogMaxSize = 200M
containerLogMaxFiles = 6

Spec 1

Continuously generating 10Mib with 0.1 sec sleep in between

apiVersion: batch/v1
kind: Job
metadata:
  name: generate-huge-logs
spec:
  template:
    spec:
      containers:
      - name: log-generator
        image: busybox
        command: ["/bin/sh", "-c"]
        args:
          - |
            # Generate huge log entries to stdout
            start_time=$(date +%s)
            log_size=0
            target_size=$((4 * 1024 * 1024 * 1024))  # 4 GB target size in bytes
            while [ $log_size -lt $target_size ]; do
              # Generate 1 MB of random data and write it to stdout
              echo "Generating huge log entry at $(date) - $(dd if=/dev/urandom bs=10M count=1 2>/dev/null)"
              log_size=$(($log_size + 1048576))  # Increment size by 1MB
              sleep 0.1  # Sleep to control log generation speed
            done
            end_time=$(date +%s)
            echo "Log generation completed in $((end_time - start_time)) seconds"
      restartPolicy: Never
  backoffLimit: 4

File sizes

-rw-r----- 1 root root  24142862 Jan  1 11:41 0.log
-rw-r--r-- 1 root root 183335398 Jan  1 11:40 0.log.20250101-113948.gz
-rw-r--r-- 1 root root 364144934 Jan  1 11:40 0.log.20250101-114003.gz
-rw-r--r-- 1 root root 487803789 Jan  1 11:40 0.log.20250101-114023.gz
-rw-r--r-- 1 root root 577188544 Jan  1 11:41 0.log.20250101-114047.gz
-rw-r----- 1 root root 730449620 Jan  1 11:41 0.log.20250101-114115

Spec 2

Continuously generating 10Mib with 10 sec sleep in between

apiVersion: batch/v1
kind: Job
metadata:
  name: generate-huge-logs
spec:
  template:
    spec:
      containers:
      - name: log-generator
        image: busybox
        command: ["/bin/sh", "-c"]
        args:
          - |
            # Generate huge log entries to stdout
            start_time=$(date +%s)
            log_size=0
            target_size=$((4 * 1024 * 1024 * 1024))  # 4 GB target size in bytes
            while [ $log_size -lt $target_size ]; do
              # Generate 1 MB of random data and write it to stdout
              echo "Generating huge log entry at $(date) - $(dd if=/dev/urandom bs=10M count=1 2>/dev/null)"
              log_size=$(($log_size + 1048576))  # Increment size by 1MB
              sleep 0.1  # Sleep to control log generation speed
            done
            end_time=$(date +%s)
            echo "Log generation completed in $((end_time - start_time)) seconds"
      restartPolicy: Never
  backoffLimit: 4

File sizes

-rw-r----- 1 root root 181176268 Jan  1 11:31 0.log
-rw-r--r-- 1 root root 183336647 Jan  1 11:20 0.log.20250101-111730.gz
-rw-r--r-- 1 root root 183323382 Jan  1 11:23 0.log.20250101-112026.gz
-rw-r--r-- 1 root root 183327676 Jan  1 11:26 0.log.20250101-112321.gz
-rw-r--r-- 1 root root 183336376 Jan  1 11:29 0.log.20250101-112616.gz
-rw-r----- 1 root root 205360966 Jan  1 11:29 0.log.20250101-112911

If the pod had been generating logs in Gigabytes with minimal delay, it can cause disk pressure on host.

Metadata

Assignees

Labels

kind/featureCategorizes issue or PR as related to a new feature.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/nodeCategorizes an issue or PR as relevant to SIG Node.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions