ContainerLogManager to rotate logs of all containers in case of disk pressure on host #129447
Open
Description
What would you like to be added?
The ContainerLogManager, responsible for log rotation and cleanup of log files of containers should also rotate logs of all containers in case of disk pressure on host.
Why is this needed?
It often happens that the containers generating heavy log data have compressed log file with size exceeding the containerLogMaxSize limit set in kubelet config.
For example, kubelet has
containerLogMaxSize = 200M
containerLogMaxFiles = 6
Spec 1
Continuously generating 10Mib with 0.1 sec sleep in between
apiVersion: batch/v1
kind: Job
metadata:
name: generate-huge-logs
spec:
template:
spec:
containers:
- name: log-generator
image: busybox
command: ["/bin/sh", "-c"]
args:
- |
# Generate huge log entries to stdout
start_time=$(date +%s)
log_size=0
target_size=$((4 * 1024 * 1024 * 1024)) # 4 GB target size in bytes
while [ $log_size -lt $target_size ]; do
# Generate 1 MB of random data and write it to stdout
echo "Generating huge log entry at $(date) - $(dd if=/dev/urandom bs=10M count=1 2>/dev/null)"
log_size=$(($log_size + 1048576)) # Increment size by 1MB
sleep 0.1 # Sleep to control log generation speed
done
end_time=$(date +%s)
echo "Log generation completed in $((end_time - start_time)) seconds"
restartPolicy: Never
backoffLimit: 4
File sizes
-rw-r----- 1 root root 24142862 Jan 1 11:41 0.log
-rw-r--r-- 1 root root 183335398 Jan 1 11:40 0.log.20250101-113948.gz
-rw-r--r-- 1 root root 364144934 Jan 1 11:40 0.log.20250101-114003.gz
-rw-r--r-- 1 root root 487803789 Jan 1 11:40 0.log.20250101-114023.gz
-rw-r--r-- 1 root root 577188544 Jan 1 11:41 0.log.20250101-114047.gz
-rw-r----- 1 root root 730449620 Jan 1 11:41 0.log.20250101-114115
Spec 2
Continuously generating 10Mib with 10 sec sleep in between
apiVersion: batch/v1
kind: Job
metadata:
name: generate-huge-logs
spec:
template:
spec:
containers:
- name: log-generator
image: busybox
command: ["/bin/sh", "-c"]
args:
- |
# Generate huge log entries to stdout
start_time=$(date +%s)
log_size=0
target_size=$((4 * 1024 * 1024 * 1024)) # 4 GB target size in bytes
while [ $log_size -lt $target_size ]; do
# Generate 1 MB of random data and write it to stdout
echo "Generating huge log entry at $(date) - $(dd if=/dev/urandom bs=10M count=1 2>/dev/null)"
log_size=$(($log_size + 1048576)) # Increment size by 1MB
sleep 0.1 # Sleep to control log generation speed
done
end_time=$(date +%s)
echo "Log generation completed in $((end_time - start_time)) seconds"
restartPolicy: Never
backoffLimit: 4
File sizes
-rw-r----- 1 root root 181176268 Jan 1 11:31 0.log
-rw-r--r-- 1 root root 183336647 Jan 1 11:20 0.log.20250101-111730.gz
-rw-r--r-- 1 root root 183323382 Jan 1 11:23 0.log.20250101-112026.gz
-rw-r--r-- 1 root root 183327676 Jan 1 11:26 0.log.20250101-112321.gz
-rw-r--r-- 1 root root 183336376 Jan 1 11:29 0.log.20250101-112616.gz
-rw-r----- 1 root root 205360966 Jan 1 11:29 0.log.20250101-112911
If the pod had been generating logs in Gigabytes with minimal delay, it can cause disk pressure on host.