Skip to content

Cluster join fails due to insufficient ephemeral-storageΒ #99305

Closed
@hvenev-vmware

Description

What happened:

kubeadm join on the new node fails sometimes. The following error appears in the log:

Feb 22 14:04:33 NEWHOST kubelet[7331]: W0222 14:04:33.866713    7331 predicate.go:113] Failed to admit pod etcd-NEWHOST_kube-system(13c2d55e88e08af2d946555ab67e9eab) - Unexpected error while attempting to recover from admission failure: preemption: error finding a set of pods to preempt: no set of running pods found to reclaim resources: [(res: ephemeral-storage, q: 104857600), ]

What you expected to happen:

kubeadm join on the new node succeeds every time

How to reproduce it (as minimally and precisely as possible):

Join a control plane node

Anything else we need to know?:

The node has over 40 GB of free space on its root partition.

I added some more logging to the following functions:

  • pkg/kubelet/cm/container_manager_linux.go, Start
  • pkg/kubelet/lifecycle/predicate.go, GeneralPredicates
  • pkg/kubelet/nodestatus/setters.go, MachineInfo
  • pkg/kubelet/nodestatus/setters.go, ReadyCondition
  • pkg/kubelet/preemption/preemption.go, HandleAdmissionFailure
  • pkg/scheduler/framework/plugins/noderesources/fit.go, Filter

It appears that the ephemeral storage is detected a bit too late:

Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.599153    7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.599202    7331 setters.go:334] TRACE MachineInfo got initialCapacity map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.599333    7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.599357    7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.666348    7331 setters.go:527] TRACE capacity cpu: {i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.666413    7331 setters.go:527] TRACE capacity memory: {i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.666425    7331 setters.go:527] TRACE capacity pods: {i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.666435    7331 setters.go:527] TRACE capacity ephemeral-storage: {i:{value:0 scale:0} d:{Dec:<nil>} s: Format:}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.673445    7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.673490    7331 setters.go:334] TRACE MachineInfo got initialCapacity map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.673540    7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.673563    7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.692186    7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.692250    7331 setters.go:334] TRACE MachineInfo got initialCapacity map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.692306    7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.692325    7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.743289    7331 setters.go:527] TRACE capacity cpu: {i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.743339    7331 setters.go:527] TRACE capacity memory: {i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.743349    7331 setters.go:527] TRACE capacity pods: {i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.743362    7331 setters.go:527] TRACE capacity ephemeral-storage: {i:{value:0 scale:0} d:{Dec:<nil>} s: Format:}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.752328    7331 container_manager_linux.go:613] TRACE containerManagerImpl.Start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.752381    7331 container_manager_linux.go:625] TRACE containerManagerImpl.Start, map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.772730    7331 setters.go:527] TRACE capacity cpu: {i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.772773    7331 setters.go:527] TRACE capacity memory: {i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.772780    7331 setters.go:527] TRACE capacity pods: {i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.772786    7331 setters.go:527] TRACE capacity ephemeral-storage: {i:{value:0 scale:0} d:{Dec:<nil>} s: Format:}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.826100    7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.826146    7331 setters.go:334] TRACE MachineInfo got initialCapacity map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.826353    7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.826414    7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:46245666740 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846712    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846749    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:0 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846845    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846854    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:0 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846892    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.846940    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:0 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847005    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847014    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:0 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847072    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847080    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:250 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847160    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.847177    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:450 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.866562    7331 predicate.go:225] TRACE nodeInfo.Allocatable &{MilliCPU:12000 Memory:42086916096 EphemeralStorage:0 AllowedPodNumber:210 ScalarResources:map[hugepages-1Gi:0 hugepages-2Mi:0]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.866598    7331 predicate.go:226] TRACE nodeInfo.Requested &{MilliCPU:550 Memory:0 EphemeralStorage:0 AllowedPodNumber:0 ScalarResources:map[]}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.866620    7331 preemption.go:67] TRACE cannot admit etcd-NEWHOST, [Node didn't have enough resource: ephemeral-storage, requested: 104857600, used: 0, capacity: 0]
Feb 22 14:04:33 NEWHOST kubelet[7331]: W0222 14:04:33.866713    7331 predicate.go:113] Failed to admit pod etcd-NEWHOST_kube-system(13c2d55e88e08af2d946555ab67e9eab) - Unexpected error while attempting to recover from admission failure: preemption: error finding a set of pods to preempt: no set of running pods found to reclaim resources: [(res: ephemeral-storage, q: 104857600), ]
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.905188    7331 setters.go:527] TRACE capacity cpu: {i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.905231    7331 setters.go:527] TRACE capacity memory: {i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.905239    7331 setters.go:527] TRACE capacity pods: {i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}
Feb 22 14:04:33 NEWHOST kubelet[7331]: E0222 14:04:33.905248    7331 setters.go:527] TRACE capacity ephemeral-storage: {i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI}
Feb 22 14:04:43 NEWHOST kubelet[7331]: E0222 14:04:43.789009    7331 setters.go:284] TRACE MachineInfo start
Feb 22 14:04:43 NEWHOST kubelet[7331]: E0222 14:04:43.789081    7331 setters.go:334] TRACE MachineInfo setting capacity to map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]
Feb 22 14:04:43 NEWHOST kubelet[7331]: E0222 14:04:43.789183    7331 setters.go:408] TRACE MachineInfo done, Capacity: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:51384074240 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42191773696 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]
Feb 22 14:04:43 NEWHOST kubelet[7331]: E0222 14:04:43.789234    7331 setters.go:409] TRACE MachineInfo done, Allocatable: map[cpu:{i:{value:12000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} ephemeral-storage:{i:{value:46245666740 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-1Gi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} hugepages-2Mi:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} memory:{i:{value:42086916096 scale:0} d:{Dec:<nil>} s: Format:BinarySI} pods:{i:{value:210 scale:0} d:{Dec:<nil>} s: Format:DecimalSI}]

/sig node storage

Environment:

  • Kubernetes version (use kubectl version): 1.20.4, commit e87da0b
  • Kernel (e.g. uname -a): 4.9.252-1.ph2

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.sig/cluster-lifecycleCategorizes an issue or PR as relevant to SIG Cluster Lifecycle.sig/nodeCategorizes an issue or PR as relevant to SIG Node.sig/storageCategorizes an issue or PR as relevant to SIG Storage.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    • Status

      Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions