-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change eviction manager to manage one single local storage resource #50889
Change eviction manager to manage one single local storage resource #50889
Conversation
cc @ddysher |
m.dedicatedImageFs = &hasImageFs | ||
m.resourceToRankFunc = buildResourceToRankFunc(hasImageFs) | ||
m.resourceToNodeReclaimFuncs = buildResourceToNodeReclaimFuncs(m.imageGC, m.containerGC, hasImageFs) | ||
if len(m.resourceToRankFunc) == 0 || len(m.resourceToNodeReclaimFuncs) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should change the code here. HasDedicatedImageFs is need to build for resourceToNodeReclaimFunc. Before my change, the eviction manager monitors both nodefs and imagefs usage and perform eviction when there is outofdisk condition. This part should not be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your review. imagefs is still managed ?
evicted = append(evicted, pod) | ||
} | ||
} | ||
|
||
return evicted | ||
} | ||
|
||
func getPodContainersTotalStorageLimits(pod *v1.Pod) resource.Quantity { | ||
containersTotalStorageLimits := resource.Quantity{Format: resource.BinarySI} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can resue function PodRequestsAndLimits in helpers.go to get the pod-level resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks
if used != nil && size.Sign() == 1 && used.Cmp(size) > 0 { | ||
// the emptyDir usage exceeds the size limit, evict the pod | ||
return m.evictPod(pod, v1.ResourceName("EmptyDir"), fmt.Sprintf("emptyDir usage exceeds the limit %q", size.String())) | ||
} | ||
} | ||
} | ||
containersTotalStorageLimits := getPodContainersTotalStorageLimits(pod) | ||
if !containersTotalStorageLimits.IsZero() && containersTotalStorageLimits.Cmp(emptyDirTotalUsed) < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pod-level check should include all containers usage and emptyDir usage, not just only total emptyDir usage right?
To make it more clear, maybe have a separate function podLimitEviction()..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/kubelet/eviction/helpers.go
Outdated
@@ -1001,26 +987,15 @@ func isHardEvictionThreshold(threshold evictionapi.Threshold) bool { | |||
} | |||
|
|||
// buildResourceToRankFunc returns ranking functions associated with resources | |||
func buildResourceToRankFunc(withImageFs bool) map[v1.ResourceName]rankFunc { | |||
func buildResourceToRankFunc() map[v1.ResourceName]rankFunc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part should not be changed
pkg/kubelet/eviction/helpers.go
Outdated
@@ -1030,24 +1005,13 @@ func PodIsEvicted(podStatus v1.PodStatus) bool { | |||
} | |||
|
|||
// buildResourceToNodeReclaimFuncs returns reclaim functions associated with resources. | |||
func buildResourceToNodeReclaimFuncs(imageGC ImageGC, containerGC ContainerGC, withImageFs bool) map[v1.ResourceName]nodeReclaimFuncs { | |||
func buildResourceToNodeReclaimFuncs(imageGC ImageGC, containerGC ContainerGC) map[v1.ResourceName]nodeReclaimFuncs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not be changed
pkg/kubelet/eviction/helpers.go
Outdated
} else { | ||
storageScratchAllocatable.Sub(usage[resourceDisk]) | ||
} | ||
storageScratchAllocatable.Sub(usage[resourceDisk]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not be changed. This is for node allocatable feature.
38ba12d
to
ba7fe66
Compare
ba7fe66
to
6a4a6e0
Compare
/retest |
/retest Seems not related to the code changed |
6a4a6e0
to
6d4d9e2
Compare
cmd/kubelet/app/server.go
Outdated
@@ -742,9 +742,9 @@ func parseResourceList(m kubeletconfiginternal.ConfigurationMap) (v1.ResourceLis | |||
if q.Sign() == -1 { | |||
return nil, fmt.Errorf("resource quantity for %q cannot be negative: %v", k, v) | |||
} | |||
// storage specified in configuration map is mapped to ResourceStorageScratch API | |||
// storage specified in configuration map is mapped to ResourceEphemeralStorage API | |||
if v1.ResourceName(k) == v1.ResourceStorage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should change this too
if v1.ResourceName(k) == v1.ResourceEphemeralStorage {
pkg/kubelet/eviction/helpers.go
Outdated
podVolumeUsed[volume.Name] = resource.NewQuantity(int64(*volume.UsedBytes), resource.BinarySI) | ||
} | ||
for _, vol := range pod.Spec.Volumes { | ||
if vol.EmptyDir != nil && vol.EmptyDir.Medium != v1.StorageMediumMemory && vol.EmptyDir.SizeLimit.Sign() == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why you need to check vol.EmptyDir.SizeLimit.Sign() == 1 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, actually do not need.
if overlayThreshold, ok := thresholdsMap[containerStat.Name]; ok { | ||
if overlayThreshold.Cmp(*rootfs) < 0 { | ||
return m.evictPod(pod, v1.ResourceName("containerOverlay"), fmt.Sprintf("container's overlay usage exceeds the limit %q", overlayThreshold.String())) | ||
containerUsed := diskUsage(containerStat.Rootfs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
containerStat.Rootfs could comes from imagefs, and should not be counted here, right?
|
||
podEphemeralStorageTotalUsage := &resource.Quantity{} | ||
podEphemeralStorageTotalUsage.Add(*getTotalEmptyDirUsed(podStats, pod)) | ||
podEphemeralStorageTotalUsage.Add(*getTotalContainerEphemeralStorageUsed(podStats)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get the total storage usage for a pod, please also check podDiskUsage() in helpers.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, got it
6d4d9e2
to
fc0eef9
Compare
fc0eef9
to
e1ee299
Compare
/retest |
e1ee299
to
c7d9c67
Compare
cc @dashpole |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of nits. Looks good overall.
if overlayThreshold.Cmp(*rootfs) < 0 { | ||
return m.evictPod(pod, v1.ResourceName("containerOverlay"), fmt.Sprintf("container's overlay usage exceeds the limit %q", overlayThreshold.String())) | ||
containerUsed := &resource.Quantity{} | ||
if *m.dedicatedImageFs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: simplify this statement:
containerUsed := diskUsage(containerStat.Logs)
if !*m.dedicatedImageFs {
containerUsed.Add(*diskUsage(containerStat.Rootfs))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah,thanks, done
} | ||
if ephemeralStorageThreshold, ok := thresholdsMap[containerStat.Name]; ok { | ||
if ephemeralStorageThreshold.Cmp(*containerUsed) < 0 { | ||
return m.evictPod(pod, v1.ResourceName("containerEphemeralStorage"), fmt.Sprintf("container's ephemeral local storage usage exceeds the limit %q", ephemeralStorageThreshold.String())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we need seperate resource types for pod and container ephemeral storage. I would prefer they both used: ResourceEphemeralStorage
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can differentiate via the message portion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for not being specific. I meant v1.RessourceEphemeralStorage, not a new resource type.
|
||
if ephemeralStorageThreshold, ok := thresholdsMap[containerStat.Name]; ok { | ||
if ephemeralStorageThreshold.Cmp(*containerUsed) < 0 { | ||
return m.evictPod(pod, v1.ResourceName("ResourceEphemeralStorage"), fmt.Sprintf("container's ephemeral local storage usage exceeds the limit %q", ephemeralStorageThreshold.String())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I wasnt specific. I meant use v1.ResourceEphemeralStorage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. Modified
b438abc
to
810008a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/retest |
/lgtm |
/assign @smarterclayton @thockin thanks @jingxu97 |
ping @smarterclayton @thockin for approval |
/approve |
glog.Errorf("eviction manager: error getting pod disk usage %v", err) | ||
return false | ||
} | ||
podEphemeralStorageTotalUsage.Add(podUsage[resourceDisk]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seem like you can directly use podUsage[resourceDisk] instead of assigning to podEphemeralStorageTotalUsage? But it is also ok
pkg/kubelet/eviction/helpers.go
Outdated
@@ -746,14 +746,14 @@ func makeSignalObservations(summaryProvider stats.SummaryProvider, capacityProvi | |||
diskUsageP := &diskUsage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we should use the similar as you did in eviction manager to calculate ephemeralStorageAllocatable. That is more clear
if *m.dedicatedImageFs {
-
fsStatsSet = []fsStatsType{fsStatsLogs, fsStatsLocalVolumeSource}
- } else {
-
fsStatsSet = []fsStatsType{fsStatsRoot, fsStatsLogs, fsStatsLocalVolumeSource}
- }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done
810008a
to
9730e3d
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: NickrenREN, jingxu97, thockin Associated issue: 50818 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest |
1 similar comment
/retest |
/test all [submit-queue is verifying that this PR is safe to merge] |
/retest |
Automatic merge from submit-queue (batch tested with PRs 50889, 51347, 50582, 51297, 51264) |
podEphemeralStorageTotalUsage := &resource.Quantity{} | ||
fsStatsSet := []fsStatsType{} | ||
if *m.dedicatedImageFs { | ||
fsStatsSet = []fsStatsType{fsStatsLogs, fsStatsLocalVolumeSource} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fsStatsLocalVolumeSource
also contains hostPath
, are we counting host path usage to Pod? e.g. if we limit Pod usage to 500M and the pod mount a hostpath which is about 1Gi, are we gonna evict the pod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, my bad. I did not notice that localVolumeNames
contains other volume sources
cc @jingxu97
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking further, if hostPath
something outside rootfs, e.g. /etc
, then do we calculate it at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think hostapath should not be counted since it is considered as persistent storage. But secrets, configumap, gitrepo, downwardapi should be counted because they are basically a wrapper of emptydir. We might need to have a separate podLocalEphemeralStorageUsage function
|
||
podEphemeralStorageTotalUsage := &resource.Quantity{} | ||
fsStatsSet := []fsStatsType{} | ||
if *m.dedicatedImageFs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry if this is already discussed somewhere; i have two questions regarding dedicated image fs:
- why do we still count
fsStatsLogs
if we have dedicated imagefs? IIUC, it is yet to decide how to handle container logs, it's possible that imagefs provides storage for logs? - are we simply ignoring eviction for imagefs? e.g. if i have imagefs setup and a container keeps writing to writable layer, it can use up all storage in imagefs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only manage one partition(root) now. If we do not have imagefs partition, then writable layer is satisfied by root partition, we need to calculate its usage, if we have imagefs partition, do not manage it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which means this is true? if i have imagefs setup and a container keeps writing to writable layer, it can use up all storage in imagefs?
We only manage one partition(root) now. If we do not have imagefs partition, then writable layer is satisfied by root partition, we need to calculate its usage, if we have imagefs partition, do not manage it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddysher yes, that is true..
What this PR does / why we need it:
We decided to manage one single resource name, eviction policy should be modified too.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): part of #50818Special notes for your reviewer:
Release note:
/assign @jingxu97