Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change eviction manager to manage one single local storage resource #50889

Merged
merged 2 commits into from
Aug 26, 2017

Conversation

NickrenREN
Copy link
Contributor

@NickrenREN NickrenREN commented Aug 18, 2017

What this PR does / why we need it:
We decided to manage one single resource name, eviction policy should be modified too.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): part of #50818

Special notes for your reviewer:

Release note:

Change eviction manager to manage one single local ephemeral storage resource

/assign @jingxu97

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 18, 2017
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Aug 18, 2017
@NickrenREN
Copy link
Contributor Author

cc @ddysher

m.dedicatedImageFs = &hasImageFs
m.resourceToRankFunc = buildResourceToRankFunc(hasImageFs)
m.resourceToNodeReclaimFuncs = buildResourceToNodeReclaimFuncs(m.imageGC, m.containerGC, hasImageFs)
if len(m.resourceToRankFunc) == 0 || len(m.resourceToNodeReclaimFuncs) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should change the code here. HasDedicatedImageFs is need to build for resourceToNodeReclaimFunc. Before my change, the eviction manager monitors both nodefs and imagefs usage and perform eviction when there is outofdisk condition. This part should not be changed.

Copy link
Contributor Author

@NickrenREN NickrenREN Aug 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your review. imagefs is still managed ?

evicted = append(evicted, pod)
}
}

return evicted
}

func getPodContainersTotalStorageLimits(pod *v1.Pod) resource.Quantity {
containersTotalStorageLimits := resource.Quantity{Format: resource.BinarySI}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can resue function PodRequestsAndLimits in helpers.go to get the pod-level resources

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

if used != nil && size.Sign() == 1 && used.Cmp(size) > 0 {
// the emptyDir usage exceeds the size limit, evict the pod
return m.evictPod(pod, v1.ResourceName("EmptyDir"), fmt.Sprintf("emptyDir usage exceeds the limit %q", size.String()))
}
}
}
containersTotalStorageLimits := getPodContainersTotalStorageLimits(pod)
if !containersTotalStorageLimits.IsZero() && containersTotalStorageLimits.Cmp(emptyDirTotalUsed) < 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pod-level check should include all containers usage and emptyDir usage, not just only total emptyDir usage right?
To make it more clear, maybe have a separate function podLimitEviction()..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1001,26 +987,15 @@ func isHardEvictionThreshold(threshold evictionapi.Threshold) bool {
}

// buildResourceToRankFunc returns ranking functions associated with resources
func buildResourceToRankFunc(withImageFs bool) map[v1.ResourceName]rankFunc {
func buildResourceToRankFunc() map[v1.ResourceName]rankFunc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part should not be changed

@@ -1030,24 +1005,13 @@ func PodIsEvicted(podStatus v1.PodStatus) bool {
}

// buildResourceToNodeReclaimFuncs returns reclaim functions associated with resources.
func buildResourceToNodeReclaimFuncs(imageGC ImageGC, containerGC ContainerGC, withImageFs bool) map[v1.ResourceName]nodeReclaimFuncs {
func buildResourceToNodeReclaimFuncs(imageGC ImageGC, containerGC ContainerGC) map[v1.ResourceName]nodeReclaimFuncs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not be changed

} else {
storageScratchAllocatable.Sub(usage[resourceDisk])
}
storageScratchAllocatable.Sub(usage[resourceDisk])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not be changed. This is for node allocatable feature.

@NickrenREN NickrenREN force-pushed the local-storage-eviction branch 2 times, most recently from 38ba12d to ba7fe66 Compare August 22, 2017 07:07
@k8s-github-robot k8s-github-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Aug 22, 2017
@NickrenREN NickrenREN force-pushed the local-storage-eviction branch from ba7fe66 to 6a4a6e0 Compare August 22, 2017 07:58
@NickrenREN
Copy link
Contributor Author

/retest

@NickrenREN
Copy link
Contributor Author

/retest

Seems not related to the code changed

@NickrenREN NickrenREN force-pushed the local-storage-eviction branch from 6a4a6e0 to 6d4d9e2 Compare August 23, 2017 05:32
@@ -742,9 +742,9 @@ func parseResourceList(m kubeletconfiginternal.ConfigurationMap) (v1.ResourceLis
if q.Sign() == -1 {
return nil, fmt.Errorf("resource quantity for %q cannot be negative: %v", k, v)
}
// storage specified in configuration map is mapped to ResourceStorageScratch API
// storage specified in configuration map is mapped to ResourceEphemeralStorage API
if v1.ResourceName(k) == v1.ResourceStorage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change this too
if v1.ResourceName(k) == v1.ResourceEphemeralStorage {

podVolumeUsed[volume.Name] = resource.NewQuantity(int64(*volume.UsedBytes), resource.BinarySI)
}
for _, vol := range pod.Spec.Volumes {
if vol.EmptyDir != nil && vol.EmptyDir.Medium != v1.StorageMediumMemory && vol.EmptyDir.SizeLimit.Sign() == 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you need to check vol.EmptyDir.SizeLimit.Sign() == 1 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, actually do not need.

if overlayThreshold, ok := thresholdsMap[containerStat.Name]; ok {
if overlayThreshold.Cmp(*rootfs) < 0 {
return m.evictPod(pod, v1.ResourceName("containerOverlay"), fmt.Sprintf("container's overlay usage exceeds the limit %q", overlayThreshold.String()))
containerUsed := diskUsage(containerStat.Rootfs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containerStat.Rootfs could comes from imagefs, and should not be counted here, right?


podEphemeralStorageTotalUsage := &resource.Quantity{}
podEphemeralStorageTotalUsage.Add(*getTotalEmptyDirUsed(podStats, pod))
podEphemeralStorageTotalUsage.Add(*getTotalContainerEphemeralStorageUsed(podStats))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the total storage usage for a pod, please also check podDiskUsage() in helpers.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, got it

@NickrenREN NickrenREN force-pushed the local-storage-eviction branch from 6d4d9e2 to fc0eef9 Compare August 24, 2017 04:45
@k8s-github-robot k8s-github-robot removed the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Aug 24, 2017
@NickrenREN NickrenREN force-pushed the local-storage-eviction branch from fc0eef9 to e1ee299 Compare August 24, 2017 04:56
@NickrenREN
Copy link
Contributor Author

/retest

@NickrenREN NickrenREN force-pushed the local-storage-eviction branch from e1ee299 to c7d9c67 Compare August 24, 2017 07:25
@jingxu97 jingxu97 requested a review from dashpole August 24, 2017 17:42
@jingxu97
Copy link
Contributor

cc @dashpole

Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of nits. Looks good overall.

if overlayThreshold.Cmp(*rootfs) < 0 {
return m.evictPod(pod, v1.ResourceName("containerOverlay"), fmt.Sprintf("container's overlay usage exceeds the limit %q", overlayThreshold.String()))
containerUsed := &resource.Quantity{}
if *m.dedicatedImageFs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: simplify this statement:
containerUsed := diskUsage(containerStat.Logs)
if !*m.dedicatedImageFs {
containerUsed.Add(*diskUsage(containerStat.Rootfs))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah,thanks, done

}
if ephemeralStorageThreshold, ok := thresholdsMap[containerStat.Name]; ok {
if ephemeralStorageThreshold.Cmp(*containerUsed) < 0 {
return m.evictPod(pod, v1.ResourceName("containerEphemeralStorage"), fmt.Sprintf("container's ephemeral local storage usage exceeds the limit %q", ephemeralStorageThreshold.String()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need seperate resource types for pod and container ephemeral storage. I would prefer they both used: ResourceEphemeralStorage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can differentiate via the message portion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for not being specific. I meant v1.RessourceEphemeralStorage, not a new resource type.


if ephemeralStorageThreshold, ok := thresholdsMap[containerStat.Name]; ok {
if ephemeralStorageThreshold.Cmp(*containerUsed) < 0 {
return m.evictPod(pod, v1.ResourceName("ResourceEphemeralStorage"), fmt.Sprintf("container's ephemeral local storage usage exceeds the limit %q", ephemeralStorageThreshold.String()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasnt specific. I meant use v1.ResourceEphemeralStorage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Modified

@NickrenREN NickrenREN force-pushed the local-storage-eviction branch from b438abc to 810008a Compare August 25, 2017 03:03
Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NickrenREN
Copy link
Contributor Author

/retest

@jingxu97
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2017
@NickrenREN
Copy link
Contributor Author

NickrenREN commented Aug 25, 2017

/assign @smarterclayton @thockin

thanks @jingxu97

@NickrenREN
Copy link
Contributor Author

ping @smarterclayton @thockin for approval

@thockin
Copy link
Member

thockin commented Aug 25, 2017

/approve

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 25, 2017
glog.Errorf("eviction manager: error getting pod disk usage %v", err)
return false
}
podEphemeralStorageTotalUsage.Add(podUsage[resourceDisk])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem like you can directly use podUsage[resourceDisk] instead of assigning to podEphemeralStorageTotalUsage? But it is also ok

@@ -746,14 +746,14 @@ func makeSignalObservations(summaryProvider stats.SummaryProvider, capacityProvi
diskUsageP := &diskUsage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we should use the similar as you did in eviction manager to calculate ephemeralStorageAllocatable. That is more clear

if *m.dedicatedImageFs {

  •  fsStatsSet = []fsStatsType{fsStatsLogs, fsStatsLocalVolumeSource}
    
  • } else {
  •  fsStatsSet = []fsStatsType{fsStatsRoot, fsStatsLogs, fsStatsLocalVolumeSource}
    
  • }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done

@jingxu97 jingxu97 removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2017
@NickrenREN NickrenREN force-pushed the local-storage-eviction branch from 810008a to 9730e3d Compare August 25, 2017 21:15
@jingxu97
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: NickrenREN, jingxu97, thockin

Associated issue: 50818

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@NickrenREN
Copy link
Contributor Author

/retest

1 similar comment
@NickrenREN
Copy link
Contributor Author

/retest

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@NickrenREN
Copy link
Contributor Author

/retest

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 50889, 51347, 50582, 51297, 51264)

@k8s-github-robot k8s-github-robot merged commit 76c520c into kubernetes:master Aug 26, 2017
@NickrenREN NickrenREN deleted the local-storage-eviction branch August 26, 2017 06:09
podEphemeralStorageTotalUsage := &resource.Quantity{}
fsStatsSet := []fsStatsType{}
if *m.dedicatedImageFs {
fsStatsSet = []fsStatsType{fsStatsLogs, fsStatsLocalVolumeSource}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fsStatsLocalVolumeSource also contains hostPath, are we counting host path usage to Pod? e.g. if we limit Pod usage to 500M and the pod mount a hostpath which is about 1Gi, are we gonna evict the pod?

Copy link
Contributor Author

@NickrenREN NickrenREN Aug 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, my bad. I did not notice that localVolumeNames contains other volume sources
cc @jingxu97

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking further, if hostPath something outside rootfs, e.g. /etc, then do we calculate it at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think hostapath should not be counted since it is considered as persistent storage. But secrets, configumap, gitrepo, downwardapi should be counted because they are basically a wrapper of emptydir. We might need to have a separate podLocalEphemeralStorageUsage function


podEphemeralStorageTotalUsage := &resource.Quantity{}
fsStatsSet := []fsStatsType{}
if *m.dedicatedImageFs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry if this is already discussed somewhere; i have two questions regarding dedicated image fs:

  • why do we still count fsStatsLogs if we have dedicated imagefs? IIUC, it is yet to decide how to handle container logs, it's possible that imagefs provides storage for logs?
  • are we simply ignoring eviction for imagefs? e.g. if i have imagefs setup and a container keeps writing to writable layer, it can use up all storage in imagefs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only manage one partition(root) now. If we do not have imagefs partition, then writable layer is satisfied by root partition, we need to calculate its usage, if we have imagefs partition, do not manage it now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which means this is true? if i have imagefs setup and a container keeps writing to writable layer, it can use up all storage in imagefs?

We only manage one partition(root) now. If we do not have imagefs partition, then writable layer is satisfied by root partition, we need to calculate its usage, if we have imagefs partition, do not manage it now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddysher yes, that is true..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants