fix leaking memory backed volumes of terminated pods #36779

sjenning · 2016-11-14T23:11:27Z

Currently, we allow volumes to remain mounted on the node, even though the pod is terminated. This creates a vector for a malicious user to exhaust memory on the node by creating memory backed volumes containing large files.

This PR removes memory backed volumes (emptyDir w/ medium Memory, secrets, configmaps) of terminated pods from the node.

@saad-ali @derekwaynecarr

This change is

saad-ali · 2016-11-15T05:48:16Z

pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go

 // Iterate through all pods and add to desired state of world if they don't
 // exist but should
 func (dswp *desiredStateOfWorldPopulator) findAndAddNewPods() {
 	for _, pod := range dswp.podManager.GetPods() {
+		if isPodTerminated(pod) {
+			// Do not (re)add volumes for terminated pods


Unless "non-memory backed volume"?

I thought about this but I couldn't think of a good reason to add a volume back to a node for a pod that is terminated. The only reason I'm filtering the ones we remove is to not make waves for the release. In general, I can't think of a reason we would want to leave volumes attached to a node for a terminated pods except for debugging.

I thought about this but I couldn't think of a good reason to add a volume back to a node for a pod that is terminated. The only reason I'm filtering the ones we remove is to not make waves for the release. In general, I can't think of a reason we would want to leave volumes attached to a node for a terminated pods except for debugging.

Neither can I, but in the interest of minimizing potentially disruptive changes, it makes sense to limit the scope of the change as much as possible. Thoughts?

@saad-ali the change required to check that actually makes the change more invasive because I need to get the volume. In findAndRemoveDeletedPods(), that is already available via volumeToMount. In findAndAddNewPods(), the logic is inverted; for each pod, process volumes. I'm not sure if it is safe to call dswp.desiredStateOfWorld.GetVolumesToMount() in findAndAddNewPods().

I can extend my e2e tests to include a check that ensures non-memory backed volumes are untouched by this change?

Ack. That's fine. I'm a little nervous about it, but let's just get it in and give it time to bake and keep an eye out for strange volume behaviors.

saad-ali · 2016-11-15T05:52:06Z

@yujuhong Can you also take a look at this PR

derekwaynecarr · 2016-11-15T15:17:32Z

Is it possible to write a node E2E test that verifies this behavior?

derekwaynecarr · 2016-11-15T15:20:57Z

For example, can we do a test like the following:

launch a pod that writes into a memory backed empty dir
terminate said pod (but dont delete it)
launch a new pod that mounts the kubelet volume host path
have that pod wait a minimum amount of time to verify the volume from Better error messages if go isn't installed, or if gcloud is old. #2 is deleted

See example of something similar here:
https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/cgroup_manager_test.go#L104

sjenning · 2016-11-15T15:48:16Z

@derekwaynecarr I'm looking into it

yujuhong · 2016-11-15T18:45:21Z

Change LGTM. Like @derekwaynecarr suggested, an e2e test would be nice.

sjenning · 2016-11-15T19:22:32Z

@derekwaynecarr @yujuhong updated PR with e2e node test

k8s-ci-robot · 2016-11-15T19:45:34Z

Jenkins GCE Node e2e failed for commit 010a826c6a0b25dbcc0913357acba92fdeb1e2c3. Full PR test history.

The magic incantation to run this job again is @k8s-bot node e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-15T19:52:34Z

Jenkins verification failed for commit 010a826c6a0b25dbcc0913357acba92fdeb1e2c3. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

derekwaynecarr · 2016-11-16T15:25:31Z

test/e2e_node/volume_manager_test.go

+									{
+										Name: "kubelet-pods",
+										VolumeSource: api.VolumeSource{
+											HostPath: &api.HostPathVolumeSource{Path: "/var/lib/kubelet/pods"},


Can you put a // TODO to pull this value from the test context for node (it needs to be added)

derekwaynecarr · 2016-11-16T15:26:39Z

one nit for a todo, otherwise this LGTM.

@saad-ali - can you take a pass and mark for 1.5 milestone if you agree?

saad-ali · 2016-11-16T21:40:22Z

One minor comment about potentially minimizing the change further

saad-ali · 2016-11-18T04:22:21Z

/lgtm

k8s-github-robot · 2016-11-18T05:29:50Z

Automatic merge from submit-queue

@saad-ali

Automatic merge from submit-queue (batch tested with PRs 39493, 39496) kubelet: fix nil deref in volume type check An attempt to address memory exhaustion through a build up of terminated pods with memory backed volumes on the node in PR #36779 introduced this. For the `VolumeSpec`, either the `Volume` or `PersistentVolume` field is set, not both. This results in a situation where there is a nil deref on PVs. Since PVs are inherently not memory-backend, only local/temporal volumes should be considered. This needs to go into 1.5 as well. Fixes #39480 @saad-ali @derekwaynecarr @grosskur @gnufied ```release-note fixes nil dereference when doing a volume type check on persistent volumes ```

@pmorie

Automatic merge from submit-queue (batch tested with PRs 37228, 40146, 40075, 38789, 40189) kubelet: storage: teardown terminated pod volumes This is a continuation of the work done in #36779 There really is no reason to keep volumes for terminated pods attached on the node. This PR extends the removal of volumes on the node from memory-backed (the current policy) to all volumes. @pmorie raised a concern an impact debugging volume related issues if terminated pod volumes are removed. To address this issue, the PR adds a `--keep-terminated-pod-volumes` flag the kubelet and sets it for `hack/local-up-cluster.sh`. For consideration in 1.6. Fixes #35406 @derekwaynecarr @vishh @dashpole ```release-note kubelet tears down pod volumes on pod termination rather than pod deletion ```

ghost · 2018-02-14T09:11:36Z

@sjenning Should this change also delete the emptyDir-Memory volume for OOMKilled terminated containers? Because I still get an OOMKilled loop, if my cgroup mem limit is reached, because of large files written to the emptyDir with OpenShift 3.6.

googlebot added the cla: yes label Nov 14, 2016

k8s-github-robot assigned saad-ali Nov 14, 2016

k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Nov 14, 2016

saad-ali reviewed Nov 15, 2016

View reviewed changes

saad-ali assigned yujuhong Nov 15, 2016

derekwaynecarr added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Nov 15, 2016

sjenning force-pushed the fix-memory-leak-via-terminated-pods branch from eaa1be0 to 010a826 Compare November 15, 2016 19:21

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 15, 2016

sjenning force-pushed the fix-memory-leak-via-terminated-pods branch from 010a826 to 3dc3de1 Compare November 15, 2016 20:28

derekwaynecarr reviewed Nov 16, 2016

View reviewed changes

fix leaking memory backed volumes of terminated pods

b80bea4

sjenning force-pushed the fix-memory-leak-via-terminated-pods branch from 3dc3de1 to b80bea4 Compare November 16, 2016 16:17

sjenning mentioned this pull request Nov 16, 2016

UPSTREAM: 36779: fix leaking memory backed volumes of terminated pods openshift/origin#11939

Merged

saad-ali approved these changes Nov 18, 2016

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 18, 2016

saad-ali added this to the v1.5 milestone Nov 18, 2016

k8s-github-robot merged commit eca9e98 into kubernetes:master Nov 18, 2016

sjenning deleted the fix-memory-leak-via-terminated-pods branch November 21, 2016 16:17

This was referenced Nov 21, 2016

kubelet: storage: teardown terminated pod volumes #37228

Merged

UPSTREAM: 36779: fix leaking memory backed volumes of terminated pods openshift/origin#12003

Merged

chentao1596 mentioned this pull request Dec 5, 2016

WIP:kubelet: support multi-headers when getting pod from HTTP source #38089

Closed

sjenning mentioned this pull request Jan 5, 2017

kubelet: fix nil deref in volume type check #39493

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix leaking memory backed volumes of terminated pods #36779

fix leaking memory backed volumes of terminated pods #36779

sjenning commented Nov 14, 2016 •

edited

Loading

saad-ali Nov 15, 2016

sjenning Nov 15, 2016

saad-ali Nov 16, 2016

sjenning Nov 16, 2016

saad-ali Nov 18, 2016

saad-ali commented Nov 15, 2016

derekwaynecarr commented Nov 15, 2016

derekwaynecarr commented Nov 15, 2016 •

edited

Loading

sjenning commented Nov 15, 2016

yujuhong commented Nov 15, 2016

sjenning commented Nov 15, 2016

k8s-ci-robot commented Nov 15, 2016

k8s-ci-robot commented Nov 15, 2016

derekwaynecarr Nov 16, 2016

derekwaynecarr commented Nov 16, 2016

saad-ali commented Nov 16, 2016

saad-ali commented Nov 18, 2016

k8s-github-robot commented Nov 18, 2016

ghost commented Feb 14, 2018 •

edited by ghost

Loading

fix leaking memory backed volumes of terminated pods #36779

fix leaking memory backed volumes of terminated pods #36779

Conversation

sjenning commented Nov 14, 2016 • edited Loading

saad-ali Nov 15, 2016

Choose a reason for hiding this comment

sjenning Nov 15, 2016

Choose a reason for hiding this comment

saad-ali Nov 16, 2016

Choose a reason for hiding this comment

sjenning Nov 16, 2016

Choose a reason for hiding this comment

saad-ali Nov 18, 2016

Choose a reason for hiding this comment

saad-ali commented Nov 15, 2016

derekwaynecarr commented Nov 15, 2016

derekwaynecarr commented Nov 15, 2016 • edited Loading

sjenning commented Nov 15, 2016

yujuhong commented Nov 15, 2016

sjenning commented Nov 15, 2016

k8s-ci-robot commented Nov 15, 2016

k8s-ci-robot commented Nov 15, 2016

derekwaynecarr Nov 16, 2016

Choose a reason for hiding this comment

derekwaynecarr commented Nov 16, 2016

saad-ali commented Nov 16, 2016

saad-ali commented Nov 18, 2016

k8s-github-robot commented Nov 18, 2016

ghost commented Feb 14, 2018 • edited by ghost Loading

sjenning commented Nov 14, 2016 •

edited

Loading

derekwaynecarr commented Nov 15, 2016 •

edited

Loading

ghost commented Feb 14, 2018 •

edited by ghost

Loading