Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make kubelet never delete files on mounted filesystems #37698

Merged
merged 1 commit into from
Mar 24, 2017

Conversation

jsafrane
Copy link
Member

With bug #27653, kubelet could remove mounted volumes and delete user data.
The bug itself is fixed, however our trust in kubelet is significantly lower.
Let's add an extra version of RemoveAll that does not cross mount boundary
(rm -rf --one-file-system).

It calls lstat(path) three times for each removed directory - once in
RemoveAllOneFilesystem and twice in IsLikelyNotMountPoint, however this way
it's platform independent and the directory that is being removed by kubelet
should be almost empty.

@jsafrane jsafrane added sig/storage Categorizes an issue or PR as relevant to SIG Storage. do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. release-note-none Denotes a PR that doesn't merit a release note. labels Nov 30, 2016
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 30, 2016
@k8s-oncall
Copy link

This change is Reviewable

@jsafrane
Copy link
Member Author

Marked as WIP for dicsussion and it needs a test (mounting in tests could be difficult though).

@saad-ali, @jingxu97, would you consider this patch useful as a safe-guard in kubelet? Will it break anything?

@kubernetes/rh-storage PTAL. Tests are welcome.

@jsafrane jsafrane force-pushed the remove-all-filesystems branch from ba0fa89 to 7444d09 Compare November 30, 2016 16:49
pkg/util/util.go Outdated
// files from another filesystems. Like 'rm -rf --one-file-system'.
// It is copied from RemoveAll() sources, with IsLikelyNotMountPoint
func RemoveAllOneFilesystem(mounter mount.Interface, path string) error {
// Simple case: if Remove works, we're done.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check mountpoint here, os.Remove can potentially remove dir too
https://golang.org/pkg/os/#Remove

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.Remove() will only remove an empty directory. So it shouldn't work if this is a mountpoint or if it has files in in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the directory was empty and mount point at the same time then we do not remove any user data, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The api contract guarantees nothing about directory emptiness

func Remove(name string) error
Remove removes the named file or directory. If there is an error, it will be of type *PathError.

@k8s-github-robot k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 30, 2016
@jsafrane jsafrane assigned jingxu97 and unassigned dchen1107 Nov 30, 2016
pkg/util/util.go Outdated
// Remove contents & return first error.
err = nil
for {
names, err1 := fd.Readdirnames(100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 100, not -1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wongma7 that's a question for os.RemoveAll

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it's reading 100 at a time in the loop.

@jsafrane jsafrane force-pushed the remove-all-filesystems branch 3 times, most recently from 34cce0f to 88e47e3 Compare December 1, 2016 09:11
@jsafrane jsafrane changed the title WIP: Make kubelet never delete files on mounted filesystems Make kubelet never delete files on mounted filesystems Dec 1, 2016
@jsafrane jsafrane removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Dec 1, 2016
@jsafrane
Copy link
Member Author

jsafrane commented Dec 1, 2016

Unit test added

@jsafrane jsafrane force-pushed the remove-all-filesystems branch from 88e47e3 to 7573c9c Compare December 1, 2016 12:23
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 1, 2016
@jsafrane jsafrane force-pushed the remove-all-filesystems branch from 7573c9c to 5ac0bea Compare December 1, 2016 13:12
@jsafrane
Copy link
Member Author

jsafrane commented Dec 1, 2016

Tested manually with Kubernetes 1.3 with #27653 still reproducible - it does not remove anything in mounted secret (tmpfs) and an AWS volume and shows in the log:

Failed to remove orphaned pod "869411ac-b7ce-11e6-90d8-0e2ed4bd5858" dir; err: cannot delete directory /var/lib/origin/openshift.local.volumes/pods/869411ac-b7ce-11e6-90d8-0e2ed4bd5858/volumes/kubernetes.io~secret/default-token-9rqfl: it is a mount point

The secret and my AWS volume are still attached and mounted and they're never unmounted (which is fixed in #27653, this PR is just safety check).

pkg/util/util.go Outdated
}

// Close directory, because windows won't remove opened directory.
fd.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth moving to defer ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a question for os.RemoveAll

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 11, 2017
@jsafrane jsafrane force-pushed the remove-all-filesystems branch from 5ac0bea to 3647aa9 Compare January 12, 2017 09:45
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 12, 2017
@jsafrane jsafrane force-pushed the remove-all-filesystems branch from 3647aa9 to 73d3e2f Compare January 12, 2017 12:56
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 30, 2017
@jingxu97
Copy link
Contributor

@jsafrane The main concern of this PR is that adding the os.stat() function in removing directories. In current design, cleanupOrphanedPodDirs is called directly by the main kubelet sync loop, such calls might cause hang problem for cleanup NFS mounted directories. I am not sure if we should consider using separate thread to do the cleanup.

@jsafrane
Copy link
Member Author

@jingxu97, os.RemoveAll calls os.Stat() too, this PR does not change it. It adds more stat calls, however if it gets stuck, os.RemoveAll would get stuck too.

@jingxu97
Copy link
Contributor

jingxu97 commented Feb 13, 2017

@jsafrane I see. Another thing I want to check is that before calling RemoveAll() directory, there is a check to see whether there are any volume directories exist. It looks to me this is a strong check already. As long as there is a volume path exist, it will return without removing path. I have an old PR #33616 which tries to check whether there is any mounted directories inside volumes path. Only if the volume path is still mounted, it will skip removing path. Later I removed this logic in #37255 because users complain about NFS hung problem. But as you said, since RemoveAll also call os.Stat(), this check will not add anything worse.
So I think we only need to put this IsLikelyNotMounted check in volume directory instead of checking it in every sub directory of the pod, which may cause delay for cleanup pod routine.
I also think we might be able to use a separate goroutine to do this cleanup. This issue was discussed before #13418

@jsafrane
Copy link
Member Author

I think there is not much in the pod directory - there are volumes/ and plugins/ (both with mounted stuff) and containers/ which just holds empty files. If everything is unmounted, there are just three empty directories to check. If something is mounted, we find it really quickly. Beauty of this RemoveAllOneFilesystem is that it does not depend on implementation details. If we restructure the pod directory in the future, this will keep working and it will prevent data being deleted.

@jingxu97
Copy link
Contributor

@jsafrane I see. Do you think we should remove the check whether there are directories exist under /volumes? Because I think that check is too strict and might fail to clean up orphaned pods.

@jsafrane
Copy link
Member Author

@jingxu97, there already is a check for existence of volumes/*, just a few lines above in getPodVolumePathListFromDisk. This RemoveAllOneFilesystem` is another safety mechanism to be 1000% sure that we don't remove any data on mounted filesystems if all the checks above fail or are buggy.

@@ -111,7 +111,7 @@ func (kl *Kubelet) cleanupOrphanedPodDirs(
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the we should remove the check len(volumePaths) > 0. If we have this check, any directories exist under /volumes will skip the clean up and RemoveAllOneFilesystem will not be called. The purpose of checking len(volumePaths) > 0 was to prevent any data loss in case the directories under /volumes are not cleaned up properly. Currently, it is possible that the directories under /volumes are unmounted, but fail to delete. If this happens, cleanupOrphanedPod will fail to delete the pod directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talk with @jsafrane offline, will address this issue in a separate PR. Opened up an issue to track this #41930

@jingxu97
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 22, 2017
@jsafrane
Copy link
Member Author

@k8s-bot test this
"Jenkins overloaded. Please try again later"

@jsafrane
Copy link
Member Author

@k8s-bot kubemark e2e test this
@k8s-bot gce etcd3 e2e test this
@k8s-bot kops aws e2e test this

@jsafrane
Copy link
Member Author

@thockin, please approve

@thockin
Copy link
Member

thockin commented Feb 27, 2017

It would be nice to have a benchmark on this vs plain old RemoveAll.

@thockin thockin added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 27, 2017
@rmmh
Copy link
Contributor

rmmh commented Feb 27, 2017

@k8s-bot verify test this

A batch test failed indicating that this needs a staging client-go update.

With bug kubernetes#27653, kubelet could remove mounted volumes and delete user data.
The bug itself is fixed, however our trust in kubelet is significantly lower.
Let's add an extra version of RemoveAll that does not cross mount boundary
(rm -rf --one-file-system).

It calls lstat(path) three times for each removed directory - once in
RemoveAllOneFilesystem and twice in IsLikelyNotMountPoint, however this way
it's platform independent and the directory that is being removed by kubelet
should be almost empty.
@jsafrane jsafrane force-pushed the remove-all-filesystems branch from 02741e5 to d7d039d Compare February 28, 2017 14:33
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 28, 2017
@jsafrane
Copy link
Member Author

I moved the function to standalone package at pkg/util/removeall.

pkg/util/util.go is part of client-go and adding RemoveAllOneFilesystem there would pull mounter and exec into client-go as dependencies, which IMO does not belong there.

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

The following people have approved this PR: jingxu97, jsafrane, k8s-merge-robot

Needs approval from an approver in each of these OWNERS Files:

We suggest the following people:
cc @lavalamp
You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@jsafrane jsafrane added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 1, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 41139, 41186, 38882, 37698, 42034)

@k8s-github-robot k8s-github-robot merged commit d14854f into kubernetes:master Mar 24, 2017
@g0194776
Copy link

g0194776 commented Jun 14, 2017

Will the issue be merge into the v1.6.5 version of Kubernetes?

@jsafrane
Copy link
Member Author

Well, nobody really asked for backport and I think we missed 1.6.5 deadline. This PR is just a safety mechanism that #27653 won't ever happen again, however the real bugfix is in #27970 and it has been already merged in every branch starting with 1.3.

@g0194776
Copy link

@jsafrane We are using a cluster of Kubernetes v1.3.4 with 50+ nodes. We've found an issue that the Kubelet will automatically delete all NFS mounted files after the POD became to orphan. This Issue JUST likes what problem we met, so I really care about the BUG FIX about that.

I have no way to prove that the version of v1.6.4 Kubelet won't deletes the mounted files again.
So, do you have any suggestions?

@wongma7
Copy link
Contributor

wongma7 commented Jun 20, 2017

@g0194776 the fix is in 1.3 branch starting only with 1.3.6 https://github.com/kubernetes/kubernetes/releases/tag/v1.3.6. The fix is definitely in 1.6.4

@g0194776
Copy link

@wongma7 Obviously, It does not fix YET. I was checking the source code of v1.6.4 out on few days ago, I also checked the code implementation at here.

The newest change JUST replaces os.RemoveAll to another method which called: "removeall.RemoveAllOneFilesystem".

@wongma7
Copy link
Contributor

wongma7 commented Jun 22, 2017 via email

@g0194776
Copy link

@wongma7 Okay, I got it.
Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.