-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work around for PDs stop mounting after a few hours issue #10169
Conversation
GCE e2e build/test passed for commit 5f2a9a5315e07fd82ae623da1d2827f60d2cca33. |
Per new merge policy, risk assessment for this PR: this PR touches GCE Volume PD attach and detach code. I'd consider this a critical bug fix, that'll improve system stability--without this PR, GCE Volume PD attachment may stop working after some time on some nodes. |
// Calls "udevadm trigger --action=change" on the specified drive. | ||
// drivePath must be the the block device path to trigger on, in the format "/dev/sd*", or a symlink to it. | ||
// This is workaround for Issue #7972. Once the underlying issue has been resolved, this may be removed. | ||
func udevadmChangeToDrive(drivePath string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets try to split this function up, it does too many things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to leave this as one function since it is implementing temporary behavior that I'd like to keep all together (and remove all together when the time comes). Let me know if you feel strongly otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can put the split functions into a separate source file if you want to make it easy to delete them together at a later date. I actually suspect that this code might be required for some time still, given that the underlying GCE fix does not in fact directly address the fact that udev events are not being processed strictly correctly (which this PR works around).
I will leave the decision to you, but if there are bugs in this code, we're going to need to split it up to debug effectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you're definitely going to need to write unit and integration tests for this code, which is going to be easier if you split up the function.
More to come on this review. I'll get back on it tomorrow - gotta run, sorry. |
GCE e2e build/test passed for commit f4bb795f59a67293b279c9666289b9f298c449be. |
glog.Errorf("Error filepath.Glob(\"/dev/sd*\"): %v\r\n", err) | ||
} | ||
udevadmChangeToNewDrives(sdBefore, sdAfter) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not new code below, but this causes kubelet to go to sleep for 1 second, perhaps 10 times in a row, as far as I understand. Is this in a separate go rountine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dchen1107 @lavalamp I'm being a bit lazy here, but what is the preferred way of doing these sorts of retries inside kubelet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should block only one pod and not the entire queue. I'm not clear on why this is a for {
loop rather than a for numTries := 0; numTries < 10; numTries++ {
loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually @quinton-hoole's intuition is right here. This could cause kubelet's syncPods
loop to sleep for an additional 10 seconds, see kubelet.go#L1328, since volume teardown is synchronous (unlike volume creation which is asynchronous, see pkg/kubelet/pod_workers.go#L151). Any opposition to spawning of a new thread (go routine) to execute vol.TearDown()
in kubelet.go:cleanupOrphanedVolumes(...)
?
OK, review done. If you can strip all of the stuff I suggested out of udevadmChangeToNewDrives() and udevadmChangeToDrive() then there's no need to split up udevadmChangeToDrive(). Ideally I'd like unit and integration tests for this, but it's going to be tricky to stub out udevadm, and given how simple the code should become if you implement the suggested changes, I think that it's OK to merge without these tests. The e2e tests should catch any bad breakages in future, albeit at more test execution time. |
GCE e2e build/test failed for commit 95d694245aa0a760ae530d4eca5581ddaf1fa7f4. |
@k8s-bot retest this please |
GCE e2e build/test failed for commit 95d694245aa0a760ae530d4eca5581ddaf1fa7f4. |
Thanks for the feedback @quinton-hoole |
if err != nil { | ||
glog.Errorf("Could not tear down volume %q: %v", name, err) | ||
} | ||
go func(volumeName string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove TODO. @dchen1107 is it OK to start a random goroutine here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g., what if it comes through next time and it's still working on tearing this down and it tears it down again.
@saad-ali Let me know when you're ready for another round of reviews on this one. Assigning to you until then. |
|
||
// Veify the specified persistent disk device has been succesfully detached, and retries if it fails. | ||
func verifyDetached(pd *gcePersistentDisk, gce cloudprovider.Interface) { | ||
ch := detachChanManager.CreateAndAddChan(pd.pdName, 0 /* bufferSize */) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment to the effect:
bufferSize being 0 is very important, because it means that when senders send, they are blocked until we recieve; this avoids having to have a separate "did you exit yet" check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
removing ok-to-merge since it looks like saad has unpushed changes ;) |
GCE e2e build/test failed for commit 1215b17ec0374e582a45b484ebd544913cf2ccff. |
LGTM |
GCE e2e build/test failed for commit c952ee2. |
@k8s-bot retest this please |
@k8s-bot test this please |
GCE e2e build/test failed for commit c952ee2. |
Looks like github is flaky today. This is the second error accessing github that I've seen in the past hour. Retrying...
|
@k8s-bot test this please |
Oh wait. It's a 401 (unauthorized) error. I wonder whether the problem is on our Jenkins side with credentials? Checking... |
@ixdy Tells me that these 401 errors have been happening intermittently for a while now, and are mostly innocuous. |
The new e2e run seems to be succeeding... |
Thanks for taking a look @quinton-hoole |
GCE e2e build/test passed for commit c952ee2. |
e2e Passed. LGTM, ok-to-merge |
Work around for PDs stop mounting after a few hours issue
This is a temporary work around for #7972 and fixes #9994 (and is a follow-up to PR #9929).
The problem was that if we repeatedly create, attach, mount, unmount, detach, and delete a PD from a GCE instance (PD soak test did this), after a few hours the GCE instance gets corrupted and stops attaching PDs correctly.
The real fix lies on the GCE side, in the meantime, they suggest calling
udevadm trigger
as a work around. @quinton-hoole dug in toudevadm trigger
and suggested that we target the trigger more narrowly to the device being mounted/unmounted to prevent any unintended side effects.In addition to calling
udevadm trigger
on the device being attached/detached, this adds logic to verify that a disk detach completed (for #9994) and adds robust retry logic for both attaches and detaches.Tested this locally by running PD tests back to back for hours. Although it is much more stable than before, the tests still flake out every few hours (disk doesn't attach or detach properly). But when this happens it doesn't leave the machine unable to attach any new disks (as was the case before this fix).
CC @quinton-hoole @brendanburns @dchen1107