-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apiservers: add synchronous shutdown mechanism on SIGTERM+INT #50439
apiservers: add synchronous shutdown mechanism on SIGTERM+INT #50439
Conversation
cmd/hyperkube/kube-aggregator.go
Outdated
if err := o.Complete(); err != nil { | ||
return err | ||
} | ||
if err := o.Validate(args); err != nil { | ||
return err | ||
} | ||
if err := o.RunAggregator(wait.NeverStop); err != nil { | ||
if err := o.RunAggregator(stopCh); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err == nil
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@crassirostris can you create a PR against this one to actually send the batched events? |
@crassirostris either against this one or a follow-up. Btw, I tested this manually and I see some shutdown messages (possibly from informers). @ALL: If anybody has a Windows machine, please test this there as well. |
@sttts Sure, I'll try to do this by the EOD or EOW |
Thanks a lot for doing this work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try to repeatedly bang on Ctrl-C (i.e., generating multiple SIGINT's, for example)?
shutdownStopCh := make(chan struct{}) | ||
go func() { | ||
defer close(shutdownStopCh) | ||
<-time.After(time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just temporary, correct? Otherwise 1 minute seems very arbitrary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We can also leave that out and trust a SIGKILL following after some time anyway. @crassirostris @soltysh opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start with this and TODO, once we reach a point where we'll gather some feedback we can decide what to do with it further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer leaving this out, for the exact reason @sttts mentioned: in most setups SIGKILL will follow anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I can't even figure out why this is here, a comment is very necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact we decided to remove it. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
No. I wonder what the usual behavour is in those case? Directly calling |
Calling os.Exit(?) on subsequent SIGINTs is probably better than trying to call the teardown sequence again, unless it can handle being repeatedly called. That seems a harder ask. As a user you just have to expect that a Ctrl-C might take some time; repeatedly hitting Ctrl-C will be a harsher exit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, overall this LGTM! Thanks Stefan!!!
cmd/hyperkube/hyperkube.go
Outdated
hk.Println("Error:", err) | ||
if !s.RespectsStopCh { | ||
// for command that do not respect the stopCh, we run them in a go | ||
// routine while leaving them running while when stopCh is closed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running while when
@@ -0,0 +1,55 @@ | |||
/* | |||
Copyright 2014 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2017
) | ||
|
||
// SetupSignalHandler registered for SIGTERM and SIGINT. A stop channel is returned | ||
// which is closed on one of these signals. A exitCode channel is returned for the program |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An exitCode
cmd/hyperkube/hyperkube.go
Outdated
exitCode := 1 | ||
defer func() { exitCodeCh <- exitCode }() | ||
|
||
err := hk.Run(args, stopCh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but I like when these kind of things are written with the short-if. Ask @pweil-, he calls me short-if 👮
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw. I'm surprised we don't print that error here anywhere, like we do in most cases.
if err := cmd.Execute(); err == nil { | ||
exitCode = 0 | ||
} else { | ||
glog.Error(err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's nice, your SetupSignalHandler
also nicely takes care of exiting with code != 0 :)
shutdownStopCh := make(chan struct{}) | ||
go func() { | ||
defer close(shutdownStopCh) | ||
<-time.After(time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start with this and TODO, once we reach a point where we'll gather some feedback we can decide what to do with it further.
@frobware @crassirostris @soltysh thanks for the quick review. Addressed all comments. ptal |
shutdownStopCh := make(chan struct{}) | ||
go func() { | ||
defer close(shutdownStopCh) | ||
<-time.After(time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer leaving this out, for the exact reason @sttts mentioned: in most setups SIGKILL will follow anyway
} | ||
case code := <-exit: | ||
close(stop) | ||
os.Exit(code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that create a race between main function returning 0 and the os.Exit
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why I did this exit channel logic at all. ptal.
cmd/hyperkube/hyperkube.go
Outdated
@@ -174,7 +176,22 @@ func (hk *HyperKube) Run(args []string) error { | |||
logs.InitLogs() | |||
defer logs.FlushLogs() | |||
|
|||
err = s.Run(s, s.Flags().Args()) | |||
if !s.RespectsStopCh { | |||
// for commands that do not respect the stopCh, we run them in a go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: capitalize the first letter for conformity.
stop := make(chan struct{}) | ||
c := make(chan os.Signal, 2) | ||
signal.Notify(c, shutdownSignals...) | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: will this work?
<-c
close(stop)
<-c
os.Exit(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly nicer 👍
cmd.Flags().AddGoFlagSet(flag.CommandLine) | ||
if err := cmd.Execute(); err != nil { | ||
panic(err) | ||
glog.Fatal(err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err.Error()
Is it necessary? Can it be just glog.Fatal(err)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, Fatalf wants a string. Fatal is generic.
cmd.Flags().AddGoFlagSet(flag.CommandLine) | ||
if err := cmd.Execute(); err != nil { | ||
panic(err) | ||
glog.Fatal(err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
81879e2
to
a47779e
Compare
a47779e
to
f998764
Compare
@crassirostris @soltysh here is still a slight chance to drop events. To be really sure we want to disable the http handlers and at kill all existing connections. But that's follow-up. At least we have a chance to implement this now. |
You mean some requests may go through not audited while auditing is being shut down?
Yes, thanks again for working on this :) |
c88f3b1
to
f9f7625
Compare
f9f7625
to
f1c10c1
Compare
/assign @smarterclayton @smarterclayton is it approved? |
/retest |
/approve |
/approve no-issue |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crassirostris, smarterclayton, sttts Associated issue requirement bypassed by: sttts The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest Review the full test history for this PR. |
2 similar comments
/retest Review the full test history for this PR. |
/retest Review the full test history for this PR. |
f1c10c1
to
5a75be1
Compare
@mikedanese there is something odd with the |
5a75be1
to
11b2536
Compare
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
Automatic merge from submit-queue advanced audit: shutdown batching audit webhook gracefully Follow-up of #50439 When the `stopCh` passed to the batching audit webhook is closed, it stops accepting new events and when `Shutdown` method is called afterwards, it blocks until the last request to the webhook has finished. /cc @tallclair @soltysh
This is used to shutdown the auditing backend in order not to drop any pending events on the floor.