Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apiservers: add synchronous shutdown mechanism on SIGTERM+INT #50439

Merged
merged 1 commit into from
Aug 15, 2017

Conversation

sttts
Copy link
Contributor

@sttts sttts commented Aug 10, 2017

This is used to shutdown the auditing backend in order not to drop any pending events on the floor.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 10, 2017
@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 10, 2017
@sttts sttts assigned soltysh and crassirostris and unassigned mikedanese Aug 10, 2017
if err := o.Complete(); err != nil {
return err
}
if err := o.Validate(args); err != nil {
return err
}
if err := o.RunAggregator(wait.NeverStop); err != nil {
if err := o.RunAggregator(stopCh); err == nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err == nil

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@sttts
Copy link
Contributor Author

sttts commented Aug 10, 2017

@crassirostris can you create a PR against this one to actually send the batched events?

@sttts
Copy link
Contributor Author

sttts commented Aug 10, 2017

@crassirostris either against this one or a follow-up.

Btw, I tested this manually and I see some shutdown messages (possibly from informers).

@ALL: If anybody has a Windows machine, please test this there as well.

@crassirostris
Copy link

@sttts Sure, I'll try to do this by the EOD or EOW

@crassirostris
Copy link

Thanks a lot for doing this work!

Copy link
Contributor

@frobware frobware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try to repeatedly bang on Ctrl-C (i.e., generating multiple SIGINT's, for example)?

shutdownStopCh := make(chan struct{})
go func() {
defer close(shutdownStopCh)
<-time.After(time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just temporary, correct? Otherwise 1 minute seems very arbitrary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We can also leave that out and trust a SIGKILL following after some time anyway. @crassirostris @soltysh opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with this and TODO, once we reach a point where we'll gather some feedback we can decide what to do with it further.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer leaving this out, for the exact reason @sttts mentioned: in most setups SIGKILL will follow anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I can't even figure out why this is here, a comment is very necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact we decided to remove it. Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sttts
Copy link
Contributor Author

sttts commented Aug 10, 2017

Did you try to repeatedly bang on Ctrl-C (i.e., generating multiple SIGINT's, for example)?

No. I wonder what the usual behavour is in those case? Directly calling os.Exit on the second SIGINT?

@frobware
Copy link
Contributor

Calling os.Exit(?) on subsequent SIGINTs is probably better than trying to call the teardown sequence again, unless it can handle being repeatedly called. That seems a harder ask.

As a user you just have to expect that a Ctrl-C might take some time; repeatedly hitting Ctrl-C will be a harsher exit.

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, overall this LGTM! Thanks Stefan!!!

hk.Println("Error:", err)
if !s.RespectsStopCh {
// for command that do not respect the stopCh, we run them in a go
// routine while leaving them running while when stopCh is closed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running while when

@@ -0,0 +1,55 @@
/*
Copyright 2014 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017

)

// SetupSignalHandler registered for SIGTERM and SIGINT. A stop channel is returned
// which is closed on one of these signals. A exitCode channel is returned for the program
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An exitCode

exitCode := 1
defer func() { exitCodeCh <- exitCode }()

err := hk.Run(args, stopCh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but I like when these kind of things are written with the short-if. Ask @pweil-, he calls me short-if 👮

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. I'm surprised we don't print that error here anywhere, like we do in most cases.

if err := cmd.Execute(); err == nil {
exitCode = 0
} else {
glog.Error(err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's nice, your SetupSignalHandler also nicely takes care of exiting with code != 0 :)

shutdownStopCh := make(chan struct{})
go func() {
defer close(shutdownStopCh)
<-time.After(time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with this and TODO, once we reach a point where we'll gather some feedback we can decide what to do with it further.

@soltysh
Copy link
Contributor

soltysh commented Aug 10, 2017

@frobware @sttts I'm not sure you're gonna get multiple SIGINTs here.

@sttts
Copy link
Contributor Author

sttts commented Aug 10, 2017

@frobware @crassirostris @soltysh thanks for the quick review. Addressed all comments. ptal

shutdownStopCh := make(chan struct{})
go func() {
defer close(shutdownStopCh)
<-time.After(time.Minute)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer leaving this out, for the exact reason @sttts mentioned: in most setups SIGKILL will follow anyway

}
case code := <-exit:
close(stop)
os.Exit(code)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that create a race between main function returning 0 and the os.Exit call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why I did this exit channel logic at all. ptal.

@@ -174,7 +176,22 @@ func (hk *HyperKube) Run(args []string) error {
logs.InitLogs()
defer logs.FlushLogs()

err = s.Run(s, s.Flags().Args())
if !s.RespectsStopCh {
// for commands that do not respect the stopCh, we run them in a go

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: capitalize the first letter for conformity.

stop := make(chan struct{})
c := make(chan os.Signal, 2)
signal.Notify(c, shutdownSignals...)
go func() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: will this work?

<-c
close(stop)
<-c
os.Exit(1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly nicer 👍

cmd.Flags().AddGoFlagSet(flag.CommandLine)
if err := cmd.Execute(); err != nil {
panic(err)
glog.Fatal(err.Error())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err.Error()

Is it necessary? Can it be just glog.Fatal(err)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, Fatalf wants a string. Fatal is generic.

cmd.Flags().AddGoFlagSet(flag.CommandLine)
if err := cmd.Execute(); err != nil {
panic(err)
glog.Fatal(err.Error())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link

@crassirostris crassirostris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sttts sttts added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 10, 2017
@sttts sttts force-pushed the sttts-shutdown-apiservers branch from 81879e2 to a47779e Compare August 10, 2017 17:18
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 10, 2017
@sttts sttts force-pushed the sttts-shutdown-apiservers branch from a47779e to f998764 Compare August 10, 2017 17:25
@sttts
Copy link
Contributor Author

sttts commented Aug 10, 2017

@crassirostris @soltysh here is still a slight chance to drop events. To be really sure we want to disable the http handlers and at kill all existing connections. But that's follow-up. At least we have a chance to implement this now.

@sttts sttts added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Aug 10, 2017
@crassirostris
Copy link

@sttts

To be really sure we want to disable the http handlers and at kill all existing connections

You mean some requests may go through not audited while auditing is being shut down?

At least we have a chance to implement this now

Yes, thanks again for working on this :)

@sttts sttts force-pushed the sttts-shutdown-apiservers branch from c88f3b1 to f9f7625 Compare August 11, 2017 07:26
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 11, 2017
@sttts sttts added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 11, 2017
@sttts sttts force-pushed the sttts-shutdown-apiservers branch from f9f7625 to f1c10c1 Compare August 11, 2017 16:13
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 11, 2017
@sttts
Copy link
Contributor Author

sttts commented Aug 11, 2017

/assign @smarterclayton

@smarterclayton is it approved?

@sttts sttts added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 11, 2017
@sttts
Copy link
Contributor Author

sttts commented Aug 14, 2017

/retest

@smarterclayton
Copy link
Contributor

/approve

@sttts
Copy link
Contributor Author

sttts commented Aug 14, 2017

/approve no-issue

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crassirostris, smarterclayton, sttts

Associated issue requirement bypassed by: sttts

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 14, 2017
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

2 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@sttts sttts force-pushed the sttts-shutdown-apiservers branch from f1c10c1 to 5a75be1 Compare August 15, 2017 07:15
@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Aug 15, 2017
@sttts
Copy link
Contributor Author

sttts commented Aug 15, 2017

@mikedanese there is something odd with the signal_windows.go file, compare https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/50439/pull-kubernetes-verify/45239/. Our update-bazel.sh script on Mac will not create that block.

@sttts sttts force-pushed the sttts-shutdown-apiservers branch from 5a75be1 to 11b2536 Compare August 15, 2017 07:17
@sttts sttts added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 15, 2017
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 4d6db74 into kubernetes:master Aug 15, 2017
k8s-github-robot pushed a commit that referenced this pull request Aug 15, 2017
Automatic merge from submit-queue

advanced audit: shutdown batching audit webhook gracefully

Follow-up of #50439

When the `stopCh` passed to the batching audit webhook is closed, it stops accepting new events and when `Shutdown` method is called afterwards, it blocks until the last request to the webhook has finished.

/cc @tallclair @soltysh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants