Skip to content
This repository has been archived by the owner on Dec 15, 2021. It is now read-only.

Perform a graceful shutdown for the function proxy on SIGINT & SIGTERM. #1108

Merged
merged 6 commits into from
Jan 16, 2020

Conversation

d0x2f
Copy link
Contributor

@d0x2f d0x2f commented Jan 15, 2020

Issue Ref: #1107

Description:

Performs a graceful shutdown of the function proxy so that requests that are still in flight have a chance to complete before the function pod is terminated.

TODOs:

  • Ready to review
  • Automated Tests
  • Docs

Copy link
Contributor

@andresmgot andresmgot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @d0x2f! I have a couple of minor comments. Were you able to test this changes?

@@ -80,10 +83,35 @@ func startNativeDaemon() {
}
}

func gracefulShutdown(server *http.Server) {
stop := make(chan os.Signal, 1)
signal.Notify(stop, os.Interrupt, syscall.SIGINT, syscall.SIGTERM)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I know, you cannot capture a SIGTERM signal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you can capture SIGTERM, and it's what kubernetes sends to a pod when it requests a graceful shutdown. 30 seconds later it'll send SIGKILL which can't be captured, maybe that's what you're thinking of?

It can be tested by running the proxy locally and running kill -15 <pid>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, indeed they suggest so. I tried it with a simple example and it didn't seem to work:

▶ cat t.go 
package main

import (
    "fmt"
    "os"
    "os/signal"
    "syscall"
)

func main() {

    sigs := make(chan os.Signal, 1)
    done := make(chan bool, 1)

    signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)

    go func() {
        sig := <-sigs
        fmt.Println()
        fmt.Println(sig)
        done <- true
    }()

    fmt.Println("awaiting signal")
    <-done
    fmt.Println("exiting")
}

▶ go run t.go
awaiting signal
^C
interrupt
exiting

▶ go run t.go  
awaiting signal
[1]    17141 terminated  go run t.go

As you can see, in the second execution I am sending kill -15 <pid> and it's not catching the signal (but I may be missing something),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curiously I tried your example.
When running with go run foo.go it seems to make two processes, when I sent sigterm to the binary in /tmp it worked as expected.
i.e. the second entry in this ps output
image
Compiling with go build foo.go and doing the same with the produced binary should work as well.

Anyway thanks for being so responsive!
I hope to get the node runtime doing the same soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, good to know, thanks for following up!

stop := make(chan os.Signal, 1)
signal.Notify(stop, os.Interrupt, syscall.SIGINT, syscall.SIGTERM)
<-stop
timeout := 10 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be useful to configure this timeout, maybe we should add a env var to be able to change the default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an env var SHUTDOWN_TIMEOUT to control this.

@d0x2f
Copy link
Contributor Author

d0x2f commented Jan 16, 2020

Actually, don't merge this yet, I've hit some issues while testing.

@d0x2f
Copy link
Contributor Author

d0x2f commented Jan 16, 2020

Ok I had just accidently deleted the call to startNativeDaemon() woops!
I've tested it with the ruby runtime, screenshots attached before and after:

Before:
image

After:
image

Copy link
Contributor

@andresmgot andresmgot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks for the changes

@andresmgot andresmgot merged commit ff0ba5b into vmware-archive:master Jan 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants