Kubernetes downscale result in 5xx for the pods that have been killed #2343
Description
Describe the bug
We are running an API using Puma in a Kubernetes cluster that uses a Horizontal Pod Autoscaler (HPA). We noticed that whenever the HPA scales down, the downstream services consuming our API get a 5xx error response, one for each of the pods that got killed in the process.
We've added puma-metrics and see that the running number of threads is well below the maximum configured, whenever the errors occur. Requests are farily quick (<200ms), so it's not that the graceful shutdown timeout is exceeded.
It looks like requests are thrown away/not fully processed after a graceful shutdown triggered by Kubernetes.
We don't know where to dig further, so I was curious to learn more about this, and maybe about the Puma internals, so we can spot the culprit.
Puma config:
It reproduces with the default configuration.
To Reproduce
Mentioned above.
Expected behavior
Requests shouldn't get discarded and 5xx errors should occur. On graceful shutdown, Puma should stop accepting new requests and wait until the ongoing ones finish.
Desktop (please complete the following information):
- OS: Alpine 3.11 in Docker, ran in Kubernetes
- Puma Version: 4.3.5