Kubernetes downscale result in 5xx for the pods that have been killed

**Describe the bug**
We are running an API using Puma in a Kubernetes cluster that uses a Horizontal Pod Autoscaler (HPA). We noticed that whenever the HPA scales down, the downstream services consuming our API get a 5xx error response, one for each of the pods that got killed in the process. 

We've added puma-metrics and see that the running number of threads is well below the maximum configured, whenever the errors occur. Requests are farily quick (<200ms), so it's not that the graceful shutdown timeout is exceeded. 

It looks like requests are thrown away/not fully processed after a graceful shutdown triggered by Kubernetes.

We don't know where to dig further, so I was curious to learn more about this, and maybe about the Puma internals, so we can spot the culprit.

**Puma config:**

It reproduces with the default configuration.

**To Reproduce**

Mentioned above.

**Expected behavior**

Requests shouldn't get discarded and 5xx errors should occur. On graceful shutdown, Puma should stop accepting new requests and wait until the ongoing ones finish.

**Desktop (please complete the following information):**
 - OS: Alpine 3.11 in Docker, ran in Kubernetes
 - Puma Version: 4.3.5


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes downscale result in 5xx for the pods that have been killed #2343

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development