Skip to content

Kubernetes downscale result in 5xx for the pods that have been killed #2343

Closed
@linkyndy

Description

Describe the bug
We are running an API using Puma in a Kubernetes cluster that uses a Horizontal Pod Autoscaler (HPA). We noticed that whenever the HPA scales down, the downstream services consuming our API get a 5xx error response, one for each of the pods that got killed in the process.

We've added puma-metrics and see that the running number of threads is well below the maximum configured, whenever the errors occur. Requests are farily quick (<200ms), so it's not that the graceful shutdown timeout is exceeded.

It looks like requests are thrown away/not fully processed after a graceful shutdown triggered by Kubernetes.

We don't know where to dig further, so I was curious to learn more about this, and maybe about the Puma internals, so we can spot the culprit.

Puma config:

It reproduces with the default configuration.

To Reproduce

Mentioned above.

Expected behavior

Requests shouldn't get discarded and 5xx errors should occur. On graceful shutdown, Puma should stop accepting new requests and wait until the ongoing ones finish.

Desktop (please complete the following information):

  • OS: Alpine 3.11 in Docker, ran in Kubernetes
  • Puma Version: 4.3.5

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions