Int overflow in hpa causing incorrect replica count #126892

sheepster1 · 2024-08-23T20:40:38Z

What happened?

The setup:
I am using keda with the prometheus scaler. The query I am using, returns the lag in the message queue i am using, and the threshold is set to 0.1.

What happened:
The lag was increasing for a long time, and the replica count reached the max setting as expected. Everything was running fine for some time. When the lag value reached 214,748,364 hpa decided to reduce the replicas from the max limit to 1.

What I think is the problem:
When the lag passes 214,748,364, the calculation here divides by the threshold 0.1 and it passes the max int32 value. causing hpa to scale to the minimum value, 1.
It also seems like a lot of other places in this file cast a 64 bit float to a 32 bit int. Should there maybe be a check everywhere this is done?

What did you expect to happen?

I expected the replica count to stay at the max value. Or alternatively, get an error that we have reached the max value for an external metric value

How can we reproduce it (as minimally and precisely as possible)?

Use an external metric, and set it above 214,748,364 with a threshold of 0.1.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

1.29

Cloud provider

aws eks

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-08-23T20:40:47Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

neolit123 · 2024-08-27T18:29:54Z

/sig autoscaling

omerap12 · 2024-08-27T20:00:44Z

/assign

k8s-triage-robot · 2024-11-25T20:30:10Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

vaibhav2107 · 2024-12-01T09:46:13Z

As PR is still open,
/remove-lifecycle stale

sheepster1 added the kind/bug Categorizes issue or PR as related to a bug. label Aug 23, 2024

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 23, 2024

k8s-ci-robot added sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 27, 2024

k8s-ci-robot assigned omerap12 Aug 27, 2024

This was referenced Aug 28, 2024

Fix replicaCount calculation exceeding max int32 #126979

Open

Potential int overflow in HPA scaling calculations may lead to incorrect autoscaling behavior #127022

Closed

sheepster1 mentioned this issue Sep 1, 2024

HPA: Fix int overflow in GetExternalPerPodMetricReplicas #127050

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 25, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Int overflow in hpa causing incorrect replica count #126892

Int overflow in hpa causing incorrect replica count #126892

sheepster1 commented Aug 23, 2024

k8s-ci-robot commented Aug 23, 2024

neolit123 commented Aug 27, 2024

omerap12 commented Aug 27, 2024

k8s-triage-robot commented Nov 25, 2024

vaibhav2107 commented Dec 1, 2024

Int overflow in hpa causing incorrect replica count #126892

Int overflow in hpa causing incorrect replica count #126892

Comments

sheepster1 commented Aug 23, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Aug 23, 2024

neolit123 commented Aug 27, 2024

omerap12 commented Aug 27, 2024

k8s-triage-robot commented Nov 25, 2024

vaibhav2107 commented Dec 1, 2024