HPA uses wrong count to calculate target replcias #34821

mwielgus · 2016-10-14T13:35:40Z

The current formula looks like:

currentReplicas := scale.Status.Replicas
[...] 
usageRatio := float64(utilization) / float64(targetUtilization)
    if math.Abs(1.0-usageRatio) > tolerance {
        return int32(math.Ceil(usageRatio * float64(currentReplicas))), &utilization, timestamp, nil
    }

If currentReplicas doesn't match to the number of replicas used to calculate utilization then the result can be way bigger than necessary. For example if there are 10 pods created in apiserver (= current replicas) but only 1 is scheduled and it is running at its last legs (300% of the target utilization) then the target will grow to 30 pods.
20 (30-10) pods get created and are not scheduled and will increase the replicas even further.

Solly you did some fixes in this area: #33593.
Will this PR fix the above scenario?

cc: @DirectXMan12 @jszczepkowski @fgrzadkowski @davidopp

jszczepkowski · 2016-10-14T14:01:30Z

If this is urgent and we need to fix it on 1.4, I can prepare a small, separate PR with the fix. PR #33593 is huge and will not be cherry-picked to 1.4.

DirectXMan12 · 2016-10-14T20:04:24Z

#33593 currently experiences this as well, I think, but I'm trying to figure out the right way to solve it. The problem, as I see it, is this: while we can be reasonably certain that an unready pod will probably become ready eventually (certain-ish, at least), a pod in a pending state may effectively be permafailed (e.g. if you don't specify a command, it looks like the pod will report pending, unready, with one of the containers reporting that the runtime indicated that no command was specified). So, we need to figure out what we're going to guess about pending pods. I need to do a bit more digging about pod lifecycle.

Talked to @decarr about lifecycle a bit. think it's safe to lump "phase: running, status: unready" together with "phase: pending", and write total number = running|ready + running|unready + pending.

davidopp · 2016-10-14T20:14:00Z

cc/ @lavalamp for the HPA problem we saw today

derekwaynecarr · 2016-10-14T20:31:36Z

@DirectXMan12 and I chatted on this today, I think we should split pods into two states based on the pod condition Ready. I would ignore cpu usage associated with pods whose Ready condition was not true.

HPA: fixed wrong count for target replicas calculations (kubernetes#34821).

Automatic merge from submit-queue HPA: fixed wrong count for target replicas calculations. ```release-note HPA: fixed wrong count for target replicas calculations (#34821). ``` HPA: fixed wrong count for target replicas calculations (#34821).

HPA: fixed wrong count for target replicas calculations (kubernetes#34821).

jayunit100 · 2016-11-15T16:13:24Z

is this sorted now ?

DirectXMan12 · 2016-11-15T16:18:33Z

it should have been fixed as part of #33593

jszczepkowski · 2016-11-16T08:41:42Z

Fixed by #33593.

HPA: fixed wrong count for target replicas calculations (kubernetes#34821).

mwielgus added the sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. label Oct 14, 2016

mwielgus assigned jszczepkowski Oct 14, 2016

k8s-github-robot added area/controller-manager team/control-plane labels Oct 14, 2016

fgrzadkowski added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Oct 17, 2016

jszczepkowski added a commit to jszczepkowski/kubernetes that referenced this issue Oct 17, 2016

HPA: fixed wrong count for target replicas calculations.

47451cb

HPA: fixed wrong count for target replicas calculations (kubernetes#34821).

jszczepkowski mentioned this issue Oct 17, 2016

HPA: fixed wrong count for target replicas calculations. #34955

Merged

DirectXMan12 mentioned this issue Oct 17, 2016

HPA: Consider unready pods separately #33593

Merged

jszczepkowski added a commit to jszczepkowski/kubernetes that referenced this issue Oct 18, 2016

HPA: fixed wrong count for target replicas calculations.

678c291

HPA: fixed wrong count for target replicas calculations (kubernetes#34821).

jszczepkowski added a commit to jszczepkowski/kubernetes that referenced this issue Oct 18, 2016

HPA: fixed wrong count for target replicas calculations.

f495e73

HPA: fixed wrong count for target replicas calculations (kubernetes#34821).

jessfraz pushed a commit to jessfraz/kubernetes that referenced this issue Oct 18, 2016

HPA: fixed wrong count for target replicas calculations.

d5ad431

HPA: fixed wrong count for target replicas calculations (kubernetes#34821).

rootfs pushed a commit to rootfs/kubernetes that referenced this issue Oct 19, 2016

HPA: fixed wrong count for target replicas calculations.

7244a19

HPA: fixed wrong count for target replicas calculations (kubernetes#34821).

jszczepkowski closed this as completed Nov 16, 2016

shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this issue Dec 1, 2016

HPA: fixed wrong count for target replicas calculations.

6a744bc

HPA: fixed wrong count for target replicas calculations (kubernetes#34821).

tallclair pushed a commit to tallclair/kubernetes that referenced this issue Aug 3, 2020

HPA: fixed wrong count for target replicas calculations.

3aae030

HPA: fixed wrong count for target replicas calculations (kubernetes#34821).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPA uses wrong count to calculate target replcias #34821

HPA uses wrong count to calculate target replcias #34821

mwielgus commented Oct 14, 2016

jszczepkowski commented Oct 14, 2016

DirectXMan12 commented Oct 14, 2016 •

edited

Loading

davidopp commented Oct 14, 2016

derekwaynecarr commented Oct 14, 2016

jayunit100 commented Nov 15, 2016

DirectXMan12 commented Nov 15, 2016 •

edited

Loading

jszczepkowski commented Nov 16, 2016

HPA uses wrong count to calculate target replcias #34821

HPA uses wrong count to calculate target replcias #34821

Comments

mwielgus commented Oct 14, 2016

jszczepkowski commented Oct 14, 2016

DirectXMan12 commented Oct 14, 2016 • edited Loading

davidopp commented Oct 14, 2016

derekwaynecarr commented Oct 14, 2016

jayunit100 commented Nov 15, 2016

DirectXMan12 commented Nov 15, 2016 • edited Loading

jszczepkowski commented Nov 16, 2016

DirectXMan12 commented Oct 14, 2016 •

edited

Loading

DirectXMan12 commented Nov 15, 2016 •

edited

Loading