Do not count failed pods as unready in HPA controller #60648

bskiba · 2018-03-01T15:25:33Z

What this PR does / why we need it:
Currently, when performing a scale up, any failed pods (which can be present for example in case of evictions performed by kubelet) will be treated as unready. Unready pods are treated as if they had 0% utilization which will slow down or even block scale up.

After this change, failed pods are ignored in all calculations. This way they do not influence neither scale up nor scale down replica calculations.

@MaciekPytel @DirectXMan12

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #55630

Special notes for your reviewer:

Release note:

Stop counting failed pods as unready in HPA controller to avoid failed pods incorrectly affecting scale up replica count calculation.

Currently, when performing a scale up, any failed pods (which can be present for example in case of evictions performed by kubelet) will be treated as unready. Unready pods are treated as if they had 0% utilization which will slow down or even block scale up. After this change, failed pods are ignored in all calculations. This way they do not influence neither scale up nor scale down replica calculations.

DirectXMan12 · 2018-03-01T15:37:40Z

This should probably have a release note filled out, because it's a change in behavior.

bskiba · 2018-03-01T17:45:52Z

Fair point, added.

bskiba · 2018-03-01T21:05:58Z

@DirectXMan12 Since the bug seems to be quite an inconvenience (the only workaround I know of is to manually remove the evicted pods and since at least 1.7.5 the evicted pods seem to stay around for a fairly long amount of time - #55051 (comment)) do you think this could go into 1.10?

DirectXMan12 · 2018-03-02T17:21:14Z

yeah, I'll add it to the milestone. This seems like it could prevent the HPA from working at all, which makes it a decently bad bug.

DirectXMan12 · 2018-03-02T17:28:36Z

/kind bug
/approve
/lgtm

k8s-ci-robot · 2018-03-02T17:28:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bskiba, DirectXMan12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/controller/podautoscaler/OWNERS~~ [DirectXMan12]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

DirectXMan12 · 2018-03-02T20:04:31Z

/sig autoscaling
/priority critical-urgent

DirectXMan12 · 2018-03-02T20:05:16Z

/status approved-for-milestone

k8s-github-robot · 2018-03-02T20:05:59Z

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@DirectXMan12 @bskiba

Pull Request Labels

sig/autoscaling: Pull Request will be escalated to these SIGs if needed.
priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
kind/bug: Fixes a bug discovered during the current release.

Help

k8s-github-robot · 2018-03-02T22:25:53Z

Automatic merge from submit-queue (batch tested with PRs 60732, 60689, 60648, 60704). If you want to cherry-pick this change to another branch, please follow the instructions here.

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 1, 2018

k8s-ci-robot requested review from DirectXMan12 and jszczepkowski March 1, 2018 15:25

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Mar 1, 2018

DirectXMan12 added this to the v1.10 milestone Mar 2, 2018

k8s-ci-robot assigned DirectXMan12 Mar 2, 2018

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. kind/bug Categorizes issue or PR as related to a bug. labels Mar 2, 2018

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2018

k8s-github-robot added the milestone/incomplete-labels label Mar 2, 2018

k8s-ci-robot added sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Mar 2, 2018

k8s-github-robot added milestone/needs-approval and removed milestone/incomplete-labels labels Mar 2, 2018

k8s-ci-robot added the status/approved-for-milestone label Mar 2, 2018

k8s-github-robot removed the milestone/needs-approval label Mar 2, 2018

k8s-github-robot merged commit 30eb1aa into kubernetes:master Mar 2, 2018

moonek mentioned this pull request May 16, 2018

HPA not working properly when pod status "Unknown" (node failure) #62845

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not count failed pods as unready in HPA controller #60648

Do not count failed pods as unready in HPA controller #60648

bskiba commented Mar 1, 2018 •

edited

Loading

DirectXMan12 commented Mar 1, 2018

bskiba commented Mar 1, 2018

bskiba commented Mar 1, 2018 •

edited

Loading

DirectXMan12 commented Mar 2, 2018 •

edited

Loading

DirectXMan12 commented Mar 2, 2018

k8s-ci-robot commented Mar 2, 2018

DirectXMan12 commented Mar 2, 2018

DirectXMan12 commented Mar 2, 2018

k8s-github-robot commented Mar 2, 2018

k8s-github-robot commented Mar 2, 2018

Do not count failed pods as unready in HPA controller #60648

Do not count failed pods as unready in HPA controller #60648

Conversation

bskiba commented Mar 1, 2018 • edited Loading

DirectXMan12 commented Mar 1, 2018

bskiba commented Mar 1, 2018

bskiba commented Mar 1, 2018 • edited Loading

DirectXMan12 commented Mar 2, 2018 • edited Loading

DirectXMan12 commented Mar 2, 2018

k8s-ci-robot commented Mar 2, 2018

DirectXMan12 commented Mar 2, 2018

DirectXMan12 commented Mar 2, 2018

k8s-github-robot commented Mar 2, 2018

k8s-github-robot commented Mar 2, 2018

bskiba commented Mar 1, 2018 •

edited

Loading

bskiba commented Mar 1, 2018 •

edited

Loading

DirectXMan12 commented Mar 2, 2018 •

edited

Loading