Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not count failed pods as unready in HPA controller #60648

Merged
merged 1 commit into from
Mar 2, 2018

Conversation

bskiba
Copy link
Member

@bskiba bskiba commented Mar 1, 2018

What this PR does / why we need it:
Currently, when performing a scale up, any failed pods (which can be present for example in case of evictions performed by kubelet) will be treated as unready. Unready pods are treated as if they had 0% utilization which will slow down or even block scale up.

After this change, failed pods are ignored in all calculations. This way they do not influence neither scale up nor scale down replica calculations.

@MaciekPytel @DirectXMan12

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #55630

Special notes for your reviewer:

Release note:

Stop counting failed pods as unready in HPA controller to avoid failed pods incorrectly affecting scale up replica count calculation.

Currently, when performing a scale up, any failed pods (which can be present for example in case of evictions performed by kubelet) will be treated as unready. Unready pods are treated as if they had 0% utilization which will slow down or even block scale up.

After this change, failed pods are ignored in all calculations. This way they do not influence neither scale up nor scale down replica calculations.
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 1, 2018
@DirectXMan12
Copy link
Contributor

This should probably have a release note filled out, because it's a change in behavior.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Mar 1, 2018
@bskiba
Copy link
Member Author

bskiba commented Mar 1, 2018

Fair point, added.

@bskiba
Copy link
Member Author

bskiba commented Mar 1, 2018

@DirectXMan12 Since the bug seems to be quite an inconvenience (the only workaround I know of is to manually remove the evicted pods and since at least 1.7.5 the evicted pods seem to stay around for a fairly long amount of time - #55051 (comment)) do you think this could go into 1.10?

@DirectXMan12
Copy link
Contributor

DirectXMan12 commented Mar 2, 2018

yeah, I'll add it to the milestone. This seems like it could prevent the HPA from working at all, which makes it a decently bad bug.

@DirectXMan12 DirectXMan12 added this to the v1.10 milestone Mar 2, 2018
@DirectXMan12
Copy link
Contributor

/kind bug
/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. kind/bug Categorizes issue or PR as related to a bug. labels Mar 2, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bskiba, DirectXMan12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2018
@DirectXMan12
Copy link
Contributor

/sig autoscaling
/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Mar 2, 2018
@DirectXMan12
Copy link
Contributor

/status approved-for-milestone

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@DirectXMan12 @bskiba

Pull Request Labels
  • sig/autoscaling: Pull Request will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Help

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 60732, 60689, 60648, 60704). If you want to cherry-pick this change to another branch, please follow the instructions here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HPA not scaling due to evicted pods
4 participants