Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Rescheduler's manifest #65454

Merged
merged 1 commit into from
Jun 26, 2018

Conversation

bsalamat
Copy link
Member

What this PR does / why we need it: Updates Rescheduler's manifest to use version 0.4.0

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

Update Rescheduler's manifest to use version 0.4.0.

@bsalamat bsalamat added this to the v1.11 milestone Jun 25, 2018
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 25, 2018
@bsalamat bsalamat requested a review from yguo0905 June 25, 2018 23:42
@bsalamat bsalamat added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. kind/feature Categorizes issue or PR as related to a new feature. status/approved-for-milestone labels Jun 25, 2018
@vishh
Copy link
Contributor

vishh commented Jun 25, 2018

@bsalamat can you add more details on what this change means to k8s on GCE?

@jberkus
Copy link

jberkus commented Jun 26, 2018

adding priority

/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jun 26, 2018
@bsalamat
Copy link
Member Author

Rescheduler is changed to work with Scheduler's preemption. In the latest version of the Rescheduler, it only evicts Pods when a critical DaemonSet Pod cannot be scheduled. Other critical system pods rely on the scheduler preemption logic to be scheduled.
Before the recent changes in the Rescheduler, it would evict Pods for any critical system Pods. So, using older versions of the Rescheduler could cause double preemption (one by the default scheduler and one by the Rescheduler) in Kubernetes 1.11 when a system critical Pod remains pending due to lack of resources in a cluster.

@vishh
Copy link
Contributor

vishh commented Jun 26, 2018

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, vishh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@bsalamat @vishh

Pull Request Labels
  • sig/scheduling: Pull Request will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/feature: New functionality.
Help

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2018
@ravisantoshgudimetla
Copy link
Contributor

/retest

2 similar comments
@AishSundar
Copy link
Contributor

/retest

@AishSundar
Copy link
Contributor

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

3 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@bsalamat
Copy link
Member Author

/retest

2 similar comments
@yguo0905
Copy link
Contributor

/retest

@yguo0905
Copy link
Contributor

/retest

@ravisantoshgudimetla
Copy link
Contributor

Is this network issue, when I skimmed through logs, I am seeing continuously following error:

{default-scheduler } FailedScheduling: 0/5 nodes are available: 1 node(s) were unschedulable, 5 node(s) had unavailable network, 5 node(s) were not ready.

@bsalamat
Copy link
Member Author

I think tests should be fine now, if they finish!

@ravisantoshgudimetla
Copy link
Contributor

@bsalamat So, what has changed to make tests pass? Was this related to something on infra side?

@yguo0905
Copy link
Contributor

The tests were failing because the upgraded rescheduler:v0.4.0 image was not published yet. Tests passed since the image is there now.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@ravisantoshgudimetla
Copy link
Contributor

@yguo0905 Thanks but I think scheduler wouldn't throw those errors and rescheduler pod won't be in Pending state for a long time if there is an issue with container image not available in registry.

@bsalamat
Copy link
Member Author

@ravisantoshgudimetla All the tests passed after we uploaded the image. The merge bot is running the tests again for merging. The issue was definitely caused by the absence of the image.

@yguo0905
Copy link
Contributor

rescheduler pod won't be in Pending state for a long time if there is an issue with container image not available in registry

It's a static pod, which I guess makes a difference here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 35d5daa into kubernetes:master Jun 26, 2018
@bsalamat bsalamat deleted the rescheduler_version branch June 26, 2018 22:08
k8s-github-robot pushed a commit that referenced this pull request Jun 27, 2018
…54-upstream-release-1.11

Automatic merge from submit-queue.

Automated cherry pick of #65454: Update Rescheduler's manifest

Cherry pick of #65454 on release-1.11.

#65454: Update Rescheduler's manifest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants