-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler: initialize podsWithAffinity #33967
Conversation
LGTM Thanks! |
BTW the bot says you haven't signed the Linux Foundation CLA... |
@davidopp Just signed. Do you know how can I trigger the CLA check again? |
Not sure. I closed anre reopened the PR, maybe that will help? |
@davidopp FYI, we signed the CNCF corp CLA and it should cover all @coreos.com emails; I escalated to @caniszczyk and @sarahnovotny over email as it isn't working elsewhere. |
Jenkins GCI GCE e2e failed for commit 8ddac82abadc4dc050b64db52404c56ed92ff0ad. Full PR test history. The magic incantation to run this job again is |
Jenkins GCI GKE smoke e2e failed for commit 8ddac82abadc4dc050b64db52404c56ed92ff0ad. Full PR test history. The magic incantation to run this job again is |
@xiang90 I have just tried out this patch on our cluster I still see the nil pointer exception unfortunately. The exact same stack trace. |
@kdima How did you reproduce this? Can you try to print out variable |
I can see the following errors before the first exception happens:
|
@xiang90 yes sure give me a second. It usually happens within 10-15 minutes of provisioning a cluster. I am not quite sure how exactly to reproduce it. |
@xiang90 actually as I was investigating this before I saw that |
That would be helpful. It seems that there are multiple issues with the cache. |
@xiang90 yeap that was my impression as well. I hit quite a few |
@xiang90 yeap |
So I checked whether it is related to nodes being destroyed or created but it does not correlate. |
I just saw the exception on three schedulers (i.e. all of them) running on three separate instances at the same time. I thought it was master elected?
The other box log
|
please also cherrypick this to 1.4.1. |
Removing label |
LGTM Thanks for fixing! |
@xiang90 - thanks a lot for working on it. However, I completely don't understand why the first commit is needed. In fact I'm pretty sure the first commit is not needed and we should just left the second one. I temporary removed LGTM because of that, so can you please clarify it? |
OK. I thought we cannot range over the nil map. But it seems that ranging over a nil map is actually OK.
It is a good practice to initialize the map, but I guess it is not necessary for the cache. I am going to remove the 1st commit. |
@wojtek-t Fixed. PTAL. |
Yes we can. @xiang90 - thanks a lot. LGTM |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
@xiang90 can you open the PR to cherry-pick this into release-1.4 |
#33163-#33227-#33359-#33605-#33967-#33977-#34158-origin-release-1.4 Automatic merge from submit-queue Automated cherry pick of #32914 #33163 #33227 #33359 #33605 #33967 #33977 #34158 origin release 1.4 Cherry pick of #32914 #33163 #33227 #33359 #33605 #33967 #33977 #34158 on release-1.4. #32914: Limit the number of names per image reported in the node #33163: fix the appending bug #33227: remove cpu limits for dns pod. The current limits are not #33359: Fix goroutine leak in federation service controller #33605: Add periodic ingress reconciliations. #33967: scheduler: cache.delete deletes the pod from node specified #33977: Heal the namespaceless ingresses in federation e2e. #34158: Add missing argument to log message in federated ingress
Commit found in the "release-1.4" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
…ck-of-#32914-kubernetes#33163-kubernetes#33227-kubernetes#33359-kubernetes#33605-kubernetes#33967-kubernetes#33977-kubernetes#34158-origin-release-1.4 Automatic merge from submit-queue Automated cherry pick of kubernetes#32914 kubernetes#33163 kubernetes#33227 kubernetes#33359 kubernetes#33605 kubernetes#33967 kubernetes#33977 kubernetes#34158 origin release 1.4 Cherry pick of kubernetes#32914 kubernetes#33163 kubernetes#33227 kubernetes#33359 kubernetes#33605 kubernetes#33967 kubernetes#33977 kubernetes#34158 on release-1.4. kubernetes#32914: Limit the number of names per image reported in the node kubernetes#33163: fix the appending bug kubernetes#33227: remove cpu limits for dns pod. The current limits are not kubernetes#33359: Fix goroutine leak in federation service controller kubernetes#33605: Add periodic ingress reconciliations. kubernetes#33967: scheduler: cache.delete deletes the pod from node specified kubernetes#33977: Heal the namespaceless ingresses in federation e2e. kubernetes#34158: Add missing argument to log message in federated ingress
Without initializing podsWithAffinity, scheduler panics when deleting
a pod from a node that has no pods with affinity ever scheduled to.
Fix #33772
This change is