-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken test: [k8s.io] SchedulerPredicates [Serial] validates MaxPods limit number of pods that are allowed to run [Slow] #24262
Comments
@davidopp to triage. This is breaking kubernetes-e2e-gce-serial right now. |
kubernetes-e2e-gce-serial hasn't passed for several days. This is a gating job for this week's release. |
I took a quick look at failed builds of kubernetes-e2e-gce-serial this morning, and all are caused by this very test. I quickly checked why and noticed that most of them are failed due to fluentd. 23:12:53 Apr 17 23:12:53.746: INFO: fluentd-elasticsearch-jenkins-e2e-minion-8ukz started at (0 container statuses recorded) |
kubernetes-e2e-gce-serial still not passing. Are we going to pass on the alpha release tomorrow? |
cc @krousey, (build cop for a couple of days), though the issue has been around for a couple of weeks. |
ping @davidopp |
@gmarek can you take a look at this? |
Yup. It's a real bug:
|
My best guess is that it's caused by parallel computation of bindings, as number of Pods assigned to a Node is read from NodeInfo, not from the 'AssumedPods' struct. @hongchaodeng |
The fun thing is that tests started to fail in this way on Apr 12th between 12:44 and 16:26 UTC and as far as I can tell nothing was merged in scheduler codebase back then. |
Run 1038 is the last 'good' one that I was able to find. |
How can we reproduce it?
@gmarek Can you help clarify more detailed what's happening? |
Easily - just run MaxPods test - it reliably fails:). If you want to dig a bit deeper you can add Scheduler assigned 111 pods to a given node. It certainly looks like the binding issue, but as I wrote - nothing was merged at the time it started to fail. |
But parallel binding wasn't merged at that point... so there has to be at least another issue. |
Yeah - my point exactly. It may be that we were always broken and some environment change caused this. |
hmm - take a look here: it seems we are not checking the maxPods predicate at all. |
#20204 was merged at the time. |
(I was looking in the wrong place when checking what was merged back then) - I'll try to revert it. |
Yeah - #20204 broke our scheduler. Test for |
Sounds like this could have been caught by a unit or integration test. On Fri, Apr 22, 2016 at 9:23 AM, Marek Grabowski notifications@github.com
|
Yup - if you convince someone that rewriting SchedulerPredicates tests into integration ones is a P0 more important than other things, I'm happy to do it:) Or I can help someone. |
Automatic merge from submit-queue Enforce --max-pods in kubelet admission; previously was only enforced in scheduler This is an ugly hack - I spent some time trying to understand what one NodeInfo has in common with the other one, but at some point decided that I just don't have time to do that. Fixes #24262 Fixes #20263 cc @HaiyangDING @lavalamp
This isn't a flake, the test is just broken. Starting from here:
http://kubekins.dls.corp.google.com/view/Critical%20Builds/job/kubernetes-e2e-gce-serial/1064/
https://console.cloud.google.com/storage/kubernetes-jenkins/logs/kubernetes-e2e-gce-serial/1064/
I guess maybe something dies while it is scheduling everything?
The text was updated successfully, but these errors were encountered: