-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integration flake: TestMultiScheduler is flaky #22848
Comments
@mml any progress on this? |
http://pr-test.k8s.io/23415/kubernetes-pull-test-unit-integration/19839/ is another failure.
|
Only this one is an actual error.
And... the logging logic doesn't make much sense. It logs "err", but it only gets there if err was nil. I can't see any other logs besides the junit logs, which don't provide any extra info. Am I missing something? |
There is a concurrency bug in the test. We signal the scheduler to stop by closing a channel but we don't synchronize at this point to verify that the scheduler has stopped. Indeed, since We need to synchronize here, but once we do that, we may need to prevent the deadlock that occurs while I don't know that this is the bug, but the bug is there and it fits the symptom. |
This is similar to the bug I fixed with #22727. The We should probably re-do this interface to avoid the problem, although in practice I bet it only pops up in tests. @davidopp I see two choices: invest in fixing this now, or comment out all the test that come after the assumption "now we've stopped this scheduler" and fix it later. WDYT? |
When you say
Are you talking about everything after step 7? |
@davidopp Yes. |
I guess it is fine to comment out everything after step 7 for now, if you don't see any easy alternative. The key is that we test that the right scheduler is scheduling the pod, and while steps 8/9 do that, earlier parts of the test also do it and I think they're adequate. Please file an issue related to the problem you described, so we can fix it eventually. |
https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/22835/kubernetes-pull-test-unit-integration/18111/build-log.txt
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/pr-logs/pull/22835/kubernetes-pull-test-unit-integration/18111/?debugUI=CLOUD
@davidopp
The text was updated successfully, but these errors were encountered: