-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bumping ContainerVM image is breaking kubemark #25949
Comments
One of these PRs is the culprit: Bump GCE ContainerVM to container-v1-3-v20160517 (detail / githubweb) |
ok, all the hollow node pods got scheduled... |
Does the hollow master of the interior cluster not get logged? |
What do you mean by that? |
@hongchaodeng that's the outer master, is there no inner master? what do the hollow nodes register with? |
yes - the hollow-node register with inner master; Let me take a look |
|
Looking into it. It seems that only ~370-380 kubelets are always registered in the kubemark master. |
What @bprashanth wrote above seems like a problem. |
To be honest, those errors suggest that maybe this PR is in fact the victim: Bump GCE ContainerVM to container-v1-3-v20160517 (detail / githubweb) I will try to verify it. |
I other words, it seems to me that the problem is somewhere around kubelet/docker/node and the change of quantity by @smarterclayton doesn't seem to affect that. On the other, the second change clearly touches that part of the system. |
So far I verified that I'm getting exactly the same issues when running locally. |
OK - so I locally reverted the bum of container Vm PR: and this fixes the problem for me locally. |
OK - reverting to the previous ContainerVM image solve the problem. So we clearly have some incompatibility between 1.9 and 1.11 Dockers which I don't have time to debug. So I'm reassigning this to @dchen1107 for desicion how to proceed with this. |
kubelet.log is full of errors like this, as the other issue suggested:
It seems to be caused by the fact that runc creates a unique session key for each container. |
Didn't see this issue, I updated with #25951 (comment) |
If we reached that quota limit, how come the our docker performance tests & e2e density tests doesn't catch the issue? |
Failed four times in a row now.
https://console.cloud.google.com/storage/kubernetes-jenkins/logs/kubernetes-kubemark-500-gce/3307/
The text was updated successfully, but these errors were encountered: