-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e flake: Downward API should provide pod IP as an env var #20403
Comments
Previous occurrence #13690 |
@gmarek is this in a normal e2e? In a soak env? |
It's normal e2e. |
@gmarek I mean, the context in which the failure was discovered -- previously this happened only in a soak env. I'm trying to reproduce now. |
Yup, I understand. It happened in our main test suite. |
Ran this about 500 times so far in a loop w/o a failure |
Wasn't able to repro today, will continue tomorrow. |
Can we decorate logs (in a non-horrible way) to make a failure more visible? That is a viable answer |
@thockin I was doing just that and found that the information is present in the logs in a log message related to the hosts file mount. Analyzing now, it looks like on the node where the flake happened, a little over of half the containers started with the podIP unknown.
^^ example message So, it looks like the underlying issue here is happening much more often than I personally thought. Will try to debug. |
Nevermind, it looks like most of the containers without pod IP are infracontainers. I remember now that this is expected for the infra container. |
Looks like it happened 4 times to actual containers on this node that flaked during e2e.
|
This is flaking 1 or 2 times per day on kubernetes-e2e-gce. |
bumping to p0 |
@pmorie status on this? |
@thockin fix PR has lgtm; running through jenkins now. |
@thockin correction, PR has a regression for kube-managed /etc/hosts :-( we'll need to update it |
I think I've figured out why this is flaking. Here's what I'm seeing locally and also what I've noticed from 1 of the GCE flakes:
This looks something like this in the logs (with lots of snipping):
|
@wojtek-t I'm working on an e2e that captures the existing failure, hope to have a PR in the next day or so. |
I was testing Paul's fix for this on Friday, but unfortunately the lack of #22607 in the tree resulted in a different error condition. Instead of failing because POD_IP was blank, it failed because the pod was marked as Failed and the container never ran. |
@ncdc I think the problem was originated by relying on passing the pod IP to downstream functions through modifying the kubernetes/pkg/kubelet/dockertools/manager.go Line 1887 in b1a6ee2
The @pmorie's PR (#22666) builds on top of this hack, so the The call stack that needs to be modified:
/cc @kubernetes/sig-node, FYI. |
@yujuhong cool, thanks for the detailed explanation. Where in SyncPod does the pod IP exist other than the line you linked above, which only gets called when the infra container is created? |
If the infra container was not created in the same sync iteration, you should be able to get the pod IP from |
Ok, so if we put if/else logic in for that check, then we can pass the pod IP around as needed, right? |
Has anyone hit this recently? Afaik, it can be closed. I haven't seen it linked to anything recently. |
http://kubekins.dls.corp.google.com/view/Critical%20Builds/job/kubernetes-e2e-gce/10735/
I'm not sure who's responsible for Downward API. cc @lavalamp @davidopp @bgrant0607
The text was updated successfully, but these errors were encountered: