-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Catch kubelet-master hostname mismatch during health check #4497
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project, in which case you'll need to sign a Contributor License Agreement (CLA) at https://cla.developers.google.com/. If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check the information on your CLA or see this help article on setting the email on your git commits. Once you've done that, please reply here to let us know. If you signed the CLA as a corporation, please let us know the company's name. |
Hostname string | ||
Health string | ||
} | ||
var healthResponse HealthResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this approach is okay this struct can be generalized and moved somewhere where it can be shared by kubelet and master code. I think one of the types.go files from what I have read sofar
Is this an illegal condition? See https://github.com/GoogleCloudPlatform/kubernetes/blob/8b25b43039f03e97ca930b2f0c8099253948ec5f/pkg/kubelet/server/server.go#L131 Edit: Ahh never mind, I see you've addressed this in your comment. A better question is when is this an illegal condition and when is it legal? |
@swagiaal Is a Red Hatter. That should cover his CLA requirements. |
@mikedanese so for example in one situation if you do not provide a hostnames override the pod thinks its hostname is |
In the case where the master thinks the machines hostname is host.example.com but the minion thinks it's hostname is localhost.localdomain the apiserver will schedule work for host.example.com but the minion will look for work for localhost.localdomain. Thus the minion will never pick up work to do! |
Okay. That makes sense. |
@eparis gotcha! |
@mikedanese I have updated the patch to use the hostname in the header of the request |
I like this a lot better. Looks like you will have to send a Host header with curl here as well https://github.com/GoogleCloudPlatform/kubernetes/blob/master/hack/lib/util.sh#L21. It would also be worthwhile to investigate downstream effects (if any) of this change. |
@mikedanese curl sets the Host header but I had to add a --hostname_override="127.0.0.1" to match it. What do you mean by downstream effects BTW ? I am looking into the test failure... |
@swagiaal particularly I was thinking about monit monitoring on that healthcheck with a localhost url, which would cause that check to fail. From a skim, I don't think that was an issue as I couldn't find any monit monitoring in the salt configs for kubelet. |
@mikedanese Hmmm.. isnt that a valid use case though ? I am starting to think relying on the Host header might be a bit presumptuous. |
@mikedanese I like what you are proposing in #4646. If we have that then we can use the Host header more comfortably since consumer of healthz which only want to do a simple ping would have the option of using healthz/ping |
During the kubelet's /healthz responce check to see if the hostname used by the master matches the hostname the kubelet knows itself by. If not fail the health check. Signed-off-by: Sami Wagiaalla <swagiaal@redhat.com>
Ping... Can someone review this for me |
LGTM. Merging. |
WIP: Catch kubelet-master hostname mismatch during health check
@brendandburns thanks! |
Hello all,
I have seen this bug a few times and run into it myself. For a variety of reasons a minion ends up with an incorrect hostname override. So it thinks its hostname is one thing while the master thinks it is something else. 'get minions' then shows the node as ready but pods never start on it.
This patch adds a hostname comparison as part of the node health check. @eparis had suggested using pkg/healthz.go although the health check code as changed since then.
Is this approach ok ? (expading the /healthz repsonse, checking against 'host' in 'HealthCheck')