Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Self-hosted Kubelet proposal #23343

Merged

Conversation

derekparker
Copy link
Contributor

Provides a proposal for changes needed with Kubernetes to allow for a
self-hosted Kubelet bootstrap.

@k8s-bot
Copy link

k8s-bot commented Mar 22, 2016

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@derekparker
Copy link
Contributor Author

@k8s-bot
Copy link

k8s-bot commented Mar 22, 2016

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

1 similar comment
@k8s-bot
Copy link

k8s-bot commented Mar 22, 2016

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

@k8s-github-robot k8s-github-robot added kind/design Categorizes issue or PR as related to design. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 22, 2016
@mikedanese
Copy link
Member

@dchen1107


To expand on this, we envision a flow similar to the following:

1. Systemd (or $init_system) continually runs “bootstrap” Kubelet in “runonce” mode with a file lock until it pulls down a “self-hosted” Kubelet pod and runs it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assume --runonce-timeout results in a failure exit-code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there happens to be nothing scheduled to the node (hence hitting the --runonce-timeout) would we want that considered an error case?

In terms of the bootstrap kubelet / self-hosted kubelet pivot, I don't think coordinating around exit codes would be strictly necessary. Instead, the coordination point becomes "has another kubelet started", rather than "was I scheduled something". So $init could essentially loop on something like "if lockfile not acquired, start bootstrap kubelet; sleep X".

@derekparker
Copy link
Contributor Author

cc: @vishh @mikedanese @dchen1107 @aaronlevy

To summarize the results of the in person meetings last Wednesday:

As per @vishh 's feedback we decided that from a system administrator perspective, have a service continually restart several times would likely be taken as an indication of failure, even though it only represents essentially a "loop iteration" until the bootstrap kubelet can pull down the "self-hosted" kubelet (again, this is until we get Taints and Tolerations).

So, instead of modifying the "runonce" code path to be able to contact an API Server, we will instead modify the default code path of the kubelet with a --bootstrap flag, which will indicate to the kubelet that after it acquires a file lock, it will wait for another kubelet to attempt to acquire that lock (via inotify) and then exit.

I will update the proposal to reflect this new approach. @vishh please let me know if I remembered anything about our discussion incorrectly :).

@vishh
Copy link
Contributor

vishh commented Apr 4, 2016

@derekparker: Thanks for the summary. LGTM.
The bootstrapping Kubelet will exit once it notices another process attempting to acquire the lock.
It will restart immediately though, and take over the node if the lock were to be released. There will be no restart loops.

During upgrades, the bootstrap kubelet might take over from the old version, before it let's the upgraded version run.

It is possible that the bootstrap kubelet version is incompatible with the newer versions that were run in the node. For example, the cgroup cofigurations might be incompatible.
In the beginning, we will require cluster admins to keep the configuration in sync. Since we want the bootstrap kubelet to come up and run even if the API server is not available, we should persist the configuration for bootstrap kubelet on the node.
Once we have checkpointing in kubelet, we will checkpoint the updated config and have the bootstrap kubelet use the updated config, if it were to take over.

@aaronlevy
Copy link
Contributor

@vishh, coming back to this I'm still not fully clear on upgrade scenario.

bootstrap+upgrade scenario:

  1. bootstrap-kubelet gets pod for v1-kubelet
  2. v1-kubelet tries to acquire lock (inotify bootstrap kubelet)
  3. bootstrap-kubelet dies due to inotify (but is restarted by init, and now waiting on lockfile)
  4. v1-kubelet gets pod for upgraded v2-kubelet
    • now v1-kubelet is running, v2-kubelet and bootstrap-kubelet waiting on file-lock
  5. v1-kubelet pod definition is deleted, causing v1-kubelet to kill itself
  6. If bootstrap-kubelet wins lock-file race and starts before v2-kubelet how does bootstrap-kubelet ever know to die (inotify events not queued)?

@vishh
Copy link
Contributor

vishh commented Apr 4, 2016

  1. If bootstrap-kubelet wins lock-file race and starts before v2-kubelet how does bootstrap-kubelet ever know to die (inotify events not queued)?

Wouldn't the health checks on the kubelet pod end up restarting the kubelet?

@aaronlevy
Copy link
Contributor

Ah right, that seems like it should work. Thanks.

@derekparker
Copy link
Contributor Author

I have updated the proposal to reflect the latest discussions.

cc: @aaronlevy @dchen1107 @vishh @mikedanese @derekwaynecarr


## Abstract

In a self-hosted Kubernetes deployment, we have the initial bootstrap problem. This proposal presents a solution to the kubelet bootstrap, and assumes a functioning control plane, and a kubelet that can securely contact the API server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does a functioning control plane mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Could you describe the bootstrap problem?
  2. When in the bootstrap process that a functioning control plan (assuming the apiserver) is needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of the abstract, maybe we can elaborate a bit. Possibly something along the lines of:

When running self-hosted components, there needs to be a mechanism for pivoting from the initial bootstrap state to the kubernetes-managed (self-hosted) state. In the case of a self-hosted kubelet, this means pivoting from the initial kubelet defined & run on the host, to the kubelet pod which has been scheduled to the node.

This proposal presents a solution to the initial kubelet bootstrap, and the mechanism for pivoting to the self-hosted kubelet. This proposal assumes that the initial kubelet on the host is able to connect to a properly configured api-server.


Not sure if the above changes would answer the questions, but:

  1. The bootstrap problem is essentially that we want the kubelet to be managed by kubernetes, but we need an initial kubelet to do that. So we need some mechanism for us to launch a kubelet, then give up control once a new kubelet has started.

  2. A functioning apiserver would be required from the beginning (assuming no checkpointed pods I guess). Otherwise the initial kubelet would behave like any other kubelet without apiserver access.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think it'd help explaining why we want kubelet to be managed by kubernetes.
  • It's unclear from this proposal that who's going to start the initial apiserver. I assume the bootstrap kubelet will have to start the apiserver pod based on the manifest files? In that case, the apiserver wasn't actually functioning at the time the kubelet was being started.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was thinking about this from the perspective of "in terms of this proposal, assume a functioning apiserver" as a means of keeping the discussion more focused. Ultimately the apiserver could be a static pod, or just a binary run directly on the host, or docker container outside k8s, etc.

But you're right that you wouldn't strictly need an apiserver to demonstrate the same functionality. The kubelet pod could be a static pod as well (would be an odd use-case, but should work assuming the kubelet pod doesn't need secrets/configMap/etc). So maybe just drop that line, as it is somewhat orthogonal?

Also, agree that it would be helpful to add a "motivations" section to cover reasons we want self-hosted kubelet.

@vishh
Copy link
Contributor

vishh commented Apr 12, 2016

@derekparker Completed review pass.

# Proposal: Self-hosted kubelet

## Abstract

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: breaking up a paragraph with new lines will make commenting easier.

@derekparker
Copy link
Contributor Author

Proposal has been updated.

cc: @aaronlevy @vishh @yujuhong @derekwaynecarr

@derekparker derekparker force-pushed the self-hosted-kubelet-proposal branch from 5cb2e42 to a7f4402 Compare May 2, 2016 19:10
@derekparker
Copy link
Contributor Author

Updated proposal based on @vishh review.

@dchen1107 @derekwaynecarr any thoughts on this?

@derekwaynecarr
Copy link
Member

@derekparker - apologies for the delay reviewing. I am ok with this as described.

@philips
Copy link
Contributor

philips commented May 10, 2016

@vishh Can you ask @dchen1107 to review this? Or perhaps we can proceed without her review as you and @derekwaynecarr think that the approach is OK. We have been waiting for 12 days for a review.

@vishh vishh added lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels May 10, 2016
@k8s-bot
Copy link

k8s-bot commented May 10, 2016

GCE e2e build/test passed for commit a7f4402.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 088694f into kubernetes:master May 10, 2016
@bgrant0607
Copy link
Member

@derekparker This PR should have been squashed before merge. Next time, please squash after LGTM.

@derekparker
Copy link
Contributor Author

@bgrant0607 will do. FWIW I didn't really have much time / wasn't notified after lgtm tag was applied before the ok-to-merge tag was applied.

@tmrts
Copy link
Contributor

tmrts commented May 12, 2016

adding ok-to-merge-after-squash label to the submit-queue might be useful

@bgrant0607
Copy link
Member

@tmrts File an issue on the contrib repo for that.

@philips
Copy link
Contributor

philips commented May 14, 2016

PR for self-hosted kubelet --bootstrap flag is based on this proposal is here now: #25596

@cheld
Copy link
Contributor

cheld commented May 17, 2016

CC @batikanu, @zreigz

@pwittrock
Copy link
Member

@aaronlevy
@vishh

Would you provide an update on the status for the documentation for this feature as well as add any PRs as they are created?

Not Started / In Progress / In Review / Done

Thanks
@pwittrock

@aaronlevy
Copy link
Contributor

I do not believe any documentation has been started. Really the only work which has come out of this proposal thus far is adding the --exit-on-lock-contention flag to the kubelet.

I'm happy to add a blurb about the flag functionality as it stands if I can get a pointer to the best place to document flag functionality.

@pwittrock
Copy link
Member

Is this a feature that would be used by anyone as is or that changes existing behavior? I see that the flag itself has been documented in the --help with Whether kubelet should exit upon lock-file contention.. If there are no other user facing changes, and this flag is not meant to be consumed by users yet, we can wait to document until the feature is complete.

@pwittrock
Copy link
Member

From your response, it sounds like not doc changes are required for 1.3

@aaronlevy
Copy link
Contributor

Probably not. We should probably update this proposal to reflect the changed flag name though (s/--bootstrap/--exit-on-lock-contention) - but assume that isn't a blocker.

xingzhou pushed a commit to xingzhou/kubernetes that referenced this pull request Dec 15, 2016
…let-proposal

Automatic merge from submit-queue

docs: Self-hosted Kubelet proposal

Provides a proposal for changes needed with Kubernetes to allow for a
self-hosted Kubelet bootstrap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.