-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add daemon design doc #13368
Add daemon design doc #13368
Conversation
@bgrant0607 @davidopp @vmarmol @rjnagal @vishh @erictune @timothysc @smarterclayton @eparis (cc'ing people who commented on the original proposal, feel free to add others if I missed anyone!) |
GCE e2e build/test failed for commit da64887. |
|
||
## Motivation | ||
|
||
In Kubernetes, a Replication Controller ensures that the specified number of a specified pod are running in the cluster at all times (pods are restarted if they are killed). With the Replication Controller, users cannot control which nodes their pods run on - Kubernetes decides how to schedule the pods onto nodes. However, many users want control over how certain pods are scheduled. In particular, many users have requested for a way to run a daemon on every node in the cluster, or on a certain set of nodes in the cluster. This is essential for use cases such as building a sharded datastore, or running a logger on every node. In comes the daemon controller, a way to conveniently create and manage daemon-like workloads in Kubernetes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really true that "users cannot control which nodes their pods run on" -- you can use node selector in your Pod.Spec. The reason why ReplicationController is sub-optimal for this use case is that you can't know what value to set for replicas
in the RC spec, since the size of the cluster (or size of the subset of the cluster defined by the node selector) may vary over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change.
Made a few comments. One stylistic notes: in parts of the doc you write in future tense (e.g. "The daemon controller will support...") while in others you write in past tense (e.g. "Each component was unit tested.") Probably best to stick to one tense. |
@davidopp Thanks for the review, I'll make the changes tomorrow! I'll change all the tenses to the past tense. |
Labelling this PR as size/L |
@davidopp Sorry for the delay, made the requested changes. PTAL when you have time! |
GCE e2e build/test passed for commit f2297d8. |
- update | ||
- Daemon controllers have labels, so you could, for example, list all daemon controllers with a certain label (the same way you would for a Replication Controller). | ||
- In general, for all the supported features like get, describe, update, etc, the Daemon Controller works in a similar way to the Replication Controller. However, note that the Daemon Controller and the Replication Controller are different constructs. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the update needs to be explained a bit more carefully. Will there be a rolling update to limit the number of instances that are changing? The description of the daemon controller's operation doesn't seem to allow for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd probably describe this in deployment terminology as MaxAvailability:
80% and MaxSurge: 0. The deployment process would need a way to make the
daemon controller "inert" or "inactive", create a new daemon controller,
and then selectively delete old pods.
On Wed, Sep 2, 2015 at 10:12 PM, Kamal Marhubi notifications@github.com
wrote:
In docs/design/daemon.md
#13368 (comment)
:
image: kubernetes/sharded
ports:
- containerPort: 9042
+```name: main
- commands that get info
- get (e.g. kubectl get dc)
- describe
- Modifiers
- delete
- stop: first we turn down all the pods controller by the daemon (by setting the nodeName to a non-existed name). Then we turn down the daemon controller.
- label
- update
- Daemon controllers have labels, so you could, for example, list all daemon controllers with a certain label (the same way you would for a Replication Controller).
- In general, for all the supported features like get, describe, update, etc, the Daemon Controller works in a similar way to the Replication Controller. However, note that the Daemon Controller and the Replication Controller are different constructs.
I think that the update needs to be explained a bit more carefully. Will
there be a rolling update to limit the number of instances that are
changing? The description of the daemon controller's operation doesn't seem
to allow for that.—
Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/13368/files#r38607531.
Clayton Coleman | Lead Engineer, OpenShift
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree you should provide some description of a plausible update process. Even if it managed by deployment controller, I assume we need some way deployment controller tweaks daemon controller to accomplish the update? (Or would deployment controller manage the update directly?)
Also, since one of the main purposes of daemon controller is to guarantee one replica per node, you probably want to kill the old pod before starting the new pod (i.e. the application make break if there are two instances running on the same node).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to rathole on updates in the initial implementation, but I envision Daemon orchestrating the updates, analogous to (but not the same as) Deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I just specified which fields can be updated.
@AnanyaKumar one thing I think is missing is QoS, and possibly priority of some kind. e.g.
|
|
||
**Author**: Ananya Kumar (@AnanyaKumar) | ||
|
||
**Status**: Draft proposal; prototype in progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation is in progress.
labels: | ||
name: datastore-shard | ||
spec: | ||
node-selector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodeSelector:
https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/types.go#L1215
node-selector
violates our API conventions:
https://github.com/kubernetes/kubernetes/blob/master/docs/devel/api-conventions.md#naming-conventions
Can we merge this before v1.1 considering the DaemonSet will have been entirely merged? @AnanyaKumar do you have time to address comments? Thanks for all your work. I would love to have some accompanying doc included in v1.1. |
@mikedanese Sorry for the delay, I had a few big deadlines. Is @erictune taking over this, based on the PR he sent out? Or is that for something else? Thanks for finishing up the DaemonSet PRs! :) |
I don't mind taking over this PR and making the necessary changes given how soon the code freeze is, if @AnanyaKumar doesn't have time. But I've pinged #14079 to see if @erictune is trying to replace this doc or just augment it. |
I'm taking over this PR. I am going to send out a new PR that incorporates the review comments made since Ananya's last commit. In the case of comments that merit a reply, I will reply in this PR (not the new one) since there's no way I'm aware of to move the comments over. |
@davidopp Thanks for taking over, and I'm so sorry for not revising the design doc - I've been pretty occupied at school, and spent the weekend at a hackathon. If you have any questions feel free to ping me. |
Originally started as PR kubernetes#13368.
Superceded by #14326. Please put any followup comments there. |
Originally started as PR kubernetes#13368.
Originally started as PR kubernetes#13368.
Originally started as PR kubernetes#13368.
No description provided.