-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for implementing nominal services AKA StatefulSets AKA The-Proposal-Formerly-Known-As-PetSets #18016
Conversation
fc13e71
to
cc9eb23
Compare
Like a replication controller, a PetSet may be targeted by an autoscaler. The PetSet makes no assumptions | ||
about upgrading or altering the pods in the set (similar to a DaemonSet) - instead, the user can trigger | ||
graceful deletion and the PetSet will replace the terminated member with the newer template once it exits. | ||
Future proposals may offer update capabilities. A PetSet requires RunAlways pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean RestartPolicyAlways (or "a restart policy of Always")
I like this design a lot. I think this is going to make a lot of users very happy, and will make it practical for normal users to deploy applications that today require bizarre contortions (like creating one controller per pod). |
Nice proposal and looking forward to this! I am one such user looking forward to move away from one RC per instance to achieve this. |
I propose RabbitMQ as active-active exemple. It also has some specific requirement being master-master. |
This requires to use a NetworkStorage, I would prefer to have the choice to allow or not my Pets to be re-scheduled on an another node. If this is possible, using local storages on the cluster would be easy. |
cc @kubernetes/rh-cluster-infra @kubernetes/rh-scalability |
Thanks, will add rabbit as an example. On Dec 2, 2015, at 7:30 AM, Antoine Legrand notifications@github.com Active-active I propose RabbitMQ as active-active exemple. It also has some specific — |
This is a broader topic, and I agree it is important. I will add a section It is likely that locality is associated with volumes, not the sets - so as If you care about locality today the DaemonSet is the appropriate tool. I On Dec 2, 2015, at 8:08 AM, Antoine Legrand notifications@github.com Instances can migrate from machine to machine as necessary and are not tied This requires to use a NetworkStorage, I would prefer to have the choice to — |
So the key sticking point that I'm missing is gravity, forgiveness, and recovery. Once a pet has found it's home, it's not going to want to leave unless there is a maintenance / migration plan. Otherwise many clustered systems will attempt to recover for a failure condition when in fact it was a planned outage. |
The first two are separate issues and while they can and should be I'm actually going to walk back on saying update is out of scope a bit - a Forgiveness is described in another issue but we can implement this without For recovery is this post pod death recovery? The new pod is allowed to On Dec 2, 2015, at 1:10 PM, Timothy St. Clair notifications@github.com So the key sticking point that I'm missing is gravity, forgiveness, and Otherwise many clustered systems will attempt to recover during a failure — |
replicas exist as quickly as possible (by creating new pods as soon as old ones begin graceful deletion, for | ||
instance). In addition, pods by design have no stable network identity other than their assigned pod IP, | ||
which can change over the lifetime of a pod resource. ReplicaSets are best leveraged for shared-nothing, | ||
zero-coordination software. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other applicable adjectives: stateless, embarrassingly parallel, fungible
@solsson https://github.com/kubernetes/charts is probably a great place for your kafka example. |
/subscribe |
@smarterclayton Thanks for the proposal! We are using PetSet, we find these addtional features are useful for our use case:
Would them be supported in the further? Thanks! |
GCE e2e build/test passed for commit b9d998f. |
We can integrate with deployment to achieve this(perhaps in the future).
Did you try initialized annotation? |
Very glad to know it. PetSet and deployment are used for different use cases. I think PetSet needs it too. Is there any plan to support it in PetSet?
If I understand it correctly, it is used for initialization, it could not be used for rolling update after app has been running for some time. Could you explain it more? Thanks! |
Will update this after the rename and the pod safety proposal #34160 is reviewed. |
Proposal has been updated to reflect the changes to naming as decided in #27430 and has included the beta and GA criteria as described. @bprashanth I think this is ready for merge and subsequent changes can be reflected as updates. |
b9d998f
to
3f059e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, comments were nits I'm fine with fixing them or letting it in as is
* Add examples | ||
* Discuss failure modes for various types of clusters | ||
* Provide an active-active example | ||
* Templating proposals need to be argued through to reduce options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't all these already done?
rebalances. | ||
* Active-active | ||
* Galera - has multiple active masters which must remain in sync | ||
* ??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
master slave? (non-quorum, unilateral master)
|
||
|
||
## Design Assumptions | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in retrospect these assumptions feel a little pessimisitc. We've dicsucces:
External access direct to cluster members is out of scope
No built-in update
Limited scaling
on issues for longer than we should if we were just designing somthing that ignores any of that ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bprashanth Out of curiosity - does the "limited scaling" also includes limited down-scaling? Is it possible to say I can't go lower than 3-4 replicas?
## Proposed Design | ||
|
||
Add a new resource to Kubernetes to represent a set of pods that are individually distinct but each | ||
individual can safely be replaced-- the name **StatefulSet** (working name) is chosen to convey that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer (working name)?
individual can safely be replaced-- the name **StatefulSet** (working name) is chosen to convey that the | ||
individual members of the set are themselves "members" and thus each one is preserved. A relevant analogy | ||
is that a StatefulSet is composed of members, but the members are like goldfish. If you have a blue, red, and | ||
yellow goldfish, and the red goldfish dies, you replace it with another red goldfish and no one would |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goldfish analogy doesn't work half as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shame, it was an awesome analogy.
|
||
Requested features: | ||
|
||
* IPs per member for clustered software like Cassandra that cache resolved DNS addresses that can be used outside the cluster (scope growth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth noting that service per pod might also not solve this case because there is delay in updating endpoints
StatefulSets now? Oh boy. I guess this is just the nature of alpha / work in progress features. Will get started refactoring now. ... also, isn't the idea behind petsets / statefulsets keeping things healthy with persistent identities rather than culling off unhealthy pets? I certainly would NOT kill off one of my pets if it got sick. And I don't think Cassandra would appreciate random nodes dying, either. |
3f059e7
to
14258ce
Compare
Applied prashanth's changes - labelling. Thanks for round 1 of feedback :) |
LGTM (we're just a month away from 1 year to merge though) |
The amount of comments definitely breaks github, for sure. On Thu, Oct 27, 2016 at 3:11 PM, Prashanth B notifications@github.com
|
Automatic merge from submit-queue |
1 similar comment
Automatic merge from submit-queue |
|
||
Requested features: | ||
|
||
* IPs per member for clustered software like Cassandra that cache resolved DNS addresses that can be used outside the cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean it would be possible to access a specific cluster member from the outside of the cloud? Or by outside of the cluster you mean "within the same namespace"?
Automatic merge from submit-queue Proposal for implementing nominal services AKA StatefulSets AKA The-Proposal-Formerly-Known-As-PetSets This is the draft proposal for kubernetes#260.
This is the draft proposal for #260.
This change is