-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a way to relabel controllers and their pods #36897
Comments
cc @mfojtik |
@Kargakis Sorry, quick comments for now. We may want to think about a subresource, similar to /scale, that could do this uniformly across workload APIs. Maybe even we could extend /scale to be a generic workload-controller interface, but I haven't thought about that much yet. |
cc @erictune |
@bgrant0607 agreed that this should be similar to /scale, not sure about making it generic. |
Use-case: Easier Blue-Green deployments: openshift/origin#11954 |
See also comments here: Note that one pattern that's safe and doesn't require complex orchestration when one wants to introduce a new attribute is to modify selectors of existing workload controllers to select pods without the new label key, and then just add the key to new pod templates. This isn't possible with RCs, however, since they don't support the more powerful selectors. |
Would causing a relabel to happen spin down old pods and spin up new pods, or would existing pods be reused? I don't know if this is by design, but at present it isn't really possible to store pod state in labels as changing the RC causes new pods to be created. Having pod state is useful for services as it allows them to be able to select pods based on their state. |
Changing the RC is not causing new pods to be created, you are probably talking about Deployments / DeploymentConfigs. Changing a RC will create new pods if the RC is scaled up or old pods die. Relabeling would merely relabel all old pods to use the new label(s). |
If the RC selector is changed the system would suddenly see 0 pods and try to start scaling up. Since existing pods cannot have their labels changed in an atomic operation when the RC selector is changed there is possible contention. |
The RC selector should change as the last part of the "transaction" ie. once all old pods and the rc pod template labels have changed. If any of the latter actions fails, we won't change the selector. I guess once we start using transactions in etcd3, this won't be a problem. |
Don't assume transactions will help us - there are limitations that may
prevent some of these scenarios from being possible.
… If the RC selector is changed the system would suddenly see 0 pods and try to start scaling up. Since existing pods cannot have their labels changed in an atomic operation when the RC selector is changed there is possible contention.
The RC selector should change as the last part of the "transaction" ie. once all old pods and the rc pod template labels have changed. If any of the latter actions fails, we won't change the selector. I guess once we start using transactions in etcd3, this won't be a problem.
|
@bprashanth because we had discussed pod labels changing to track state (like leader). I think the atomicity challenge remains even with etcd3, so we should at least consider whether this is something that we might indicate another way. |
I think at this point atomic multi object operations is an explicit
non-goal of the kubernetes API system.
…On Mon, Nov 28, 2016 at 6:54 AM, Clayton Coleman ***@***.***> wrote:
@bprashanth <https://github.com/bprashanth> because we had discussed pod
labels changing to track state (like leader). I think the atomicity
challenge remains even with etcd3, so we should at least consider whether
this is something that we might indicate another way.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#36897 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAnglvCx2ozD0iOsfr5DDxrAufkIrzfJks5rCuskgaJpZM4Kz-D5>
.
|
Was that officially decided somewhere? I think there are some reasonable use cases for multi-object transactions, for example the way we do pod binding today is not great because it only updates the pod, not the node, so you can't actually detect conflicts. Even just being able to bump the resource version on the node and the pod atomically, while only actually mutating the NodeName in the pod, would be useful. But I suspect there are other useful scenarios (whether the one here is one of those, I don't know). We were afraid of the implication for sharding etcd, but we never actually needed to shard etcd (yet)... |
Doing a transaction under the covers of a particular resource is probably
ok. Trying to do it across resource types via the public API is probably
not.
If we're going to shard on anything, it's going to be resource types, and
it's happening O(soon) for the API extension work. I doubt a third party
extension will be allowed to have root access to the etcd cluster, and in
any case it's easier if they have their own storage.
…On Wed, Nov 30, 2016 at 4:09 AM, David Oppenheimer ***@***.*** > wrote:
Was that officially decided somewhere? I think there are some reasonable
use cases for multi-object transactions, for example the way we do pod
binding today is not great because it only updates the pod, not the node,
so you can't actually detect conflicts. Even just being able to bump the
resource version on the node and the pod atomically, while only actually
mutating the NodeName in the pod, would be useful. But I suspect there are
other useful scenarios (whether the one here is one of those, I don't know).
We were afraid of the implication for sharding etcd, but we never actually
needed to shard etcd (yet)...
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#36897 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pyg2_vgrsefUO3kx6FpfL6LdZm1Uks5rDT1mgaJpZM4Kz-D5>
.
|
Right. API server federation means the set of resources over which a
transaction is even possible is complex, since different groups may be
stored in different etcds or even different storage systems. Even if we
ultimately do decide to allow a few curated transactions, I'm quite
confident it won't happen in 2017.
On Wed, Nov 30, 2016 at 8:38 AM, Clayton Coleman <notifications@github.com>
wrote:
… Doing a transaction under the covers of a particular resource is probably
ok. Trying to do it across resource types via the public API is probably
not.
If we're going to shard on anything, it's going to be resource types, and
it's happening O(soon) for the API extension work. I doubt a third party
extension will be allowed to have root access to the etcd cluster, and in
any case it's easier if they have their own storage.
On Wed, Nov 30, 2016 at 4:09 AM, David Oppenheimer <
***@***.***
> wrote:
> Was that officially decided somewhere? I think there are some reasonable
> use cases for multi-object transactions, for example the way we do pod
> binding today is not great because it only updates the pod, not the node,
> so you can't actually detect conflicts. Even just being able to bump the
> resource version on the node and the pod atomically, while only actually
> mutating the NodeName in the pod, would be useful. But I suspect there
are
> other useful scenarios (whether the one here is one of those, I don't
know).
>
> We were afraid of the implication for sharding etcd, but we never
actually
> needed to shard etcd (yet)...
>
> —
> You are receiving this because you are on a team that was mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/kubernetes/kubernetes/issues/
36897#issuecomment-263820891>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABG_pyg2_
vgrsefUO3kx6FpfL6LdZm1Uks5rDT1mgaJpZM4Kz-D5>
> .
>
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#36897 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAngliZ3mwr-F2OEvu5BWfiQ1KhfbUeCks5rDaaSgaJpZM4Kz-D5>
.
|
It's off topic for this issue, so please don't discuss it further in this issue, but I agree with smarterclayton and lavalamp: that we should not support atomic transactions across multiple resources, and have documented that: OTOH, I could imagine some operation similar to rollback, which performs the relabeling orchestration server-side. |
See also #14894 |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
/remove-lifecycle rotten |
Deployment controller stops relabeling adopted ReplicaSets and Pods after #61615 got merged. Selectors became immutable in apps/v1 endpoints. Can we close this issue now that it's no longer needed by workloads controllers? |
I agree we are unlikely to get to this, and permitting it would increase complexity of object management. |
Currently, the deployment controller relabels replica sets and pods when it needs to adopt a replica set for a deployment. It would be more convenient to have a replica set endpoint that would force the replica set controller to do all the relabeling (change pod template labels, update old pods, change selector). The implementation should be similar to how the rollback spec works for Deployments.
It would help:
kubectl set selector
in Add new command "kubectl set selector" #28949 enabled only for services. Eventually, we could enable it for controllers by using the new endpoint.@kubernetes/api-review-team @kubernetes/sig-api-machinery @ncdc @liggitt @bgrant0607
The text was updated successfully, but these errors were encountered: