-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StatefulSets with the same selector interfere with each other #36859
Comments
Filter on label selector AND create-by annotation should be safe |
This comes up a lot. Seems like we want some machinery to make this easier. @Kargakis |
Per discussion with @erictune, we should fix this in 1.6, presumably with Controller Reference instead |
This is the good old overlapping selectors bug right? (fixable with the good old overlapping selectors solution as Janet mentioned). We basically punted on this for a bunch of releases for RC/RS etc, then put in controlleref, and now everything that doesn't use controllereff still suffers from it.
This is not the solution we adopted everywhere else, though I don't deny that it might work for this specific case. An easy short term fix would be to filter the pets returned by the selector list: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/petset/pet_set.go#L223 by an additional petste-name filter, because we can. IIRC the selector list is used to figure out who to delete, not who to create, we know who to create based on the expected identities of the set. |
Then there's the approach implemented in Jobs, where we have auto-generated selectors, which can still be combined with user created ones. That with controller ref, should make the pods distinguishable enough. |
@bgrant0607 argued somewhere that the autogenerated selector is more appropriate for Jobs than for long-running controllers (users will care more about labels on the latter vs the former) and I agree with that statement. |
That make sense, thx for bringing that up. |
The original discussion started in #2210. We should document this (hint hint). This is a tradeoff between ease of creation (1st operation) and ease of management (subsequent operations). Labels are like names in most respects. It is desirable for a collection of labels to be unique for a given set so that there is a way to unambiguously refer to that set, in management operations (e.g., We encourage use of a unique collection of labels rather than a single unique label value since the additional attributes are generally needed, anyway (e.g., due to rolling update), and otherwise the user would need to combine multiple attributes in a single label value or generate the unique label by some means that would likely result in an unpredictable and/or not-human-friendly value. It is desirable that the collection of labels be predictable and human-friendly in order to facilitate references from other resources (e.g., services, pod affinity/anti-affinity policies, disruption budget, pod modification policies), and from controllers in adoption cases, even for StatefulSet, where a user might want to delete their previous PetSet or StatefulSet and create a new one to adopt the pre-existing pods. Unlike many other REST APIs, we don't expect clients to need to query most resources first in order to figure out how to refer to them in subsequent operations. Jobs are different than most other resources because we do typically expect them to be managed by higher-level APIs and/or other orchestrators, and to exist concurrently with subsequently created Jobs. In those respects, they are similar to Pods and the ReplicaSets generated by Deployment, so generated names and labels make sense. Controller references are intended to prevent bad behavior due to overlapping selectors, especially across different types of controllers, but are not intended as a replacement for label uniqueness. See also: StatefulSets are different than every other workload controller in that the instances have predictable names derived from the set resource name. Could we eliminate the use of labels for identifying the set's pods? In addition to making management of the pods harder, as discussed above, my main concern about that is that it would be inconsistent with all other workload controllers. Therefore, I'd like to think about a strategy that would work for other workload controllers, as well. The main problem with Job's approach is that it creates a uid label, which is unpredictable an not human-friendly. The current approach of defaulting the label selector from all labels in the template has problems, as well (#34292, #17902). Defaulting the template labels and selector from just the resource name (and not the uid), would address the problem with Job's approach and would be neither better nor worse with respect to the other problems. We'd probably use It appears that we can't expect new users to understand what labels they need ahead of time, though our tooling (e.g., |
Also note that the /scale subresource requires a selector. I'd be on board with preventing changes that would break the StatefulSet. For instance, for now we could prevent the user from setting the selector and from removing the |
Same thing should apply for the podtemplate-hash label in ReplicaSets owned by Deployments. |
I'm pursuing a fix for this via controllerRef (#24946). |
1. pcb and pcb controller are removed and their functionality is encapsulated in StatefulPodControlInterface. 2. IdentityMappers has been removed to clarify what properties of a Pod are mutated by the controller. All mutations are performed in the UpdateStatefulPod method of the StatefulPodControlInterface. 3. In the current control loop, which implements the same logic, is in stateful_set_control.go. The Pods are now sorted and considered by their ordinal indices, as is outlined in the documentation. 4. StatefulSetController now checks to see if the Pods matching a StatefulSet's Selector also match the Name of the StatefulSet.
Automatic merge from submit-queue (batch tested with PRs 40796, 40878, 36033, 40838, 41210) StatefulSet hardening **What this PR does / why we need it**: This PR contains the following changes to StatefulSet. Only one change effects the semantics of how the controller operates (This is described in #38418), and this change only brings the controller into conformance with its documented behavior. 1. pcb and pcb controller are removed and their functionality is encapsulated in StatefulPodControlInterface. This class modules the design contoller.PodControlInterface and provides an abstraction to clientset.Interface which is useful for testing purposes. 2. IdentityMappers has been removed to clarify what properties of a Pod are mutated by the controller. All mutations are performed in the UpdateStatefulPod method of the StatefulPodControlInterface. 3. The statefulSetIterator and petQueue classes are removed. These classes sorted Pods by CreationTimestamp. This is brittle and not resilient to clock skew. The current control loop, which implements the same logic, is in stateful_set_control.go. The Pods are now sorted and considered by their ordinal indices, as is outlined in the documentation. 4. StatefulSetController now checks to see if the Pods matching a StatefulSet's Selector also match the Name of the StatefulSet. This will make the controller resilient to overlapping, and will be enhanced by the addition of ControllerRefs. 5. The total lines of production code have been reduced, and the total number of unit tests has been increased. All new code has 100% unit coverage giving the module 83% coverage. Tests for StatefulSetController have been added, but it is not practical to achieve greater coverage in unit testing for this code (the e2e tests for StatefulSet cover these areas). 6. Issue #38418 is fixed in that StaefulSet will ensure that all Pods that are predecessors of another Pod are Running and Ready prior to launching a new Pod. This removes the potential for deadlock when a Pod needs to be rescheduled while its predecessor is hung in Pending or Initializing. 7. All reference to pet have been removed from the code and comments. **Which issue this PR fixes** fixes #38418,#36859 **Special notes for your reviewer**: **Release note**: ```release-note Fixes issue #38418 which, under circumstance, could cause StatefulSet to deadlock. Mediates issue #36859. StatefulSet only acts on Pods whose identity matches the StatefulSet, providing a partial mediation for overlapping controllers. ```
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555) StatefulSet: Respect ControllerRef **What this PR does / why we need it**: This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings StatefulSet into full compliance with ControllerRef. See the individual commit messages for details. **Which issue this PR fixes**: Fixes #36859 **Special notes for your reviewer**: **Release note**: ```release-note StatefulSet now respects ControllerRef to avoid fighting over Pods. At the time of upgrade, **you must not have StatefulSets with selectors that overlap** with any other controllers (such as ReplicaSets), or else [ownership of Pods may change](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md#upgrading). ``` cc @erictune @kubernetes/sig-apps-pr-reviews
What happened:
When two statefulsets are created with the same labels, only one manages to come up.
This is caused by the same issue as #36855 (comment).
What you expected to happen:
Two StatefulSets should be created, instead, one was created, and another was unable to come up.
How to reproduce it (as minimally and precisely as possible):
What we should be doing:
cc @kubernetes/sig-apps
The text was updated successfully, but these errors were encountered: