StatefulSets with the same selector interfere with each other #36859

foxish · 2016-11-16T02:00:52Z

What happened:
When two statefulsets are created with the same labels, only one manages to come up.
This is caused by the same issue as #36855 (comment).

What you expected to happen:
Two StatefulSets should be created, instead, one was created, and another was unable to come up.

~ kubectl get pods --watch
NAME      READY     STATUS    RESTARTS   AGE
ss2-0     1/1       Running   0          7s
ss2-1     1/1       Running   0          6s
ss2-2     1/1       Running   0          4s
NAME      READY     STATUS    RESTARTS   AGE
ss1-0     0/1       Pending   0          1s
ss1-0     0/1       Pending   0         1s
ss1-0     0/1       ContainerCreating   0         1s
ss1-0     1/1       Running   0         2s
ss1-0     1/1       Terminating   0         2s
ss1-0     1/1       Terminating   0         2s
ss1-0     0/1       Pending   0         0s
ss1-0     0/1       Pending   0         0s
ss1-0     0/1       ContainerCreating   0         1s
ss1-0     1/1       Running   0         2s
ss1-0     1/1       Terminating   0         2s
ss1-0     1/1       Terminating   0         2s

How to reproduce it (as minimally and precisely as possible):

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: ss1
spec:
  serviceName: "ss1"
  replicas: 3
  template:
    metadata:
      labels:
        app: ss
    spec:
      containers:
      - name: alpine
        image: alpine
        command: 
        - /bin/sleep
        - "1000000"

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: ss2
spec:
  serviceName: "ss2"
  replicas: 3
  template:
    metadata:
      labels:
        app: ss
    spec:
      containers:
      - name: alpine
        image: alpine
        command: 
        - /bin/sleep
        - "1000000"

What we should be doing:

Do not filter on the label selector to find the pods associated with a PetSet.

cc @kubernetes/sig-apps

The text was updated successfully, but these errors were encountered:

janetkuo · 2016-11-16T02:11:47Z

Filter on label selector AND create-by annotation should be safe

smarterclayton · 2016-11-16T02:35:51Z

This comes up a lot. Seems like we want some machinery to make this easier. @Kargakis

janetkuo · 2016-11-16T18:38:42Z

Per discussion with @erictune, we should fix this in 1.6, presumably with Controller Reference instead

bprashanth · 2016-11-16T22:15:43Z

This is the good old overlapping selectors bug right? (fixable with the good old overlapping selectors solution as Janet mentioned). We basically punted on this for a bunch of releases for RC/RS etc, then put in controlleref, and now everything that doesn't use controllereff still suffers from it.

Do not filter on the label selector to find the pods associated with a PetSet.

This is not the solution we adopted everywhere else, though I don't deny that it might work for this specific case.

An easy short term fix would be to filter the pets returned by the selector list: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/petset/pet_set.go#L223 by an additional petste-name filter, because we can. IIRC the selector list is used to figure out who to delete, not who to create, we know who to create based on the expected identities of the set.

soltysh · 2016-11-19T19:59:07Z

Then there's the approach implemented in Jobs, where we have auto-generated selectors, which can still be combined with user created ones. That with controller ref, should make the pods distinguishable enough.

0xmichalis · 2016-11-21T09:23:26Z

Then there's the approach implemented in Jobs, where we have auto-generated selectors, which can still be combined with user created ones. That with controller ref, should make the pods distinguishable enough.

@bgrant0607 argued somewhere that the autogenerated selector is more appropriate for Jobs than for long-running controllers (users will care more about labels on the latter vs the former) and I agree with that statement.

soltysh · 2016-11-21T13:22:55Z

That make sense, thx for bringing that up.

bgrant0607 · 2016-11-21T18:19:35Z

The original discussion started in #2210.

We should document this (hint hint).

This is a tradeoff between ease of creation (1st operation) and ease of management (subsequent operations).

Labels are like names in most respects.

It is desirable for a collection of labels to be unique for a given set so that there is a way to unambiguously refer to that set, in management operations (e.g., kubectl delete all --selector=app=ss, monitoring/logging queries). Indeed, the most common reason to want to change labels and selectors (#26202) is lack of planning ahead about what labels one would need: controller, release/version, canary, service, environment, etc. etc.

We encourage use of a unique collection of labels rather than a single unique label value since the additional attributes are generally needed, anyway (e.g., due to rolling update), and otherwise the user would need to combine multiple attributes in a single label value or generate the unique label by some means that would likely result in an unpredictable and/or not-human-friendly value.

It is desirable that the collection of labels be predictable and human-friendly in order to facilitate references from other resources (e.g., services, pod affinity/anti-affinity policies, disruption budget, pod modification policies), and from controllers in adoption cases, even for StatefulSet, where a user might want to delete their previous PetSet or StatefulSet and create a new one to adopt the pre-existing pods.

Unlike many other REST APIs, we don't expect clients to need to query most resources first in order to figure out how to refer to them in subsequent operations. Jobs are different than most other resources because we do typically expect them to be managed by higher-level APIs and/or other orchestrators, and to exist concurrently with subsequently created Jobs. In those respects, they are similar to Pods and the ReplicaSets generated by Deployment, so generated names and labels make sense.

Controller references are intended to prevent bad behavior due to overlapping selectors, especially across different types of controllers, but are not intended as a replacement for label uniqueness. See also:
#14961 (comment)

StatefulSets are different than every other workload controller in that the instances have predictable names derived from the set resource name. Could we eliminate the use of labels for identifying the set's pods? In addition to making management of the pods harder, as discussed above, my main concern about that is that it would be inconsistent with all other workload controllers. Therefore, I'd like to think about a strategy that would work for other workload controllers, as well.

The main problem with Job's approach is that it creates a uid label, which is unpredictable an not human-friendly. The current approach of defaulting the label selector from all labels in the template has problems, as well (#34292, #17902). Defaulting the template labels and selector from just the resource name (and not the uid), would address the problem with Job's approach and would be neither better nor worse with respect to the other problems. We'd probably use name as the label key, which would enable users to not think as much about what --selector=name=ss1 means, which, again, mostly defers confusion to later.

It appears that we can't expect new users to understand what labels they need ahead of time, though our tooling (e.g., kubectl run) and documentation don't help them. We probably need to expect operations like expose and rolling-update to add new labels for their own use (#36897).

cc @erictune @pwittrock @Kargakis

bgrant0607 · 2016-11-21T22:19:43Z

Also note that the /scale subresource requires a selector.

I'd be on board with preventing changes that would break the StatefulSet. For instance, for now we could prevent the user from setting the selector and from removing the name label.

0xmichalis · 2016-12-08T12:42:08Z

I'd be on board with preventing changes that would break the StatefulSet. For instance, for now we could prevent the user from setting the selector and from removing the name label.

Same thing should apply for the podtemplate-hash label in ReplicaSets owned by Deployments.

enisoc · 2017-01-07T00:56:11Z

I'm pursuing a fix for this via controllerRef (#24946).

1. pcb and pcb controller are removed and their functionality is encapsulated in StatefulPodControlInterface. 2. IdentityMappers has been removed to clarify what properties of a Pod are mutated by the controller. All mutations are performed in the UpdateStatefulPod method of the StatefulPodControlInterface. 3. In the current control loop, which implements the same logic, is in stateful_set_control.go. The Pods are now sorted and considered by their ordinal indices, as is outlined in the documentation. 4. StatefulSetController now checks to see if the Pods matching a StatefulSet's Selector also match the Name of the StatefulSet.

Automatic merge from submit-queue (batch tested with PRs 40796, 40878, 36033, 40838, 41210) StatefulSet hardening **What this PR does / why we need it**: This PR contains the following changes to StatefulSet. Only one change effects the semantics of how the controller operates (This is described in #38418), and this change only brings the controller into conformance with its documented behavior. 1. pcb and pcb controller are removed and their functionality is encapsulated in StatefulPodControlInterface. This class modules the design contoller.PodControlInterface and provides an abstraction to clientset.Interface which is useful for testing purposes. 2. IdentityMappers has been removed to clarify what properties of a Pod are mutated by the controller. All mutations are performed in the UpdateStatefulPod method of the StatefulPodControlInterface. 3. The statefulSetIterator and petQueue classes are removed. These classes sorted Pods by CreationTimestamp. This is brittle and not resilient to clock skew. The current control loop, which implements the same logic, is in stateful_set_control.go. The Pods are now sorted and considered by their ordinal indices, as is outlined in the documentation. 4. StatefulSetController now checks to see if the Pods matching a StatefulSet's Selector also match the Name of the StatefulSet. This will make the controller resilient to overlapping, and will be enhanced by the addition of ControllerRefs. 5. The total lines of production code have been reduced, and the total number of unit tests has been increased. All new code has 100% unit coverage giving the module 83% coverage. Tests for StatefulSetController have been added, but it is not practical to achieve greater coverage in unit testing for this code (the e2e tests for StatefulSet cover these areas). 6. Issue #38418 is fixed in that StaefulSet will ensure that all Pods that are predecessors of another Pod are Running and Ready prior to launching a new Pod. This removes the potential for deadlock when a Pod needs to be rescheduled while its predecessor is hung in Pending or Initializing. 7. All reference to pet have been removed from the code and comments. **Which issue this PR fixes** fixes #38418,#36859 **Special notes for your reviewer**: **Release note**: ```release-note Fixes issue #38418 which, under circumstance, could cause StatefulSet to deadlock. Mediates issue #36859. StatefulSet only acts on Pods whose identity matches the StatefulSet, providing a partial mediation for overlapping controllers. ```

@erictune

Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555) StatefulSet: Respect ControllerRef **What this PR does / why we need it**: This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings StatefulSet into full compliance with ControllerRef. See the individual commit messages for details. **Which issue this PR fixes**: Fixes #36859 **Special notes for your reviewer**: **Release note**: ```release-note StatefulSet now respects ControllerRef to avoid fighting over Pods. At the time of upgrade, **you must not have StatefulSets with selectors that overlap** with any other controllers (such as ReplicaSets), or else [ownership of Pods may change](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md#upgrading). ``` cc @erictune @kubernetes/sig-apps-pr-reviews

foxish added area/stateful-apps kind/bug Categorizes issue or PR as related to a bug. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Nov 16, 2016

foxish added this to the v1.5 milestone Nov 16, 2016

foxish assigned foxish and janetkuo Nov 16, 2016

foxish mentioned this issue Nov 16, 2016

StatefulSet tests #36855

Closed

19 tasks

foxish added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 16, 2016

janetkuo mentioned this issue Nov 16, 2016

Avoid overlapping controllers (including StatefulSets) to fight with StatefulSets #36866

Closed

janetkuo modified the milestones: v1.6, v1.5 Nov 16, 2016

This was referenced Nov 21, 2016

Provide a way to relabel controllers and their pods #36897

Closed

Generate unique label for Replication Controller, Job #12298

Closed

enisoc mentioned this issue Jan 7, 2017

Proposal of selector generation kubernetes/community#256

Closed

kow3ns mentioned this issue Feb 2, 2017

StatefulSet hardening #40838

Merged

kow3ns added a commit to kow3ns/kubernetes that referenced this issue Feb 8, 2017

fixes kubernetes#38418,kubernetes#36859

3030e76

enisoc mentioned this issue Feb 24, 2017

StatefulSet: Respect ControllerRef #42080

Merged

k8s-github-robot closed this as completed in #42080 Mar 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StatefulSets with the same selector interfere with each other #36859

StatefulSets with the same selector interfere with each other #36859

foxish commented Nov 16, 2016

janetkuo commented Nov 16, 2016

smarterclayton commented Nov 16, 2016

janetkuo commented Nov 16, 2016

bprashanth commented Nov 16, 2016

soltysh commented Nov 19, 2016

0xmichalis commented Nov 21, 2016

soltysh commented Nov 21, 2016

bgrant0607 commented Nov 21, 2016

bgrant0607 commented Nov 21, 2016

0xmichalis commented Dec 8, 2016

enisoc commented Jan 7, 2017

StatefulSets with the same selector interfere with each other #36859

StatefulSets with the same selector interfere with each other #36859

Comments

foxish commented Nov 16, 2016

janetkuo commented Nov 16, 2016

smarterclayton commented Nov 16, 2016

janetkuo commented Nov 16, 2016

bprashanth commented Nov 16, 2016

soltysh commented Nov 19, 2016

0xmichalis commented Nov 21, 2016

soltysh commented Nov 21, 2016

bgrant0607 commented Nov 21, 2016

bgrant0607 commented Nov 21, 2016

0xmichalis commented Dec 8, 2016

enisoc commented Jan 7, 2017