Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic Configmaps/Secrets #43112

Closed
kfox1111 opened this issue Mar 15, 2017 · 23 comments
Closed

Atomic Configmaps/Secrets #43112

kfox1111 opened this issue Mar 15, 2017 · 23 comments
Labels
area/workload-api/deployment sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@kfox1111
Copy link

FEATURE REQUEST

  • note, anywhere that says 'configmap' below could have 'secret' substituted.

Longer description at the bottom. TL:DR; here:

We need a new volume type or a new property on configmap volumes that has the following behavior:

  • When a k8s object type that contains a pod template is created, when one or more of its configmaps is marked atomic, a snapshot of the referenced configmap is made and the template is updated to point to the snapshot instead. The lifetime of the snapshot tracks the templated object. This allows, for example, all pods in a ReplicaSet to be atomic, as they will always have the same configmaps between member pods no matter in what order they were (re)created.

Example workflow:

  1. The user creates a configmap named 'bar'.
  2. The user creates a deployment 'foo' that contains a reference to an atomic configmap named 'bar'.
    • k8s decides it needs to create a ReplicaSet 'foo-1' based on the template in deployment 'foo'.
    • The template contains atomic configmap reference 'bar' so k8s snapshots the existing configmap 'bar' to 'bar-1' and changes the ReplicaSet template as its being created to point to the non atomic configmap 'bar-1'
  3. A user updates configmap 'bar' with new settings.
    • k8s receives the new configmap but does not push it out to the existing pods, as they all reference 'bar-1' instead.
  4. The user decides now is the time to roll the config changes into production so triggers a deployment update on 'foo'.
    • k8s creates ReplicaSet 'foo-2' snapshotting 'bar' as before, making 'bar-2' and pointing foo-2's configmap references to it.
    • k8s starts the process of doing a rolling upgrade from 'foo-1' to 'foo-2'
  5. At this point, the user can roll all the way forward and delete ReplicaSet 'foo-1' causing k8s to delete the associated 'bar-1' or roll back to 'foo-1's version and delete 'foo-2' and k8s will delete 'bar-2'

Longer explanation:

Before Deployments, it was possible to orchestrate a rolling upgrade manually by performing ReplicaSet creation/deletion in the right orders. But to save the user a lot of manual work, Deployments were created to automate the workflow. One piece of the workflow was skipped though. That of configmaps. For some use cases, users want the ReplicaSet to contain members that all behave identically. This is one of the great features of containers. But if config is done outside the container using configmaps, you can get into weird situations where, to upgrade from RS-1 to RS-2, the configmap has to be changed first. Any pod failures in RS-1 after this change, or performing the action of rolling back to RS-1 could cause RS-1 pods to contain the configmap for RS-2's pods potentially causing malfunction or at least inconsistent behavior. This can be worked around by putting a version number in the configmap name, but that is a very manual process that parallels the manual process of doing ReplicaSet creation/versioning that Deployments were intended to alleviate.

The proposal is to make that part of the workflow of Deployments and other objects that support rolling upgrades/downgrades, such as the newly introduced Daemonset upgrades to perform this logic on behalf of the user. The user states that they want a referenced configmap to be atomic, and the management/lifecycle of a copy of the configmap is managed by the Deployment/Deamonset. This unburdons the user from having to deal with manual workflow around the configmaps.

A nice follow on feature would be to automatically trigger a rolling upgrade of a deployment/daemonset whenever the configmaps referenced by atomic configmaps change. This would eliminate the need for the user to perform step 4 in the example above.

@kfox1111
Copy link
Author

@idvoretskyi @flaper87

@mtaufen
Copy link
Contributor

mtaufen commented Mar 15, 2017

A user updates configmap 'bar' with new settings.
k8s receives the new configmap but does not push it out to the existing pods, as they all reference 'bar-1' instead.

Why not, instead of mutating bar, just create a ConfigMap named bar2, and update your Deployment to point the Pods at that new ConfigMap?

@liggitt
Copy link
Member

liggitt commented Mar 15, 2017

@kfox1111
Copy link
Author

@mtaufen, you can do the same thing with ReplicaSets without needing Deployments at all. The point of Deployment was to make the users life easier in not having to manually orchestrate through creating parallel ReplicaSets and managing their lifecycle separately. Configmaps/Secrets have the same problem as ReplicaSets and could benefit from the same treatment.

@kfox1111
Copy link
Author

@liggitt Yeah, sounds kind of similar to that. I hadn't seen it before. Thanks for pointing me at it. It does take a different implementation approach to it I think though, which may be more tolerable?

@rata
Copy link
Member

rata commented Mar 15, 2017

@kfox111 deployments are not just for a easier user workflow. I think that is a minor part.

The problems of not doing it server side were tons, leaving aside the user experience you mention. If it's not done server side, then a network issue on the client (or just a crash, power outage, etc.) Would​ leave the deployment in an intermediate state, not trivial to automate the recovery. It also made very difficult things like HPA, etc.

I really don't think the new abstraction was due to the user workflow you describe, but for those more fundamental problems.

Also, there is a valid question on why to do this on the deployments object or a new one. I think, before adding it to deployments, the pro and cons of both approaches should be weighted and even consider if this belongs to core kubernetes or not (and why :)).

What do you think?

@kfox1111
Copy link
Author

Fair enough .Maybe it was done for reasons beyond just usability. I don't know all the history of it.

It did make it much more easy for users to do the common thing of the workflow of upgrading thing from X to Y though.

But in its current state, its kind of incomplete and not atomic.

It could be a different object type, but I think it would have almost complete overlap with Deployments as it exists today, so probably extra work splitting it out, for maybe not much benifit.

Deployments are not the only object that I think should get the functionality though. Daemonsets are another example of one that could use atomic configmap/secret functionality.

I guess raw ReplicaSet also could use the same treatment. So maybe the change doesnt' belong in Deployments at all, but instead at ReplicaSet creation?

I do believe it belongs in K8s as its a very common pattern that is very tedious. Being straight forward and tedious, its better to make a machine do it then a person. :)

@rata
Copy link
Member

rata commented Mar 15, 2017 via email

@kfox1111
Copy link
Author

Immutable might be a better name for it. atomic was just the first thing that occurred to me.

I have looked at the daemonset update discussion a bit. But I don't think they have handled the immutable configmap case there either.

Optimizing the workflow is what I would like to do. In Deployments, you don't have to worry about the orchestration around, create RS-1, create RS-2, then garbage collect as needed. This is the same kind of thing. The user just deals with foo-configmap, rather then foo-configmap-1/foo-configmap-2 and the system automatically tacks on the -1/-2 as appropriate. There's a lot of prior art in k8s for this pattern/behavior.

I don't think its a large amount of code to write, or a burden on the k8s team to maintain once its written, but does eliminate a bunch of busy work for users. So I think its worth doing.

As it stands now, I feel like we're optimizing for (IMO) the antipattern of letting RS's members config drift. Users will tend to do what's easy, and its easier to do things that may break in unexpected ways on them. I'd rather we provide the tools to make it easy for them to do the thing that has the least chance of surprise/breakage.

@rata
Copy link
Member

rata commented Mar 15, 2017 via email

@kfox1111
Copy link
Author

I can do a proposal. is there a link to how that process works? I thought the feature request was the way that was handled.

today, that functionality is correct, from a pod level but not a ReplicaSet level. If the pod is ever recreated, or a RS is scaled, the new pod behaves suddenly differently then its brethren, based on an event that may have occurred a long time ago but didn't apply to the others. I don't want pods in a single ReplicaSet to behave differently from each other depending on when each pod was scheduled. Nor do I want a rollback of Deployment to have a mix of old config and new config'ed pods in the same RS. This is a recipe for hard to debug problems. Like we both have agreed, you can manually do that today. But, its a lot of work ensure that inconsistency doesn't happen, and that can cause errors to creep in as humans skipping a step by mistake happen. We should make it easy for the user to be able to do the right thing and get an expected/consistent result.

@rata
Copy link
Member

rata commented Mar 15, 2017 via email

@rata
Copy link
Member

rata commented Mar 15, 2017

@kfox1111 ohh, I was on my phone and didn't notice the link @liggitt shown. This is exactly what was proposed in that link: #31701

@kfox1111 see the discussion there, please. It seems it was dismissed for using GC in the last comment, but a lengthy reasoning seems to be through the PR comments (and on the file also): #31701 (comment)

I guess that address this issue for the moment and should be closed, then?

@kfox1111
Copy link
Author

Oh. are you saying a PR, not a spec kind of thing?

Ok, I think the confusion is on definition of "app". I think your defining app to be "running pod". I'm defining app to be something akin to Deployment. An app in my world is cloud native, pods come and go automatically as the system autoscales up and down. The members of a ReplicaSet should be atomic through time no matter what actions are preformed scaling, rolling back, machines dieing and pods getting evicted, etc. That is not handled today without a lot of manual work on behalf of the user.

@kfox1111
Copy link
Author

I think the difference between this request and the other is around snapshotting and ownership.

I want to own the main configmap and share the one configmap amongst multiple deployments/daemonsets. If I delete all the deployments/daemonsets referencing it, the configmap should stay. At the time of launching one of those, no further access to the main configmap is ever made again. I should even be able to delete the configmap and the atomic configmap references in existence should still continue to work, as they have a snapshot of the state. The snapshots are garbage collected, not the main configmap.

I think this behavior will deal better with package managers such as Helm. The owner of the root configmap is Helm in this case, and wouldn't be deleted out from under the owner.

A separate activity could trigger a deployment redeploy on that root configmap changing, but is totally orthoganal to this feature. Helm could do that itself I think.

I think the ownerReference in the other PR as referenced in #31701 (comment) Doesn't have the same GC same behavior as I'm describing above? Its assuming if no ownerReferences are there, then its free to GC?

It seems like the use case was de'prioritized in the other PR. Is this due to lack of manpower or desire to tackle the issue at the moment? Because I think this will be important to Kolla-Kubernetes and other projects and would be willing to add manpower if that's the issue.

@rata
Copy link
Member

rata commented Mar 15, 2017 via email

@rata
Copy link
Member

rata commented Mar 15, 2017 via email

@technosophos
Copy link
Contributor

Helm currently does this manually by versioning each name of a configmap. Obviously, we would benefit from having this implemented lower down the stack. @liggitt @smarterclayton how does OpenShift handle this kind of thing?

@bgrant0607
Copy link
Member

See also #22368

@bgrant0607 bgrant0607 added sig/apps Categorizes an issue or PR as relevant to SIG Apps. area/workload-api/deployment labels Mar 21, 2017
@smarterclayton
Copy link
Contributor

smarterclayton commented Apr 23, 2017 via email

@pmorie
Copy link
Member

pmorie commented May 5, 2017

I would definitely support a scheme similar to what is proposed here. When I read what is proposed, I imagine a controller that creates a new snapshot-resource from a 'master' resource. We do have a similar pattern in OpenShift for the ImageStream and Image resources.

One thing we should consider is: typically a rollout for an application might involve:

  1. Changes to the images used
  2. Changes to the configs used
  3. Changes to the secrets used
  4. Changes to the pod descriptors involved

We need a way to batch changes to all of the above into a single rollout event to avoid incomplete states.

There's also the question of how many versions we keep around for rollback.

@kfox1111
Copy link
Author

kfox1111 commented Jul 5, 2017

@pmorie If the configmap/secret's snapshot lifecycle matched the rc/rs that it was associated with, it would be deleted by the same action that deletes the rc/rs, and wouldn't need to be kept track of individually?

@bgrant0607
Copy link
Member

A lightweight approach was previously described here:
#31701 (comment)

Closing as a dupe of #22368.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/workload-api/deployment sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
None yet
Development

No branches or pull requests

8 participants