Atomic Configmaps/Secrets #43112

kfox1111 · 2017-03-15T00:19:30Z

FEATURE REQUEST

note, anywhere that says 'configmap' below could have 'secret' substituted.

Longer description at the bottom. TL:DR; here:

We need a new volume type or a new property on configmap volumes that has the following behavior:

When a k8s object type that contains a pod template is created, when one or more of its configmaps is marked atomic, a snapshot of the referenced configmap is made and the template is updated to point to the snapshot instead. The lifetime of the snapshot tracks the templated object. This allows, for example, all pods in a ReplicaSet to be atomic, as they will always have the same configmaps between member pods no matter in what order they were (re)created.

Example workflow:

The user creates a configmap named 'bar'.
The user creates a deployment 'foo' that contains a reference to an atomic configmap named 'bar'.
- k8s decides it needs to create a ReplicaSet 'foo-1' based on the template in deployment 'foo'.
- The template contains atomic configmap reference 'bar' so k8s snapshots the existing configmap 'bar' to 'bar-1' and changes the ReplicaSet template as its being created to point to the non atomic configmap 'bar-1'
A user updates configmap 'bar' with new settings.
- k8s receives the new configmap but does not push it out to the existing pods, as they all reference 'bar-1' instead.
The user decides now is the time to roll the config changes into production so triggers a deployment update on 'foo'.
- k8s creates ReplicaSet 'foo-2' snapshotting 'bar' as before, making 'bar-2' and pointing foo-2's configmap references to it.
- k8s starts the process of doing a rolling upgrade from 'foo-1' to 'foo-2'
At this point, the user can roll all the way forward and delete ReplicaSet 'foo-1' causing k8s to delete the associated 'bar-1' or roll back to 'foo-1's version and delete 'foo-2' and k8s will delete 'bar-2'

Longer explanation:

Before Deployments, it was possible to orchestrate a rolling upgrade manually by performing ReplicaSet creation/deletion in the right orders. But to save the user a lot of manual work, Deployments were created to automate the workflow. One piece of the workflow was skipped though. That of configmaps. For some use cases, users want the ReplicaSet to contain members that all behave identically. This is one of the great features of containers. But if config is done outside the container using configmaps, you can get into weird situations where, to upgrade from RS-1 to RS-2, the configmap has to be changed first. Any pod failures in RS-1 after this change, or performing the action of rolling back to RS-1 could cause RS-1 pods to contain the configmap for RS-2's pods potentially causing malfunction or at least inconsistent behavior. This can be worked around by putting a version number in the configmap name, but that is a very manual process that parallels the manual process of doing ReplicaSet creation/versioning that Deployments were intended to alleviate.

The proposal is to make that part of the workflow of Deployments and other objects that support rolling upgrades/downgrades, such as the newly introduced Daemonset upgrades to perform this logic on behalf of the user. The user states that they want a referenced configmap to be atomic, and the management/lifecycle of a copy of the configmap is managed by the Deployment/Deamonset. This unburdons the user from having to deal with manual workflow around the configmaps.

A nice follow on feature would be to automatically trigger a rolling upgrade of a deployment/daemonset whenever the configmaps referenced by atomic configmaps change. This would eliminate the need for the user to perform step 4 in the example above.

kfox1111 · 2017-03-15T00:23:20Z

@idvoretskyi @flaper87

mtaufen · 2017-03-15T00:56:06Z

A user updates configmap 'bar' with new settings.
k8s receives the new configmap but does not push it out to the existing pods, as they all reference 'bar-1' instead.

Why not, instead of mutating bar, just create a ConfigMap named bar2, and update your Deployment to point the Pods at that new ConfigMap?

liggitt · 2017-03-15T01:56:39Z

echoes of https://github.com/kubernetes/kubernetes/pull/31701/files

kfox1111 · 2017-03-15T17:06:38Z

@mtaufen, you can do the same thing with ReplicaSets without needing Deployments at all. The point of Deployment was to make the users life easier in not having to manually orchestrate through creating parallel ReplicaSets and managing their lifecycle separately. Configmaps/Secrets have the same problem as ReplicaSets and could benefit from the same treatment.

kfox1111 · 2017-03-15T17:11:15Z

@liggitt Yeah, sounds kind of similar to that. I hadn't seen it before. Thanks for pointing me at it. It does take a different implementation approach to it I think though, which may be more tolerable?

rata · 2017-03-15T18:13:33Z

@kfox111 deployments are not just for a easier user workflow. I think that is a minor part.

The problems of not doing it server side were tons, leaving aside the user experience you mention. If it's not done server side, then a network issue on the client (or just a crash, power outage, etc.) Would leave the deployment in an intermediate state, not trivial to automate the recovery. It also made very difficult things like HPA, etc.

I really don't think the new abstraction was due to the user workflow you describe, but for those more fundamental problems.

Also, there is a valid question on why to do this on the deployments object or a new one. I think, before adding it to deployments, the pro and cons of both approaches should be weighted and even consider if this belongs to core kubernetes or not (and why :)).

What do you think?

kfox1111 · 2017-03-15T18:34:27Z

Fair enough .Maybe it was done for reasons beyond just usability. I don't know all the history of it.

It did make it much more easy for users to do the common thing of the workflow of upgrading thing from X to Y though.

But in its current state, its kind of incomplete and not atomic.

It could be a different object type, but I think it would have almost complete overlap with Deployments as it exists today, so probably extra work splitting it out, for maybe not much benifit.

Deployments are not the only object that I think should get the functionality though. Daemonsets are another example of one that could use atomic configmap/secret functionality.

I guess raw ReplicaSet also could use the same treatment. So maybe the change doesnt' belong in Deployments at all, but instead at ReplicaSet creation?

I do believe it belongs in K8s as its a very common pattern that is very tedious. Being straight forward and tedious, its better to make a machine do it then a person. :)

rata · 2017-03-15T19:38:56Z

On Wed, Mar 15, 2017 at 11:35:00AM -0700, kfox1111 wrote: Fair enough .Maybe it was done for reasons beyond just usability. I don't know all the history of it. It did make it much more easy for users to do the common thing of the workflow of upgrading thing from X to Y though. But in its current state, its kind of incomplete and not atomic.

Well, if you leave the "usability" of creating a new configmap aside, then this is almost solved with the GC that will delete orphaned objects. Am I missing something? Just a new configmap/secret and update the deployment. This has what you call "atomic" (I think immutable it's more appropriate) is solved and no objects are floating arround. Also, upgrading daemon sets I think is treated in a different discussion. Have you looked for that? (sorry, don't have it handy now)

kfox1111 · 2017-03-15T19:51:27Z

Immutable might be a better name for it. atomic was just the first thing that occurred to me.

I have looked at the daemonset update discussion a bit. But I don't think they have handled the immutable configmap case there either.

Optimizing the workflow is what I would like to do. In Deployments, you don't have to worry about the orchestration around, create RS-1, create RS-2, then garbage collect as needed. This is the same kind of thing. The user just deals with foo-configmap, rather then foo-configmap-1/foo-configmap-2 and the system automatically tacks on the -1/-2 as appropriate. There's a lot of prior art in k8s for this pattern/behavior.

I don't think its a large amount of code to write, or a burden on the k8s team to maintain once its written, but does eliminate a bunch of busy work for users. So I think its worth doing.

As it stands now, I feel like we're optimizing for (IMO) the antipattern of letting RS's members config drift. Users will tend to do what's easy, and its easier to do things that may break in unexpected ways on them. I'd rather we provide the tools to make it easy for them to do the thing that has the least chance of surprise/breakage.

rata · 2017-03-15T20:31:24Z

On Wed, Mar 15, 2017 at 12:51:56PM -0700, kfox1111 wrote: Immutable might be a better name for it. atomic was just the first thing that occurred to me. I have looked at the daemonset update discussion a bit. But I don't think they have handled the immutable configmap case there either. Optimizing the workflow is what I would like to do. In Deployments, you don't have to worry about the orchestration around, create RS-1, create RS-2, then garbage collect as needed. This is the same kind of thing. The user just deals with foo-configmap, rather then foo-configmap-1/foo-configmap-2 and the system automatically tacks on the -1/-2 as appropriate. There's a lot of prior art in k8s for this pattern/behavior. I don't think its a large amount of code to write, or a burden on the k8s team to maintain once its written, but does eliminate a bunch of busy work for users. So I think its worth doing.

The proper way to approach it should be via a proposal in the community repo, AFAIK. So, please, do it if you want/have the time! =)

As it stands now, I feel like we're optimizing for (IMO) the antipattern of letting RS's members config drift. Users will tend to do what's easy, and its easier to do things that may break in unexpected ways on them. I'd rather we provide the tools to make it easy for them to do the thing that has the least chance of surprise/breakage.

Why do you think it is easy to edit a configmap and be done with it? If you edit it, the change will appear on the container but usually no tool that I know of also watches that and auto-reloads. I think that *today* the easiest thing to do is just to create a new configmap

kfox1111 · 2017-03-15T21:12:38Z

I can do a proposal. is there a link to how that process works? I thought the feature request was the way that was handled.

today, that functionality is correct, from a pod level but not a ReplicaSet level. If the pod is ever recreated, or a RS is scaled, the new pod behaves suddenly differently then its brethren, based on an event that may have occurred a long time ago but didn't apply to the others. I don't want pods in a single ReplicaSet to behave differently from each other depending on when each pod was scheduled. Nor do I want a rollback of Deployment to have a mix of old config and new config'ed pods in the same RS. This is a recipe for hard to debug problems. Like we both have agreed, you can manually do that today. But, its a lot of work ensure that inconsistency doesn't happen, and that can cause errors to creep in as humans skipping a step by mistake happen. We should make it easy for the user to be able to do the right thing and get an expected/consistent result.

rata · 2017-03-15T21:25:46Z

On Wed, Mar 15, 2017 at 02:13:11PM -0700, kfox1111 wrote: I can do a proposal. is there a link to how that process works? I thought the feature request was the way that was handled.

Well, is the right thing to do if you are planning to code it. Was that your plan? Sorry, maybe I misunderstood. But if you have the time/will, it will be awesome and I can try to help :)

today, that functionality is correct, from a pod level but not a ReplicaSet level. If the pod is ever recreated, or a RS is scaled, the new pod behaves suddenly differently then its brethren, based on an event that may have

Yes and no. That can happen, of course. But that will only happen if you modify the configmap. And in that case, *no* change is done (if no auto-reload) until pods are re-scheduled. So, you do the change and no change happens. From a semantic POV, I think configmaps should be immutable (for those problems) and a new one should be used and create a new deployment. I think we both agree on this. I think we don't agree (I'm not fully convinced, just that) if the user should be exposed to that or not. But going back to the point you said:

Users will tend to do what's easy, and its easier to do things that may break in unexpected ways on them

And I don't think to change the configmap is the easiest way to do it. Because doing it will not change *anything* on the running app if the app doesn't reload the configuration. So, it's not easy to trigger a configuration change like that, IMHO. Am I missing something?

rata · 2017-03-15T21:34:59Z

@kfox1111 ohh, I was on my phone and didn't notice the link @liggitt shown. This is exactly what was proposed in that link: #31701

@kfox1111 see the discussion there, please. It seems it was dismissed for using GC in the last comment, but a lengthy reasoning seems to be through the PR comments (and on the file also): #31701 (comment)

I guess that address this issue for the moment and should be closed, then?

kfox1111 · 2017-03-15T21:36:18Z

Oh. are you saying a PR, not a spec kind of thing?

Ok, I think the confusion is on definition of "app". I think your defining app to be "running pod". I'm defining app to be something akin to Deployment. An app in my world is cloud native, pods come and go automatically as the system autoscales up and down. The members of a ReplicaSet should be atomic through time no matter what actions are preformed scaling, rolling back, machines dieing and pods getting evicted, etc. That is not handled today without a lot of manual work on behalf of the user.

kfox1111 · 2017-03-15T22:05:32Z

I think the difference between this request and the other is around snapshotting and ownership.

I want to own the main configmap and share the one configmap amongst multiple deployments/daemonsets. If I delete all the deployments/daemonsets referencing it, the configmap should stay. At the time of launching one of those, no further access to the main configmap is ever made again. I should even be able to delete the configmap and the atomic configmap references in existence should still continue to work, as they have a snapshot of the state. The snapshots are garbage collected, not the main configmap.

I think this behavior will deal better with package managers such as Helm. The owner of the root configmap is Helm in this case, and wouldn't be deleted out from under the owner.

A separate activity could trigger a deployment redeploy on that root configmap changing, but is totally orthoganal to this feature. Helm could do that itself I think.

I think the ownerReference in the other PR as referenced in #31701 (comment) Doesn't have the same GC same behavior as I'm describing above? Its assuming if no ownerReferences are there, then its free to GC?

It seems like the use case was de'prioritized in the other PR. Is this due to lack of manpower or desire to tackle the issue at the moment? Because I think this will be important to Kolla-Kubernetes and other projects and would be willing to add manpower if that's the issue.

rata · 2017-03-15T22:31:50Z

On Wed, Mar 15, 2017 at 02:36:48PM -0700, kfox1111 wrote: Oh. are you saying a PR, not a spec kind of thing?

I'm saying a proposal just like the linked :)

Ok, I think the confusion is on definition of "app". I think your defining app to be "running pod". I'm defining app to be something akin to Deployment. An

Doesn't matter, what I'm trying to say is that, IMHO, changing a configmap is not an easy way to update the configuration, as you said. Because it just won't happen, configuration won't be updated for MOST apps. And that is were I disagree that, as that is the easiest thing to do, that is what people will do.

rata · 2017-03-15T22:37:38Z

On Wed, Mar 15, 2017 at 03:06:30PM -0700, kfox1111 wrote: I think the difference between this request and the other is around snapshotting and ownership. I want to own the main configmap and share the one configmap amongst multiple deployments/daemonsets. If I delete all the deployments/daemonsets referencing it, the configmap should stay. At the time of launching one of those, no further access to the main configmap is ever made again. I should even be able to delete the configmap and the atomic configmap references in existence should still continue to work, as they have a snapshot of the state. The snapshots are garbage collected, not the main configmap. I think this behavior will deal better with package managers such as Helm. The owner of the root configmap is Helm in this case, and wouldn't be deleted out from under the owner. A separate activity could trigger a deployment redeploy on that root configmap changing, but is totally orthoganal to this feature. Helm could do that itself I think. I think the ownerReference in the other PR as referenced in #31701 (comment) Doesn't have the same GC same behavior as I'm describing above? Its assuming if no ownerReferences are there, then its free to GC?

No. It is configurable if some object is orphan it should be deleted or not, so you can have configmaps independentntly of a deployment if you want. IIUC.

It seems like the use case was de'prioritized in the other PR. Is this due to lack of manpower or desire to tackle the issue at the moment? Because I think this will be important to Kolla-Kubernetes and other projects and would be willing to add manpower if that's the issue.

Not man power but semantics and how the problem should be approached, as far as I understand (take into account that I've contributed to k8s in my spare time only). A proposal was created, so someone was willing to implement it if the interface was agreed, but it was not. It was decided it will be solved in some other way, not exactly equivalent (of course). And as far as I understand, what you propose is really similar to the proposal. But feel free to keep this issue open if you think otherwise (please don't take my word as a last word, my opinion is only mine :))

technosophos · 2017-03-16T18:00:01Z

Helm currently does this manually by versioning each name of a configmap. Obviously, we would benefit from having this implemented lower down the stack. @liggitt @smarterclayton how does OpenShift handle this kind of thing?

bgrant0607 · 2017-03-21T06:37:25Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic Configmaps/Secrets #43112

Atomic Configmaps/Secrets #43112

kfox1111 commented Mar 15, 2017

kfox1111 commented Mar 15, 2017

mtaufen commented Mar 15, 2017

liggitt commented Mar 15, 2017

kfox1111 commented Mar 15, 2017

kfox1111 commented Mar 15, 2017

rata commented Mar 15, 2017

kfox1111 commented Mar 15, 2017

rata commented Mar 15, 2017 via email

kfox1111 commented Mar 15, 2017

rata commented Mar 15, 2017 via email

kfox1111 commented Mar 15, 2017

rata commented Mar 15, 2017 via email

rata commented Mar 15, 2017

kfox1111 commented Mar 15, 2017

kfox1111 commented Mar 15, 2017

rata commented Mar 15, 2017 via email

rata commented Mar 15, 2017 via email

technosophos commented Mar 16, 2017

bgrant0607 commented Mar 21, 2017

smarterclayton commented Apr 23, 2017 via email

pmorie commented May 5, 2017

kfox1111 commented Jul 5, 2017

bgrant0607 commented Jul 6, 2017

Atomic Configmaps/Secrets #43112

Atomic Configmaps/Secrets #43112

Comments

kfox1111 commented Mar 15, 2017

kfox1111 commented Mar 15, 2017

mtaufen commented Mar 15, 2017

liggitt commented Mar 15, 2017

kfox1111 commented Mar 15, 2017

kfox1111 commented Mar 15, 2017

rata commented Mar 15, 2017

kfox1111 commented Mar 15, 2017

rata commented Mar 15, 2017 via email

kfox1111 commented Mar 15, 2017

rata commented Mar 15, 2017 via email

kfox1111 commented Mar 15, 2017

rata commented Mar 15, 2017 via email

rata commented Mar 15, 2017

kfox1111 commented Mar 15, 2017

kfox1111 commented Mar 15, 2017

rata commented Mar 15, 2017 via email

rata commented Mar 15, 2017 via email

technosophos commented Mar 16, 2017

bgrant0607 commented Mar 21, 2017

smarterclayton commented Apr 23, 2017 via email

pmorie commented May 5, 2017

kfox1111 commented Jul 5, 2017

bgrant0607 commented Jul 6, 2017