-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a proposal for ConfigMap/Secret garbage collection #1163
Conversation
// (such as Jobs). | ||
// This can only be specified at creation time. | ||
// Default to false. | ||
ControllersAsOwners *bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels yucky and introduces undesirable coupling in the system. How will you deal with extension resources? This won't work for extension controllers, which is a red flag.
i don't really like the idea of controllers being special on config maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Clayton here. If we add field to object that means every time we'll be wanting to extend this mechanism with new resources that do not have any controller-controlled history we'll be changing APIs of those. Why not just having annotation, which would make an object eligible for GC. This will work fine with every type of object, including CRDs, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was even thinking this could be a nice way for users to get their resources garbage collected if we combine this annotation with owerRef on each object. All we'd need to do is to enable users setting them easily. I doubt there's such a way currently. Obviously, where possible the controllers should handle those situation automatically. Then adding new resources should be pretty straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just having annotation, which would make an object eligible for GC.
That sounds interesting. In the past, we've regretted starting things out as annotations if they eventually were destined to be fields. But maybe this is a case where it will always make sense as an annotation, like the ones that Deployment puts on ReplicaSet.
After all, this flag is not really about affecting the ConfigMap itself. It's just a signal to other entities, affecting how they treat the ConfigMap. That sounds more like an annotation than a field to me.
@janetkuo What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton Extension controllers would need to implement the same logic mentioned in the "controller" section in order to support this feature. The assumption here is that CMs are treated immutable, and that users would keep creating new CMs and then update references in controller pod template. This proposal tries to help users garbage collect those kinds of CMs.
@soltysh @enisoc What does such annotation look like? If it's just a boolean value, it does not affect the design much. We need to allow users to opt in and out of CM GC, and setting a field seems more straightforward than an annotation. In order to make this bit immutable, making it a field is easier because it can be protected by validation (you cannot make an annotation immutable).
By "combining this annotation with owerRef on each object", did you mean switching this field to an annotation, or having the garbage collector check both the annotation and ownerRef to decide if this CM should be GC'd or not? If the former, the design is the same; if the latter, we'd need to change the behavior of the garbage collector (which would likely be non backward compatible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is mostly needed for ConfigMap and Secret, which are "consumed" by other resources.
I don't like the name of the field, but the idea is that referrers should take ownership -- the resource should be GCed when no longer needed. It's actually a hint to consumers. It could be an annotation, but that wouldn't be very discoverable.
If we had split out the PodTemplate from controllers and used it for controller history universally, this would be easier, and could work OOTB for extensions.
As it is, this does work for extensions, but requires extension implementors to do some work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I believe this hint would be useful even without the history issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's only a hint (true means it's ok to do it, but doesn't guarantee anything), and the thing it allows is for the object to be deleted when it's no longer referenced, how about AllowRefCount
?
When a controller API object (with or without history) exists, the controller | ||
will do the following: | ||
|
||
1. Find all referenced ConfigMaps/Secrets of the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat of a red flag in that it gives those controllers access to all secrets in the system.
It would be better to have a controller that copies config maps and then makes mutations to controllers, without controllers being aware of it (for that particular problem)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea to keep CM/Secret mutation outside the workload controllers if possible. It would be nice, for example, if we could have a controller that does nothing but watch ControllerRevision objects and add ownerRefs to the CMs/Secrets that they reference (if they are opted in to CM GC). However, since ControllerRevision currently only contains opaque data, we would need to modify each controller (DS/StS) to put in a list of CMs/Secrets that each ControllerRevision references.
We're also planning to consult sig-auth to see how concerned they are about such permissions, given that these controllers can already create Pods, which transitively gives them access to (at least read) all Secrets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not allowed to give controllers access to secrets in the system, we probably cannot use ownerRefs for secret GC at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the docs, if you can create a Pod that uses the secret you can access the secret. This seems transitive through controllers today. If I can create a Deployment that creates Pods that can access the secret I can access the secret. There is nothing to stop a user from creating a Deployment that reads a secret and serves it via http. Giving a controller the ability to mutate the secret adds an additional risk to integrity (of the secret itself not the data its protecting), but you could (in some cases) address this by creating a secret per Deployment. We probably don't want to give controllers delete permissions. That defeats the purpose of the GC, and is an unnecessary risk to availability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We deliberately don't enforce hard referential integrity.
kubectl currently doesn't create/update resources in any particular order, and controllers may observe changes out of order even if it did.
However, I would like controllers, such as Deployment and ReplicaSet, to behave gracefully in the absence of dependencies, such as ConfigMap and Secret. For example, possibly they shouldn't create lots of pods that can't function.
If a resource can't properly function due to a nonexistent dependent resource, I recommend reporting that as the reason the resource isn't fully functional in status.
This would require at least being able to confirm the resource's existence or lack thereof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're concerned about permissions, we could create a subresource for updating ownerReferences, finalizers, and possibly other metadata in order to facilitate lifecycle management without privilege escalation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds interesting... filed kubernetes/kubernetes#54269.
**Cons:** | ||
|
||
* It becomes tricky for controllers (e.g. Deployments) to compare its pod | ||
template against its history (e.g. ReplicaSets) to find current history |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have something mutate the deployment - which avoids this problem. Alternatively, make the deployment tolerate refinement of its ReplicaSets by admission and initializers (this is inevitable - image resolution is one concrete example we'll get asked to do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If something outside mutates the Deployment, will we need a way to prevent the Deployment controller from seeing the original CM ref before it's resolved to point to a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the trigger controller (which is what I would call the thing acting on the resource) is the one who is resolving refs. That means the none of the existing controllers change.
I partially am advocating for this approach because I have direct familiarity with it from several years of working on the openshift image triggering mechanism. The mechanism is:
- a resource exposes a "latest state value" (an image stream tag, which points to a concrete image by digest)
- a controller that reconciles the latest state value with the spec of a deployment
https://github.com/openshift/origin/blob/master/pkg/image/controller/trigger/image_trigger_controller.go
https://github.com/openshift/origin/blob/master/pkg/image/trigger/annotations/annotations.go
https://github.com/openshift/origin/blob/master/pkg/oc/cli/cmd/set/triggers.go#L809
The user sets an annotation on any object (deployment, stateful set, daemon set, replica set) pointing to an image stream tag (roughly foo/bar:latest
which resolves to sha256:abc
). The controller is level driven and ensures that all resources with that annotation have the correct resolved value in their config. The mechanism can be paused by updating the annotation. Otherwise, any watched resource can be mutated.
The key advantages are that this can be extended to any object (it could be used for an etcd operator, a new workload controller, etc) just by adding the annotation. The downside is that the object is under automatic control. However, this is similar to the autoscaler - a controller is programming down onto the object and the object is unaware of its programming.
A similar model for secret and config maps would be:
- support an annotation on the deployment that indicates it wishes to receive the latest copy of a configmap, and give a volume name (or jsonpath, or other target) to replace
- the controller watches the configmap and ensures
- a versioned configmap exists for every targeted configmap
- every resource referencing that versioned configmap is pointing to the latest copy of the configmap
Rollback is accomplished by reverting the old configmap, pausing the trigger on a per resource basis, or removing the annotation. Same advantage exists here - no changes to existing controllers is required, all logic is centralized, and all extension / CRD objects can benefit from this.
The per controller approach has merits, but I think it's biggest weakness is that it requires a large amount of duplicate code that can be better centralized. Annotations are annoying - but if necessary we could introduce future changes that make this a formal field on an object (it's unlikely a common schema exists), or we could place it on a config map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The controller model for this is roughly what is described by @janetkuo down below, just automated. I believe it's a natural next step to automate (we had been planning on doing just that for other reasons), which makes me lean towards GC mechanisms that align with that model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does GC work in this case? The solution @janetkuo is proposing ensures that the owner references of the config objects point to the objects that implement history for a controller. As long as the config is within the revision history limit you can be assured that it still exists and that you can roll back to it. Also, how do you ensure that the config injection controller updates the configuration of the ReplicaSet prior to the ReplicaSet controller materializing Pods that point to a nil configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @kow3ns. Whether the references are updated by a controller or by a non-continuously running client, the GC implementation is orthogonal and the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating the image field to point at the digest in the deployment pod template is analogous to updating references to ConfigMaps and Secrets.
template against its history (e.g. ReplicaSets) to find current history | ||
(ReplicaSet). | ||
* Rollbacks trigger updates to original ConfigMaps. All consumers of a | ||
ConfigMap/Secret are then forced to do rollbacks together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section is the right approach - lift the responsibility for cloning CM/S up into a top level controller and have it selectively mutate top level controllers as requested. Define a linkage from a controller object back to a cm template, and make the new controller able to work with many different API types (possibly by using unstructured json path on annotations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the goals of CM rollout is that it should become possible to roll back a Deployment, and have it revert to the exact, known-good combination of CM revision and Pod Template.
Once a Deployment is rolled back, how will the top-level "CM update" controller know it shouldn't bump it back to the newest CM revision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to turn off CM update if you want to roll back. You were going to have to do that anyway, because the controller for cm update could bump you at any time resulting in a new deployment. If you want to roll the state back to a previous version, you either revert the config (in which case this approach works the same as before) or you disable trigger of the configmap and then rollback.
Another advantage of this approach - the deployment is exactly exportable (to a known good state) whereas in the other approach the user would have to export the deployment and the replica set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit of the model that @janetkuo proposes is that the entire configuration is encapsulated in the config map itself. For any of the controllers, if you point them to a previous config map or secret, they will detect the identical previous state, and perform a minimally disruptive rollback. If you export the Deployment, with a reference to the config map, and reapply it, you will get the same behavior. Also, both approaches can be used in conjunction with kubectl rollback. I think the proposed approach is actually more flexible.
From Declarative application management in Kubernetes#ConfigMap and Secret updates, Issues related to Declarative App Management#Facilitate ConfigMap rollouts / management, and previous discussions in kubernetes/kubernetes#22368, going forward we plan to facilitate configmap and secret rollouts by treating CM/S as immutable, and have client-side tooling to generate unique CM/S names (implemented in kubernetes/kubernetes#49961), update references in the controller pod templates to use the new CM/S, and then trigger a rolling update on the deployment. The missing piece for now is the GC of those CM/S. That said, auto updates of CM/S is not something we plan to support. |
Current design can be implemented in a backward compatible way. Therefore, this should not block workloads controller GA (1.9). To finalize configmap/secret GC design, we also need more concrete Declarative App Management design. |
cc @yliaog |
if this is still active, it should be converted to a KEP in the enhancements repo /close |
@liggitt: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Original doc: https://docs.google.com/document/d/1RkB1HEz8hWSmHQp8_spDFHj498eARBz4EjPvS2zVfAg/edit (join https://groups.google.com/forum/#!forum/kubernetes-sig-apps for access)
ref kubernetes/kubernetes#22368
@kubernetes/sig-apps-api-reviews @erictune