-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Initializers proposal #17305
WIP: Initializers proposal #17305
Conversation
This is very much a WIP, but I saw @davidopp asked for a write-up, and I had an hour this evening to write something up quickly, but I recognize it needs more refining. Even so, I think its a good basic description on the desired pattern/use case, and we can iterate on the actual API type names and REST API syntax. Major thing to think through now is the overall system flow. |
Note: For the scheduler, I think we would want the binding action to fail on non initialized pods to maintain backward compatibility with older schedulers. Latest code would be initializer aware. |
|
||
The primary mechanism to achieve this use-case today is to write an "admission controller" which is a server-side filter that observes API server actions, and provides the ability for an ordered set of filters to observe, modify, accept, or reject an object from being persisted into the API server. By providing the ability to give an immediate yes/no response to the user, it provides a good place for hooks for use-cases that require an immediate user response. For example, the user is able to be told that they are not able to create an object in a namespace without first creating that namespace. Another common example is quota enforcement where the user is immediately forbidden from exceeding any quota constraints. In essence, its a synchronous visitor pattern. | ||
|
||
While powerful, admission controllers have drawbacks. The major drawback is that it requires a recompilation and restart of the API server to introduce new behavior. For those scenarios where an immediate yes/no response was not required for the user experience, it would be ideal if the system could support an asynchronous visitor pattern that would allow controllers external to the API server to observe, and modify a resource prior to it becoming 'active' in the system. This would allow a new type of controller to be authored in the system that can initialize a resource similar to how other controllers watch and act upon already initialized objects. There are many scenarios where this would be appropriate. For example, a pod auto-sizer could be deployed separate from the API server that watches for incoming pods. The pod-autosizer should be able to assign compute resource requests prior to the pod being scheduled. If authored as an "initializer", it would allow anyone to bring their own logic to the cluster for how sizing occurs without needing to recompile or change the core components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pod autosizer probably wasn't a great example because we already have this, as a synchronous admission controller: https://github.com/kubernetes/kubernetes/blob/master/docs/admin/admission-controllers.md#initialresources-experimental
The plan for making it customizable is that it will fetch predictions via an API, so you replace the component that provides the predictions. (Today the prediction logic is compiled into the admission controller.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took pick too much on the example choice, this was really thrown together quickly to give you a document to refer to when discussing the concept.
Have another alternate example use case that would better fit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that said, I think auto sizer could have been implemented differently if something like what is described here existed. Admission control was used because that is all that existed. Chicken/egg scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a perfectly fine example. I would just introduce it as: we could turn the existing initial resources admission controller into an initializer.
One other note, put this out there as a point of reference when discussing topic in mid to long term. Right now, implementation of a proposal may not be a priority for me yet in Kube 1.2 timeframe. TBD. It would be good to collect use cases though and have a definitive PR to reference. |
This is a great writeup. I have a few questions/comments
|
Original discussion was in #3585 |
No. Admission controllers would run like they do now and intercept all incoming requests. So let's say an administrator has configured kind=pod with initializer name=foo. User creates a pod A by POST to API Server. Admission control as it does today intercepts that request with verb=CREATE and verifies that pod A's namespace exists, if not it rejects the request. The last admission control step would be to check if pod A has any configured initializers. It does, so it toggles pod A's ObjectMeta.Initialized to false, and sets its ObjectMeta.Initializers list to include 'foo'. The API server persists the object. At this point, the initialization choreography has begun. When 'foo' completes its work, and the object transitions to initialize, I think I would want admission control to be updated again but with a verb=UPDATE. That was my thought at least, may warrant some more thought.
Yes, it's entirely possible that initializers go down and don't see the object pending initialization. Its reasonable to then want to purge them if they are stuck for extended periods of time. This controller would provide that future intelligence.
None - I think it complicates things. Maybe others feel differently. I imagine the set of initializers to be very small, and that they change really infrequently.
I imagine the controller that transitions ObjectMeta.Initialized=false to true to be in controller-manager. Otherwise, I see no reason why new initializers are not just deployed themselves as pods that work against the kube-apiserver that have a valid cert to talk back. An RC makes sure the pod is running somewhere. EDITED - they do not require to be given 'super access', basically policy could scope an initializer to a select set of group/kinds. |
I see. But aren't there some admission controllers you really want to run after all initializers? For example, let's say you have an "initial resources" initializer like you describe in the doc, but you want an admission controller to enforce quota. It seems that it would have to run after the initializer, since there are no requests until after the initializer has run.
My point is that there are various scenarios where they do need to know about one another. Some rules for being "well-behaved" cannot be defined in isolation, only on the context of which other initializers run. I gave a few examples in (3)(ii). Being small or changing infrequently is orthogonal. You can say the admin must sort this out, which is fine, my point is just that it can't be a free-for-all. |
@davidopp - existing admission controllers would need to be evaluated individually, but in the quota case, its capable of handling pod resource requests updates so the behavior would be fine. My current thought is that no initializer should wait to run last. If coordination was needed across a set of them, then I assume that set is deployed together and knows about each other. If one initializer knows it must run after another, I assume that that initializer a) has verified that the other initializer was actually registered and maybe logs to journal if it wasn't for admin attention, and b) it just does nothing if the other never does its thing, which would cause the controller I discussed to basically time-out the object and delete it altogether. |
Ah, so IIUC admission controllers could run on each I can see how this can work in the quota case, but I'm not sure this works in general (as a substitute for running them after initializers run). For example, say you want to enforce a rule that all pods must have field X set. You can only check this after all initializers have run. Checking it on object creation (before initializers have run) or after each initializer runs doesn't work. |
Is the enforcement of the rule in validation code or somewhere else? I do not think admission controllers would run on
I think they would run on the final step that induces strict validation (rather than just valid ObjectMeta).
|
@davidopp - your questions are good, when I have more time, I will have to document the flow between admission control and initializers similar to what I had to do with namespace finalizers here: |
|
||
**NOTE: It's possible we could support this without introduction of a new type, maybe via the ConfigData resource, needs thought and input from community** | ||
|
||
**ObjectMeta.Initializers** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to pollute ObjectMeta with initializers, finalizers, existence dependencies, controllerRef, etc. I was thinking of a new top-level clause, perhaps called dependencies
. If not at top level, I'd at least group these fields in metadata under it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgrant0607 - just to make sure I understand, objects would have a new DependencyMeta
that is embedded? That's fine, the idea is just that we need a field to record who still has work left to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concretely, using Pod as an example:
type Pod struct {
unversioned.TypeMeta `json:",inline"`
ObjectMeta `json:"metadata,omitempty"`
Dependencies ObjectDependencies `json:"dependencies,omitempty"`
Spec PodSpec `json:"spec,omitempty"`
Status PodStatus `json:"status,omitempty"`
}
type ObjectDependencies struct {
Initializers map[string]string `json:"initializers,omitempty"`
Finalizers map[string]string `json:"finalizers,omitempty"`
ExistenceDependencies []ObjectReference `json:"existenceDependencies,omitempty"`
ControllerRef *ObjectReference `json:"controllerRef,omitempty"`
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random observation: The term "finalizer" got redefined somewhere between being proposed (#3586) and being implemented. It initially meant what we now call initializer. The current use of the terms is completely reasonable, this just threw me a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to more loosely structure dependencies - they are
closer to annotations to me, and I'd like to be able to attach limited
metadata with some dependencies (for instance, representing triggers that
need an object reference and a few bits of data). We can serialize it as
an annotation, but then generic clients lose the directed edge relationship.
On Nov 16, 2015, at 7:42 PM, Brian Grant notifications@github.com wrote:
In docs/proposals/initializers.md
#17305 (comment):
+While powerful, admission controllers have drawbacks. The major drawback is that it requires a recompilation and restart of the API server to introduce new behavior. For those scenarios where an immediate yes/no response was not required for the user experience, it would be ideal if the system could support an asynchronous visitor pattern that would allow controllers external to the API server to observe, and modify a resource prior to it becoming 'active' in the system. This would allow a new type of controller to be authored in the system that can initialize a resource similar to how other controllers watch and act upon already initialized objects. There are many scenarios where this would be appropriate. For example, a pod auto-sizer could be deployed separate from the API server that watches for incoming pods. The pod-autosizer should be able to assign compute resource requests prior to the pod being scheduled. If authored as !
an "
initializer", it would allow anyone to bring their own logic to the
cluster for how sizing occurs without needing to recompile or change
the core components.
+
+## Goals
+
+1. Enable asynchronous controller pattern (i.e. initializers) as a viable alternative to existing admission control to support loose coupling of components where synchronous user response is less critical.
+2. Enable registration of a set of initializers to an API resource that must be able to act on an object to support its initialization.
+
+## Model changes
+
+Initializer
+
+A new type introduced into the API that allows the registration of an initializer with an associated group and kind in the API. This allows dynamic registration of initializers without a forced restart or recompilation of the server.
+
+NOTE: It's possible we could support this without introduction of a new type, maybe via the ConfigData resource, needs thought and input from community
+
+ObjectMeta.Initializers
Concretely, using Pod as an example:
type Pod struct {
unversioned.TypeMeta json:",inline"
ObjectMeta json:"metadata,omitempty"
Dependencies ObjectDependencies json:"dependencies,omitempty"
Spec PodSpec json:"spec,omitempty"
Status PodStatus json:"status,omitempty"
}
type ObjectDependencies struct {
Initializers map[string]string json:"initializers,omitempty"
Finalizers map[string]string json:"finalizers,omitempty"
ExistenceDependencies []ObjectReference
json:"existenceDependencies,omitempty"
ControllerRef *ObjectReference json:"controllerRef,omitempty"
...
}
—
Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/17305/files#r45007826.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for why maps for initializers and finalizers, see the annotations used in the multi-scheduler proposal:
https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/multiple-schedulers.md#where-to-start-initial-design
Key is ad hoc (e.g., "scheduler") or field name (e.g., spec.nodeName) and value is an identifier for the responsible component that will remove the initializer/finalizer.
@smarterclayton Could you make a concrete proposal or example?
I'd imagine using the maps I proposed for delegating responsibility and gating activation or deletion (depending on whether initializer or finalizer), and additional configuration could be dropped into annotations or an improved form of extension (#12226). OR the additional policy information could be specified elsewhere along the lines of #17097 or #17324
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgrant0607 - I have debated having something more akin to an InitializerCondition
where the name
is ad-hoc (e.g. "scheduler"), it has a flag to say if it's done (eg. "Complete") and potentially a Message to report why its not yet done to aid in error reporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the Condition-like approach.
I think we also concluded that performing the initializers in order makes sense, similar to how admission controllers are in order.
|
||
It introspects the set of registered *Initializers*, and uses the *Discovery API* and a generic API client to know what resource types to WATCH that are not yet initialized. It then creates a set of watchers that look for objects whose list of "initializers" is empty, but the object is not yet "initialized". | ||
|
||
It then performs an "/api/initialize" action for each resource that meets the required criteria. If the API action succeeded, the object is now initialized for use by the system. If the API action fails, it's considered to be in an error state that is unrecoverable. When this occurs, the controller will permanently delete the resource from the API server and log an event because the object would forever fail strict validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting the object seems pretty hostile to users. Assume there's an errant initializer, what is the user experience: I created a Pod, it was accepted, but it never started and when I went to look for it, it was gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin - is time the factor here? if an initializer is errant for pods, its possible it would prevent all pods from ever being created. basically, this was a catch-all for what to do if the initialized object does not pass strict validation. seems like information in the event listing is a fine place to start debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@derekwaynecarr You want to defer vaildation until after initialization? Another approach would be to only allow optional fields to be initialized asynchronously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgrant0607 - we do not make that same statement with admission control today, so I didn't want to create a limitation with initializers that may prohibit their future adoption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't the json marshaller on the client side demand that required fields be filled in? That's true for proto.
A trickier case might be optional but immutable. Those we'd have to make write-once.
But, yes, validation on a field can't be run until the field has been initialized.
A quick status update:
I am targeting an updated draft to be completed by Jan 4 (hopefully sooner) |
|
||
Objects like a ReplicationController should treat a non-initialized pod as a pod, meaning it should not create new ones to satisfy its desired replica set. | ||
|
||
Objects like the Scheduler should treat a non-initialized pod as a pod not yet ready to be scheduled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal was to not return not-fully-initialized resources in list/watch by default, with a query param to request all/uninitialized.
A longer-term way to deal with pre-initialization and post-deletion could be to dissociate the resource name from the resource, by storing the resource by uid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about get? Probably has a similar impact if a naive client is polling
for the existence of a specific service by name (kube-dns) in a loop - when
we introduce this path, that client is broken because the service isn't
initialized yet.
On Thu, Jan 7, 2016 at 9:33 PM, Brian Grant notifications@github.com
wrote:
In docs/proposals/initializers.md
#17305 (comment)
:
+NOTE: I will go back and fill in more details here
+
+## Making the transition to initialized
+
+A new controller is introduced into the system that looks for objects that have no more named "initializers" but are not yet "initialized".
+
+It introspects the set of registered Initializers, and uses the Discovery API and a generic API client to know what resource types to WATCH that are not yet initialized. It then creates a set of watchers that look for objects whose list of "initializers" is empty, but the object is not yet "initialized".
+
+It then performs an "/api/initialize" action for each resource that meets the required criteria. If the API action succeeded, the object is now initialized for use by the system. If the API action fails, it's considered to be in an error state that is unrecoverable. When this occurs, the controller will permanently delete the resource from the API server and log an event because the object would forever fail strict validation.
+
+## Existing controller manager components
+
+Objects like a ReplicationController should treat a non-initialized pod as a pod, meaning it should not create new ones to satisfy its desired replica set.
+
+Objects like the Scheduler should treat a non-initialized pod as a pod not yet ready to be scheduled.The proposal was to not return not-fully-initialized resources in
list/watch by default, with a query param to request all/uninitialized.A longer-term way to deal with pre-initialization and post-deletion could
be to dissociate the resource name from the resource, by storing the
resource by uid.—
Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/17305/files#r49153363.
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") If this message is too spammy, please complain to ixdy. |
Reassigning to myself so I don't lose track of this. |
GCE e2e build/test passed for commit 0afd165. |
@derekwaynecarr I'd like to get this proposal merged. Do you have time to finish the proposal? |
@bgrant0607 - my priority this week is to finish up loose ends around the systemd node spec (#17688) and out of resource killing (#18724) proposals. I will make another pass on this proposal so it captures the collected feedback next week. |
Automatic merge from submit-queue Adding a proposal for server-side cascading deletion @lavalamp @bgrant0607 cc @kubernetes/sig-api-machinery @derekwaynecarr ref #12143 #19054 #17305 (initializer proposal)
This PR hasn't been active in 111 days. Feel free to reopen. You can add 'keep-open' label to prevent this from happening again. |
Is this idea dead or just stalled for lack of drive? The topic is re-emerging as part of the cloud-provider excision work, wherein we don't want apiserver to know anything about cloud providers, but we need the cloud-provider-controller to ACK creation/defaulting of resources. |
I think it's so disruptive of a change that no one can get their heads Would need a real API break. On Nov 4, 2016, at 12:56 PM, Tim Hockin notifications@github.com wrote: Is this idea dead or just stalled for lack of drive? The topic is — |
From 10k feet, it's not that different from finalizers though, right? |
That is my feeling. There has to be something simple that serves a lion's An admission controller that reads config and applies initializer names to On Nov 6, 2016 11:12 AM, "David Oppenheimer" notifications@github.com
|
I know @lavalamp is interested in this but I'm not sure how high it is on his priority list. |
Initializers on pods break the pods API (schedulers have to change, and I'm somewhat convinced we need something like initializers (@ncdc's work on On Nov 6, 2016, at 11:19 AM, David Oppenheimer notifications@github.com I know @lavalamp https://github.com/lavalamp is interested in this but — |
I think we're still very interested in this. I thought at Kubecon that @smarterclayton said he could find some people to work on this next quarter? |
I'm currently drafting a proposal for this and externalized admission On Fri, Nov 18, 2016 at 2:38 PM, Daniel Smith notifications@github.com
|
So, I'm trying to get up to speed on initializers/external-adm ... are we saying that the external ADM controllers are:
|
This is a proposal for introducing the concept of an "initializer" into the API.
It is intended to provide a pattern for asynchronous initialization of the object to support the ability for a new class of controller to be authored that can modify an object prior to it becoming fully active or passing strict API validation.
/cc @bgrant0607 @erictune @ncdc @davidopp @lavalamp @smarterclayton