-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: API support for diff'ing desired state #1178
Comments
Bonus points: Add a mechanism to GET just part of an object, such as just the DesiredState or just the CurrentState, such as using URL parameters to specify the filter (?field=DesiredState&field=Metadata). That would reduce overhead, as well as being convenient. |
I'm really not a fan of the PrimaryDesiredState/FallbackDesiredState/DesiredState split. Are we sure this will be enough? Why not a general stack of DesiredState? This will turn into CSS. If we need to deal with the "replica count" issue, this seems to me to be a feature of the config system. It should support a "don't modify" value for certain fields so it doesn't clobber other systems. I am a fan of using different schemas for desired vs. current state. While there is a lot duplicated across these having fields that only apply to one but not the other is really confusing. |
@jbeda We don't need a stack. How about just the bit? I think a systematic approach is better than ad hoc addition of secondary shadow fields, some of which are visible in the API and some of which aren't, sentinel values interpreted as UNSET, and other non-uniform approaches. It's not just replicas, but also resources, fields with default values, perhaps ports, ... If this information were only maintained by config clients, then the addition of new automatically set fields would break existing configurations. |
Another option might be to use the same object for Primary / Desired, but instead make those alterations of the way you fetch an object. So instead of
I base this on the use case of getting the fallback being a fairly uncommon operation that UIs might use - you're trying to get an alternate representation of the object for a certain type of user scenario. |
Is "ID" metadata or primary data? I feel like it's primary, whereas everything else in metadata could be optional (even kind, because in theory you know the kind when you asked for it from the server). |
@smarterclayton Overloading the object/field would make it impossible to specify both some primary and fallback fields in the same POST or PUT. I agree that GET would likely only want to fetch one view, which is why I proposed filters for that purpose. IMO, name, namespace, ID, labels, creationTimestamp, kind, resourceVersion, etc. are all identifying metadata, used to find and manipulate objects, but aren't primary configuration data, nor current operational state. Additionally, once created, name and ID are immutable, so they might as well be left out of desired state. |
From a practical perspective I can't say I've ever seen an API that managed to do this well (or really every tried to model it as part of the standard rest resource). That give me a bit of pause when considering making the resource be a composite of these multiple layers. I agree with the general design point of shadow values, but at the same time they seem much less of a common use case than the simple "hey, I want to write a simple client to introspect the actual value". If most clients don't need shadow values (or can get away with being naive about it), is the extra complexity good or bad? |
For all practical purposes, internally we have 2 types of workload: services and batch (e.g., mapreduce). The overwhelmingly vast majority of services, even simple ones, are managed using declarative configuration. It's a pretty important pattern to support well. How about just the 1 bit per field to indicate primary or not? It's somewhat less expressive, but is simpler and less onerous to implement, and could be emulated in client-side glue to support APIs that aren't compliant. An even more minimal approach would be to allow clients to attach non-identifying key-value metadata to our API objects. The config system could then use that to store the primary field info. We'll likely want that, anyway, though I feel using it for this would create an informal API between automation systems, such as between the config system and auto-scalers. |
Why a named field for JSONBase? Lists have JSONBase because the encode/decode logic requires it. Re: automatically assigned "default" values. Why is it important to know @smarterclayton 'Is "ID" metadata or primary data?' I ma also wrestlng a Re: Labels in JSONBase. If JSONBase describes the REST object, rather than My mind keeps driving towards a metaphor of inodes and dirents. JSONData Brian, can you get a bit more detailed about "1 bite per field" ? How On Thu, Sep 4, 2014 at 4:11 PM, Clayton Coleman notifications@github.com
|
@thockin Why a named field for JSONBase? To more easily segregate it from desired and current state, to filter or select its fields, and to treat the collection of fields uniformly across API objects collectively, even as we add new metadata fields. Why NOT a named field? Anyway, I'm not hard set on this one. Re. operations other than GET of a single object: Maybe we need RESTObjectBase, which includes JSONBase. Re. 1 bit per field: The minimum information that needs to be tracked is which fields should be diff'ed (as mentioned above, keeping track of which ones should not be diff'ed instead is problematic). The representation depends on the complexity of the structures of the types we care the most about, such as resource requests. I suppose it could be as simple as an array of the field names. A map of bools would also work, though there's not much point in specifying the false case, unless we simply wanted to mirror the full DesiredState schema. The configuration mechanism would produce a representation (whichever we choose) of all the fields it has user-provided values for, union that with the representation in the server, and diff the resulting set of fields. |
In the case where we have the three facets of the object's properties (effective desired, user desired, fallback desired), the user desired properties must be nullable. That means that all clients have to deal with that complexity (naive clients will set all of them, maybe, and defeat the purpose of the fallbacks) which adds a burden to client authors. I don't believe it's sufficient (I think you mentioned this in the config proposal) to just do a diff because when values are equal intent is still not preserved. Approaching this from the perspective of encouraging client authors to do the right thing - it should be easy (trivial?) to write a client that conveys user desired intent, and to preserve shadow values. It should be possible to atomically update shadow values and user intent values in the same request. It should be clear to a client author how to map user input into those values (both for the naive "i'm scripting this in curl" and the advanced "i'm building the uber ui for this"). Desired, Fallback, and Effective seem like the same struct viewed differently. They are different representations of the same "object" (but maybe not the same "resource"). By default, GET really wants to show Effective. If I GET then PUT, a RESTful client should not clobber my fallback values. Fallback should (?) be sparse - the attributes listed in Fallback should represent values set in preference to the system defaults. As per the discussion about platform supplied defaults, validation of incoming Desired needs to take Fallback into account. Platform supplied values comprise a hidden fourth struct - the defaults applied by the platform underneath fallback, which are not persisted into objects in the store. |
@smarterclayton To paraphrase what I think you're suggesting:
Is that correct? That works for me. Yes, you're right that diff'ing just the fields currently and/or previously set isn't very robust, and not setting a field in the configuration should typically unset that field in the corresponding system object. However, this implies that modifying primary desired state of config-managed objects through a non-config-aware client will produce changes that will show up in the next diff with configuration, and likely would be clobbered by the next config push. That's Working as Intended, IMO. An automation client that wants changes to stick should be setting fallback values. |
Honestly, I think we should leave all of this out of the API itself. We shouldn't assume there is only one config system. Instead, we should set the patterns that config systems should only tweak/compare the parameters that they know about. Anything else should be left unmolested and blank. If we must put this in the API I would support:
I'm 90% sure that we'll regret having a singleton overlay layer -- we should either do this whole hog (with the complexity and confusion in the API that it implies) or we should leave it for higher layers. |
@jbeda I agree that we should allow for multiple implementations of config systems, but any given deployment should only be managed by one, since there needs to be a single authoritative source of the desired state. I also agree that users/clients that don't want to deal with it shouldn't need to worry about it. I think @smarterclayton's proposal addressed that. What counts as "parameters they know about"? If only explicitly set fields, then how would one unset a field? A separate database of fields that should be diff'ed or not diff'ed could create circular dependencies (if the database were run on k8s), as well as being difficult to keep up to date and consistent. We've considered N>2 config layers before. That would allow sophisticated support for overrides represented as overlays. I see the appeal, but flattening, re-expanding, and otherwise managing the layers can be quite complex, and many use cases wanted to expand a single representation into many flattened objects, which is something that should not be part of the core API. The proposal(s) here is intended to facilitate interoperation of a configuration system and other automated systems that would want to set desired state fields without updating the authoritative revision-controlled configuration, default values and auto-scaling being the canonical examples. It's intended to be independent of the method(s) used by the configuration system of choice to produce the literal API objects, and independent of the method(s) used to determine the default values and automatically updated values. Obviously, more coordination is required in order to make these systems play nicely together -- for example, an auto-scaler would need to be configured and granted authority -- but I think that problem is also separable from this one. I really don't think we need an arbitrary number of layers for this. The ad hoc alternatives I've seen in the past were fragile and not general-purpose, requiring special-case code to figure out which values were set by whom at each translation layer, such as code that hardcodes which fields are set to which default values by the system (e.g., that hostPort is sometimes set from containerPort), or how to interpret particular sentinel values in different fields (e.g., -1 implies the system will choose a value for that field), or which fields are set by a particular generator, or which fields are set by the auto-scaler when another configured object is present, or which fields take precedence over which other fields, or which different fields have to be read vs. the ones set, etc. We really want an approach more robust and more uniform than that. The 2-layer approach would have been sufficient to handle what our users have been doing for the past 10 years. |
I have a couple of thoughts:
Now, that said, I see the value in having consistent defaults across a variety of tools/configs/etc. so I think a reasonable compromise is establishing a const default object that is accessible from the API, but isn't a part of each instance of a type. |
@brendandburns Default values are frequently not fixed values, as in the example where hostPort is set from the containerPort value. I also see a universal default template as being problematic in multi-tenant scenarios. Lack of any default values in the system would exacerbate the problem we're trying to solve, since it would push setting of more values to clients that would have no explicit representation in the configuration source. Without direct API support, what I'd then recommend is storing the primary desired state, before any defaults or automatically set values were applied, in a "database". The new desired state after a configuration update would be diff'ed with the state in the database, rather than with the API objects. FWIW, if we're doing that, the rigorous separation of desired and current state isn't buying us much. With that approach, arbitrary key-value metadata and/or a pass-through storage API (which is also needed for API plug-ins) would be useful so that users wouldn't need to run their own databases, deal with bootstrapping/dependency issues, etc. |
Assuming the original proposal is dead. Already filed more specific issues for changes/features that appear to have more support. |
I regret not having something like this. It's complex, but I still regret it. |
…erry-pick-1108-to-release-4.7 [release-4.7] Bug 2054669: UPSTREAM: 89885: SQUASH: Retry fetching clouds.conf
Broken out from #1007, which was unwieldy.
A configuration system needs to be able to diff a new desired state generated from the configuration with the current desired state registered with the system, in a generic, extensible manner, in order to determine which objects require updates (as well as potentially what kinds of updates). In order to do this, it needs uniform structure across all API objects, clean separation of input and output values, and clean differentiation of user-provided values and automatically determined values, including default values.
Currently, our API objects contain:
In JSONBase, only ID is explicitly provided by the client and only ResourceVersion is mutable.
In Pod, DesiredState and CurrentState use the same schema, mixing input and output fields (Manifest, RestartPolicy, Host, HostIP, PodIP, Info).
Service lacks DesiredState and CurrentState.
ReplicationController lacks CurrentState (though that may change with #736).
Proposals:
The text was updated successfully, but these errors were encountered: