Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we get consistent defaulting for API? #1502

Closed
thockin opened this issue Sep 30, 2014 · 30 comments
Closed

Can we get consistent defaulting for API? #1502

thockin opened this issue Sep 30, 2014 · 30 comments
Assignees
Labels
area/api Indicates an issue on api area. area/app-lifecycle area/usability kind/design Categorizes issue or PR as related to design. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@thockin
Copy link
Member

thockin commented Sep 30, 2014

In #1458, Brendan introduced a pattern for accessing and defaulting values: Rather than actually setting the value in the struct during validation (which we do for some other fields), he wrapped the comparisons in functions that captured the defaulting.

This pattern has a distinct advantage of working in tests, where input validation has not run. It has the distinct disadvantage that not all fields are so self-contained - some set their default based on another field's value.

We should choose a pattern and apply it liberally. I'd like to be consistent as much as possible.

@bgrant0607 bgrant0607 added the area/api Indicates an issue on api area. label Sep 30, 2014
@bgrant0607
Copy link
Member

Other disadvantages of the comparison approach:

  • Clients can't introspect default behavior
  • Creates a subtle API version dependency: Changing default behavior is a breaking API change. If we were to ever change such defaults, we'd need to factor out the code into a version-specific location. Furthermore, if we were to allow objects to survive API version changes (which we probably will eventually), we'd want to record the original default in the object.

@brendandburns
Copy link
Contributor

I'd argue rather strongly for my approach.

The main argument in favor is that it respects the user's request. If we automatically add stuff when it comes into validation, then we will be modifying the thing that the user stores, and they will then be surprised when they see a field they didn't set come back in the response.

wrt to the version dependency, this is easy to add a unit test to validate, just test that the predicate that you expect to match, matches on an API object with the default value (e.g. IsPullAlways(v1beta1.Container{}) == true). And then any change to the behavior will cause a break in the unit test, and any such change will require the user to update the test, which should set off red flags.

Regarding client introspection, this is exactly what documentation is for.

--brendan

@bgrant0607
Copy link
Member

As discussed in #1178, values will be initialized through a variety of means without the user directly specifying them: configuration generators, client tools/libraries (e.g., kubecfg), name uniquification, active controllers (e.g., replication controller), auto-scalers, ... I don't think it buys us much to forbid this one narrow case, and it would result in less transparency. Additionally, the code would need to be duplicated or linked in to every component and client looking at the desired state of the object. That approach has been highly problematic in the past, in my experience.

The track record of people changing the API documenting the changes they made makes me doubt that as an effective mechanism, and the proposals to auto-generate the documentation from the Go types wouldn't solve this problem, either.

Unit tests are a good idea, regardless of the approach we settle on.

@brendandburns
Copy link
Contributor

Hrm, I think we violently agree in the first case.

Rather than doing code substitution anywhere, I'm arguing for having a very
specific interpretation of "" (or whatever default value is for the type)
in the code that handles the field, and treating that as a value as value
with meaning, just like "PullAlways" or whatever.

I think this is preferable to writing code anywhere (client or otherwise)
that looks for defaults and writes in a data value to replace the default
with the "correct" value, for all of the reasons you mention.

On Tue, Sep 30, 2014 at 12:05 PM, bgrant0607 notifications@github.com
wrote:

As discussed in #1178
#1178, values
will be initialized through a variety of means without the user directly
specifying them: configuration generators, client tools/libraries (e.g.,
kubecfg), name uniquification, active controllers (e.g., replication
controller), auto-scalers, ... I don't think it buys us much to forbid this
one narrow case, and it would result in less transparency. Additionally,
the code would need to be duplicated or linked in to every component and
client looking at the desired state of the object. That approach has been
highly problematic in the past, in my experience.

The track record of people changing the API documenting the changes they
made makes me doubt that as an effective mechanism, and the proposals to
auto-generate the documentation from the Go types wouldn't solve this
problem, either.

Unit tests are a good idea, regardless of the approach we settle on.


Reply to this email directly or view it on GitHub
#1502 (comment)
.

@bgrant0607
Copy link
Member

I don't think we are in agreement.

The code that would need to be duplicated is the comparison code.

If the field were actually initialized, then all consumers would see the initialized value and not need to reason about the default. Protocol buffer default values are intended to provide similar behavior, though, AFAICT, Go's json library doesn't support defaults.

@thockin
Copy link
Member Author

thockin commented Oct 1, 2014

I don't think anyone would be confused if they write an object and omit
optional fields, and find those fields populated with default values upon
read-back.

It requires a certain amount of rigor to ALWAYS call validation logic
before using objects in tests. But it also takes some rigor to ALWAYS call
IsFoo(value) functions, rather than compare value == Foo.sum.

Unfortunately Go does not have "auto-settable by JSON but
read-encapsulated" fields.

On Tue, Sep 30, 2014 at 1:32 PM, bgrant0607 notifications@github.com
wrote:

I don't think we are in agreement.

The code that would need to be duplicated is the comparison code.

If the field were actually initialized, then all consumers would see the
initialized value and not need to reason about the default. Protocol buffer
default values are intended to provide similar behavior, though, AFAICT,
Go's json library doesn't support defaults.

Reply to this email directly or view it on GitHub
#1502 (comment)
.

@bgrant0607
Copy link
Member

If our code is structured such that we can only create legal objects when they are submitted through the API, then we should fix that problem.

@thockin
Copy link
Member Author

thockin commented Oct 1, 2014

We either chokepoint early and do fixups (validation) so callers see valid
structures or we abstract access (the PR in question) so callers see valid
fields or we scatter knowledge of default values for myriad fields all over
the place.

I know how much pain the last option causes.
On Sep 30, 2014 6:22 PM, "bgrant0607" notifications@github.com wrote:

If our code is structured such that we can only create legal objects when
they are submitted through the API, then we should fix that problem.

Reply to this email directly or view it on GitHub
#1502 (comment)
.

@bgrant0607
Copy link
Member

Abstracting access across multiple components, multiple usage scenarios (e.g., printing the effective behavior in a UI), multiple API versions, multiple repos (e.g., Openshift), (eventually) multiple languages, etc. would require implementing something automatic, like the proto compiler.

@brendandburns
Copy link
Contributor

I'd like to argue more stridently for my approach. Humans may be able to discern when something was defaulted in, but programs often won't.

Imagine I write a loop that does the following:

do {
  updateConfig(newConfig)
  conf := getConfig()
} while (conf != newConfig)

This loop goes forever, and its not even super clear how a user could successfully write a program that understood it should stop.

@bgrant0607
Copy link
Member

@brendandburns It's not that simple, and your suggested approach doesn't solve a sufficiently large part of the problem to be useful. In the case of simultaneous updates from an automated component, this approach is not going to work. An intelligent merge needs to be done with the updated desired state, by tracking which fields should be set/unset by the configuration. See discussion in #1178, #1007, and #1201. If you don't care about clobbering other changes, you can drop the preconditions from the update and keep performing it until operation succeeds -- no need to get desired state back from the apiserver.

@bgrant0607
Copy link
Member

Clarification of #1502 (comment): We should create a factory method for each object that accepts json, parses into an object, performs validation checks, clears fields that shouldn't be set, initializes fields that are set automatically, such as UID, sets default values, and produces a valid object. The API methods should do auth checks, logging, response generation, etc., but should rely on the same factory methods for object creation.

Updates should similarly be factored out.

No significant business logic should be directly implemented in the API methods themselves.

@bgrant0607 bgrant0607 added kind/design Categorizes issue or PR as related to design. area/app-lifecycle labels Oct 1, 2014
@thockin
Copy link
Member Author

thockin commented Oct 2, 2014

Devil's advocate on this proposal:

We receive a blob of versioned JSON. We look up which factory to call by the version & path. We call factory.Create(blob). Either we have version-specific factories or each factory.Create() has to switch on version. It has to decode JSON into a struct, then filter out not-valid-on-input fields, then apply defaults and assign managed fields. At last we have a usable object (without any real encapsulation because, hey, it's Go).

That's a lot of work to do for 49 unique structs * 3 API versions (yes, we have 49 structs in pkg/api, 58 in v1beta3). Can it be slimmed down? Thought experiment.

We receive a blob of versioned JSON. We auto-decode that into a version-specific struct, then almost-auto-convert that to the internal struct (as today). Now we have an internal struct that has not-valid-on-input fields, missing to-be-defaulted fields, and missing managed fields.

We could make a ClearNonInputFields() method on runtime.Object, which each type would have to implement and Decode() would call. Maybe we could even use a struct tag 'nodecode" and have Decode() do that transparently? We could make a SetDefaults() method which Decode would call, but this is maybe a stretch for struct tags. We could maybe make Validate() be a method, but there is an argument to be made that validation is contextual, and we don't know that at Decode time. Likewise managed fields, they have to be set by context-specific code.

We could have a THIRD type (not api internal, not api versioned) that had a real constructor like you suggest here, and convert from api internal to this "stronger" type. Construction implies clearing and defaulting fields. But this is just like C++ - it's still easy to add a field and forget to add it to the contructor init. Is it really much better?

We could have different structs for input and output, where there is no such thing as a field that has to be ignored on input. For example, v1beta3 still has POST receive a Pod, which has a Status field. We would instead have a PodInput struct which just has metadata and spec, and a Pod struct which has the additional Status. This leads to a lot more structs, often overlapping, but at least it is self-documenting. We still need to call SetDefaults() on it.

Better ideas?

@bgrant0607
Copy link
Member

Current code is:

    case "POST":
        if len(parts) != 1 {
            notFound(w, req)
            return
        }
        body, err := readBody(req)
        if err != nil {
            errorJSON(err, h.codec, w)
            return
        }
        obj := storage.New()
        err = h.codec.DecodeInto(body, obj)
        if err != nil {
            errorJSON(err, h.codec, w)
            return
        }
        out, err := storage.Create(ctx, obj)
        if err != nil {
            errorJSON(err, h.codec, w)
            return
        }
        op := h.createOperation(out, sync, timeout, curry(h.setSelfLinkAddID, req))
        h.finishReq(op, req, w)

Right now we do validation and initialize default values in Create(), but we do the conversion in DecodeInto. If we convert between versions, we probably need to do validation twice, once in the original version and again in the target version.

Translations between API versions implies that we will need to fill in at least some default fields, to record version-specific defaults.

@thockin
Copy link
Member Author

thockin commented Oct 2, 2014

On Wed, Oct 1, 2014 at 11:07 PM, bgrant0607 notifications@github.com
wrote:

Current code is:

case "POST":
    if len(parts) != 1 {
        notFound(w, req)
        return
    }
    body, err := readBody(req)
    if err != nil {
        errorJSON(err, h.codec, w)
        return
    }
    obj := storage.New()
    err = h.codec.DecodeInto(body, obj)
    if err != nil {
        errorJSON(err, h.codec, w)
        return
    }
    out, err := storage.Create(ctx, obj)
    if err != nil {
        errorJSON(err, h.codec, w)
        return
    }
    op := h.createOperation(out, sync, timeout, curry(h.setSelfLinkAddID, req))
    h.finishReq(op, req, w)

Right now we do validation and initialize default values in Create(), but
we do the conversion in DecodeInto. If we convert between versions, we
probably need to do validation twice, once in the original version and
again in the target version.

Objects only exist in the versioned form in transit to the internal form.
I agree it's POSSIBLE that we would need to validate before conversion, but
I don't see that actually happening.

Translations between API versions implies that we will need to fill in at
least some default fields, to record version-specific defaults.

That's a fair and fugly point. If the default value changes across
versions, we'll totally get that wrong right now.

@bgrant0607
Copy link
Member

The case for filling in default values:

Pros:

  • Could be done in just one place (per object, per version, of course)
    • In particular, clients and desired-state consumers wouldn't need to hardcode knowledge of default values
    • Would force some object-creation discipline, which is a good thing in my experience, since it helps to ensure we test what the system is doing
  • Straightforward cross-version object translation
    • In particular, if we want objects to survive schema changes, this solves the problem of remembering what the defaults were in the API version used to create the object, including the case where we introduce a new default behavior that didn't even exist in prior API versions
  • Transparent, introspectable, explicit
  • Compatible with the config proposal (Proposal: API support for diff'ing desired state #1178, Document annotations and guidelines regarding when to use annotations vs labels #1201, Proposal: Configuration #1007)
  • Consistent with other means of automatically setting metadata and desired-state fields, such as UID generation, Name uniquification, label attachment (e.g., by replication controller), auto-sizing and auto-scaling, and fields populated by client convenience utilities

Cons:

  • Consumes a little more bandwidth and/or memory for fields that would otherwise be unset

@smarterclayton
Copy link
Contributor

One other con of filling in default values:

  • Requires a migration in the event the default value is implemented in code incorrectly.

@bgrant0607
Copy link
Member

@smarterclayton I'm not sure I understand. Whether or not a default value is explicitly filled in, changing default behavior, whether the original behavior was intended or not, could break clients, in general.

@smarterclayton
Copy link
Contributor

I was referring to an internal migration - today you can roll out a fix to the code that corrects a bad default value, but if the values were persisted at creation you have to roll out a migration. If the default value was non client impactful the latter is more expensive.

I don't think that's a significant con, but I wanted to highlight it.

----- Original Message -----

@smarterclayton I'm not sure I understand. Whether or not a default value is
explicitly filled in, changing default behavior, whether the original
behavior was intended or not, could break clients, in general.


Reply to this email directly or view it on GitHub:
#1502 (comment)

@lavalamp
Copy link
Member

Let me suggest: Translate defaults at conversion time.

Like, if "" means PullAlways in a certain field in a certain version, then when serialized in that version, one should see "" there, but in memory one sees in the unversioned field "PullAlways".

This has the advantage that defaults are versioned, and that you can add support for legacy defaults by adding a new setting, e.g. "PullAlwaysLikeWeUsedToInv1beta1".

Downsides to this: Need to make it super easy to write a field conversion that only changes one field; right now you have to write a bunch of other code. I want to do this anyway, so not that much of a con.

I'm kind of morally opposed to having more than one way to represent any given desired behavior, that makes everything hard. We should independently consider the questions "What's the best way for the user to specify their intent?" and "What's the best way present intentions to the system?". Filling in defaults at random places scattered throughout the code spreads user-intent-deduction code throughout the system, an anti-pattern. No user intent deductions should be required after an object is submitted to the system.

(One could also argue that config stuff should fill in defaults as its last step; I can live with that, too-- but if that's our answer then we should not have defaults in our versioned structs--leaving something unset should just be an error. This would be a PITA if you're not using the/a config system.)

@bgrant0607
Copy link
Member

@smarterclayton Sounds like more of a feature than a bug. Changing default behavior is a breaking change that should require a version change.

@lavalamp Punting the problem to clients would mean that we couldn't add fields without breaking clients. We don't want to do that.

Besides, it's unrealistic to think that there will just be one place/tool/system/whatever that sets all fields not explicitly set by the user. We need a composable solution.

I agree with your moral opposition, though. Sprinkling interpretation of defaults all throughout the code would make the system much harder to understand and evolve.

I think the only question is whether clients should be able to inspect the default values. I argue that they should (provided that the fields are visible in the version of the API they are using).

@lavalamp
Copy link
Member

Punting the problem to clients would mean that we couldn't add fields without breaking clients.

Good point; I will double-down on my assertion that the versioned <-> unversioned struct conversion code is the place for defaults to be expanded.

@bgrant0607
Copy link
Member

As shown in #2388, we need the code that validates and initializes defaults in pods to be callable by Kubelet as well as by apiserver, for pods that don't come from the apiserver.

yujuhong added a commit to yujuhong/kubernetes that referenced this issue Feb 3, 2015
Currently, the validation logic validates fields in an object and supply default
values wherever applies. This change factors out defaulting to a set of
defaulting callback functions for decoding (see kubernetes#1502 for more discussion).

 * This change is based on pull request 2587.

 * Most defaulting has been migrated to defaults.go where the defaulting
   functions are added.

 * validation_test.go and converter_test.go have been adapted to not testing the
   default values.

 * Fixed all tests with that create invalid objects with the absence of
   defaulting logic.
@goltermann goltermann removed this from the v1.0 milestone Feb 6, 2015
@thockin thockin removed this from the v1.0 milestone Feb 6, 2015
@bgrant0607
Copy link
Member

This is mostly done. I did find a stray default:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/registry/service/rest.go#L133

@bgrant0607 bgrant0607 assigned yujuhong and unassigned thockin Feb 14, 2015
@bgrant0607
Copy link
Member

We should also document the approach in docs/api-conventions.md.

@bgrant0607 bgrant0607 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. area/usability and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Feb 14, 2015
@ghodss
Copy link
Contributor

ghodss commented Feb 16, 2015

@bgrant0607 Can you summarize the approach taken for how defaults are handled?

@thockin
Copy link
Member Author

thockin commented Feb 17, 2015

@ghodss Defaulting is now done in a separate per-version pass during conversion from versioned APIs to internal structs.

@bgrant0607
Copy link
Member

@bgrant0607
Copy link
Member

I think this is done.

dobsonj pushed a commit to dobsonj/kubernetes that referenced this issue Mar 28, 2023
…erry-pick-1498-to-release-4.13

[release-4.13] OCPBUGS-8412: Fix mounted volume expansion tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/app-lifecycle area/usability kind/design Categorizes issue or PR as related to design. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

8 participants