-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kustomize resource ordering regression #3794
Comments
I am currently working on a PR for addressing the problem described above. If possible, @monopole @Shell32-Natsu @pwittrock I would love if you could take a look at the described issue and tell me if you agree/disagree with the points made. |
Thanks! Discussing this on the earliest issue you mentioned #821 People can order their inputs to get the desired output order, which may be the only answer. There's no one ordering that makes sense for everyone, since no ordering can take custom resources into account. |
the original motivation for this has become somewhat moot since the community has moved away from declarative things and into imperative. this leads to a race condition, you cannot apply a system w/ a yaml doc to install prometheus followed by one that sets up a servicemonitor on some other thing. No ordering can fix this (since the order is correct). and, the system will not reconcile to the desire state (since the servicemonitor is rejected as an unknown type). What is the solution? I dunno, does one call |
Generally I agree that kubernetes resource usage should be declarative. Though there are exceptions to the rule as e.g. the Namespace and potentially any custom webhook/CRD. Due to this reason I am in favour of preserving the resource order within a kustomization as the author defined it and authors, when they must define complex webhook/CRD constellations, need to take the resource order within their manifests into account anyway or it won't be installable at the first attempt. |
So as @mgoltzsche mentions, kustomize emits resources in FIFO order if one specfies:
Some history of the #1104 happened before the @mgoltzsche, @yanniszark do you agree? Someone will come along later and change it again. I'd like to make the declarative nature of kubernetes isn't going away, but lets not derail this issue. @donbowman there is indeed a concept of polling for status being developed: https://github.com/kubernetes-sigs/cli-utils/blob/master/cmd/status/cmdstatus.go |
not trying to derail, but in the last 2 years i've had to abandon parts of my strategy of having a single declarative setup. I'm the one that opened the original issue, it was indeed motivated by e.g. cert manager webhooks. Since then, istio has moved to a standalone binary to install/update, so i cannot have it in my gitops + kustomize. so i'm skeptical at this stage that order alone can solve (since its inherently order + wait for operator to initialise + wait for webhooks to finish setting up). maintaining author order would be a good step tho. |
@monopole I agree: deprecating the TL;DR When it comes to CRDs and operators: I don't think an operator should apply its CRDs (as opposed to custom resources). CRDs should be applied separately or be contained within the manifest that also contains the operator Deployment. In case the operator should really deploy its own CRDs for some reason the operator would at least need to expose the CRD registration status within its own readiness probe. Otherwise the caller can indeed not know when the CRDs are usable unless it polls for the presence and |
Thanks for the input @monopole! About:
This may indeed be the case, but it's a big change, since it effectively changes the current promise of kustomize to manifests developers. If I understand correctly, the current (default) promise is: don't care about ordering, kustomize will do it for you.
May I suggest the following:
|
Where is that promise made? It must be retracted. It's literally impossible right now to make one ordering that can both deploy a working stack to a cluster and universally works for everyone's custom resources now and in the future. Generally one needs a grander mechanism which can apply some things, waits for ready, apply more things, etc. Various things are moving in this direction, and would encapsulate kustomize or some other editor. |
The default behavior of kustomize is to reorder. Thus, the implied promise towards the manifest developer is that kustomize takes care of ordering so you shouldn't watch out for it. This is how I meant it. |
well, we need better docs . @yanniszark have you tried reordering your inputs to solve your particular use case? or is this somehow not an option? |
e.g. resources not under your control. |
@monopole that's a good question. The answer is that in Kubeflow we have many many kustomizations which have been constructed and tested with kustomize's reordering feature. This is without accounting for the various downstream uses of KubeflowSince downstream uses are essentially supported by different vendors, I'd say that this is way too big of a change to be an option, at least for the upcoming release of Kubeflow. Thus, my thought is to get reordering working, as we now use it successfully in 3.2.0, and then possibly bring the bigger change of stopping to reorder resources, if kustomize also aligns with this direction long-term. In general, we'd like to be aligned with upstream :) |
@yanniszark as you mentioned within the issue description applying a webhook and a resource the webhook would handle within the same manifest for the first time to a cluster will fail since the webhook's Deployment will not be available yet. |
@mgoltzsche Would you object to accepting #3803? I think we all agree this isn't the right solution, but it would seem to unblock kubeflow in the short term. |
I just commented in the PR that I am objecting to accepting it indeed because, while it would unblock kubeflow for now, it would break the cert-manager installation again (and potentially others) although the latter works nicely now and - opposed to kubeflow apparently - can be installed with a single |
ack, thanks. Here's a list of ordering changes over the last ~2 years Mostly insertions (i.e. specifying order for a particular resource where there wasn't an order before).
#3803 would move them back to It's been pretty stable for a year, so making a change now may have consequences for many users. As discussed above, it's impossible for kustomize to own the answer to 'correct' ordering. So let's close #3803, and leave Some other options are:
|
@yanniszark if interested, please file a bug for option 2. I think it could be knocked of relatively quickly for the next release, if your think it would help. |
@monopole @mgoltzsche thank you for your answers. @monopole thank you for proposing a way forward. We will gladly implement and contribute option (2) to unblock Kubeflow, but first I'd like to get your thoughts on the following comments that have to do with the reasoning stated above:
In addition, I understand that you are reluctant to make changes to ordering because as you say, Kustomize does not know all resources (doesn't know CRs, extension API-Servers). I want to ask two questions here:
All in all, and independently of whether we implement option 2, I wanted to argue that ordering admission webhooks last is not a best practice and not what kustomize is doing right now for similar resources. In addition, for tools that don't use retries (e.g., Argo, Flux use retries), @mgoltzsche mentioned that there is an existing practice and following this practice would not break them. That said @monopole, I know that you don't want to potentially disrupt existing users. But I thought I would make these arguments first, so that you are also aware of them. We could decide to make a final change which is in-line with what Kustomize is doing with every other admission controller and avoid (2). Let me know what you think. Again thanks for your comments and repeating that we are open to implementing (2). |
Did an issue for option 2 get filed? |
This is the issue for option 2 implementation and design: #3913 |
Describe the bug
In Kubeflow, we are using Kustomize
v3.2.0
and want to upgrade tov4.0.5
: kubeflow/manifests#1797However, our deployment failed for
v4.0.5
, while it succeeded forv3.2.0
.This is what happened:
Regression Background
For reference, here is the original issue and PRs that made these changes:
#821
#1104
#2459
Issue #821 presents the following scenario, which led to PR #1104:
Step (3) fails because the Deployment has not become ready yet.
The solution SHOULD be to retry the apply.
PR #1104 solution was to order the webhook last, so that it doesn't mutate/validate the CR.
This is false, as it circumvents logic that the application has explicitly declared should be applied to all relevant resources.
Files that can reproduce the issue
Please see: https://github.com/kubeflow/manifests/blob/v1.3-branch/README.md
which includes the
example
kustomization we use for Kubeflow components.example
kustomization with kustomizev3.2.0
, as per the README.example
kustomization with kustomizev4.0.5
. You will see the WebhookConfigurations ordered last, which causes the issues.Proposed Solution
Restore the order of Mutating / Validating Webhooks as it was before PR #1104
Kustomize version
v4.0.5
cc'ing authors of the referenced issues and PRs: @donbowman @mgoltzsche @asadali
cc @monopole @Shell32-Natsu @pwittrock
The text was updated successfully, but these errors were encountered: