Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Server Side Apply #4758

Merged
merged 6 commits into from
Mar 22, 2022

Conversation

JoshVanL
Copy link
Contributor

This PR contains a design document for implementing Server Side Apply for all cert-manager controllers. Discussion is very welcome 🙂

Some related issues/PRs:

#4638
#4598
#4552

NONE

controllers

Signed-off-by: joshvanl <vleeuwenjoshua@gmail.com>
Signed-off-by: joshvanl <vleeuwenjoshua@gmail.com>
@jetstack-bot jetstack-bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. kind/design Categorizes issue or PR as related to design. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 18, 2022
@JoshVanL JoshVanL changed the title Design Serber Side Apply Design Server Side Apply Jan 18, 2022
Copy link
Member

@munnerz munnerz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks for putting it together.

It'd be good to see an expanded section on a migration plan too, e.g. if a user turns this feature on, is there any risk of old fields being deleted? What if they disable it again after?


To reach graduation, the cert-manager authors should consider the server side
apply implementation for cert-manager to be safe for end users. Graduation
should ideally be done over no more than one or two releases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we validate it is safe? Are there metrics we can use? (e.g. "we expect to see a decrease in the number of resource version conflicts"?

Will we directly solicit feedback from users?

Graduation should ideally be done over no more than one or two releases.

I think it's great to timebox this, however if we don't have confidence, we should not proceed with graduation. (and I think it's worth noting that here too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is something we should define. I'll add a note about soliciting user feedback, but we should keep thinking of other ideas (if there is anything else we can do).

Indeed, this was meant as a rfc SHOULD rather than MUST. I'll make that more clear

Signed-off-by: joshvanl <vleeuwenjoshua@gmail.com>
@JoshVanL
Copy link
Contributor Author

Thanks for the comments @irbekrm @munnerz. I have tried to address them in the design, however if you think they need more discussion please feel free to un-resolve that conversation.

Signed-off-by: joshvanl <vleeuwenjoshua@gmail.com>
API server to revoke management of that field from the previous owner, overwrite
the field, and change owner ship to the new client.
one controller (issuing and trigger Certificate controllers, and cmctl), and as
such, will need to make use of the `force` option in their API calls. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: will there be any cases where force should not be used?
I guess the assumption in this design is that all conflicts that we get are caused by controllers modifying different fields of the same resource (I assume that we cannot end up in a state where a controller has an older version of some resource (i.e due to some slow operations) and ends up force overwriting field X that was also modified by another controller in a newer version of the same resource).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a think, but I don't think there is any case where we do not want to use force. Getting a conflict error is an opportunity for the client to make a decision on whether they want to overwrite, co-own, or drop management of that value- all of our controllers own their managed fields and will want to overwrite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about using 'force' - this makes me think it is potentially overwriting other controller's progress.

Can we expand in the doc what force does, when it's appropriate to use, when we do use it, and why it's okay that we use it where we do/any special considerations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add more comments to the doc to clear up what force does, and how it relates to the management. From the Kube docs;

Conflicts can be forced, in which case the value will be overridden, and the ownership will be transferred.

https://kubernetes.io/docs/reference/using-api/server-side-apply/#conflicts



Overwrite value, become sole manager: If overwriting the value was intentional (or if the applier is an automated process like a controller) the applier should set the force query parameter to true and make the request again. This forces the operation to succeed, changes the value of the field, and removes the field from all other managers' entries in managedFields.

Don't overwrite value, give up management claim: If the applier doesn't care about the value of the field anymore, they can remove it from their config and make the request again. This leaves the value unchanged, and causes the field to be removed from the applier's entry in managedFields.

Don't overwrite value, become shared manager: If the applier still cares about the value of the field, but doesn't want to overwrite it, they can change the value of the field in their config to match the value of the object on the server, and make the request again. This leaves the value unchanged, and causes the field's management to be shared by the applier and all other field managers that already claimed to manage it

fwiw, I think force isn't a great name 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only case where it could overwrite other controllers progress could be something like- controller X and controller Y both pick up resource foo at version N where cluster state is T and resource state is Z. X would be writing to field A if cluster state was T and resource state was Z, Y would be writing to field A if cluster state was K and resource state was Z. X writes to the field, so the resource state is no longer Z. Meanwhile, cluster state changes to K for unrelated reasons and Y who still has foo at state Z and has been doing some slow operation now overwrites the field A.

Signed-off-by: joshvanl <vleeuwenjoshua@gmail.com>
@irbekrm
Copy link
Contributor

irbekrm commented Jan 25, 2022

Looks good to me, thank you for replying to all the comments!

Adding an lgmt with an assumption that this will be feature gated and that graduation steps will be easier worked out once the feature is in use and a hold in case more folks have comments

/lgtm
/hold
/approve

@jetstack-bot jetstack-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 25, 2022
@jetstack-bot jetstack-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 25, 2022
@jetstack-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: irbekrm, JoshVanL

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jetstack-bot jetstack-bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 31, 2022
Signed-off-by: joshvanl <vleeuwenjoshua@gmail.com>
@maelvls
Copy link
Member

maelvls commented Mar 17, 2022

Hi! This is a notice and does not require any action. As part of the effort of migrating from Bazel to Make, I wanted to let you know that we have added two new Prow jobs pull-cert-manager-make-test and pull-cert-manager-make-e2e-v1-23.

@irbekrm
Copy link
Contributor

irbekrm commented Mar 21, 2022

I think that this could now be merged, given that we have already merged most of the actual implementation and I see that all outstanding comments have been resolved

/lgtm
/hold cancel

@jetstack-bot jetstack-bot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Mar 21, 2022
@irbekrm
Copy link
Contributor

irbekrm commented Mar 22, 2022

/retest

@irbekrm
Copy link
Contributor

irbekrm commented Mar 22, 2022

/test pull-cert-manager-e2e-v1-23

@jetstack-bot
Copy link
Contributor

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to jetstack).
Review the full test history for this PR.
Silence the bot with an /lgtm cancel comment for consistent failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. kind/design Categorizes issue or PR as related to design. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants