-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4650: StatefulSet Support for Updating Volume Claim Template #4651
base: master
Are you sure you want to change the base?
Conversation
huww98
commented
May 22, 2024
•
edited
Loading
edited
- One-line PR description: initial proposal of KEP-4650: StatefulSet Support for Updating Volume Claim Template
- Issue link: StatefulSet Support for Updating Volume Claim Template #4650
- Other comments: Inspired by and should replace KEP-0661: StatefulSet volume resize #3412
This KEP is inspired by the previous proposal in #3412 . However, there are several major differences, so I decided to make a new KEP for this. Differences:
Please read more in the "Alternatives" section in the KEP. This KEP also contains more details about how to coordinate the update of Pod and PVC, which is another main concern of the previous attempt. |
owning-sig: sig-storage | ||
participating-sigs: | ||
- sig-app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main issues with #3412 were around StatefulSet controller and its behavior in error cases. IMO it should be owned by sig-apps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, given that the changes to code should mainly happen on StatefulSet controller, and the previous attempts all put sig-apps as the owning-sig (it is weird that they are placed in the sig-storage folder)
/sig apps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing production readiness questionnaire filled in and a PRR file in https://github.com/kubernetes/enhancements/tree/master/keps/prod-readiness/sig-apps matching the template
keps/sig-apps/4650-stateful-set-update-claim-template/README.md
Outdated
Show resolved
Hide resolved
keps/sig-apps/4650-stateful-set-update-claim-template/README.md
Outdated
Show resolved
Hide resolved
know that this has succeeded? | ||
--> | ||
* Allow users to update the `volumeClaimTemplates` of a `StatefulSet` in place. | ||
* Automatically update the associated PersistentVolumeClaim objects in-place if applicable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by in-place
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the resizedisk feature can automatically adjust the disk and filesystem sizes while the pod is running. In addition to this there will be new features like VolumeAttributesClass that will also support in-place changes to the storage being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use the word in-place to explicitly distinguish from Delete/Create style update for Pods. We should never delete PVC automatically.
Production Readiness review, etc.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: huww98 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Updated "Production Readiness Review Questionnaire" as requested. During the last sig-apps meeting, First, this KEP is still mainly targeting the use-cases where we can update the PVC in-place, without interrupting the running pods. The migration and other use-cases are just by-product of enabling editing the volumeClaimTemplates. But still, migration by editing sts is simpler, just requiring a rolling restart, which should already be familiar by most k8s operators. Editing the sts does not require switching traffic between 2 sts, for example. And the name of sts won't change after migration. Yes, we cannot rollback easily with this procedure. The user should delete the PVC again to rollback. And the data in the PVC may not be recovered if retention policy is Q2: @soltysh suggested we may still go along the way of KEP-0661, then do more after we got more experience. Here is why I don't want to proceed that way:
Please read more in the @soltysh said we don't want the STS to be stuck in a permanently broken state. Of course. With this KEP, as we are not validating the templates. It is actually very hard to get stuck. Larger PVC is compatible with smaller template, so just rolling back the template should unblock us from future rollout, leaving PVCs in the expanding state, or try to cancel the expansion if I think the only way we may get stuck is patching VAC, if one replica is successfully updated to the new VAC, but another replica failed. and rolling back the first replica to the old VAC also failed. Even in this case, the user can just set Q3: @kow3ns thinks it is not appropriate to delete and recreate the PVC to alter the performance characteristics of volumes. VAC is the KEP that actually parameterizing the storage class and allow us to specify and update the performance characteristics of volumes without interrupting the running pod, by patching the existing PVC. So this KEP should also integrate with VAC. The update to VAC in the volumeClaimTemplates should not require re-creating the PVC, and is fully automated if everything goes well. Q4: @kow3ns asked how we should handle each field of This is described in the KEP in "How to update PVCs" section in "Updated Reconciliation Logic". Basically, patch what we can, skip the rest. It seems the wording "update PVC in-place" causes many mis-understandings. I will replace it with "patch PVC". We didn’t actually decide anything during the last meeting. I think these core questions should be decided to push this KEP forward:
|
The general sentiment from that sig-apps call (see https://docs.google.com/document/d/1LZLBGW2wRDwAfdBNHJjFfk9CFoyZPcIYGWU7R1PQ3ng/edit#heading=h.2utc2e8dj14) was that the smaller changes have a greater chances to move forward. Also, it's worth noting that the smaller changes do not stand in opposition to the changes proposed in here, they are only taking the gradual approach by focusing on a minimal subset of changes. |
Okay, we agreed that focusing on the minimal subset of changes. @huww98 and @vie-serendipity will proceed with only
|
At the last sig-apps meeting, we have decided that we should:
But for the validation of the template, I think we still need more discussion. It can be a major blocking point of this KEP. @soltysh think that we should not allow decreasing the size of template. He thinks we can remove the validation later if desired. But I think validation has many drawbacks which may block normal usage of this feature and should be resolved in the initial version:
On the contrast, if we just don't add the validation, we can avoid all these issues, and lose nothing: The user can expand PVC independently today. So, the state that the template is smaller than PVC is already very common and stable. The strategy in this state is not even trying to shrink the PVC. I think this is well defined and easy to follow. If Kubernetes ever supports shrinking in the future, we will still need to support drivers that can't shrink. So, even then we can only support shrinking with a new To take one step back, I think validating the template across resources violates the high-level design. The template describes a desired final state, rather than an immediate instruction. A lot of things can happen externally after we update the template. For example, I have an IaaS platform, which tries to To conclude, I don't want to add the validation, we don't add it just to be removed in the future. |
Agree. By the way, |
That's one way looking at it, also in those cases where a mistake happens (I consider that a rare occurrence), you can always use #3335 and migrate to a new, smaller StatefulSet. |
@huww98 and @liubog2008 are you planning to push this through for 1.32? |
Users have been waiting for many years to be able to scale up statefulset volumes. I agree we shouldn't over complicate that use case trying to handle solving other issues at the same time. Lets focus on the very common use case, and then can reevaluate other features after that is completed. |
@soltysh Enhancement Freeze date is at Oct 11th right? We are on vacation right now and may not make this date. cc @huww98 @vie-serendipity |
Roger that, let's try to work on the enhancement still within 1.32 time frame, so that it's ready in time for 1.33, in that case. |
okay, that sounds great~ |
We propose to patch the PVC as a whole, so it can only succeed if the immutable fields matches. | ||
|
||
If only expansion is supported, patching regardless of the immutable fields can be a logical choice. | ||
But this KEP also integrates with VAC. VAC is closely coupled with storage class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use full name of VAC first time the acronym is used - I had to infer what it was from the comments (https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/3751-volume-attributes-class/README.md)
|
||
Introduce a new field in StatefulSet `spec`: `volumeClaimUpdatePolicy` to | ||
specify how to coordinate the update of PVCs and Pods. Possible values are: | ||
- `OnDelete`: the default value, only update the PVC when the the old PVC is deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could have some value in calling this OnPVCDelete
to clarify it is only updated if PVC is deleted, especially if there could be other deletion-based alternatives like OnPodDelete
in the future.
### What PVC is compatible | ||
|
||
A PVC is compatible with the template if: | ||
- All the immutable fields match exactly; and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(including storage class) matches (inferring from prior discussion that storage class changes will be left to the operator to manually manage)