Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Set default storage class on PVC sync if default StorageClass is present #100031

Conversation

LittleChimera
Copy link

In case where there's no default StorageClass at the time of creation of a PVC, the PVC will stay without storage class.
This logic was thus far only executed in AdmissionWebhook which would add a storageClassName to PVC with no defined storageClass.
Therefore, PVC will stay without storageClassName, if the PVC manifest is not reapplied to the cluster.

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #99629

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


In case where there's no default StorageClass at the time of creation of a PVC, the PVC will stay without storage class.
This logic was thus far only executed in AdmissionWebhook which would add a storageClassName to PVC with no defined storageClass.
Therefore, PVC will stay without storageClassName, if the PVC manifest is not reapplied to the cluster.
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 9, 2021
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 9, 2021
@k8s-ci-robot
Copy link
Contributor

@LittleChimera: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

Welcome @LittleChimera!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 9, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @LittleChimera. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Mar 9, 2021
@k8s-ci-robot k8s-ci-robot requested review from msau42 and thockin March 9, 2021 20:24
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 9, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: LittleChimera
To complete the pull request process, please assign thockin after the PR has been reviewed.
You can assign the PR to them by writing /assign @thockin in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 11, 2021
@LittleChimera LittleChimera changed the title Set default storage class on PVC sync if default StorageClass is present [WIP] Set default storage class on PVC sync if default StorageClass is present Mar 11, 2021
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 11, 2021
Copy link
Member

@jsafrane jsafrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some unit tests too!

Comment on lines +364 to +372
if storageClasses, err := ctrl.kubeClient.StorageV1().StorageClasses().List(context.TODO(), metav1.ListOptions{}); err != nil {
for _, sc := range storageClasses.Items {
if sutil.IsDefaultAnnotation(sc.ObjectMeta) {
claim.Spec.StorageClassName = &sc.Name
ctrl.updateClaim(claim)
return nil
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract all this getting default SC and modifying PV into a separate function and just call it. I like the giant if-then-else in syncUnboundClaim as clean as possible.

@@ -359,8 +360,19 @@ func (ctrl *PersistentVolumeController) syncUnboundClaim(claim *v1.PersistentVol
return err
}
return nil
case storagehelpers.GetPersistentVolumeClaimClass(claim) == "":
if storageClasses, err := ctrl.kubeClient.StorageV1().StorageClasses().List(context.TODO(), metav1.ListOptions{}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the default storage class should be done at line 337, "// User did not care which PV they get" - if the PVC has no storage class, file it there and return. Updated PVC will be enqueued shortly and it will have a SC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on, maybe leave the code here - if there is a PV without a storage class, we try to bind to it once. If there is no such PV, we file the default SC into the PVC and try again, hoping for dynamic provisioning. This will keep at least a bit of the existing behavior there.

@msau42 what do you think? It's a bit confusing.

@thockin
Copy link
Member

thockin commented May 21, 2021 via email

@LittleChimera
Copy link
Author

Thank you for the feedback! ❤️

Is this really something we want to do? If I recall correctly, having no class is semantically meaningful, isn't it?

Please add some unit tests too!

I was actually trying to modify these test cases which specify that nothing should be done when storageClass is set to an empty value (with this PR, I would expect that in these two cases, the default storageClass is overridden on this PVC instead, and then a PV provisioned and bound):
https://github.com/jsafrane/kubernetes/blob/bb5d562f37ab029d2b562e23bd23e906ee099a45/pkg/controller/volume/persistentvolume/provision_test.go#L299-L316

Originally, there were written as part of dynamic provisioning. I cannot make out a reason as to why this behaviour was chosen.
#29006

Default storage classes were added later, and I cannot find again the reason for this behavior:
#30900

I would expect that storageClass is not overwritten by dynamic provisioner only if the associated PV was created manually.
I don't see a use-case for using an empty value, if the claim is not bound and doesn't specify a persistentVolume.
Even for manual use-cases, the usual value that's typically used for the storageClass would be "none" or "local".
Does this make sense to you or am I missing something?

@msau42
Copy link
Member

msau42 commented May 21, 2021

Is this really something we want to do? If I recall correctly, having no class is semantically meaningful, isn't it?

I think we definitely don't want to set the default storageclass for PVCs that have already been bound.

For unbound PVCs, I see two scenarios:

  1. Someone deployed all their components in parallel, including default StorageClass and workloads that use it. The ordering constraint currently imposed means that sometimes the default storageclass doesn't get set on those PVCs and they're stuck forever in an unbound state. This forces folks to order their deployments or hardcode a storageclass name in their PVC. One could argue that we have this ordering issue with other objects too, like CRDs and CRs.
  2. Someone does not have a default storageclass and they manually provide PVs for the PVCs. In this case, a nil PVC.storageclassname will bind to the PV. This is the scenario that may be impacted if we make this change. Note that it only breaks if a default storageclass gets installed in the cluster at some later point. The sequence that may break is:
    1. User creates PVC with empty StorageClass
    2. User creates a default StorageClass
    3. User creates a PV with empty StorageClass

IMO the scenario of depending on this behavior before installing a default StorageClass in the system seems a bit contrived and requires careful deployment ordering by the user. It could be mitigated by creating the PVC with "" storageclass.

That being said, I do agree that this makes explaining the API semantics more complicated. There are other issues too, such as validation not allowing updates to storageclassname, and maybe some concerns (that I haven't thought about yet) with pvc templates (statefulset, generic ephemeral volumes).

In summary, this is not as simple to change, we need to think more deeply on the api compatibility and impact to other features.

@LittleChimera
Copy link
Author

LittleChimera commented May 22, 2021

One could argue that we have this ordering issue with other objects too, like CRDs and CR

I think that ordering probably isn't the best way to solve this. In case of CRDs and Namespaces, which fall into this category, API server doesn't accept applying the dependencies. In other words, either you get expected results, or you get an error. Doing this with StorageClasses would make two different results possible based on which order you apply your resources. I.e., this solution would not be eventually consistent, because you would be able to get different results based on order of operations.

  1. Someone does not have a default storageclass and they manually provide PVs for the PVCs. In this case, a nil PVC.storageclassname will bind to the PV. This is the scenario that may be impacted if we make this change. Note that it only breaks if a default storageclass gets installed in the cluster at some later point. The sequence that may break is:

    1. User creates PVC with empty StorageClass
    2. User creates a default StorageClass
    3. User creates a PV with empty StorageClass

It's a good point. But, I think we can break this down further in two scenarios:

  1. User expects a default storage class present on their system. In this case, if you want to provide PV manually, you're expected to define some storage class - e.g. "", "none", "manual". Otherwise, the storageClass will be set to default one by the mutating admission webhook.
  2. User doesn't expect a default storage class present on their system. In this case, the change would be irrelevant. Adding a default StorageClass at a much later stage would not make a difference because a PV will be bound to the PVC.

I think it should be enough to change the logic so that storageClass can change only for PVCs with nil value. I.e. users that don't specify storage class on their PVC, expect that a default one will be set eventually. Users who want to provision PVs manually fall into one of the two scenarios above which both seem safe to me. Additionally, this might require changes to defaulter I think, so that might cause additional issues.

Do you think that makes sense and is worth a try?

That being said, I do agree that this makes explaining the API semantics more complicated. There are other issues too, such as validation not allowing updates to storageclassname, and maybe some concerns (that I haven't thought about yet) with pvc templates (statefulset, generic ephemeral volumes).

I didn't know about generic ephemeral volumes before, but from what I've read I don't see any clear issues for them or STS provisoned PVCs.
I still agree there might be some other concerns that we're missing.

@jsafrane
Copy link
Member

Is this really something we want to do?

I think that issues described my @msau42 are real, we have the same ordering problem too. Similarly, it's not possible to reliably change the default storage class - most fields are read-only and currently we must delete the storage class and re-create it later, hoping that no PVC is created in the meantime.

If I recall correctly, having no class is semantically meaningful, isn't it?

There is a difference between storageClassName: "" (PVC does not want any storage class) and storageClassName: nil (PVC wants the default one and it falls back to "" if there is no default storage class at the point when PVC was created). IMO, it makes sense to extend nil to "PVC wants the default storage class, even if it's created later".

If I remember correctly, with fall back from nil to "" we wanted to keep the existing PVC behavior from time when there were no storage classes at all.

I also remember that we used annotation + admission plugin for the default storage class to have possibility to have different defaulting rules / admission plugins in the future, e.g. a different default class for different namespaces. If this was real, then we cannot implement the defaulting in PV controller, because it does not know what the defaulting rules are. But I haven't seen these different defaulting rules used in the wild, IMO everyone uses our admission plugin.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 22, 2021
@LittleChimera
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 22, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 20, 2021
@LittleChimera
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2021
@dims
Copy link
Member

dims commented Jan 14, 2022

/close

@k8s-ci-robot
Copy link
Contributor

@dims: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dims
Copy link
Member

dims commented Jan 14, 2022

please reopen if you still need this WIP PR. Trying to declutter the open PR(s)

@@ -359,8 +360,19 @@ func (ctrl *PersistentVolumeController) syncUnboundClaim(claim *v1.PersistentVol
return err
}
return nil
case storagehelpers.GetPersistentVolumeClaimClass(claim) == "":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane I think that we can change it to case claim.Spec.StorageClassName == nil: to keep the behavior compatible for the case that users intentionally specify "" to Spec.StorageClassName (Default StorageClass should be set to StorageClasssName only when StorageClassName == nil).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some codes on top of this PR in order to:

  • Work (fix wrong error check, loosen validation)
  • Avoid trying to add default StorageClass when "" is intentionally set by users (although, it is actually blocked by validation)
  • Only allow one default StorageClass

master...mkimuram:storage-class-default-pv-controller-v3

It would be better tighten validation a bit more, but there won't be a big user facing change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default storage class needs to be present before PVCs are created
8 participants