-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI Upgrade Tests Failing for hostPath and gcePD #71228
Comments
The specific upgrade job these tests are failing on, upgrades the master and nodes to 1.13 and run the old version's tests i.e., in this case runs 1.11 tests against the 1.13 cluster
More info on upgrade tests at https://github.com/kubernetes/community/blob/master/contributors/devel/e2e-tests.md#version-skewed-and-upgrade-testing |
Do we also have skew tests that only upgrade master, but keeps the nodes on the older version? If so, we need to detect that case too and use the old 0.3 drivers in that scenario. |
am I understanding correctly that there is no overlap in supported CSI plugin APIs between 1.12 and 1.13? |
quoting @saad-ali from #65246 (comment)
So yes looks like users will see this exact same issue as this CI signal if the entire cluster is upgraded to 1.13. But I will let @saad-ali talk to this more. Also should we communicate this more broadly than within Release notes. This test failure did manage to raise a few eyebrows during the burndown, so I am wondering if this compatibility expectation is clearly understood/ communicated. |
CSI 1.0 is not backwards compatible with CSI 0.x. Kubernetes v1.13 adds support for CSI 1.0 while dropping support for CSI 0.x. Which means it is not backwards compatible with CSI 0.x drivers. A vendor may choose to make a version of a CSI driver that supports both versions. If they do not, the disruptive upgrade path for Kubernetes cluster admins is to remove the old CSI drivers from the cluster before upgrading. Upgrade master and node to k8s v1.13, and wait until master and all nodes are upgraded to 1.13, and then deploy the new CSI 1.0 compatible version of the driver. This approach will be disruptive for the duration of the cluster upgrade as it would mean that during the upgrade no volumes of that type can be provisioned/attached/mounted/etc. After the cluster/driver upgrade things should just start working again with the existing PV/PVCs/objects (unless the driver name changed, or the driver had some other change between 0.3 and 1.0 that prevents it from operating on the old PV/PVC/Storageclass objects). A possible non-disruptive upgrade path is being investigated here: #71282 Regardless, in the CSI community it has been well communicated (and in my opinion well understood) that CSI 0.x will have breaking changes from release to release. The intention has been to treat CSI 0.x as a chance for developing and testing drivers before we arriving at a stable API. And as far as I know that is the case with the CSI drivers that I am aware of (https://kubernetes-csi.github.io/docs/Drivers.html) -- none of them are currently production ready. The intended path forward is to document the incompatibility as a known (and expected) issue for 1.13. And document possible upgrade strategies. The existing test have been modified to support disruptive upgrade (PR #71241). We're looking in to also adding a new upgrade test that exercises and validates a possible non-disruptive upgrade strategy (issue #71282). Moving forward, we intend for all all CSI 1.x release to be backwards compatible to 1.0 and support it as such in Kubernetes. |
Just to explore how disruptive that would be, aren't nodes required to be drained as part of upgrade? if so, when new replicas of those workloads attempt to get scheduled to other nodes, any that use CSI volumes will not be able to run because those volumes could not be attached/mounted, correct? |
Correct, it is suggested that pods be drained from nodes before a node is upgraded. And if a CSI driver is uninstalled before upgrade, then the pods depending on volumes using that driver will remain in pending state (won't attach/mount) until the upgrade is complete and a compatible version of the driver is installed. |
This might be tricky for pods that are managed by higher level controllers that will automatically restart the pods |
@AishSundar @liggitt I spoke with @thockin. He is not happy with the backwards comparability break. He wants us to look in to what it will take to support both 0.x and 1.0. I will look in to that. And update this thread on feasibility. Until this is resolved consider this a release blocking issue. /priority critical-urgent |
@saad-ali thanks for the update, will watch this thread. |
We discussed this at the release meeting today. Plan of record is that we will try to get PR #71314 approved/merged by EOD. Once it is merged, we'll monitor for regressions over the long weekend. On Monday we will assess the impact and next steps. As a backup option we will continue to investigate a non-disruptive manual intervention upgrade option (issue #71282). |
/reopen Reopening until we get green CI. @jberkus ^^ |
@AishSundar: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We're getting a lot of flakes across the longer-running tests today, so at this point we don't know if things are passing or not. |
the CSI tests in question turned green after #71314 was merged: This particular issue can be closed |
Flakes are unrelated to this issue. This issue was fixing a hard fail. |
OK, closing this and opening a new issue for the flakes. /close |
@jberkus: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Quoted from: #65246 (comment)
CSI v1.0.0 is only compatible with k8s >1.13
CSI v0.x is only compatible with k8s <1.12
Its probable that we will need to detect what version the k8s cluster is on in the test and deploy the "correct" version of the CSI Drivers for each version
/kind failing-test
/assign
/cc @AishSundar @jberkus @saad-ali @msau42
/sig storage
The text was updated successfully, but these errors were encountered: