-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The provisioner exits after 30 minutes of idle. #1099
Comments
I filed kubernetes-csi/csi-lib-utils#153 to disable autoclose. |
/assign |
The idle timeout was disabled, but has been enabled by default in google.golang.org/grpc v1.59. The kubernetes-csi-addons operator acts similarly to the Kubernetes external-provisioner, and benefits from having a functional gRPC connection open to the csi-addons sidecars that run alongside CSI-drivers. See-also: kubernetes-csi/external-provisioner#1099 Signed-off-by: Niels de Vos <ndevos@ibm.com>
The idle timeout was disabled, but has been enabled by default in google.golang.org/grpc v1.59. The kubernetes-csi-addons operator acts similarly to the Kubernetes external-provisioner, and benefits from having a functional gRPC connection open to the csi-addons sidecars that run alongside CSI-drivers. See-also: kubernetes-csi/external-provisioner#1099 Signed-off-by: Niels de Vos <ndevos@ibm.com>
/reopen |
@jsafrane: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jsafrane What is the status of this issue ? |
Maybe there is another issue around this: when the socket times out, the container fails without releasing the lease. Is this intended? Once restarted, the container doesn't recover the lease, we have to wait for the lease timeout (300s with vsphere-csi). |
I found that external-provisioner uses a random identity for the lease: external-provisioner/cmd/csi-provisioner/csi-provisioner.go Lines 283 to 286 in b377ea4
This is not the case for external-attacher for example: NB: default is hostname, i.e. pod name: |
@sathieu Can you open a different issue? The original issue should have been fixed in the latest patch releases. |
This was fixed in master branch by #1135, sorry I forgot to close it. |
@jsafrane: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
Automatic gRPC bump to 1.59.0 introduced a new gRPC behavior that closes idle connections after 30 minutes of inactivity. After 30 minutes of no provisioning / deletion, the connection to a CSI driver is silently closed. At the next provisioning / deletion, the provisioner realizes the connection is closed and exits with
Lost connection to CSI driver, exiting
. A new provisioner starts immediately, but it must wait for leader election to expire, which adds quite a long delay to volume provisioning (and our downstream e2e tests time out).What you expected to happen:
The gRPC connection should not close because of inactivity.
How to reproduce it:
On a very quiet cluster (no provisioning/deletion), wait for 30 minutes after external-provisioner start and create a new PVC that should be dynamically provisioned.
The text was updated successfully, but these errors were encountered: