Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The provisioner exits after 30 minutes of idle. #1099

Closed
jsafrane opened this issue Nov 9, 2023 · 11 comments · Fixed by kubernetes-csi/csi-lib-utils#153
Closed

The provisioner exits after 30 minutes of idle. #1099

jsafrane opened this issue Nov 9, 2023 · 11 comments · Fixed by kubernetes-csi/csi-lib-utils#153
Assignees

Comments

@jsafrane
Copy link
Contributor

jsafrane commented Nov 9, 2023

What happened:

Automatic gRPC bump to 1.59.0 introduced a new gRPC behavior that closes idle connections after 30 minutes of inactivity. After 30 minutes of no provisioning / deletion, the connection to a CSI driver is silently closed. At the next provisioning / deletion, the provisioner realizes the connection is closed and exits with Lost connection to CSI driver, exiting. A new provisioner starts immediately, but it must wait for leader election to expire, which adds quite a long delay to volume provisioning (and our downstream e2e tests time out).

What you expected to happen:

The gRPC connection should not close because of inactivity.

How to reproduce it:
On a very quiet cluster (no provisioning/deletion), wait for 30 minutes after external-provisioner start and create a new PVC that should be dynamically provisioned.

@jsafrane
Copy link
Contributor Author

jsafrane commented Nov 9, 2023

I filed kubernetes-csi/csi-lib-utils#153 to disable autoclose.

@jsafrane
Copy link
Contributor Author

jsafrane commented Nov 9, 2023

/assign

nixpanic added a commit to nixpanic/kubernetes-csi-addons that referenced this issue Nov 9, 2023
The idle timeout was disabled, but has been enabled by default in
google.golang.org/grpc v1.59. The kubernetes-csi-addons operator acts
similarly to the Kubernetes external-provisioner, and benefits from
having a functional gRPC connection open to the csi-addons sidecars that
run alongside CSI-drivers.

See-also: kubernetes-csi/external-provisioner#1099
Signed-off-by: Niels de Vos <ndevos@ibm.com>
mergify bot pushed a commit to csi-addons/kubernetes-csi-addons that referenced this issue Nov 9, 2023
The idle timeout was disabled, but has been enabled by default in
google.golang.org/grpc v1.59. The kubernetes-csi-addons operator acts
similarly to the Kubernetes external-provisioner, and benefits from
having a functional gRPC connection open to the csi-addons sidecars that
run alongside CSI-drivers.

See-also: kubernetes-csi/external-provisioner#1099
Signed-off-by: Niels de Vos <ndevos@ibm.com>
@jsafrane
Copy link
Contributor Author

/reopen
we still need to vendor new csi-lib-utils here

@k8s-ci-robot k8s-ci-robot reopened this Nov 13, 2023
@k8s-ci-robot
Copy link
Contributor

@jsafrane: Reopened this issue.

In response to this:

/reopen
we still need to vendor new csi-lib-utils here

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sathieu
Copy link

sathieu commented Jan 15, 2024

@jsafrane What is the status of this issue ?

@sathieu
Copy link

sathieu commented Jan 15, 2024

Maybe there is another issue around this: when the socket times out, the container fails without releasing the lease. Is this intended? Once restarted, the container doesn't recover the lease, we have to wait for the lease timeout (300s with vsphere-csi).

@sathieu
Copy link

sathieu commented Jan 15, 2024

I found that external-provisioner uses a random identity for the lease:

identity := strconv.FormatInt(timeStamp, 10) + "-" + strconv.Itoa(rand.Intn(10000)) + "-" + provisionerName
if *enableNodeDeployment {
identity = identity + "-" + node
}

This is not the case for external-attacher for example:

https://github.com/kubernetes-csi/external-attacher/blob/4e13fc2eabc320c779b574bf35bb79dd00feb2e2/cmd/csi-attacher/main.go#L281-L283

NB: default is hostname, i.e. pod name:

https://github.com/kubernetes-csi/csi-lib-utils/blob/f82f9de5b8aeb3c3b236d7f58fc5eeab34438078/leaderelection/leader_election.go#L198-L200

@xing-yang
Copy link
Contributor

@sathieu Can you open a different issue? The original issue should have been fixed in the latest patch releases.

@jsafrane
Copy link
Contributor Author

This was fixed in master branch by #1135, sorry I forgot to close it.
/close

@k8s-ci-robot
Copy link
Contributor

@jsafrane: Closing this issue.

In response to this:

This was fixed in master branch by #1135, sorry I forgot to close it.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sathieu
Copy link

sathieu commented Jan 18, 2024

@sathieu Can you open a different issue? The original issue should have been fixed in the latest patch releases.

Done #1147

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants