The provisioner exits after 30 minutes of idle. #1099

jsafrane · 2023-11-09T13:36:52Z

What happened:

Automatic gRPC bump to 1.59.0 introduced a new gRPC behavior that closes idle connections after 30 minutes of inactivity. After 30 minutes of no provisioning / deletion, the connection to a CSI driver is silently closed. At the next provisioning / deletion, the provisioner realizes the connection is closed and exits with Lost connection to CSI driver, exiting. A new provisioner starts immediately, but it must wait for leader election to expire, which adds quite a long delay to volume provisioning (and our downstream e2e tests time out).

What you expected to happen:

The gRPC connection should not close because of inactivity.

How to reproduce it:
On a very quiet cluster (no provisioning/deletion), wait for 30 minutes after external-provisioner start and create a new PVC that should be dynamically provisioned.

The text was updated successfully, but these errors were encountered:

jsafrane · 2023-11-09T13:37:20Z

I filed kubernetes-csi/csi-lib-utils#153 to disable autoclose.

jsafrane · 2023-11-09T13:37:25Z

/assign

The idle timeout was disabled, but has been enabled by default in google.golang.org/grpc v1.59. The kubernetes-csi-addons operator acts similarly to the Kubernetes external-provisioner, and benefits from having a functional gRPC connection open to the csi-addons sidecars that run alongside CSI-drivers. See-also: kubernetes-csi/external-provisioner#1099 Signed-off-by: Niels de Vos <ndevos@ibm.com>

jsafrane · 2023-11-13T10:38:20Z

/reopen
we still need to vendor new csi-lib-utils here

k8s-ci-robot · 2023-11-13T10:38:23Z

@jsafrane: Reopened this issue.

In response to this:

/reopen
we still need to vendor new csi-lib-utils here

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sathieu · 2024-01-15T08:19:09Z

@jsafrane What is the status of this issue ?

sathieu · 2024-01-15T08:52:04Z

Maybe there is another issue around this: when the socket times out, the container fails without releasing the lease. Is this intended? Once restarted, the container doesn't recover the lease, we have to wait for the lease timeout (300s with vsphere-csi).

sathieu · 2024-01-15T13:32:19Z

I found that external-provisioner uses a random identity for the lease:

external-provisioner/cmd/csi-provisioner/csi-provisioner.go

Lines 283 to 286 in b377ea4

    
           identity := strconv.FormatInt(timeStamp, 10) + "-" + strconv.Itoa(rand.Intn(10000)) + "-" + provisionerName 
        
           if *enableNodeDeployment { 
        
           	identity = identity + "-" + node 
        
           }

external-provisioner/cmd/csi-provisioner/csi-provisioner.go

Line 674 in b377ea4

le.WithIdentity(identity)

This is not the case for external-attacher for example:

https://github.com/kubernetes-csi/external-attacher/blob/4e13fc2eabc320c779b574bf35bb79dd00feb2e2/cmd/csi-attacher/main.go#L281-L283

NB: default is hostname, i.e. pod name:

https://github.com/kubernetes-csi/csi-lib-utils/blob/f82f9de5b8aeb3c3b236d7f58fc5eeab34438078/leaderelection/leader_election.go#L198-L200

xing-yang · 2024-01-17T18:16:27Z

@sathieu Can you open a different issue? The original issue should have been fixed in the latest patch releases.

jsafrane · 2024-01-17T18:16:58Z

This was fixed in master branch by #1135, sorry I forgot to close it.
/close

k8s-ci-robot · 2024-01-17T18:17:02Z

@jsafrane: Closing this issue.

In response to this:

This was fixed in master branch by #1135, sorry I forgot to close it.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sathieu · 2024-01-18T09:42:54Z

@sathieu Can you open a different issue? The original issue should have been fixed in the latest patch releases.

Done #1147

k8s-ci-robot assigned jsafrane Nov 9, 2023

jsafrane mentioned this issue Nov 9, 2023

Bump google.golang.org/grpc to v1.59 kubernetes-csi/csi-lib-utils#153

Merged

k8s-ci-robot closed this as completed in kubernetes-csi/csi-lib-utils#153 Nov 9, 2023

nixpanic mentioned this issue Nov 9, 2023

Explicitly disable gRPC idle timeout csi-addons/kubernetes-csi-addons#482

Merged

k8s-ci-robot reopened this Nov 13, 2023

AndrewSirenko mentioned this issue Dec 18, 2023

Csi-attacher Looses Connection to Driver Unix Socket kubernetes-sigs/aws-ebs-csi-driver#1875

Closed

sathieu mentioned this issue Jan 15, 2024

Block volume sometimes takes 5 minutes kubernetes-sigs/vsphere-csi-driver#2710

Closed

Madhu-1 mentioned this issue Jan 15, 2024

Lost connection to unix:///csi/csi-provisioner.sock. rook/rook#13458

Closed

lbogdan mentioned this issue Jan 16, 2024

RBD provisioner stuck attempting to acquire leader lease for 30s rook/rook#13475

Closed

k8s-ci-robot closed this as completed Jan 17, 2024

sathieu mentioned this issue Jan 18, 2024

Improve lease handling to avoid waiting lease timeout when container fails or crashes #1147

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The provisioner exits after 30 minutes of idle. #1099

The provisioner exits after 30 minutes of idle. #1099

jsafrane commented Nov 9, 2023

jsafrane commented Nov 9, 2023

jsafrane commented Nov 9, 2023

jsafrane commented Nov 13, 2023

k8s-ci-robot commented Nov 13, 2023

sathieu commented Jan 15, 2024

sathieu commented Jan 15, 2024

sathieu commented Jan 15, 2024

xing-yang commented Jan 17, 2024

jsafrane commented Jan 17, 2024

k8s-ci-robot commented Jan 17, 2024

sathieu commented Jan 18, 2024

The provisioner exits after 30 minutes of idle. #1099

The provisioner exits after 30 minutes of idle. #1099

Comments

jsafrane commented Nov 9, 2023

jsafrane commented Nov 9, 2023

jsafrane commented Nov 9, 2023

jsafrane commented Nov 13, 2023

k8s-ci-robot commented Nov 13, 2023

sathieu commented Jan 15, 2024

sathieu commented Jan 15, 2024

sathieu commented Jan 15, 2024

xing-yang commented Jan 17, 2024

jsafrane commented Jan 17, 2024

k8s-ci-robot commented Jan 17, 2024

sathieu commented Jan 18, 2024