Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Persistent volume is not ready for workloads #6776

Closed
ajoskowski opened this issue Sep 25, 2023 · 40 comments
Closed

[BUG] Persistent volume is not ready for workloads #6776

ajoskowski opened this issue Sep 25, 2023 · 40 comments
Assignees
Labels
area/stability System or volume stability area/volume-attach-detach Volume attach & detach related area/volume-rwx Volume RWX related backport/1.5.4 kind/bug priority/0 Must be implement or fixed in this release (managed by PO) require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage
Milestone

Comments

@ajoskowski
Copy link

ajoskowski commented Sep 25, 2023

Describe the bug (🐛 if you encounter this issue)

Sometimes we encounter on issues when we are not able to mount a longhorn volume to the pod.
Pod is not able to start and following errors are visible:

  • Kubernetes events for failing pod:
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: chdir to cwd ("/src") set in config.json failed: stale NFS file handle: unknown
AttachVolume.Attach failed for volume "pvc-506e824d-414f-43ce-af59-5821b2b9accf" : rpc error: code = Aborted desc = volume pvc-506e824d-414f-43ce-af59-5821b2b9accf is not ready for workloads

To Reproduce

Problem cannot be easily reproduced - it fails randomly.

Expected behavior

Volumes work fine and are able to be mounted to the pods.

Support bundle for troubleshooting

We must not send support bundle due to security reason but we can provide logs and details - see below:

  • Kubernetes (and Longhorn) nodes:
    • ip-X-X-X-57..compute.internal
    • ip-X-X-X-142.compute.internal
    • ip-X-X-X-140.compute.internal
  • Pod name: ci-state-pr-2463-env-doaks-prod6-uaenorth-v2wnw-override-refs-in-tf-modules-407269516
  • Pod events:
AttachVolume.Attach failed for volume "pvc-506e824d-414f-43ce-af59-5821b2b9accf" : rpc error: code = Aborted desc = volume pvc-506e824d-414f-43ce-af59-5821b2b9accf is not ready for workloads
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: chdir to cwd ("/src") set in config.json failed: stale NFS file handle: unknown
  • instance-manager on ip-X-X-X-142.compute.internal node:
time="2023-09-18T03:08:01Z" level=error msg="I/O error" error="no backend available"
[pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a] time="2023-09-18T03:08:01Z" level=error msg="I/O error" error="no backend available"
response_process: Receive error for response 3 of seq 310
tgtd: bs_longhorn_request(111) fail to read at 0 for 4096
tgtd: bs_longhorn_request(210) io error 0xc27700 28 -14 4096 0, Success
[pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a] time="2023-09-18T03:08:01Z" level=error msg="I/O error" error="no backend available"
response_process: Receive error for response 3 of seq 311
tgtd: bs_longhorn_request(111) fail to read at 0 for 4096
tgtd: bs_longhorn_request(210) io error 0xc27700 28 -14 4096 0, Success
response_process: Receive error for response 3 of seq 312
tgtd: bs_longhorn_request(111) fail to read at 0 for 4096
tgtd: bs_longhorn_request(210) io error 0xc27700 28 -14 4096 0, Success
response_process: Receive error for response 3 of seq 313
tgtd: bs_longhorn_request(97) fail to write at 10737352704 for 65536
tgtd: bs_longhorn_request(210) io error 0xc27700 2a -14 65536 10737352704, Success
[pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a] time="2023-09-18T03:08:01Z" level=error msg="I/O error" error="no backend available"
time="2023-09-18T03:08:01Z" level=error msg="I/O error" error="no backend available"
response_process: Receive error for response 3 of seq 314
tgtd: bs_longhorn_request(97) fail to write at 4337664 for 4096
tgtd: bs_longhorn_request(210) io error 0xc27700 2a -14 4096 4337664, Success
response_process: Receive error for response 3 of seq 315
tgtd: bs_longhorn_request(97) fail to write at 37912576 for 4096
tgtd: bs_longhorn_request(210) io error 0xc27700 2a -14 4096 37912576, Success
[pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a] time="2023-09-18T03:08:01Z" level=error msg="I/O error" error="no backend available"
time="2023-09-18T03:08:20Z" level=error msg="Error syncing Longhorn engine" controller=longhorn-engine engine=longhorn-system/pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a error="failed to sync engine for longhorn-system/pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a: failed to start rebuild for pvc-506e824d-414f-43ce-af59-5821b2b9accf-r-6093cefb of pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a: timed out waiting for the condition" node=ip-10-44-45-142.eu-central-1.compute.internal
  • longhorn-csi-plugin on ip-X-X-X-142.compute.internal node:
time="2023-09-18T03:07:34Z" level=error msg="ControllerPublishVolume: err: rpc error: code = Aborted desc = volume pvc-506e824d-414f-43ce-af59-5821b2b9accf is not ready for workloads"
  • csi-attacher on ip-X-X-X-142.compute.internal node:
I0918 03:07:34.632251       1 csi_handler.go:234] Error processing "csi-635290b8ff08b07c1e7e1bdf2434aec2d8e8ef39dd611f725f8f3da595713bf5": failed to attach: rpc error: code = Aborted desc = volume pvc-506e824d-414f-43ce-af59-5821b2b9accf is not ready for workloads
  • longhorn-manager on ip-X-X-X-142.compute.internal node:
time="2023-09-18T03:08:00Z" level=error msg="Failed to rebuild replica X.X.X.245:10205" controller=longhorn-engine engine=pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a error="proxyServer=X.X.X.201:8501 destination=X.X.X.201:10079: failed to add replica tcp://X.X.X.245:10205 for volume: rpc error: code = Unknown desc = failed to create replica tcp://X.X.X.245:10205 for volume X.X.X.201:10079: rpc error: code = Unknown desc = cannot get valid result for remain snapshot" node=ip-X-X-X-142.eu-central-1.compute.internal volume=pvc-506e824d-414f-43ce-af59-5821b2b9accf
time="2023-09-18T03:08:00Z" level=error msg="Failed to sync Longhorn volume longhorn-system/pvc-506e824d-414f-43ce-af59-5821b2b9accf" controller=longhorn-volume error="failed to sync longhorn-system/pvc-506e824d-414f-43ce-af59-5821b2b9accf: failed to reconcile volume state for pvc-506e824d-414f-43ce-af59-5821b2b9accf: no healthy or scheduled replica for starting" node=ip-X-X-X-142.eu-central-1.compute.internal

Environment

  • Longhorn version: v1.5.1
  • Installation method: helm
  • Kubernetes distro and version: AWS EKS, version v1.26.6
    • Number of worker node in the cluster: 3
    • Machine type: m5.4xlarge
  • Number of Longhorn volumes in the cluster: tens of volumes created dynamically as temporary storage for CICD builds (Longhorn + Argo Workflows)
  • Impacted Longhorn resources:
    • Volume names: pvc-506e824d-414f-43ce-af59-5821b2b9accf (only example)

Additional context

Cluster autoscaler is enabled on the cluster - Kubernetes Cluster Autoscaler Enabled (Experimental) is enabled in Longhorn configuration

@ajoskowski ajoskowski added kind/bug require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage labels Sep 25, 2023
@derekbit
Copy link
Member

derekbit commented Sep 25, 2023

Can you provide a support bundle to our e-mail (not public) longhorn-support-bundle@suse.com?

@ajoskowski
Copy link
Author

@derekbit We are not able to do it due to processes in our company. I wrote all details in the ticket. If you need to get something more please tell me I will try to help you :)

@derekbit
Copy link
Member

Questions for clarification:

  1. ...config.json failed: stale NFS file handle: unknown.. => What's the purpose of the NFS filesystem in your system?
  2. What's the accessMode of the problematic volume?
  3. How many replicas of the problematic volume?
  4. ..level=error msg="I/O error" error="no backend available"... => All replicas are failed. Is it because of i/o timeout? Can you help check why the replicas failed?

@ajoskowski
Copy link
Author

Questions for clarification:

  1. ...config.json failed: stale NFS file handle: unknown.. => What's the purpose of the NFS filesystem in your system?
  2. What's the accessMode of the problematic volume?
  3. How many replicas of the problematic volume?
  4. ..level=error msg="I/O error" error="no backend available"... => All replicas are failed. Is it because of i/o timeout? Can you help check why the replicas failed?
  1. It a error message from Argo Workflows step - we do not have anything relaated to NFS except Longhorn.
  2. Access mode is RWX. - we have several steps in our pipeline. First step prepares data (example: git clone), next steps can be run parallelly and use already prepared data - this is a reason why we use RWX.
  3. We have 3 worker nodes and we have 3 replicas of Longhorn volumes. We have also cluster autoscaler enabled on the cluster but in this specific case we had only initially created 3 nodes.
  4. Where can I get some details about a reason of this state?

@derekbit
Copy link
Member

Where can I get some details about a reason of this state?

Check the instance-manager logs and see why the replicas cannot be added to the engine or any i/o timeout error.

@karolkieglerski
Copy link

This issue is related to the my issue #6641

@ajoskowski
Copy link
Author

Where can I get some details about a reason of this state?

Check the instance-manager logs and see why the replicas cannot be added to the engine or any i/o timeout error.

Logs from instance manager on ip-X-X-X-142.compute.internal node -
ip-X-X-X-142.compute.internal-instance-manager.log

@derekbit
Copy link
Member

I the replicas were removed when creating an engine. Can you check why longhorn-manager deleted them at this moment?

[longhorn-instance-manager] time="2023-09-18T03:07:50Z" level=info msg="Removing replica" engineName=pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a replicaAddress="tcp://X.X.X.100:10215" replicaName= serviceURL="X.X.X.201:10079"
[longhorn-instance-manager] time="2023-09-18T03:07:50Z" level=info msg="Removing replica" engineName=pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a replicaAddress="tcp://X.X.X.245:10195" replicaName= serviceURL="X.X.X.201:10079"
[pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a] time="2023-09-18T03:07:50Z" level=info msg="Removing backend: tcp://X.X.X.245:10195"
time="2023-09-18T03:07:50Z" level=info msg="Monitoring stopped tcp://X.X.X.245:10195"
[pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a] time="2023-09-18T03:07:50Z" level=info msg="Removing backend: tcp://X.X.X.100:10215"

@ajoskowski
Copy link
Author

I the replicas were removed when creating an engine. Can you check why longhorn-manager deleted them at this moment?

[longhorn-instance-manager] time="2023-09-18T03:07:50Z" level=info msg="Removing replica" engineName=pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a replicaAddress="tcp://X.X.X.100:10215" replicaName= serviceURL="X.X.X.201:10079"
[longhorn-instance-manager] time="2023-09-18T03:07:50Z" level=info msg="Removing replica" engineName=pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a replicaAddress="tcp://X.X.X.245:10195" replicaName= serviceURL="X.X.X.201:10079"
[pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a] time="2023-09-18T03:07:50Z" level=info msg="Removing backend: tcp://X.X.X.245:10195"
time="2023-09-18T03:07:50Z" level=info msg="Monitoring stopped tcp://X.X.X.245:10195"
[pvc-506e824d-414f-43ce-af59-5821b2b9accf-e-0df2341a] time="2023-09-18T03:07:50Z" level=info msg="Removing backend: tcp://X.X.X.100:10215"

Longhorn manager logs from all (3) instances: all-longhorn-manager.log

@derekbit
Copy link
Member

RWX volume pvc-506e824d-414f-43ce-af59-5821b2b9accf received many requests from API in a short period
...
-Requested to be detached at 18/Sep/2023:03:06:01 +0000
-Requested to be attached at 18/Sep/2023:03:06:38 +0000
-Requested to be detached at 18/Sep/2023:03:06:50 +0000
-Requested to be detached at 18/Sep/2023:03:07:03 +0000
...

From the log, it looks like a race condition between the attachments and the detachments, but what's the purpose of the intensive requests?

@ajoskowski
Copy link
Author

Like I said before - we use longhorn as a storage for our CICD stack. We have many steps which do something with data. Some steps take several seconds, some several minutes. In the case which we investigate we have several steps which do not take a lot of time - it looks like a lot of attachments/detachments are valid here. The question is it valid for Longhorn? Should it support such cases? We use longhorn ~2 years in this way and we didn't observer such problems in the past.

@derekbit
Copy link
Member

The question is it valid for Longhorn? Should it support such cases? We use longhorn ~2 years in this way and we didn't observer such problems in the past.

Got it. Longhorn introduces a new Attachment/Detachment mechanism since v1.5.0. Not sure if it is related and still under investigation. Ref: #3715

cc @PhanLe1010

@derekbit
Copy link
Member

cc @james-munson

@innobead innobead added the investigation-needed Identified the issue but require further investigation for resolution (won't be stale) label Sep 25, 2023
@derekbit
Copy link
Member

Longhorn manager logs from all (3) instances: all-longhorn-manager.log

@ajoskowski Sorry, I got confused when reading the log file. Does the all-longhorn-manager.log include all longhorn-manager pods' logs or only one longhorn-manager pod?

@ajoskowski
Copy link
Author

Longhorn manager logs from all (3) instances: all-longhorn-manager.log

@ajoskowski Sorry, I got confused when reading the log file. Does the all-longhorn-manager.log include all longhorn-manager pods' logs or only one longhorn-manager pod?

All logs from all instances of longhorn-managers - it means set of logs from 3 instances.
Do you need to have it in separated files?

@derekbit
Copy link
Member

derekbit commented Sep 26, 2023

Longhorn manager logs from all (3) instances: all-longhorn-manager.log

@ajoskowski Sorry, I got confused when reading the log file. Does the all-longhorn-manager.log include all longhorn-manager pods' logs or only one longhorn-manager pod?

All logs from all instances of longhorn-managers - it means set of logs from 3 instances. Do you need to have it in separated files?

Yeah, some messages are mixed together. Can you help provide separate files? Thank you.

@c3y1huang c3y1huang moved this from New to In progress in Community Review Sprint Sep 26, 2023
@james-munson
Copy link
Contributor

Does the CI/CD process involve restarting nodes or pods? The share manager recovery backend logs that it is removing NFS client entries on multiple occasions.

@PhanLe1010
Copy link
Contributor

PhanLe1010 commented Sep 27, 2023

@ajoskowski Could we have:

  • the yaml output of kubectl get volumes.longhorn.io,volumeattachments.longhorn.io,engines.longhorn.io,replicas.longhorn.io -n longhorn-system -oyaml
  • logs from longhorn-csi-plugin-xxx pods in longhorn-system namespace
  • Is the workload pod ci-state-pr-2463-env-doaks-prod6-uaenorth-v2wnw-override-refs-in-tf-modules-407269516 stuck right now?

@PhanLe1010
Copy link
Contributor

PhanLe1010 commented Sep 27, 2023

In an effort to reproduce, do you think the following steps similar to your CI pipeline @ajoskowski ?

  1. Create a RWX PVC
  2. Create a deployment using the PVC
  3. Repeatedly quickly scale up and down the deplooyment from 0 to 5 and back to 0
  4. Verify if any pod stuck to come up

@ajoskowski
Copy link
Author

Does the CI/CD process involve restarting nodes or pods? The share manager recovery backend logs that it is removing NFS client entries on multiple occasions.

Each step in pipeline is a separated pod with shared longhorn volume. It means that if you have 10 steps then you think about them like 10 pods with mounted shared longhorn volume.

@ajoskowski Could we have:

  • the yaml output of kubectl get volumes.longhorn.io,volumeattachments.longhorn.io,engines.longhorn.io,replicas.longhorn.io -n longhorn-system -oyaml
  • logs from longhorn-csi-plugin-xxx pods in longhorn-system namespace
  • Is the workload pod ci-state-pr-2463-env-doaks-prod6-uaenorth-v2wnw-override-refs-in-tf-modules-407269516 stuck right now?
  • Regarding kubectl get volumes.longhorn.io,volumeattachments.longhorn.io,engines.longhorn.io,replicas.longhorn.io -n longhorn-system -oyaml - no problem, but I am able to do it for current cluster shape (problematic volume was already removed) - volumes_attachments_engines_replicas.log
  • logs from longhorn-csi-plugin-xxx pods in longhorn-system namespace - I do not see any logs which have information about pvc-506e824d-414f-43ce-af59-5821b2b9accf
  • Pod ci-state-pr-2463-env-doaks-prod6-uaenorth-v2wnw-override-refs-in-tf-modules-407269516 was not able to start due to problem which I've described in the bug description and it was removed

In an effort to reproduce, do you think the following steps similar to your CI pipeline @ajoskowski ?

  1. Create a RWX PVC
  2. Create a deployment using the PVC
  3. Repeatedly quickly scale up and down the deplooyment from 0 to 5 and back to 0
  4. Verify if any pod stuck to come up

Yeah, it's similar. On our side we create new pod definitions (new steps) instead of scaling of deployment but logic is the same - creating and removing pods with mounting/unmounting longhorn volume.

@PhanLe1010
Copy link
Contributor

PhanLe1010 commented Sep 28, 2023

Thanks @ajoskowski !
The provided yaml https://github.com/longhorn/longhorn/files/12734412/volumes_attachments_engines_replicas.log doesn't have anything abnormal as it is taken when the problem doesn't exist.

We will try to reproduce the issue in lab.

@innobead innobead added the priority/0 Must be implement or fixed in this release (managed by PO) label Oct 3, 2023
@innobead innobead added area/volume-rwx Volume RWX related and removed investigation-needed Identified the issue but require further investigation for resolution (won't be stale) labels Oct 13, 2023
@shuo-wu
Copy link
Contributor

shuo-wu commented Oct 20, 2023

As we discussed last time, this part is problematic, which may lead to unexpected and unnecessary detachment after a temporary node being unavailable (kubelet/network down). In fact, keeping volume.Spec.NodeID the same as ShareManager.Status.OwnerID is unnecessary.

The share-manager-controller workflow can be like the following:

  1. Start the share manager pod scheduled by Kubernetes.
  2. Set the volume attachment ticket to the pod node
  3. Unset the volume attachment ticket when the volume or the share manager pod is error or stopping/stopped

@james-munson
Copy link
Contributor

Using a similar script, I was unable to duplicate a replica being deleted. After detach, they were stopped, but that's all.
The attach and detach sequences take from 30 to 50 seconds each to reach the expected state. I wonder whether that is too slow for the CI/CD apparatus.
If the line is removed, the attachment never happens, and the workload pod is stuck in ContainerCreating. The volume itself shows a status of "attached" but the volumeattachment resource show attached=false with an attachError of "failed to attach.... Waiting for volume share to be available."

I'm going to focus on trimming the workflow as described above.

@derekbit
Copy link
Member

derekbit commented Nov 3, 2023

Hello @james-munson What's your environment for the reproduce?

@james-munson
Copy link
Contributor

It's a 4-node (1 control-plane, 3 worker) cluster running Ubuntu 20.04 on Kubernetes v1.25.12+rke2r1, running 1.6-dev (current master-head).

@james-munson
Copy link
Contributor

I also tried a repro with @phan's idea of using a deployment with an RWX volume, and scaling it up and down quickly.
Specifically, used the rwx example from https://github.com/longhorn/longhorn/examples/rwx/rwx-nginx-deployment.yaml, although I modified the container slightly to do this

      containers:
        - image: ubuntu:xenial
          imagePullPolicy: IfNotPresent
          command: [ "/bin/sh", "-c" ]
          args:
            - sleep 10; touch /data/index.html; while true; do  echo "`hostname` `date`" >> /data/index.html; sleep 1; done;

to include the hostname in the periodic writes to the shared volume.

Even with scaling up to 3 and down to 0 at 10-second intervals (far faster than the attach and detach can be accomplished) no containers crashed, and no replicas were broken. Kubernetes and Longhorn are untroubled by the fact that creating and terminating resources overlap.
In fact, I revised the time after scale up to 60 seconds, and the time after scale down to 0, so new pods were created immediately, and that just had the effect of attaching and writing from the new pods while the old ones were still detaching. So for some interval, there were 6 pods writing to the volume without trouble.
I conclude from that test that this is likely not a representative repro of the situation in this issue.

@james-munson
Copy link
Contributor

Did the same with a script that left the PV intact, but deleted and recreated the service & deployment at intervals, rather than just scaling up and down. Still behaved itself.
I assume from the lack of activity from the filer that the symptom has been solved or worked around, perhaps by turning off the cluster autoscaler setting.

@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Nov 27, 2023

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    We have not been able to reproduce the exact customer situation (all replicas down, volume not ready for workloads) but see above in this issue for some scripts to repeatedly apply and delete a load. The sequence to attach and reattach has been simplified.

  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at: None known.

  • Does the PR include the explanation for the fix or the feature?

  • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
    The PR for the YAML change is at: n/a
    The PR for the chart change is at: n/a

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at: n/a

  • Which areas/issues this PR might have potential impacts on?
    Area: RWX volume attachment
    Issues

  • If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
    The LEP PR is at: n/a

  • If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
    The UI issue/PR is at: n/a

  • If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
    The documentation issue/PR is at: n/a

  • If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
    The automation skeleton PR is at: n/a
    The automation test case PR is at: n/a
    The issue of automation test case implementation is at (please create by the template)

  • If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
    The engine automation PR is at: n/a

  • If labeled: require/manual-test-plan Has the manual test plan been documented?
    The updated manual test plan is at: n/a

  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at

@james-munson
Copy link
Contributor

I think this is a good candidate for backport to 1.4 and 1.5. @innobead do you agree?

@roger-ryao
Copy link

roger-ryao commented Dec 14, 2023

Verified on master-head 20231213

The test steps

Test Method 1 : #6776 (comment)
Test Method 2 refer:#6776 (comment)

  1. Create the deployment using the provided YAML.
deployment_rwx.yaml
apiVersion: v1
kind: Service
metadata:
  name: deployment-rwx
  labels:
    app: deployment-rwx
spec:
  ports:
    - port: 3306
  selector:
    app: deployment-rwx
  clusterIP: None
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: deployment-rwx-pvc
spec:
  accessModes:
    - ReadWriteMany    
  storageClassName: longhorn
  resources:
    requests:
      storage: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-rwx
  labels:
    app: deployment-rwx
spec:
  selector:
    matchLabels:
      app: deployment-rwx # has to match .spec.template.metadata.labels
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: deployment-rwx
    spec:
      restartPolicy: Always
        #      nodeSelector:
        #        kubernetes.io/hostname: ryao-13x-w2-60671ae2-qpfpr  # worker node      
      containers:
      - image: ubuntu
        name: deployment-rwx
        command: ["/bin/sleep", "3650d"]
        volumeMounts:
        - name: deployment-rwx-volume
          mountPath: "/data/"
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: "rancher"
      volumes:
      - name: deployment-rwx-volume
        persistentVolumeClaim:
          claimName: deployment-rwx-pvc
  1. Scale up the replicas to 10.
  2. Check if all workloads are in the "Running" state.
  3. Scale down the replicas to 0.
  4. Wait for detachment.
  5. Verify that all pods are deleted.
  6. Verify that all pods are terminated.
    We can test steps 2-7 using the following shell script.
deployment_rwx_test.sh
#!/bin/bash
# This script for the Github issue #6776.
# It assumes that deployment_rwx.yaml has already been applied.

# Define the deployment name
DEPLOYMENT_NAME="deployment-rwx"
VOLUME_NAME=$1

if [[ $# -lt 1 ]]; then
    echo "Please provide the volume and kubeconfig path arguments."
    echo "Usage: ./test.sh <volume name> [<kubeconfig path>]"
    echo "Examples:"
    echo "  ./test.sh pvc-6a507027-3101-408d-86ad-bc7e18faa061"
    echo "  ./test.sh pvc-6a507027-3101-408d-86ad-bc7e18faa061 kubeconfig.yaml"
    exit 1
fi

if [[ $# -ne 2 ]]; then
    echo "KUBECONFIG=~/.kube/config"
    KUBECONFIG=~/.kube/config # Set the default kubeconfig path
else
    echo "KUBECONFIG=$2"
    KUBECONFIG=$2 # Use the provided kubeconfig path
fi

ATTACH_WAIT_SECONDS=120
DETACH_WAIT_SECONDS=300

for ((i=0; i<200; i++)); do
    # Scale deployment to 10 replicas
    echo "Scale deployment up to 10 replicas."
    kubectl --kubeconfig=$KUBECONFIG scale deployment $DEPLOYMENT_NAME --replicas=10

    # Wait for the deployment to have 10 ready replicas
    until [[ "$(kubectl --kubeconfig=$KUBECONFIG get deployment $DEPLOYMENT_NAME -o=jsonpath='{.status.readyReplicas}')" == "10" ]]; do
        ready_replicas=$(kubectl --kubeconfig=$KUBECONFIG get deployment $DEPLOYMENT_NAME -o=jsonpath='{.status.readyReplicas}')
        echo "Iteration #$i: $DEPLOYMENT_NAME has $ready_replicas ready replicas"
        sleep 1
    done

    # Check if all pods are in the "Running" state within a time limit
    c=0
    while [ $c -lt $ATTACH_WAIT_SECONDS ]
    do
        phase=`kubectl --kubeconfig=$KUBECONFIG get pods -l=app=$DEPLOYMENT_NAME -o=jsonpath="{.items[*].status.phase}" 2>/dev/null`
        if [ x"$phase" == x"Running Running Running Running Running Running Running Running Running Running" ]; then
            echo "All pods are are ready."
            break
        fi

        sleep 1

        if [ x"$c" = x"DETACH_WAIT_SECONDS" ]; then
            echo "Timeout: Not all pods are in the 'Running' state. Elapsed time: $ATTACH_WAIT_SECONDS seconds."
            exit 1
        fi
    done

    # Scale deployment down to 0 replicas
    echo "Scale deployment down to 0 replicas."
    kubectl --kubeconfig=$KUBECONFIG scale deployment $DEPLOYMENT_NAME --replicas=0

    # Wait for the deployment to have 0 ready replicas
    while [ $c -lt $DETACH_WAIT_SECONDS ]
    do
        phase=`kubectl --kubeconfig=$KUBECONFIG -n longhorn-system get volumes $VOLUME_NAME -o=jsonpath="{['status.state']}" 2>/dev/null`
        if [ x"$phase" == x"detached" ]; then
            echo "Successfully detached"
            break
        fi

        sleep 1

        if [ x"$c" = x"DETACH_WAIT_SECONDS" ]; then
            echo "Failed to detach"
            exit 1
        fi
    done

    # Wait until all pods are terminated
    while [[ $(kubectl --kubeconfig=$KUBECONFIG get pods -l=app=$DEPLOYMENT_NAME -o=jsonpath='{.items[*].status.phase}') != "" ]]; do
        pod_status=$(kubectl --kubeconfig=$KUBECONFIG get pods -l=app=$DEPLOYMENT_NAME -o=jsonpath='{.items[*].status.phase}')
        echo "Waiting for pods of $DEPLOYMENT_NAME to be terminated. Current pod status: $pod_status"
        sleep 5
    done
    echo "All pods of $DEPLOYMENT_NAME are terminated."

done

Result Passed

  1. I did not observe the issue in Method 1 & Method 2.
  2. @james-munson , could you please review my test Method 2? If you have no concerns, I suggest building the private image for @ajoskowski. Perhaps users can assist in verifying its effectiveness and checking whether your commit is efficient or not.

@james-munson
Copy link
Contributor

@roger-ryao, that looks good. I would be up for building a 1.5.1-based private build (this fix is also being backported to 1.5.4) if @ajoskowski would be up for pre-testing it before 1.5.4 releases.

@roger-ryao
Copy link

Since we haven't received a response from the user, let's close this issue for now. If the user reports the issue again, we can reopen it.

@ajoskowski
Copy link
Author

Thanks guys, we will verify this fix in 1.5.4

@derekbit
Copy link
Member

Hello @ajoskowski ,
Has the fix in version 1.5.4+ resolved the issue? Looking forward to receiving your feedback.

@slotdawg
Copy link

I am seeing this same behavior with blockmode RWX PVCs in Longhorn v1.6.2 in Harvester 1.3.1. When we attempt to export a volume for backup using Kasten K10, we consistently see FailedAttachVolume errors:

AttachVolume.Attach failed for volume "pvc-51e0acd0-4152-4714-8b6d-ec4e40326c5a" : rpc error: code = Aborted desc = volume pvc-51e0acd0-4152-4714-8b6d-ec4e40326c5a is not ready for workloads

@innobead
Copy link
Member

I am seeing this same behavior with blockmode RWX PVCs in Longhorn v1.6.2 in Harvester 1.3.1. When we attempt to export a volume for backup using Kasten K10, we consistently see FailedAttachVolume errors:

AttachVolume.Attach failed for volume "pvc-51e0acd0-4152-4714-8b6d-ec4e40326c5a" : rpc error: code = Aborted desc = volume pvc-51e0acd0-4152-4714-8b6d-ec4e40326c5a is not ready for workloads

Harvester is using RWX migratable volume which is different from the traditional RWX. Please create an issue with the reproduce steps and provide the support buddle for the team to check it further. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/stability System or volume stability area/volume-attach-detach Volume attach & detach related area/volume-rwx Volume RWX related backport/1.5.4 kind/bug priority/0 Must be implement or fixed in this release (managed by PO) require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage
Projects
Status: Resolved
Status: Closed
Development

No branches or pull requests