-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Persistent volume is not ready for workloads #6776
Comments
Can you provide a support bundle to our e-mail (not public) longhorn-support-bundle@suse.com? |
@derekbit We are not able to do it due to processes in our company. I wrote all details in the ticket. If you need to get something more please tell me I will try to help you :) |
Questions for clarification:
|
|
Check the instance-manager logs and see why the replicas cannot be added to the engine or any i/o timeout error. |
This issue is related to the my issue #6641 |
Logs from instance manager on |
I the replicas were removed when creating an engine. Can you check why longhorn-manager deleted them at this moment?
|
Longhorn manager logs from all (3) instances: all-longhorn-manager.log |
RWX volume From the log, it looks like a race condition between the attachments and the detachments, but what's the purpose of the intensive requests? |
Like I said before - we use longhorn as a storage for our CICD stack. We have many steps which do something with data. Some steps take several seconds, some several minutes. In the case which we investigate we have several steps which do not take a lot of time - it looks like a lot of attachments/detachments are valid here. The question is it valid for Longhorn? Should it support such cases? We use longhorn ~2 years in this way and we didn't observer such problems in the past. |
Got it. Longhorn introduces a new Attachment/Detachment mechanism since v1.5.0. Not sure if it is related and still under investigation. Ref: #3715 cc @PhanLe1010 |
@ajoskowski Sorry, I got confused when reading the log file. Does the |
All logs from all instances of longhorn-managers - it means set of logs from 3 instances. |
Yeah, some messages are mixed together. Can you help provide separate files? Thank you. |
Does the CI/CD process involve restarting nodes or pods? The share manager recovery backend logs that it is removing NFS client entries on multiple occasions. |
@ajoskowski Could we have:
|
In an effort to reproduce, do you think the following steps similar to your CI pipeline @ajoskowski ?
|
Each step in pipeline is a separated pod with shared longhorn volume. It means that if you have 10 steps then you think about them like 10 pods with mounted shared longhorn volume.
Yeah, it's similar. On our side we create new pod definitions (new steps) instead of scaling of deployment but logic is the same - creating and removing pods with mounting/unmounting longhorn volume. |
Thanks @ajoskowski ! We will try to reproduce the issue in lab. |
As we discussed last time, this part is problematic, which may lead to unexpected and unnecessary detachment after a temporary node being unavailable (kubelet/network down). In fact, keeping volume.Spec.NodeID the same as ShareManager.Status.OwnerID is unnecessary. The share-manager-controller workflow can be like the following:
|
Using a similar script, I was unable to duplicate a replica being deleted. After detach, they were stopped, but that's all. I'm going to focus on trimming the workflow as described above. |
Hello @james-munson What's your environment for the reproduce? |
It's a 4-node (1 control-plane, 3 worker) cluster running Ubuntu 20.04 on Kubernetes v1.25.12+rke2r1, running 1.6-dev (current master-head). |
I also tried a repro with @phan's idea of using a deployment with an RWX volume, and scaling it up and down quickly. containers:
- image: ubuntu:xenial
imagePullPolicy: IfNotPresent
command: [ "/bin/sh", "-c" ]
args:
- sleep 10; touch /data/index.html; while true; do echo "`hostname` `date`" >> /data/index.html; sleep 1; done; to include the hostname in the periodic writes to the shared volume. Even with scaling up to 3 and down to 0 at 10-second intervals (far faster than the attach and detach can be accomplished) no containers crashed, and no replicas were broken. Kubernetes and Longhorn are untroubled by the fact that creating and terminating resources overlap. |
Did the same with a script that left the PV intact, but deleted and recreated the service & deployment at intervals, rather than just scaling up and down. Still behaved itself. |
Pre Ready-For-Testing Checklist
|
I think this is a good candidate for backport to 1.4 and 1.5. @innobead do you agree? |
Verified on master-head 20231213
The test steps Test Method 1 : #6776 (comment)
deployment_rwx.yamlapiVersion: v1
kind: Service
metadata:
name: deployment-rwx
labels:
app: deployment-rwx
spec:
ports:
- port: 3306
selector:
app: deployment-rwx
clusterIP: None
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: deployment-rwx-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: longhorn
resources:
requests:
storage: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-rwx
labels:
app: deployment-rwx
spec:
selector:
matchLabels:
app: deployment-rwx # has to match .spec.template.metadata.labels
strategy:
type: Recreate
template:
metadata:
labels:
app: deployment-rwx
spec:
restartPolicy: Always
# nodeSelector:
# kubernetes.io/hostname: ryao-13x-w2-60671ae2-qpfpr # worker node
containers:
- image: ubuntu
name: deployment-rwx
command: ["/bin/sleep", "3650d"]
volumeMounts:
- name: deployment-rwx-volume
mountPath: "/data/"
env:
- name: MYSQL_ROOT_PASSWORD
value: "rancher"
volumes:
- name: deployment-rwx-volume
persistentVolumeClaim:
claimName: deployment-rwx-pvc
deployment_rwx_test.sh#!/bin/bash
# This script for the Github issue #6776.
# It assumes that deployment_rwx.yaml has already been applied.
# Define the deployment name
DEPLOYMENT_NAME="deployment-rwx"
VOLUME_NAME=$1
if [[ $# -lt 1 ]]; then
echo "Please provide the volume and kubeconfig path arguments."
echo "Usage: ./test.sh <volume name> [<kubeconfig path>]"
echo "Examples:"
echo " ./test.sh pvc-6a507027-3101-408d-86ad-bc7e18faa061"
echo " ./test.sh pvc-6a507027-3101-408d-86ad-bc7e18faa061 kubeconfig.yaml"
exit 1
fi
if [[ $# -ne 2 ]]; then
echo "KUBECONFIG=~/.kube/config"
KUBECONFIG=~/.kube/config # Set the default kubeconfig path
else
echo "KUBECONFIG=$2"
KUBECONFIG=$2 # Use the provided kubeconfig path
fi
ATTACH_WAIT_SECONDS=120
DETACH_WAIT_SECONDS=300
for ((i=0; i<200; i++)); do
# Scale deployment to 10 replicas
echo "Scale deployment up to 10 replicas."
kubectl --kubeconfig=$KUBECONFIG scale deployment $DEPLOYMENT_NAME --replicas=10
# Wait for the deployment to have 10 ready replicas
until [[ "$(kubectl --kubeconfig=$KUBECONFIG get deployment $DEPLOYMENT_NAME -o=jsonpath='{.status.readyReplicas}')" == "10" ]]; do
ready_replicas=$(kubectl --kubeconfig=$KUBECONFIG get deployment $DEPLOYMENT_NAME -o=jsonpath='{.status.readyReplicas}')
echo "Iteration #$i: $DEPLOYMENT_NAME has $ready_replicas ready replicas"
sleep 1
done
# Check if all pods are in the "Running" state within a time limit
c=0
while [ $c -lt $ATTACH_WAIT_SECONDS ]
do
phase=`kubectl --kubeconfig=$KUBECONFIG get pods -l=app=$DEPLOYMENT_NAME -o=jsonpath="{.items[*].status.phase}" 2>/dev/null`
if [ x"$phase" == x"Running Running Running Running Running Running Running Running Running Running" ]; then
echo "All pods are are ready."
break
fi
sleep 1
if [ x"$c" = x"DETACH_WAIT_SECONDS" ]; then
echo "Timeout: Not all pods are in the 'Running' state. Elapsed time: $ATTACH_WAIT_SECONDS seconds."
exit 1
fi
done
# Scale deployment down to 0 replicas
echo "Scale deployment down to 0 replicas."
kubectl --kubeconfig=$KUBECONFIG scale deployment $DEPLOYMENT_NAME --replicas=0
# Wait for the deployment to have 0 ready replicas
while [ $c -lt $DETACH_WAIT_SECONDS ]
do
phase=`kubectl --kubeconfig=$KUBECONFIG -n longhorn-system get volumes $VOLUME_NAME -o=jsonpath="{['status.state']}" 2>/dev/null`
if [ x"$phase" == x"detached" ]; then
echo "Successfully detached"
break
fi
sleep 1
if [ x"$c" = x"DETACH_WAIT_SECONDS" ]; then
echo "Failed to detach"
exit 1
fi
done
# Wait until all pods are terminated
while [[ $(kubectl --kubeconfig=$KUBECONFIG get pods -l=app=$DEPLOYMENT_NAME -o=jsonpath='{.items[*].status.phase}') != "" ]]; do
pod_status=$(kubectl --kubeconfig=$KUBECONFIG get pods -l=app=$DEPLOYMENT_NAME -o=jsonpath='{.items[*].status.phase}')
echo "Waiting for pods of $DEPLOYMENT_NAME to be terminated. Current pod status: $pod_status"
sleep 5
done
echo "All pods of $DEPLOYMENT_NAME are terminated."
done
Result Passed
|
@roger-ryao, that looks good. I would be up for building a 1.5.1-based private build (this fix is also being backported to 1.5.4) if @ajoskowski would be up for pre-testing it before 1.5.4 releases. |
Since we haven't received a response from the user, let's close this issue for now. If the user reports the issue again, we can reopen it. |
Thanks guys, we will verify this fix in |
Hello @ajoskowski , |
I am seeing this same behavior with blockmode RWX PVCs in Longhorn v1.6.2 in Harvester 1.3.1. When we attempt to export a volume for backup using Kasten K10, we consistently see FailedAttachVolume errors: AttachVolume.Attach failed for volume "pvc-51e0acd0-4152-4714-8b6d-ec4e40326c5a" : rpc error: code = Aborted desc = volume pvc-51e0acd0-4152-4714-8b6d-ec4e40326c5a is not ready for workloads |
Harvester is using RWX migratable volume which is different from the traditional RWX. Please create an issue with the reproduce steps and provide the support buddle for the team to check it further. Thanks. |
Describe the bug (🐛 if you encounter this issue)
Sometimes we encounter on issues when we are not able to mount a longhorn volume to the pod.
Pod is not able to start and following errors are visible:
To Reproduce
Problem cannot be easily reproduced - it fails randomly.
Expected behavior
Volumes work fine and are able to be mounted to the pods.
Support bundle for troubleshooting
We must not send support bundle due to security reason but we can provide logs and details - see below:
ip-X-X-X-57..compute.internal
ip-X-X-X-142.compute.internal
ip-X-X-X-140.compute.internal
ci-state-pr-2463-env-doaks-prod6-uaenorth-v2wnw-override-refs-in-tf-modules-407269516
instance-manager
onip-X-X-X-142.compute.internal
node:longhorn-csi-plugin
onip-X-X-X-142.compute.internal
node:csi-attacher
onip-X-X-X-142.compute.internal
node:longhorn-manager
onip-X-X-X-142.compute.internal
node:Environment
v1.5.1
helm
AWS EKS
, versionv1.26.6
m5.4xlarge
pvc-506e824d-414f-43ce-af59-5821b2b9accf
(only example)Additional context
Cluster autoscaler is enabled on the cluster -
Kubernetes Cluster Autoscaler Enabled (Experimental)
is enabled in Longhorn configurationThe text was updated successfully, but these errors were encountered: