[BUG] migration test cases could fail due to unexpected volume controllers and replicas status #6215
Description
Describe the bug (🐛 if you encounter this issue)
In Longhorn master
or v1.5.x
, the migration related test cases like test_migration_with_failed_replica
, test_migration_with_unscheduled_replica
, test_migration_with_failed_replica
, test_migration_with_restore_volume
and test_migration_with_rebuilding_replica
could randomly fail due to the volume controllers and replicas status are not expected:
def wait_for_volume_migration_node(client, volume_name, node_id):
ready = False
for i in range(RETRY_COUNTS):
v = client.by_id_volume(volume_name)
engines = v.controllers
replicas = v.replicas
if len(engines) == 1 and len(replicas) == v.numberOfReplicas:
e = engines[0]
if e.endpoint != "":
assert e.hostId == node_id
ready = True
break
time.sleep(RETRY_INTERVAL)
> assert ready
E AssertionError
The volume status is:
{
"accessMode": "rwx",
"actions": {
"[activate](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=activate"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=activate),
"[attach](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=attach"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=attach),
"[cancelExpansion](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=cancelExpansion"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=cancelExpansion),
"[detach](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=detach"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=detach),
"[engineUpgrade](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=engineUpgrade"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=engineUpgrade),
"[expand](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=expand"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=expand),
"[pvCreate](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=pvCreate"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=pvCreate),
"[pvcCreate](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=pvcCreate"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=pvcCreate),
"[recurringJobAdd](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=recurringJobAdd"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=recurringJobAdd),
"[recurringJobDelete](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=recurringJobDelete"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=recurringJobDelete),
"[recurringJobList](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=recurringJobList"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=recurringJobList),
"[replicaRemove](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=replicaRemove"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=replicaRemove),
"[snapshotBackup](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotBackup"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotBackup),
"[snapshotCRCreate](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRCreate"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRCreate),
"[snapshotCRDelete](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRDelete"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRDelete),
"[snapshotCRGet](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRGet"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRGet),
"[snapshotCRList](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRList"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCRList),
"[snapshotCreate](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCreate"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotCreate),
"[snapshotDelete](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotDelete"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotDelete),
"[snapshotGet](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotGet"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotGet),
"[snapshotList](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotList"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotList),
"[snapshotPurge](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotPurge"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotPurge),
"[snapshotRevert](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=snapshotRevert"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=snapshotRevert),
"[trimFilesystem](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=trimFilesystem"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=trimFilesystem),
"[updateBackupCompressionMethod](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateBackupCompressionMethod"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateBackupCompressionMethod),
"[updateDataLocality](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateDataLocality"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateDataLocality),
"[updateOfflineReplicaRebuilding](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateOfflineReplicaRebuilding"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateOfflineReplicaRebuilding),
"[updateReplicaAutoBalance](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaAutoBalance"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaAutoBalance),
"[updateReplicaCount](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaCount"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaCount),
"[updateReplicaSoftAntiAffinity](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaSoftAntiAffinity"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaSoftAntiAffinity),
"[updateReplicaZoneSoftAntiAffinity](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaZoneSoftAntiAffinity"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateReplicaZoneSoftAntiAffinity),
"[updateSnapshotDataIntegrity](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateSnapshotDataIntegrity"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateSnapshotDataIntegrity),
"[updateUnmapMarkSnapChainRemoved](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw#)": ["…/v1/volumes/longhorn-testvol-bl8klw?action=updateUnmapMarkSnapChainRemoved"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw?action=updateUnmapMarkSnapChainRemoved),
},
"backendStoreDriver": "v1",
"backingImage": "",
"backupCompressionMethod": "lz4",
"backupStatus": [ ],
"cloneStatus": {
"snapshot": "",
"sourceVolume": "",
"state": "",
},
"conditions": {
"restore": {
"lastProbeTime": "",
"lastTransitionTime": "2023-06-28T11:38:06Z",
"message": "",
"reason": "",
"status": "False",
"type": "restore",
},
"scheduled": {
"lastProbeTime": "",
"lastTransitionTime": "2023-06-28T11:38:06Z",
"message": "",
"reason": "",
"status": "True",
"type": "scheduled",
},
"toomanysnapshots": {
"lastProbeTime": "",
"lastTransitionTime": "2023-06-28T11:38:06Z",
"message": "",
"reason": "",
"status": "False",
"type": "toomanysnapshots",
},
},
"controllers": [ 2 items
{
"actualSize": "4096",
"address": "10.42.2.8",
"currentImage": "longhornio/longhorn-engine:master-head",
"endpoint": "/dev/longhorn/longhorn-testvol-bl8klw",
"engineImage": "longhornio/longhorn-engine:master-head",
"hostId": "ip-10-0-1-146",
"instanceManagerName": "instance-manager-6c67701f85b0ae508d92d085b8b2c3ad",
"isExpanding": false,
"lastExpansionError": "",
"lastExpansionFailedAt": "",
"lastRestoredBackup": "",
"name": "longhorn-testvol-bl8klw-e-413a828c",
"requestedBackupRestore": "",
"running": true,
"size": "16777216",
"unmapMarkSnapChainRemovedEnabled": false,
},
{
"actualSize": "4096",
"address": "10.42.1.9",
"currentImage": "longhornio/longhorn-engine:master-head",
"endpoint": "/dev/longhorn/longhorn-testvol-bl8klw",
"engineImage": "longhornio/longhorn-engine:master-head",
"hostId": "ip-10-0-1-21",
"instanceManagerName": "instance-manager-408a4a130067c1351be9778cfa8b9ff7",
"isExpanding": false,
"lastExpansionError": "",
"lastExpansionFailedAt": "",
"lastRestoredBackup": "",
"name": "longhorn-testvol-bl8klw-e-d3c442ff",
"requestedBackupRestore": "",
"running": true,
"size": "16777216",
"unmapMarkSnapChainRemovedEnabled": false,
},
],
"created": "2023-06-28 11:38:05 +0000 UTC",
"currentImage": "longhornio/longhorn-engine:master-head",
"dataLocality": "disabled",
"dataSource": "",
"disableFrontend": false,
"diskSelector": [ ],
"encrypted": false,
"engineImage": "longhornio/longhorn-engine:master-head",
"fromBackup": "",
"frontend": "blockdev",
"id": ["longhorn-testvol-bl8klw"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw),
"kubernetesStatus": {
"lastPVCRefAt": "",
"lastPodRefAt": "",
"namespace": "",
"pvName": "",
"pvStatus": "",
"pvcName": "",
"workloadsStatus": null,
},
"lastAttachedBy": "",
"lastBackup": "",
"lastBackupAt": "",
"links": {
"self": ["…/v1/volumes/longhorn-testvol-bl8klw"](http://34.228.2.191:30007/v1/volumes/longhorn-testvol-bl8klw),
},
"migratable": true,
"name": "longhorn-testvol-bl8klw",
"nodeSelector": [ ],
"numberOfReplicas": 3,
"offlineReplicaRebuilding": "disabled",
"offlineReplicaRebuildingRequired": false,
"purgeStatus": [ 4 items
{
"actions": null,
"error": "",
"isPurging": false,
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-6801502c",
"state": "",
},
{
"actions": null,
"error": "",
"isPurging": false,
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-35f42942",
"state": "",
},
{
"actions": null,
"error": "",
"isPurging": false,
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-5ca1080c",
"state": "",
},
{
"actions": null,
"error": "",
"isPurging": false,
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-71c1a3b0",
"state": "",
},
],
"ready": true,
"rebuildStatus": [ ],
"recurringJobSelector": null,
"replicaAutoBalance": "ignored",
"replicaSoftAntiAffinity": "ignored",
"replicaZoneSoftAntiAffinity": "ignored",
"replicas": [ 5 items
{
"address": "10.42.3.9",
"backendStoreDriver": "v1",
"currentImage": "longhornio/longhorn-engine:master-head",
"dataPath": "/var/lib/longhorn/replicas/longhorn-testvol-bl8klw-c13409ff",
"diskID": "b318893d-402d-49d3-abc5-6ed557895b25",
"diskPath": "/var/lib/longhorn/",
"engineImage": "longhornio/longhorn-engine:master-head",
"failedAt": "",
"hostId": "ip-10-0-1-39",
"instanceManagerName": "instance-manager-0b165eb6a49550ac97473a12c0045a78",
"mode": "RW",
"name": "longhorn-testvol-bl8klw-r-35f42942",
"running": true,
},
{
"address": "10.42.1.9",
"backendStoreDriver": "v1",
"currentImage": "longhornio/longhorn-engine:master-head",
"dataPath": "/var/lib/longhorn/replicas/longhorn-testvol-bl8klw-4be8b763",
"diskID": "7a05bd54-7bf8-4e75-abf2-05497610b825",
"diskPath": "/var/lib/longhorn/",
"engineImage": "longhornio/longhorn-engine:master-head",
"failedAt": "",
"hostId": "ip-10-0-1-21",
"instanceManagerName": "instance-manager-408a4a130067c1351be9778cfa8b9ff7",
"mode": "",
"name": "longhorn-testvol-bl8klw-r-5ca1080c",
"running": true,
},
{
"address": "10.42.1.9",
"backendStoreDriver": "v1",
"currentImage": "longhornio/longhorn-engine:master-head",
"dataPath": "/var/lib/longhorn/replicas/longhorn-testvol-bl8klw-4be8b763",
"diskID": "7a05bd54-7bf8-4e75-abf2-05497610b825",
"diskPath": "/var/lib/longhorn/",
"engineImage": "longhornio/longhorn-engine:master-head",
"failedAt": "",
"hostId": "ip-10-0-1-21",
"instanceManagerName": "instance-manager-408a4a130067c1351be9778cfa8b9ff7",
"mode": "RW",
"name": "longhorn-testvol-bl8klw-r-6801502c",
"running": true,
},
{
"address": "10.42.3.9",
"backendStoreDriver": "v1",
"currentImage": "longhornio/longhorn-engine:master-head",
"dataPath": "/var/lib/longhorn/replicas/longhorn-testvol-bl8klw-c13409ff",
"diskID": "b318893d-402d-49d3-abc5-6ed557895b25",
"diskPath": "/var/lib/longhorn/",
"engineImage": "longhornio/longhorn-engine:master-head",
"failedAt": "",
"hostId": "ip-10-0-1-39",
"instanceManagerName": "instance-manager-0b165eb6a49550ac97473a12c0045a78",
"mode": "",
"name": "longhorn-testvol-bl8klw-r-71c1a3b0",
"running": true,
},
{
"address": "",
"backendStoreDriver": "v1",
"currentImage": "",
"dataPath": "/var/lib/longhorn/replicas/longhorn-testvol-bl8klw-1c4c6b0d",
"diskID": "d2828bc3-5989-4e00-8249-123f20ddad9d",
"diskPath": "/var/lib/longhorn/",
"engineImage": "longhornio/longhorn-engine:master-head",
"failedAt": "2023-06-28T11:38:39Z",
"hostId": "ip-10-0-1-146",
"instanceManagerName": "",
"mode": "",
"name": "longhorn-testvol-bl8klw-r-a6b7051b",
"running": false,
},
],
"restoreInitiated": false,
"restoreRequired": false,
"restoreStatus": [ 4 items
{
"actions": null,
"backupURL": "",
"error": "",
"filename": "",
"isRestoring": false,
"lastRestored": "",
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-6801502c",
"state": "",
},
{
"actions": null,
"backupURL": "",
"error": "",
"filename": "",
"isRestoring": false,
"lastRestored": "",
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-35f42942",
"state": "",
},
{
"actions": null,
"backupURL": "",
"error": "",
"filename": "",
"isRestoring": false,
"lastRestored": "",
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-5ca1080c",
"state": "",
},
{
"actions": null,
"backupURL": "",
"error": "",
"filename": "",
"isRestoring": false,
"lastRestored": "",
"links": null,
"progress": 0,
"replica": "longhorn-testvol-bl8klw-r-71c1a3b0",
"state": "",
},
],
"restoreVolumeRecurringJob": "ignored",
"revisionCounterDisabled": false,
"robustness": "degraded",
"shareEndpoint": "",
"shareState": "",
"size": "16777216",
"snapshotDataIntegrity": "ignored",
"staleReplicaTimeout": 0,
"standby": false,
"state": "attached",
"type": "volume",
"unmapMarkSnapChainRemoved": "ignored",
"volumeAttachment": {
"attachments": {
"test-attachment-ticket-lhgbeu": {
"attachmentID": "test-attachment-ticket-lhgbeu",
"attachmentType": "csi-attacher",
"conditions": [
{
"lastProbeTime": "",
"lastTransitionTime": "2023-06-28T11:38:49Z",
"message": "The migrating attachment ticket is satisfied",
"reason": "",
"status": "True",
"type": "Satisfied",
},
],
"nodeID": "ip-10-0-1-21",
"parameters": {
"disableFrontend": "false",
"lastAttachedBy": "",
},
"satisfied": true,
},
},
"volume": "longhorn-testvol-bl8klw",
},
}
The length of controllers is not 1, and the length of replicas is not numberOfReplicas, so the test case failed.
To Reproduce
Run test case test_migration_with_*
Expected behavior
A clear and concise description of what you expected to happen.
Log or Support bundle
supportbundle_e4760442-f449-47a1-9bd1-dab5a00e97c5_2023-06-28T12-14-54Z.zip
Environment
- Longhorn version: master-head or v1.5.x-head
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
- Number of management node in the cluster:
- Number of worker node in the cluster:
- Node config
- OS type and version:
- CPU per node:
- Memory per node:
- Disk type(e.g. SSD/NVMe):
- Network bandwidth between the nodes:
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
- Number of Longhorn volumes in the cluster:
Additional context
Test results:
https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-amd64/28/testReport/junit/tests/test_migration/test_migration_with_failed_replica/
https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-arm64/31/testReport/tests/test_migration/test_migration_with_rebuilding_replica/
https://ci.longhorn.io/job/public/job/master/job/sles/job/amd64/job/longhorn-tests-sles-amd64/533/testReport/tests/test_migration/test_migration_with_rebuilding_replica/
https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-amd64/27/testReport/tests/test_migration/test_migration_with_restore_volume_nfs_/
https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-amd64/24/testReport/tests/test_migration/test_migration_with_unscheduled_replica/
https://ci.longhorn.io/job/public/job/master/job/sles/job/amd64/job/longhorn-tests-sles-amd64/526/testReport/tests/test_migration/test_migration_with_rebuilding_replica/
https://ci.longhorn.io/job/public/job/master/job/sles/job/arm64/job/longhorn-tests-sles-arm64/522/testReport/tests/test_migration/test_migration_with_unscheduled_replica/
Metadata
Assignees
Labels
Type
Projects
Status
Closed