[BUG] timestamp or checksum not matched in test_snapshot_hash_detect_corruption test case #6145
Description
Describe the bug (🐛 if you encounter this issue)
In test case test_snapshot_hash_detect_corruption_in_global_fast_check_mode
or test_snapshot_hash_detect_corruption_in_global_enabled_mode
, it tries to check the checksum value and ctime of the checksum file in check_snapshot_checksums_and_change_timestamps
before corrupting the snapshot:
# Check checksums in snapshot resource and the calculated value
# are matched
checksum = get_checksum_from_snapshot_disk_file(data_path,
s.name)
print(f'snapshot {s.name}: '
f'checksum in resource={s.checksum}, '
f'checksum recorded={checksum}')
assert checksum == s.checksum
# Check ctime in checksum file and from stat are matched
ctime_recorded = get_ctime_in_checksum_file(disk_path)
ctime = get_ctime_from_snapshot_disk_file(data_path, s.name)
print(f'snapshot {s.name}: '
f'ctime recorded={ctime_recorded}, '
f'ctime={ctime}')
But this check randomly failed. It could be the checksum not matched:
https://ci.longhorn.io/job/public/job/master/job/sles/job/amd64/job/longhorn-tests-sles-amd64/524/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_fast_check_mode/
https://ci.longhorn.io/job/public/job/master/job/rhel/job/amd64/job/longhorn-tests-rhel-amd64/64/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_fast_check_mode/
Or the ctime of the checksum file not matched:
https://ci.longhorn.io/job/public/job/master/job/rhel/job/amd64/job/longhorn-tests-rhel-amd64/59/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_enabled_mode/
https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-arm64/15/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_enabled_mode/
https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-amd64/6/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_enabled_mode/
https://ci.longhorn.io/job/public/job/v1.5.x/job/v1.5.x-longhorn-tests-sles-amd64/12/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_fast_check_mode/
https://ci.longhorn.io/job/public/job/master/job/rhel/job/amd64/job/longhorn-tests-rhel-amd64/62/testReport/tests/test_snapshot/test_snapshot_hash_detect_corruption_in_global_fast_check_mode/
It could be hard to manually reproduce because of its tedious and time-consuming test setup, and there's another issue also happening to this test case: #6129. So if the test case failed, it could be due to either issue addressed in this ticket or the issue addressed in #6129.
This issue could be introduced after v1.5.0-rc2
, at least we didn't observe this in v1.5.0-rc1
.
To Reproduce
Run test case test_snapshot_hash_detect_corruption_in_global_fast_check_mode
or test_snapshot_hash_detect_corruption_in_global_enabled_mode
Expected behavior
A clear and concise description of what you expected to happen.
Log or Support bundle
If applicable, add the Longhorn managers' log or support bundle when the issue happens.
You can generate a Support Bundle using the link at the footer of the Longhorn UI.
Environment
- Longhorn version:
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
- Number of management node in the cluster:
- Number of worker node in the cluster:
- Node config
- OS type and version:
- CPU per node:
- Memory per node:
- Disk type(e.g. SSD/NVMe):
- Network bandwidth between the nodes:
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
- Number of Longhorn volumes in the cluster:
Additional context
Add any other context about the problem here.
Metadata
Labels
Type
Projects
Status
Closed