-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unable to backup volume after NFS server IP change #5856
Comments
This is due to |
@derekbit Can we just clean mount point without querying no matter if it's valid or dead mount? Have we tested this case before? |
I will do two improvements
|
We change the backup target before, but the targets are healthy. |
I mean can we just clean up it directly? since // mnt, err := filesystem.GetMount(mountPoint)
// if err != nil {
// return true, errors.Wrapf(err, "failed to get mount for %v", mountPoint)
// }
// if strings.Contains(mnt.FilesystemType, Kind) {
// return true, nil
// }
log.Warnf("Cleaning up the mount point %v because the fstype %v is changed to %v", mountPoint, mnt.FilesystemType, Kind)
if mntErr := cleanupMount(mountPoint, mounter, log); mntErr != nil {
return true, errors.Wrapf(mntErr, "failed to clean up mount point %v (%v) for %v protocol", mnt.FilesystemType, mountPoint, Kind)
}
return false, nil |
Yes, it is my fix. |
Pre Ready-For-Testing Checklist
longhorn/backupstore#129 longhorn/backupstore#131
|
I see, that makes sense. However, one question, right now we only have one backup target, so if blindly clean up the mount point in this case w/o checking the filesystem type, will be there any side effects? In longhorn/backupstore@149c9aa3 fix, it's different from this scenario, because it's to clean up mount points if the backup target is reset, so it can blindly cleanup w/o any checks. |
Do we still need to do at longhorn-manager side? the original fix is for cleaning up the mount points when trying to mount the same mount point again, so fixing in the backupstore should be good enough? |
For v1.4.x, the check of filesystem type is not needed, because it only supports |
If the backup target's IP is changed, the old mountpoint still needs cleanup. I feel the task can be handled in longhorn-manager and should not impact the data path (I mean I think we check if there is better way and just remove the cleanup for avoiding the timeout issue. WDYT? |
I still want to clarify a bit. When doing unmount, we can just umount it no matter what type of mount, right? The below code func cleanupMount(mountDir string, mounter mount.Interface, log logrus.FieldLogger) error {
forceUnmounter, ok := mounter.(mount.MounterForceUnmounter)
if ok {
log.Infof("Trying to force clean up mount point %v", mountDir)
return mount.CleanupMountWithForce(mountDir, forceUnmounter, false, forceCleanupMountTimeout)
}
log.Infof("Trying to clean up mount point %v", mountDir)
return mount.CleanupMountPoint(mountDir, forceUnmounter, false)
} If yes, even though the IP of backup store has been changed, we should still be able to force unmount it, or not? P.S. About multiple backup stores, that will be another implementation for sure, because we can't just unmount all mount points. |
The mount point is mounted. However, it might be a |
Ideally, yes. But the cleanup needs to scan all mount points. The dead mount points while cleanup will lead to the timeout. |
To sum up.
Cleaning mount point is general no matter what type of backup store (nfs/cifs) @derekbit right? |
Yes.
|
Yeah, agreed with this. However, this means longhorn/backupstore@149c9aa3 would have issues? since it's doing the blind unmount when the backup target is reset. |
No, this is for unset. |
It looks like there are different paths for cleaning up the mount points. It's good to clarify the test cases as well for QA.
anything else? BTW, ideally, the trigger should be caller, so it's longhorn-manager. cc @longhorn/qa |
Yeah, I think the cases are covered. |
But it will solve potential performance issues, so it still deserves it. We just need to ensure the coverage w/o regression. |
cc @ChanYiLin |
I discussed with @derekbit the action items below.
This issue is required to fix in 1.4.2. |
Verified on master-head 20230512
The test steps Result Passed
|
Describe the bug (🐛 if you encounter this issue)
I tried to stop the NFS server VM instance on AWS EC2 and start it again. The instance public IP changed to a new one. After changing the backup target to the new IP, the volume couldn't be backed up. The icon turned gray, and the backup page showed an error.
To Reproduce
Steps to reproduce the behavior:
Pre-requisite
To set up an external NFS server for backup store, perform the following steps:
sudo zypper install nfs-kernel-server
exportfs -a
Note: To simulate network disconnection, download the
network_down.sh script
from the following link: https://github.com/longhorn/longhorn/files/4864127/network_down.sh.zipThe test steps
Expected behavior
The volume should be backed up. the backup page would not show an error.
Log or Support bundle
supportbundle_81183726-5b70-4292-a359-3d41e96c9847_2023-05-04T09-29-55Z.zip
Environment
v1.4.x
/master
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: