Skip to content

[BUG] nfs backup broken - NFS server: mkdir - file exists #4626

Closed
@xamindar

Description

Describe the bug

An nfs backup location, when configured, will work for a little while (but no more than a day or so), then fail to function.

To Reproduce

Steps to reproduce the behavior:
In the longhorn UI:

  1. Go to 'setting>general-->Backup Target'
  2. set an nfs location (my example: nfs://172.16.8.11:/mnt/user/k8s/longhorn/backup)
  3. Click 'save' at the bottom
  4. it will work for a while, I was able to backup all my volumes yesterday
  5. wait maybe a day, then return to the UI backup tab and I see the following error:
    error listing backup volume names: failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.3.0/longhorn [backup ls --volume-only nfs://172.16.8.11:/mnt/user/k8s/longhorn/backup], output Cannot create mount directory /var/lib/longhorn-backupstore-mounts/172_16_8_11/mnt/user/k8s/longhorn/backup for NFS server: mkdir /var/lib/longhorn-backupstore-mounts/172_16_8_11/mnt/user/k8s/longhorn/backup: file exists , stderr, time="2022-09-27T15:13:59Z" level=error msg="Cannot create mount directory /var/lib/longhorn-backupstore-mounts/172_16_8_11/mnt/user/k8s/longhorn/backup for NFS server: mkdir /var/lib/longhorn-backupstore-mounts/172_16_8_11/mnt/user/k8s/longhorn/backup: file exists" , error exit status 1

Expected behavior

The nfs mount should continue to work, or in the event it went stale (should NOT have in my case as this nfs server is up 24/7) it should remount it.

Log or Support bundle

If applicable, add the Longhorn managers' log or support bundle when the issue happens.
You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment

  • Longhorn version: 1.3.0
  • Installation method: helm
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: ks3 v1.24.3+k3s1
    • Number of management node in the cluster: 3
    • Number of worker node in the cluster: 2
  • Node config
    • OS type and version: Debian GNU/Linux 11 (bullseye) 5.15.61-v8+
    • CPU per node: 4 (raspberry pi4)
    • Memory per node: 2-8g
    • Disk type(e.g. SSD/NVMe): nvme over usb3
    • Network bandwidth between the nodes: 1g
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal
  • Number of Longhorn volumes in the cluster: 11

Additional context

When it breaks, if I go into the settings and make any change to the nfs mount setting (for example, change the last folder backup, to backup1 and make sure it's created on the server), then save the settings again. The new nfs mount will work for a time before eventually breaking again. I can also go back into settings and change it back to the original "backup" folder and again it will work for a time before breaking again.

Note, this nfs server is up 24/7, it is an unraid server. I am able to mount nfs no problem on any normal machine and it works as long as it's mounted. Longhorn for whatever reason consistently breaks in the manner explained above.

Metadata

Labels

area/stabilitySystem or volume stabilityarea/volume-backup-restoreVolume backup restorekind/bugpriority/0Must be implement or fixed in this release (managed by PO)reproduce/always100% reproducibleseverity/3Function working but has a major issue w/ workaround

Type

No type

Projects

  • Status

    Closed

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions