Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Longhorn manager crashed during backing image 100gb volume export #5209

Closed
c3y1huang opened this issue Jan 5, 2023 · 11 comments
Closed
Assignees
Labels
area/backing-image Backing image related component/longhorn-manager Longhorn manager (control plane) investigation-needed Identified the issue but require further investigation for resolution (won't be stale) kind/bug priority/0 Must be implement or fixed in this release (managed by PO)
Milestone

Comments

@c3y1huang
Copy link
Contributor

c3y1huang commented Jan 5, 2023

Describe the bug (🐛 if you encounter this issue)

anyone having issues exporting 100gb volumes? i keep getting to 28% and then the manager pod for longhorn craps out.

Node harvester-01 is ready	2.2 mins ago 
Node harvester-01 is ready	2.2 mins ago 
Node harvester-01 is ready	2.2 mins ago 
Node harvester-01 is ready	2.2 mins ago 
Node harvester-01 is down: the manager pod longhorn-manager-fdc28 is not running	2.2 mins ago 
Node harvester-01 is down: the manager pod longhorn-manager-fdc28 is not running	2.2 mins ago 
Node harvester-01 is down: the manager pod longhorn-manager-fdc28 is not running	2.2 mins ago 
in the longhorn ui, the backing image its trying to create gets to 28%, then dies, and starts over

To Reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Perform '....'
  4. See error

Expected behavior

  • Longhorn manager should not crash.
  • The backing image should finish exporting volume.

Log or Support bundle

manager.txt
backing.txt

Environment

  • Longhorn version: 1.3.2
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
    • Number of management node in the cluster:
    • Number of worker node in the cluster:
  • Node config
    • OS type and version:
    • CPU per node:
    • Memory per node:
    • Disk type(e.g. SSD/NVMe):
    • Network bandwidth between the nodes:
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
  • Number of Longhorn volumes in the cluster:

Additional context

https://rancher-users.slack.com/archives/CC2UQM49Y/p1672880888519919

@c3y1huang c3y1huang changed the title [BUG] Longhorn manager crashed during backing image volume export [BUG] Longhorn manager crashed during backing image 100gb volume export Jan 5, 2023
@innobead innobead added this to the v1.5.0 milestone Jan 5, 2023
@innobead innobead added component/longhorn-manager Longhorn manager (control plane) priority/0 Must be implement or fixed in this release (managed by PO) area/backing-image Backing image related labels Jan 5, 2023
@innobead
Copy link
Member

innobead commented Jan 5, 2023

cc @longhorn/qa for coverage

@shuo-wu
Copy link
Contributor

shuo-wu commented Jan 5, 2023

The error log in the backing-image-ds pod:

error: stream error: stream ID 3; INTERNAL_ERROR

This is a HTTP error, which is probably caused by the data-transfer connection between the backing image ds pod and the replica process.

Waiting for the user to figure out the cause or provide more info.

@albertkohl-monotek
Copy link

@shuo-wu i think the error at the end is from the --follow command while streaming the logs. The actual last line of the log is:
time="2023-01-05T02:46:56Z" level=debug msg="SyncingFile: failed to get the checksum from a valid config during processing wrap-up, will directly calculated it then"

support bundle attached.
longhorn-support-bundle_b0b85b06-bd06-434b-95e8-c47b11be903f_2023-01-05T03-17-16Z.zip

@shuo-wu
Copy link
Contributor

shuo-wu commented Jan 5, 2023

time="2023-01-05T02:46:56Z" level=debug msg="SyncingFile: failed to get the checksum from a valid config during processing wrap-up, will directly calculated it then"

This means the pod is blindly trying to reuse the existing file if possible. Here there is no existing file, hence the pod will give up the reusage then follow the regular flow. As I mentioned, this is not an error...

@albertkohl-monotek
Copy link

last night i shut down all non-essential vms, had not very much load on my harvester cluster, and tried again. same issue. Today i put one of the nodes (Harveter-01 into maintenance mode and rebooted it. when it came back up and tried to sync the degraded volumes, i started seeing more errors like this:

image

additional support bundle attached.
longhorn-support-bundle_b0b85b06-bd06-434b-95e8-c47b11be903f_2023-01-06T01-05-49Z.zip

@albertkohl-monotek
Copy link

still having the same issue. i've added two additional nodes to my cluster as well, and now it fails at 24% instead of 28%. every time, same deal.

i cant export volumes at all it seems.

@innobead innobead added the investigation-needed Identified the issue but require further investigation for resolution (won't be stale) label Jan 9, 2023
@innobead innobead assigned ChanYiLin and unassigned shuo-wu Mar 13, 2023
@ChanYiLin
Copy link
Contributor

ChanYiLin commented Mar 14, 2023

Maybe this is also related to get checksum timeout issue #5443

Based on the log

  • data source pod: had already received the file and started to calculating the checksum.
2023-01-05T03:12:26.104170312Z time="2023-01-05T03:12:26Z" level=info msg="open: receiving fileSize: 107374182400, setting up directIO: true"
2023-01-05T03:12:26.104408768Z time="2023-01-05T03:12:26Z" level=info msg="Ssync server opened and ready"
2023-01-05T03:17:06.811723970Z time="2023-01-05T03:17:06Z" level=info msg="Closing ssync server"
2023-01-05T03:17:06.812229399Z time="2023-01-05T03:17:06Z" level=debug msg="SyncingFile: failed to get the checksum from a valid config during processing wrap-up, will directly calculated it then"
  • replica process (instance-manager-r-1ec4dda7): had exported the snapshot to 10.52.3.170:8002 (data source pod)
2023-01-05T03:12:21.811488872Z [pvc-7ba4e01b-8be4-4a3c-9f43-1564eaac9027-r-4fbc9483] time="2023-01-05T03:12:21Z" level=info msg="Finished creating disk" disk=default-image-fs65q-27f2b820
2023-01-05T03:12:23.965555815Z [pvc-7ba4e01b-8be4-4a3c-9f43-1564eaac9027-r-4fbc9483] time="2023-01-05T03:12:23Z" level=info msg="Exporting snapshot default-image-fs65q-27f2b820 to 10.52.3.170:8002"
2023-01-05T03:12:24.091153636Z [pvc-7ba4e01b-8be4-4a3c-9f43-1564eaac9027-r-4fbc9483] time="2023-01-05T03:12:24Z" level=warning msg="Failed to open server: 10.52.3.170:8002, Retrying..."
2023-01-05T03:12:24.091358856Z [pvc-7ba4e01b-8be4-4a3c-9f43-1564eaac9027-r-4fbc9483] time="2023-01-05T03:12:24Z" level=warning msg="Failed to open server: 10.52.3.170:8002, Retrying..."
2023-01-05T03:12:25.092204999Z [pvc-7ba4e01b-8be4-4a3c-9f43-1564eaac9027-r-4fbc9483] time="2023-01-05T03:12:25Z" level=warning msg="Failed to open server: 10.52.3.170:8002, Retrying..."
2023-01-05T03:17:06.800579564Z [pvc-7ba4e01b-8be4-4a3c-9f43-1564eaac9027-r-4fbc9483] time="2023-01-05T03:17:06Z" level=info msg="Done exporting snapshot default-image-fs65q-27f2b820 to 10.52.3.170:8002"

And the error seems to be http error in data source pod in the end

error: stream error: stream ID 3; INTERNAL_ERROR

@albertkohl-monotek can you successfully create backingimage by exporting smaller voulme (say smaller than 5G)

cc @shuo-wu

@innobead innobead modified the milestones: v1.5.0, v1.6.0 May 3, 2023
@ChanYiLin
Copy link
Contributor

ChanYiLin commented Jul 6, 2023

Hi I revisited the issue and found this is actually a duplicated issue of #4865

The root cause was because

  • The backing image export from volume was large
  • It took lots of time to calculate the checksum
  • During calculating the checksum in finishProcessing() the syncFile was locked
  • So the controller monitor couldn't get the info from data source pod since the Get() API couldn't get the lock to get syncFile info
  • So the controller monitor got errors
time="2023-01-05T02:48:51Z" level=error msg="failed to get default-image-fs65q info from backing image data source server: get failed, err: Get \"http://10.52.1.21:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=default-image-fs65q controller=longhorn-backing-image-data-source diskUUID=0e9897f1-fdf9-4f16-bfb3-0c722f52d650 node=harvester-02 nodeID=harvester-02 parameters="map[export-type:raw volume-name:pvc-7ba4e01b-8be4-4a3c-9f43-1564eaac9027]" sourceType=export-from-volume
time="2023-01-05T02:49:04Z" level=error msg="failed to get default-image-fs65q info from backing image data source server: get failed, err: Get \"http://10.52.1.21:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=default-image-fs65q controller=longhorn-backing-image-data-source diskUUID=0e9897f1-fdf9-4f16-bfb3-0c722f52d650 node=harvester-02 nodeID=harvester-02 parameters="map[export-type:raw volume-name:pvc-7ba4e01b-8be4-4a3c-9f43-1564eaac9027]" sourceType=export-from-volume
time="2023-01-05T02:49:04Z" level=warning msg="Stop monitoring since monitor default-image-fs65q sync reaches the max retry count 10" backingImageDataSource=default-image-fs65q controller=longhorn-backing-image-data-source diskUUID=0e9897f1-fdf9-4f16-bfb3-0c722f52d650 node=harvester-02 nodeID=harvester-02 parameters="map[export-type:raw volume-name:pvc-7ba4e01b-8be4-4a3c-9f43-1564eaac9027]" sourceType=export-from-volume
time="2023-01-05T02:49:04Z" level=info msg="Stopping monitoring" backingImageDataSource=default-image-fs65q controller=longhorn-backing-image-data-source node=harvester-02
panic: close of closed channel

goroutine 117339 [running]:
github.com/longhorn/longhorn-manager/controller.(*BackingImageDataSourceController).stopMonitoring(0xc00018a140, {0xc000f44630, 0x13})
	/go/src/github.com/longhorn/longhorn-manager/controller/backing_image_data_source_controller.go:930 +0x145
github.com/longhorn/longhorn-manager/controller.(*BackingImageDataSourceController).startMonitoring.func1()
	/go/src/github.com/longhorn/longhorn-manager/controller/backing_image_data_source_controller.go:970 +0x4c
created by github.com/longhorn/longhorn-manager/controller.(*BackingImageDataSourceController).startMonitoring
	/go/src/github.com/longhorn/longhorn-manager/controller/backing_image_data_source_controller.go:968 +0x465
  • After 10 times of failures, it tried to stop monitoring, and because of the panic: close of closed channel, the controller crashed
  • So the other controller took over the data source and restart the export process.

That's why it kept restarting and the controller kept crashing in the case.

This issue was already fixed.

cc @innobead @shuo-wu @albertkohl-monotek @c3y1huang

@ChanYiLin
Copy link
Contributor

I have verified that this issue no longer happens after version v1.4.1

time="2023-07-05T09:01:04Z" level=info msg="Stopping monitoring" backingImageDataSource=nginx controller=longhorn-backing-image-data-source node=ip-10-0-2-90
time="2023-07-05T09:01:04Z" level=info msg="Stopped monitoring" backingImageDataSource=nginx controller=longhorn-backing-image-data-source node=ip-10-0-2-90
time="2023-07-05T09:01:27Z" level=info msg="Start monitoring" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:01:37Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:01:50Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:02:03Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:02:16Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:02:29Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:02:42Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:02:55Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:03:08Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:03:21Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:03:34Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:03:34Z" level=warning msg="Stop monitoring since monitor nginx sync reaches the max retry count 10" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:03:34Z" level=info msg="Stopping monitoring" backingImageDataSource=nginx controller=longhorn-backing-image-data-source node=ip-10-0-2-90
time="2023-07-05T09:03:34Z" level=info msg="Stopped monitoring" backingImageDataSource=nginx controller=longhorn-backing-image-data-source node=ip-10-0-2-90
time="2023-07-05T09:03:57Z" level=info msg="Start monitoring" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:04:07Z" level=error msg="failed to get nginx info from backing image data source server: resp.StatusCode(500) != http.StatusOK(200), response body content: get failed, err: Get \"http://0.0.0.0:8001/v1/files/%2Fdata%2Ftmp%2Fnginx-dd644b68\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\n" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:04:20Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:04:33Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:04:46Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:04:59Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:05:12Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:05:25Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:05:38Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:05:51Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:06:04Z" level=error msg="failed to get nginx info from backing image data source server: get failed, err: Get \"http://10.42.2.24:8000/v1/file\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:06:04Z" level=warning msg="Stop monitoring since monitor nginx sync reaches the max retry count 10" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:06:04Z" level=info msg="Stopping monitoring" backingImageDataSource=nginx controller=longhorn-backing-image-data-source node=ip-10-0-2-90
time="2023-07-05T09:06:04Z" level=info msg="Stopped monitoring" backingImageDataSource=nginx controller=longhorn-backing-image-data-source node=ip-10-0-2-90
time="2023-07-05T09:06:27Z" level=info msg="Start monitoring" backingImageDataSource=nginx controller=longhorn-backing-image-data-source diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90 parameters="map[export-type:raw volume-name:nginx]" sourceType=export-from-volume
time="2023-07-05T09:06:34Z" level=debug msg="Start to fetch the data source file from the backing image data source work directory /tmp/" backingImage=nginx backingImageManager=backing-image-manager-878e-7568 controller=longhorn-backing-image-manager diskUUID=756868ea-99d0-4026-b013-ca807ba33771 node=ip-10-0-2-90 nodeID=ip-10-0-2-90

It failed many times without panic and successfully created the backing image

@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Jul 6, 2023

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:
  • Create a Volume and write large data (>50G)
  • Create Backing with export-volume
  • Monitor manager
    • in v1.3.2, it will fail to get backing image info for 10 times and then crashed
    • in current version, it will create the backing image successfully

@chriscchien chriscchien self-assigned this Aug 15, 2023
@chriscchien
Copy link
Contributor

Verified pass on longhorn master 3b04fa with test steps

Create backing image with export-volume type( source volume size greater than 50 Gi) success

@github-project-automation github-project-automation bot moved this to New Issues in Longhorn Sprint Aug 3, 2024
@derekbit derekbit moved this from New Issues to Closed in Longhorn Sprint Aug 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backing-image Backing image related component/longhorn-manager Longhorn manager (control plane) investigation-needed Identified the issue but require further investigation for resolution (won't be stale) kind/bug priority/0 Must be implement or fixed in this release (managed by PO)
Projects
Status: Closed
Development

No branches or pull requests

7 participants