-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(lep): implement volume delta copy inside SPDK #7031
base: master
Are you sure you want to change the base?
Conversation
4dee3e1
to
a50e7c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, LGTM. Just one question needs to clarify.
eed6a5a
to
a2a4b06
Compare
The concern of the current design is, if the node is suddenly powered off when a replica/lvol keeps handling write requests, is the last succeed write (retuned to the caller) really flushed into disk rather than cached in SPDK/OS? This determines if we can use that write bitmap recorded in the replica offline period to apply the delta rebuilding. Regardless of of the above concerns, I raised an alternative for the special case (snapshot creation during the replica offline period) mentioned in LEP option 1. I would explain it with the following example:
In this case, the bitmap is necessary for snapshot1 rebuilding only. In other words, when the first snapshot of the offline period is created, we can stop modifying the bitmap. |
Thanks @shuo-wu for these considerations.
Inside SPDK RAID and Lvolstore layers, data are not cached. SPDK NVMe driver provides a zero-copy data transfer path (using huge pages): in this way, also in NVMe layer of SPDK, data are not cached.
we can assume that data are really sent to the disk when a write operation returns to the caller. Instead, if we rely on Linux block device using SPDK AIO Bdev to provide a backing device for Lvolstore, both for NVMe disks and for older disk technologies, this is not true. IIUC, Linux block devices provide buffered access to hardware devices. I think that, if we don't rely on SPDK NVMe driver, before to start the rebuild process we should perform a checksum of the snapshot to be rebuilt, maybe over all the clusters that are not present in the bitmap.
Ok, so when the bitmap is retrieved by the caller it can be deleted from RAID. |
a2a4b06
to
f2defe2
Compare
In today's discussion, we confirmed that spdk But there is another scenario raised: If the node/spdk_tgt the RAID resides on gets crashed, how do we handle the failed replica on this node after the RAID back? There are 2 issues here:
Besides, we discussed the bitmap update flow. The basic idea is, setting the bit when there is a write coming, and unsetting the bit when all base bdevs are mode RW and finish that writing. Without the bit cancellation, all bits may be already 1 even if the failed replica just failed for seconds, which means a full rebuilding. |
I believe we will update this soon after some following previous discussion? |
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
6d4015d
to
43326ce
Compare
43326ce
to
f8a5442
Compare
f8a5442
to
9c3cdd0
Compare
9c3cdd0
to
b039dcb
Compare
@DamiaSan is this refined as per the recent implementation? |
Yes, all general concepts don't change, I am only changing the internal implementation of the delta copy handling inside spdk. |
1e21eb8
to
e8f1316
Compare
Just pushed a new revision with the new APIs for the delta map handling and the decision about the calculation of the snapshot checksum. |
03027ad
to
b58a7a3
Compare
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
Signed-off-by: Damiano Cipriani <damiano.cipriani@suse.com>
b58a7a3
to
f22f3ac
Compare
There are a couple of options about delta copy implementation in SPDK, I described both so we can discuss together and choose the best one for Longhorn.