Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Fast online replica rebuilding for V2 data engine #9491

Open
shuo-wu opened this issue Sep 19, 2024 · 4 comments
Open

[EPIC] Fast online replica rebuilding for V2 data engine #9491

shuo-wu opened this issue Sep 19, 2024 · 4 comments
Assignees
Labels
area/spdk SPDK upstream/downstream area/v2-data-engine v2 data engine (SPDK) area/volume-replica-rebuild Volume replica rebuilding related Epic highlight Important feature/issue to highlight kind/improvement Request for improvement of existing function priority/0 Must be implement or fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated
Milestone

Comments

@shuo-wu
Copy link
Contributor

shuo-wu commented Sep 19, 2024

Context

In v1.7, Longhorn announced the first version of v2 volume online rebuilding feature. It works but not as perfect as the v1 part, especially in the efficiency perspective: v2 only supports full rebuilding now. It cannot validate then reuse any intact snapshots or valid chunks, which caused lots of unnecessary data transfer during rebuilding.

Improvements

  1. The snapshot Checksum feature not only helps figure out corrupted/bit-rot replicas but also makes intact snapshot reusage possible. [FEATURE] v2 volume supports snapshot checksum #8666
    • To support the fast rebuilding workflow, Longhorn and SPDK should somehow automatically calculate and store checksum once a snapshot is created.
  2. We can reuse the intact snapshots in failed replicas when Longhorn can verify them.[IMPROVEMENT] v2 volume supports delta replica rebuilding based on snapshot checksum #8771
@shuo-wu shuo-wu added highlight Important feature/issue to highlight require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated kind/improvement Request for improvement of existing function area/v2-data-engine v2 data engine (SPDK) area/volume-replica-rebuild Volume replica rebuilding related area/spdk SPDK upstream/downstream labels Sep 19, 2024
@shuo-wu shuo-wu added this to the v1.8.0 milestone Sep 19, 2024
@github-project-automation github-project-automation bot moved this to New Issues in Longhorn Sprint Sep 19, 2024
@innobead innobead changed the title [Epic] Better v2 online rebuilding in v1.8 [EPIC] Better v2 online rebuilding in v1.8 Sep 23, 2024
@innobead innobead added the priority/0 Must be implement or fixed in this release (managed by PO) label Sep 23, 2024
@innobead
Copy link
Member

@shuo-wu and @DamiaSan Please don't use this EPIC for your development. Instead, you should create the corresponding issues to work on each sub-task and make them testable individually by QA.

@derekbit derekbit moved this from New Issues to Implement in Longhorn Sprint Dec 9, 2024
@innobead innobead changed the title [EPIC] Better v2 online rebuilding in v1.8 [EPIC] Fast online replica rebuilding for V2 data engine Dec 19, 2024
@derekbit
Copy link
Member

@shuo-wu
Can we use #9488 instead?

@innobead innobead modified the milestones: v1.8.0, Backlog Dec 23, 2024
@innobead
Copy link
Member

Let me add EPIC to the backlog instead, so we can still keep the whole view and plan its sub-tasks/features to the corresponding releases, since sometimes it requires several releases to finish. In this case, the tasks in this EPIC will be completed in 1.8 and 1.9.

Let's use this way to manage EPIC. cc @longhorn/dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/spdk SPDK upstream/downstream area/v2-data-engine v2 data engine (SPDK) area/volume-replica-rebuild Volume replica rebuilding related Epic highlight Important feature/issue to highlight kind/improvement Request for improvement of existing function priority/0 Must be implement or fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated
Projects
Status: Implement
Development

No branches or pull requests

4 participants