Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support periodic or on-demand full backups to enhance backup reliability #7070

Closed
derekbit opened this issue Nov 9, 2023 · 8 comments
Assignees
Labels
area/backup-store Remote backup store related area/resilience System or volume resilience area/volume-data-protection Volume data protection related highlight Important feature/issue to highlight kind/feature Feature request, new feature priority/0 Must be implement or fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation require/important-note Upgrade, Deprecation, Important notes require/lep Require adding/updating enhancement proposal
Milestone

Comments

@derekbit
Copy link
Member

derekbit commented Nov 9, 2023

Is your feature request related to a problem? Please describe (👍 if you like this request)

In the existing Longhorn backup system, the initial backup is a full backup, while subsequent backups are incremental. If any block becomes corrupted, all backup revisions relying on that block will also be corrupted as well. An approach to address the issue might perform a full backup after every N incremental backups. This method can decreases the likelihood of backup corruption, enhancing the overall reliability of the backup process.

Current implementation
image

A possible solution
image

Ref: https://www.architecting.it/blog/incrementals-forever-or-synthetic-fulls/

Describe the solution you'd like

Describe alternatives you've considered

Additional context

@derekbit derekbit added kind/feature Feature request, new feature require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation require/lep Require adding/updating enhancement proposal area/resilience System or volume resilience area/backup-store Remote backup store related labels Nov 9, 2023
@innobead innobead added this to the v1.7.0 milestone Nov 9, 2023
@derekbit
Copy link
Member Author

I would highlight the feature that can help improve the resilience to the silent corruption of a backup server.
cc @innobead

@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Mar 20, 2024

Pre Ready-For-Testing Checklist

Test

  • Create a Volume with 10MB, and write 4MB data
  • Create NFS backupstore

Test 1: Full Backup

  • Take a Backup from UI
  • Go into NFS backupstore to replace one of the block content with random data
  • Restore the Volume should fail.
  • Create a full Backup by
    • http://${LONGHORN_UI_ENDPOINT}/v1/volumes/${VOLUME}
    • Click snapshotBackup
      • name: the previous snapshot name
      • parameters: {"backup-mode": "full"}
  • Restore the Volume should succeed.
  • The first backup
    • .Status.NewlyUploadedDataSize: "4194304" (not exactly, because it is compressed)
    • .Status.ReUploadedDataSize: "0"
  • The second backup
    • .Status.NewlyUploadedDataSize: "0"
    • .Status.ReUploadedDataSize: "4194304" (not exactly, because it is compressed)

Test 2: Recurring Full Backup - Alaways incremental

  • Cleanup all the Backup/BackupVolume
  • Prepare RecurringJob YAML (every 1 min)
    apiVersion: longhorn.io/v1beta1
    kind: RecurringJob
    metadata:
      name: backup-job
      namespace: longhorn-system
    spec:
      cron: "* * * * ?"
      task: "backup"
      groups:
      - default
      retain: 100
      concurrency: 1
      parameters:
        full-backup-interval: "2"
    
  • Change the interval to 0, full-backup-interval: "0"
  • Create the RecurringJob
  • Wait for 2 Backup to be created
  • k describe lhb -n longhorn-system | grep -A 2 "Backup Mode"
    • Both should be "incremental"
  • k describe lhb -n longhorn-system | grep -A 5 "Newly"
    • The first one should be "4194304" (not exactly)
    • The second one should be "0"
  • k describe lhb -n longhorn-system | grep -A 5 "Re Uploaded Data Size"
    • The first one should be "0"
    • The second one should be "0"
  • Cleanup all the Backup/BackupVolume

Test 3: Recurring Full Backup - Alaways Full

  • Change the interval to 1, full-backup-interval: "1"
  • Create the RecurringJob
  • Wait for 2 Backup to be created
  • k describe lhb -n longhorn-system | grep -A 2 "Backup Mode"
    • Both should be "full"
  • k describe lhb -n longhorn-system | grep -A 5 "Newly"
    • The first one should be "4194304" (not exactly)
    • The second one should be "0"
  • k describe lhb -n longhorn-system | grep -A 5 "Re Uploaded Data Size"
    • The first one should be "0"
    • The second one should be "4194304" (not exactly)
  • Cleanup all the Backup/BackupVolume

Test 4: Recurring Full Backup - Every N times

  • Change the interval to 2, full-backup-interval: "2"
  • Create the RecurringJob
  • Wait for 4 Backup to be created
  • k describe lhb -n longhorn-system | grep -A 2 "Backup Mode"
    • The 1st and 3th should be incremental
    • The 2st and 4th should be full
  • k describe lhb -n longhorn-system | grep -A 5 "Newly"
    • The 1st should be "4194304" (not exactly)
    • The 2nd, 3rd and 4th should be "0"
  • k describe lhb -n longhorn-system | grep -A 5 "Re Uploaded Data Size"
    • The 1st and 3rd should be "0"
    • The 2nd and 4th should be "4194304" (not exactly, because it is compressed)

@derekbit
Copy link
Member Author

derekbit commented Mar 21, 2024

The current implementation is adding a backup-mode label within a recurring job. User must establish two separate recurring jobs for incremental and full backups to regulate the frequency of full backups. This method complicates management.

However, following discussions with @ChanYiLin and @c3y1huang, we can record the frequency, period, or count within recurring job or backup volume, and the solution could simplify the configuration, requiring only a single recurring job.

@innobead
Copy link
Member

innobead commented Jul 1, 2024

Added data-protection label.

@chriscchien
Copy link
Contributor

Verified pass on longhorn master(longhorn e5a1b5) with test steps

Waiting doc and UI PR mergerd then close this ticket.

@ChanYiLin
Copy link
Contributor

Hi @chriscchien
both doc and UI PR are merged.
Thanks

@chriscchien
Copy link
Contributor

Close this ticket for already verified here and the doc merged and UI feature verified.

@derekbit
Copy link
Member Author

Updated in longhorn/website#951

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backup-store Remote backup store related area/resilience System or volume resilience area/volume-data-protection Volume data protection related highlight Important feature/issue to highlight kind/feature Feature request, new feature priority/0 Must be implement or fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation require/important-note Upgrade, Deprecation, Important notes require/lep Require adding/updating enhancement proposal
Projects
Status: Closed
Development

No branches or pull requests

5 participants