Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][v1.8.0-rc1] Default backup target is periodically cleared #10043

Closed
roger-ryao opened this issue Dec 23, 2024 · 11 comments
Closed

[BUG][v1.8.0-rc1] Default backup target is periodically cleared #10043

roger-ryao opened this issue Dec 23, 2024 · 11 comments
Assignees
Labels
area/backup-store Remote backup store related kind/bug priority/0 Must be implement or fixed in this release (managed by PO) reproduce/always 100% reproducible require/qa-review-coverage Require QA to review coverage severity/1 Function broken (a critical incident with very high impact (ex: data corruption, failed upgrade)
Milestone

Comments

@roger-ryao
Copy link

Describe the bug

I've noticed that the default backup target configuration is being automatically cleared.

To Reproduce

  1. Open the Longhorn UI
  2. Navigate to Setting > Backup Target
  3. Configure the default backup target with MinIO settings
  4. Wait for a period of time
  5. Check the backup target configuration

Expected behavior

The default backup target should remain configured and not be cleared.

Support bundle for troubleshooting

supportbundle_3005eb36-292c-4bad-835e-4a7acd7d6515_2024-12-23T08-26-03Z.zip

Environment

  • Longhorn version: 1.8.0-rc1
  • Impacted volume (PV):
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Kubectl
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.30.7+k3s1
    • Number of control plane nodes in the cluster:1
    • Number of worker nodes in the cluster:3
  • Node config
    • OS type and version: ubuntu24.04
    • Kernel version:
    • CPU per node:
    • Memory per node:
    • Disk type (e.g. SSD/NVMe/HDD):
    • Network bandwidth between the nodes (Gbps):
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): AWS EC2
  • Number of Longhorn volumes in the cluster:

Additional context

Workaround and Mitigation

@roger-ryao roger-ryao added kind/bug severity/1 Function broken (a critical incident with very high impact (ex: data corruption, failed upgrade) reproduce/always 100% reproducible area/backup-store Remote backup store related require/qa-review-coverage Require QA to review coverage require/backport Require backport. Only used when the specific versions to backport have not been definied. labels Dec 23, 2024
@roger-ryao roger-ryao added this to the v1.8.0 milestone Dec 23, 2024
@github-project-automation github-project-automation bot moved this to New Issues in Longhorn Sprint Dec 23, 2024
@mantissahz mantissahz moved this from New Issues to Analysis and Design in Longhorn Sprint Dec 23, 2024
@mantissahz mantissahz self-assigned this Dec 23, 2024
@mantissahz
Copy link
Contributor

As discussed here longhorn/longhorn-ui#755
We would like to remove the global settings "backup-target", "backup-target-credential-secret", and "backupstore-poll-interval" for this unsynced scenario.

@derekbit
Copy link
Member

As discussed here longhorn/longhorn-ui#755 We would like to remove the global settings "backup-target", "backup-target-credential-secret", and "backupstore-poll-interval" for this unsynced scenario.

@mantissahz We won't directly remove the settings in v1.8.0, so we still need to figure out a solution.

@derekbit
Copy link
Member

derekbit commented Dec 24, 2024

After discussing with @mantissahz, some tasks need to be addressed

  • v1.7.2 behavior: check whether the spec files of the default backupTarget resource are cleaned up when user empties the settings backup-target, backup-target-credential-secret, and backupstore-poll-interval.

If the answer to the above question is true, then the behaviors of the v1.8.0 and older versions are consistent. Then, what we can do for the v1.8.0:

  • Installation or Upgrade

    • We should avoid directly removing the three default settings. Instead, they should be deprecated in v1.8.0 and removed in a later version. Before announcing the deprecation, we need to confirm whether it is feasible to include a default backupTarget resource YAML in the Helm chart and manifest. This will allow users to configure the default backupTarget using the new method.
    • If the answer to the above question is true, add a deprecation announcement in the official document. Currently, we will replace these three settings with a default backupTarget resource YAML in the future.
  • Runtime

    • If users attempt to modify the default backupTarget resource through the UI, the operation should be rejected. Instead, the UI should hint users to update the settings via the Setting page.
    • We should also explicitly state that the default backupTarget resource should not be directly updated by editing the resource. Users must update the three settings as v1.7.x or earlier versions. Additionally, these three settings will be removed starting from v1.9.0.

cc @c3y1huang @innobead

@derekbit
Copy link
Member

@c3y1huang @innobead Do you agree with the proposal?

@mantissahz
Copy link
Contributor

v1.7.2 behavior: check whether the spec files of the default backupTarget resource are cleaned up when user empties the settings backup-target, backup-target-credential-secret, and backupstore-poll-interval.

It is confirmed at v1.7.2, the fields spec.backupTargetURL, spec.credentialSecret, and spec.pollInterval of the BackupTarget CR default will be updated to the global settings backup-target, backup-target-credential, and backupstore-poll-inteval every hour (settingControllerResyncPeriod = time.Hour).

@mantissahz
Copy link
Contributor

mantissahz commented Dec 25, 2024

Summary update:

  • Our original plan:
    Remove the global settings backup-target, backup-target-credential-secret, and backupstore-poll-interval from longhorn-manager and longhorn chart.
  • What we are encountering? Why?
    After removing the settings, we need to have a way to create the default backup target, therefore, I add a new YAML file in the chart/templates directory as
apiVersion: longhorn.io/v1beta2
kind: BackupTarget
metadata:
  labels: {{- include "longhorn.labels" . | nindent 4 }}
  namespace: {{ include "release_namespace" . }}
  name: default
spec:
  {{- if .Values.defaultSettings.backupTarget }}
  backupTargetURL: {{ .Values.defaultSettings.backupTarget }}
  {{- end }}
  {{- if .Values.defaultSettings.defaultBackupTarbackupTargetCredentialSecretetSecret }}
  credentialSecret: {{ .Values.defaultSettings.backupTargetCredentialSecret }}
  {{- end }}
  pollInterval: {{ .Values.defaultSettings.backupstorePollInterval | default "5m0s" }}

It succeeded in deploying the manifest YAML file but failed when installing with Helm because the CRD needs to be installed first.

  • Potential solution
    1. Move creating the default backup target to the post-install hook.
    2. Create the default backup target when longhorn-manager daemon initializes.

@mantissahz mantissahz moved this from Analysis and Design to Implement in Longhorn Sprint Dec 25, 2024
@mantissahz mantissahz moved this from Implement to Analysis and Design in Longhorn Sprint Dec 25, 2024
@mantissahz mantissahz moved this from Analysis and Design to Implement in Longhorn Sprint Dec 26, 2024
@derekbit derekbit added the priority/0 Must be implement or fixed in this release (managed by PO) label Dec 26, 2024
@mantissahz mantissahz moved this from Implement to Review in Longhorn Sprint Dec 27, 2024
@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Dec 27, 2024

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:
  1. Fresh install or upgrade.
  2. Settings backup-target, backup-target-credential-secret, and backupstore-poll-interval are removed from global settings.
  3. The default backup target should exist.
  4. Users can still control these three settings defaultSettings.backupTarget, defaultSettings.backupTargetCredentialSecret, and defaultSettings.backupstorePollInterval in the file values.yaml to modify the default backup target's URL, secret and poll interval when installing Longhorn with Helm.
  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

  • Does the PR include the explanation for the fix or the feature?

  • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
    The PR for the YAML change is at:
    The PR for the chart change is at:

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at
    feat: remove default backup target settings longhorn-manager#3414

  • Which areas/issues this PR might have potential impacts on?
    Area
    Issues

  • If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
    The LEP PR is at

  • If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
    The UI issue/PR is at

  • If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
    The documentation issue/PR is at

  • If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
    The automation skeleton PR is at
    The automation test case PR is at
    The issue of automation test case implementation is at (please create by the template)
    fix: pytest test cases for removing global backup settings. longhorn-tests#2229
    fix(e2e): use backup target API if settings are removed longhorn-tests#2230

  • If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
    The engine automation PR is at

  • If labeled: require/manual-test-plan Has the manual test plan been documented?
    The updated manual test plan is at

  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at

@derekbit derekbit added require/doc Require updating the longhorn.io documentation and removed require/backport Require backport. Only used when the specific versions to backport have not been definied. labels Dec 27, 2024
@derekbit
Copy link
Member

doc needs to be updated (longhorn/longhorn-manager#3414 (comment))

@roger-ryao
Copy link
Author

Verified on master-head 20241231

Verified on v1.8.0-rc2 20241231

The test steps
#10043 (comment)

Result

    • Manual testing was successfully completed.

doc needs to be updated (longhorn/longhorn-manager#3414 (comment))

    • require/doc: There is currently no related documentation or PR addressing it.

@roger-ryao roger-ryao moved this from Testing to Implement in Longhorn Sprint Dec 31, 2024
@derekbit derekbit added the require/important-note Upgrade, Deprecation, Important notes label Jan 1, 2025
@derekbit
Copy link
Member

derekbit commented Jan 1, 2025

In addition to the doc, add an important note as well.

@derekbit
Copy link
Member

derekbit commented Jan 3, 2025

@mantissahz The PR is merged. Please update the status.

For the doc, it should be handled in #5411

@mantissahz mantissahz moved this from Implement to Ready For Testing in Longhorn Sprint Jan 3, 2025
@mantissahz mantissahz removed the require/important-note Upgrade, Deprecation, Important notes label Jan 3, 2025
@derekbit derekbit removed the require/doc Require updating the longhorn.io documentation label Jan 3, 2025
@github-project-automation github-project-automation bot moved this from Ready For Testing to Closed in Longhorn Sprint Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backup-store Remote backup store related kind/bug priority/0 Must be implement or fixed in this release (managed by PO) reproduce/always 100% reproducible require/qa-review-coverage Require QA to review coverage severity/1 Function broken (a critical incident with very high impact (ex: data corruption, failed upgrade)
Projects
Status: Closed
Development

No branches or pull requests

4 participants