Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-45927: Stop removing the finalizers from BMH #7253

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MahnoorAsghar
Copy link

This code was added to handle OCPBUGS-7581, and it was meant to be removed after it was resolved. Its fix has long merged and was backported till OCP 4.12; therefore this code can now be removed.
Removing the BMH finalizer causes the BMH to instantly be deleted, which means that BMO is unable to do any necessary cleanup. In OCPBUGS-25927, BMO encounters an error while attempting to update the status of an already deleted BMH:
2024-11-13T07:28:35.215792213Z {"level":"info","ts":1731482915.2157853,"logger":"controllers.BareMetalHost","msg":"saving host status","baremetalhost":{"name":"gateway-1.workload.fc18.lab","namespace":"openshift-machine-api"},"provisioningState":"unmanaged","operational status":"discovered","provisioning state":"deleting"}
2024-11-13T07:28:35.220351574Z {"level":"error","ts":1731482915.2201939,"msg":"Reconciler error","controller":"baremetalhost","controllerGroup":"metal3.io","controllerKind":"BareMetalHost","BareMetalHost":{"name":"gateway-1.workload.fc18.lab","namespace":"openshift-machine-api"},"namespace":"openshift-machine-api","name":"gateway-1.workload.fc18.lab","reconcileID":"dbda2c02-2b4d-4a5e-ac98-531d4fc46263",
"error":"failed to save host status after "unmanaged": Operation cannot be fulfilled on baremetalhosts.metal3.io "gateway-1.workload.fc18.lab": StorageError: invalid object, Code: 4, Key: /kubernetes.io/metal3.io/baremetalhosts/openshift-machine-api/gateway-1.workload.fc18.lab, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: c207fd48-3dc4-455a-ae12-b4e44fda2285, UID in object meta: ","errorVerbose":"Operation cannot be fulfilled on baremetalhosts.metal3.io "gateway-1.workload.fc18.lab": StorageError: invalid object, Code: 4, Key: /kubernetes.io/metal3.io/baremetalhosts/openshift-machine-api/gateway-1.workload.fc18.lab, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: c207fd48-3dc4-455a-ae12-b4e44fda2285, UID in object meta: \nfailed to save host status after "unmanaged"\n

  • Should this PR be tested by the reviewer? If possible, it would be helpful
  • Is this PR relying on CI for an e2e test run? Yes
  • Should this PR be tested in a specific environment? With baremetal-operator running and when assisted issues a delete request for a BareMetalHost.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • [] None

How was this code tested?

  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

This PR should be backported till 4.12

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 31, 2025
@openshift-ci-robot
Copy link

@MahnoorAsghar: This pull request references Jira Issue OCPBUGS-45927, which is invalid:

  • expected the bug to target the "4.19.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This code was added to handle OCPBUGS-7581, and it was meant to be removed after it was resolved. Its fix has long merged and was backported till OCP 4.12; therefore this code can now be removed.
Removing the BMH finalizer causes the BMH to instantly be deleted, which means that BMO is unable to do any necessary cleanup. In OCPBUGS-25927, BMO encounters an error while attempting to update the status of an already deleted BMH:
2024-11-13T07:28:35.215792213Z {"level":"info","ts":1731482915.2157853,"logger":"controllers.BareMetalHost","msg":"saving host status","baremetalhost":{"name":"gateway-1.workload.fc18.lab","namespace":"openshift-machine-api"},"provisioningState":"unmanaged","operational status":"discovered","provisioning state":"deleting"}
2024-11-13T07:28:35.220351574Z {"level":"error","ts":1731482915.2201939,"msg":"Reconciler error","controller":"baremetalhost","controllerGroup":"metal3.io","controllerKind":"BareMetalHost","BareMetalHost":{"name":"gateway-1.workload.fc18.lab","namespace":"openshift-machine-api"},"namespace":"openshift-machine-api","name":"gateway-1.workload.fc18.lab","reconcileID":"dbda2c02-2b4d-4a5e-ac98-531d4fc46263",
"error":"failed to save host status after "unmanaged": Operation cannot be fulfilled on baremetalhosts.metal3.io "gateway-1.workload.fc18.lab": StorageError: invalid object, Code: 4, Key: /kubernetes.io/metal3.io/baremetalhosts/openshift-machine-api/gateway-1.workload.fc18.lab, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: c207fd48-3dc4-455a-ae12-b4e44fda2285, UID in object meta: ","errorVerbose":"Operation cannot be fulfilled on baremetalhosts.metal3.io "gateway-1.workload.fc18.lab": StorageError: invalid object, Code: 4, Key: /kubernetes.io/metal3.io/baremetalhosts/openshift-machine-api/gateway-1.workload.fc18.lab, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: c207fd48-3dc4-455a-ae12-b4e44fda2285, UID in object meta: \nfailed to save host status after "unmanaged"\n

  • Should this PR be tested by the reviewer? If possible, it would be helpful
  • Is this PR relying on CI for an e2e test run? Yes
  • Should this PR be tested in a specific environment? With baremetal-operator running and when assisted issues a delete request for a BareMetalHost.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • [] None

How was this code tested?

  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

This PR should be backported till 4.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jan 31, 2025
@openshift-ci openshift-ci bot requested review from ori-amizur and pastequo January 31, 2025 13:41
Copy link

openshift-ci bot commented Jan 31, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MahnoorAsghar
Once this PR has been reviewed and has the lgtm label, please assign danielerez for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

codecov bot commented Jan 31, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.93%. Comparing base (4dbd6fe) to head (f91ab27).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #7253   +/-   ##
=======================================
  Coverage   67.93%   67.93%           
=======================================
  Files         300      300           
  Lines       40895    40891    -4     
=======================================
- Hits        27780    27779    -1     
+ Misses      10624    10623    -1     
+ Partials     2491     2489    -2     
Files with missing lines Coverage Δ
...nternal/controller/controllers/agent_controller.go 76.26% <ø> (+0.08%) ⬆️

... and 2 files with indirect coverage changes

Copy link

openshift-ci bot commented Jan 31, 2025

@MahnoorAsghar: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn f91ab27 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants