Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi: fix the ROOK_CSI_DISABLE_DRIVER flag in the csi reconcile #14746

Merged
merged 1 commit into from
Sep 23, 2024

Conversation

parth-gr
Copy link
Member

@parth-gr parth-gr commented Sep 20, 2024

ROOK_CSI_DISABLE_DRIVER was not working accurate
from #14489
The csi driver was installed even if its not
expected

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@parth-gr parth-gr marked this pull request as draft September 20, 2024 10:04
@parth-gr parth-gr marked this pull request as ready for review September 20, 2024 10:06
@Madhu-1 Madhu-1 requested a review from subhamkrai September 20, 2024 10:22
Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need return here, we need to add condition https://github.com/rook/rook/blob/master/pkg/operator/ceph/csi/controller.go#L335 and include && !disableCSI otherwise CSI operator will not start

@@ -332,7 +332,7 @@ func (r *ReconcileCSI) reconcile(request reconcile.Request) (reconcile.Result, e
}
}

if !EnableCSIOperator() {
if !EnableCSIOperator() && !disableCSI {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

nit, not related to your change but if you could remove this empty line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya removed

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a more descriptive commit message and PR title. You can mention that the driver was being installed even though it was intended to be disabled.

@parth-gr parth-gr force-pushed the fix-csi-driver branch 2 times, most recently from 5c695d0 to 4072dbd Compare September 20, 2024 14:54
@parth-gr parth-gr changed the title csi: fix the csi driver reconcile csi: fix the ROOK_CSI_DISABLE_DRIVER flag in the csi reconcile Sep 20, 2024
ROOK_CSI_DISABLE_DRIVER was not working accurate
from rook#14489
The csi driver was installed even if its not
expected

Signed-off-by: parth-gr <partharora1010@gmail.com>
@parth-gr parth-gr added do-not-merge DO NOT MERGE :) backport-release-1.15 and removed do-not-merge DO NOT MERGE :) labels Sep 20, 2024
@parth-gr parth-gr requested a review from travisn September 20, 2024 15:36
@parth-gr
Copy link
Member Author

Testing looks good

storageclient-737342087af10580-status-reporter-28780773-jrphc     0/1     Completed   0               49s
ux-backend-server-57b55857db-8hvls                                2/2     Running     0               23h
[rider@fedora Downloads]$ oc get pods | grep csi
ceph-csi-controller-manager-564fdb9f68-z6j7m                      2/2     Running           0               23h
csi-addons-controller-manager-945b5fb8d-4z6ns                     2/2     Running           0               23h
openshift-storage.cephfs.csi.ceph.com-ctrlplugin-76db9dcf55vxd7   7/7     Running           0               23h
openshift-storage.cephfs.csi.ceph.com-ctrlplugin-76db9dcf57f8f8   7/7     Running           0               23h
openshift-storage.cephfs.csi.ceph.com-nodeplugin-h7qt4            3/3     Running           6               43h
openshift-storage.cephfs.csi.ceph.com-nodeplugin-ldp8t            3/3     Running           6               43h
openshift-storage.cephfs.csi.ceph.com-nodeplugin-qk2nd            3/3     Running           0               43h
openshift-storage.rbd.csi.ceph.com-ctrlplugin-7f75d86c95-kst96    7/7     Running           0               23h
openshift-storage.rbd.csi.ceph.com-ctrlplugin-7f75d86c95-r7wpm    7/7     Running           0               23h
openshift-storage.rbd.csi.ceph.com-nodeplugin-6662j               4/4     Running           0               43h
openshift-storage.rbd.csi.ceph.com-nodeplugin-jsvmd               4/4     Running           8               43h
openshift-storage.rbd.csi.ceph.com-nodeplugin-l7ztl               4/4     Running           8               43h
[rider@fedora Downloads]$ 

@@ -332,8 +332,7 @@ func (r *ReconcileCSI) reconcile(request reconcile.Request) (reconcile.Result, e
}
}

if !EnableCSIOperator() {

if !disableCSI && !EnableCSIOperator() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal of the setting ROOK_CSI_DISABLE_DRIVER: true to completely skip all reconcile of the csi driver and csi operator? If so, seems like we should add a return statement above on line 205, so that we skip all reconciling in this method. If that is not the intention, then I'm not clear what is the intention of this flag since it's still reconciling some csi resources between lines 210-333.

Copy link
Member Author

@parth-gr parth-gr Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question today to @subhamkrai as I was under the same impression,

He mentioned ROOK_CSI_DISABLE_DRIVER should only responsible for not creating CSI driver from rook

And enable_csi_operator should be responsible for creating CSI operator.

So if ROOK_CSI_DISABLE_DRIVER:true and enable_csi_operator:true it should create the CSI operator resources and not CSI driver
But we can discuss this if needed more discussion and changes...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@subhamkrai please add your thoughts ^^

Copy link
Member

@BlaineEXE BlaineEXE Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, ROOK_CSI_DISABLE_DRIVER is the config that we should prioritize working correctly. This config has existed for a while, and I think we expect it to continue to be used after the CSI operator is GA'ed as well. This config has had the meaning "Rook should not manage CSI driver resources," and I think keeping that meaning will help others continue to use Rook as they have been without unexpected results.

IIUC, ENABLE_CSI_OPERATOR is something that I think is temporary while we are working to test and migrate to the new CSI operator setup. I think it's fine if ENABLE_CSI_OPERATOR doesn't do anything if ROOK_CSI_DISABLE_DRIVER == true since we don't expect to need the option for a long time.

I would suggest that ROOK_CSI_DISABLE_DRIVER behaves to disable all Rook management of CSI drivers, including the operator. I think that is also what Travis is expecting, based on his comment.

I guess the important thing might be to ensure Rook will remove CSI drivers when ROOK_CSI_DISABLE_DRIVER == true, assuming that is the behavior in Rook today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes above should be the correct expectation 👍🏻

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to decide which flag we want to give precedence as let's say

  1. User set ROOK_CSI_DISABLE_DRIVER==true and ENABLE_CSI_OPERATOR == true, if we place return higher it will skip deploying CSI Operator.

While working on this, I treated ROOK_CSI_DISABLE_DRIVER to enable/disable driver code and ENABLE_CSI_OPERATOR to enable/disable operator code.

We can also log an error when user sets ROOK_CSI_DISABLE_DRIVER==trueandENABLE_CSI_OPERATOR == true. But just to mention, we have to csi way of deployment driver and operator` so to have it clear it clear for user I treated like this currently in the code.

Copy link
Member Author

@parth-gr parth-gr Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the huddle discussion,
We are keeping the current implementations,

Later when we remove the csi operator flag check, then we will bring back the disable the controller reconcile at very start of the reconcile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion, approving to go ahead with the current changes

@parth-gr parth-gr requested a review from travisn September 20, 2024 17:46
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marking as changes requested until the discussion is resolved

@@ -332,8 +332,7 @@ func (r *ReconcileCSI) reconcile(request reconcile.Request) (reconcile.Result, e
}
}

if !EnableCSIOperator() {

if !disableCSI && !EnableCSIOperator() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion, approving to go ahead with the current changes

@travisn travisn merged commit aba78ea into rook:master Sep 23, 2024
54 checks passed
mergify bot added a commit that referenced this pull request Sep 24, 2024
csi: fix the ROOK_CSI_DISABLE_DRIVER flag in the csi reconcile  (backport #14746)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants