Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lep): support a ReadOnlyMany behavior. #7853

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

james-munson
Copy link
Contributor

Which issue(s) this PR fixes:

Issue #5041

What this PR does / why we need it:

Make an LEP with a proposal for how to implement ReadOnlyMany behavior in Longhorn code.

Special notes for your reviewer:

Additional documentation or context

Signed-off-by: James Munson <james.munson@suse.com>
@james-munson james-munson requested a review from a team as a code owner February 5, 2024 20:47

## Proposal

Add a parameter to storageclass and thus to volume spec named `mountReadOnly` with default value of `false`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just leverage the access mode of a PVC to determine if the volume is ROM. The mount ro option should supersede the customized options provided by users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @innobead's suggestion.
If the PVC is ROM, we automatically append ro option.

- Auto-remount feature (volumes that are read-only due to some mishap but should not be) needs a way to check the volume's intent. It would be cumbersome to make it parse the NFS mount options to figure that out.
- Since it requires using `nfsOptions`, the user is responsible for knowing all the other necessary mount options to add to "ro".

#### 2. New `ReadOnlyMany` access mode enum value:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an official support, we should go with option 2 to provide a better user experience via a standard interface (access mode: ROM).

@innobead innobead requested a review from derekbit February 6, 2024 09:33
@innobead
Copy link
Member

innobead commented Feb 6, 2024

Would like to hear thoughts from @derekbit and @shuo-wu

@innobead innobead requested a review from shuo-wu February 6, 2024 09:33
@l-mb
Copy link

l-mb commented Feb 6, 2024

Good idea!

Instead of "only" doing this on the mount side though I believe we'll want to change the actual export to be read-only (or rather, not rw), so that the NFS server side enforces this too.

Copy link
Contributor

@ejweber ejweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the comments so far: we should leverage the Kubernetes AccessMode = ReadOnlyMany (set in PVC and PV) instead of adding a new field to a StorageClass and Longhorn volume.

We should consider the possibility that we will be asked to create a volume with multiple capabilities / access modes simultaneously. For example:

  • ReadOnlyMany, and
  • ReadWriteMany.

Then, we may be asked to publish the volume with either of these access modes.

What should we do in this case?

I think we should probably create a Longhorn volume with AccessMode == rwx when responding to the CreateVolume request. Then, we should modify behavior appropriately in ControllerPublishVolume (if we want to enforce read-only on the export) and NodeStageVolume/NodePublishVolume (where we will modify mount options).

@ejweber
Copy link
Contributor

ejweber commented Feb 6, 2024

Instead of "only" doing this on the mount side though I believe we'll want to change the actual export to be read-only (or rather, not rw), so that the NFS server side enforces this too.

I am not sure this is strictly necessary, but it may be the "right" thing to do. It looks like we already template the Ganesha configuration, so it might not be too difficult to change the access type on the export if we can get the right information to the share manager pod (https://github.com/longhorn/longhorn-share-manager/blob/3168f43cad72c25cec704494064f3f58e28b7210/pkg/server/nfs/nfs_server.go#L148-L169). That may mean an additional API change though, since I don't think there's an existing ShareManager field to drop the read-only information into.

@PhanLe1010
Copy link
Contributor

PhanLe1010 commented Feb 8, 2024

As discussed with @james-munson, currently Kubernetes has 4 access modes in total (link):

  1. RWO - ReadWriteOnce
  2. ROX - ReadOnlyMany
  3. RWX - ReadWriteMany
  4. RWOP - ReadWriteOncePod

We are missing ReadOnlyMany and ReadWriteOncePod. Looks like from the discussion so far, we are going to implement ReadOnlyMany. Maybe ReadWriteOncePod can be on a different ticket but it would be great if we keep it in mind while developing this ReadOnlyMany mode.

@PhanLe1010
Copy link
Contributor

simultaneously. For example:

ReadOnlyMany, and
ReadWriteMany.
Then, we may be asked to publish the volume with either of these access modes.

What should we do in this case?

I think we should probably create a Longhorn volume with AccessMode == rwx when responding to the CreateVolume request. Then, we should modify behavior appropriately in ControllerPublishVolume (if we want to enforce read-only on the export) and NodeStageVolume/NodePublishVolume (where we will modify mount options).

This is a good point. I think we can take a look at how CSI sidecar is handling the selection when there are multiple modes specified. Some combinations should be blocked in CreateVolume: https://github.com/kubernetes-csi/csi-lib-utils/blob/f82f9de5b8aeb3c3b236d7f58fc5eeab34438078/accessmodes/access_modes.go#L36-L78

@PhanLe1010
Copy link
Contributor

Instead of "only" doing this on the mount side though I believe we'll want to change the actual export to be read-only (or rather, not rw), so that the NFS server side enforces this too.

Agree with @ejweber that this might not be strictly necessary. When PVC requests ReadOnlyMany, it is not mandatory that we have to force the underlying storage layer to be read-only https://kubernetes.io/docs/concepts/storage/persistent-volumes/#:~:text=For%20example%2C%20even%20if%20a%20PersistentVolume%20is%20created%20as%20ReadOnlyMany%2C%20it%20is%20no%20guarantee%20that%20it%20will%20be%20read%2Donly

@james-munson james-munson marked this pull request as draft February 16, 2024 21:01
@james-munson
Copy link
Contributor Author

Converting this to draft while I incorporate the feedback and renew with the recommended approach. I'll either re-open or close and replace with another PR.

@innobead
Copy link
Member

@james-munson This PR has been pending for a while. Let's resume this and try to get it for 1.7.0 still.

@halfcrazy
Copy link

halfcrazy commented May 30, 2024

A NFS based solution has highly available problems if we need a read-only rox solution could it decouple from NFS and use ISCSI instead since we don't need a write lock?

@derekbit
Copy link
Member

A NFS based solution has highly available problems if we need a read-only rox solution could it decouple from NFS and use ISCSI instead since we don't need a write lock?

@halfcrazy
Can you elaborate more on the highly available problems?

@halfcrazy
Copy link

halfcrazy commented May 30, 2024

A NFS based solution has highly available problems if we need a read-only rox solution could it decouple from NFS and use ISCSI instead since we don't need a write lock?

@halfcrazy Can you elaborate more on the highly available problems?

#6205

My use case it's to use a global rwx PVC to provide vmtool volume for multiple kubevirt pods on different nodes. If the share manage node shuts down unexpectedly, pods can't recover. We disable auto-delete-pod-when-volume-detached-unexpectedly because it may shutdown our VMs.

@derekbit
Copy link
Member

A NFS based solution has highly available problems if we need a read-only rox solution could it decouple from NFS and use ISCSI instead since we don't need a write lock?

@halfcrazy Can you elaborate more on the highly available problems?

#6205

My use case it's to use a global rwx PVC to provide vmtool volume for multiple kubevirt pods on different nodes. If the share manage node shuts down unexpectedly, pods can't recover. We disable auto-delete-pod-when-volume-detached-unexpectedly because it may shutdown our VMs.

Thanks for the explanation. It's a real use case for the share-manager HA. cc @james-munson @innobead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New Issues
Development

Successfully merging this pull request may close these issues.

7 participants