Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMPROVEMENT] Reject strict-local + RWX volume creation #6735

Closed
yangchiu opened this issue Sep 20, 2023 · 7 comments
Closed

[IMPROVEMENT] Reject strict-local + RWX volume creation #6735

yangchiu opened this issue Sep 20, 2023 · 7 comments
Assignees
Labels
area/admission-webhook Kubernetes validation and mutating admission webhooks backport/1.6.4 backport/1.7.3 kind/improvement Request for improvement of existing function priority/0 Must be implement or fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/backport Require backport. Only used when the specific versions to backport have not been definied.
Milestone

Comments

@yangchiu
Copy link
Member

Is your improvement request related to a feature? Please describe (👍 if you like this request)

When the server node is down, the RWX nfs server is unable to create on another node because of the strict-local volume.
Maybe we should reject strict-local + RWX.

Also strict-local is for distributed application only. Distributed application means it has replication capability, so they will only RWO.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

https://suse.slack.com/archives/C02DR3N5T24/p1695171814534559

@yangchiu yangchiu added require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation kind/improvement Request for improvement of existing function require/backport Require backport. Only used when the specific versions to backport have not been definied. labels Sep 20, 2023
@innobead innobead added the area/admission-webhook Kubernetes validation and mutating admission webhooks label Sep 20, 2023
@innobead innobead added this to the v1.7.0 milestone Sep 20, 2023
@innobead innobead added the priority/0 Must be implement or fixed in this release (managed by PO) label Sep 20, 2023
@derekbit derekbit self-assigned this May 17, 2024
@innobead innobead modified the milestones: v1.7.0, v1.8.0 Jun 3, 2024
@innobead innobead assigned mantissahz and unassigned derekbit Jun 3, 2024
@derekbit derekbit moved this to Backlog in Longhorn Sprint Aug 3, 2024
@innobead innobead moved this to New Issues in Longhorn Sprint Sep 10, 2024
@derekbit derekbit assigned COLDTURNIP and unassigned mantissahz Nov 15, 2024
@derekbit
Copy link
Member

cc @ChanYiLin

@COLDTURNIP
Copy link
Contributor

PR: longhorn/longhorn-manager#3334

@mantissahz
Copy link
Contributor

mantissahz commented Dec 10, 2024

PR: longhorn/longhorn-manager#3334

Hi @COLDTURNIP,

Move the status of Longhorn Sprint to Review to make the bot create Pre Ready-For-Testing Checklist and update the list.
Then you could move the status to Ready for Testing.
Thanks.

@COLDTURNIP COLDTURNIP moved this from Implement to Review in Longhorn Sprint Dec 10, 2024
@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Dec 10, 2024

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at: [IMPROVEMENT] Reject strict-local + RWX volume creation #6735 (comment)

  • Is there a workaround for the issue? If so, where is it documented?
    This is not a workaround.

  • Does the PR include the explanation for the fix or the feature?
    Yes, see feat(webhook): reject strict-local RWX volume creation longhorn-manager#3334

  • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
    There's no deployment change.

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at feat(webhook): reject strict-local RWX volume creation longhorn-manager#3334

  • Which areas/issues this PR might have potential impacts on?
    No impact.

  • If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
    No LEP.

  • If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
    No UI change.

  • If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
    The documentation issue/PR is at

  • If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
    The automation skeleton PR is at
    The automation test case PR is at
    The issue of automation test case implementation is at (please create by the template)

  • If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
    No engine automation PR.

  • If labeled: require/manual-test-plan Has the manual test plan been documented?
    The updated manual test plan is at [IMPROVEMENT] Reject strict-local + RWX volume creation #6735 (comment)

  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    No compatibility issue.

@COLDTURNIP
Copy link
Contributor

Test Steps

  • Step 1. a Longhorn-ready cluster
  • Step 2. create a storage class (see attached example)
    • provisioner: driver.longhorn.io
    • parameters.dataLocality: strict-local
    • parameters.migratable: false
  • Step 3. create a workload, with a PVC that claim a ReadWriteMany volume of given above class
  • Step 4. verify the cluster
    • the pods of the workload must stick as pending because of PV
    • check the status of created PVC, it must be pending, and one of the event indicates that access mode rwx (migratable: false) is incompatible with data locality strict-local mode

The following example resources can be created/cleanup by kubectl apply -f example.yaml and kubectl delete -f example.yaml

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn-6735-strict-local-rwx
provisioner: driver.longhorn.io
parameters:
  dataEngine: v1
  numberOfReplicas: "1"
  dataLocality: strict-local
  migratable: "false"
  staleReplicaTimeout: "30"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: 6735-strict-local-rwx
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Mi
  storageClassName: longhorn-6735-strict-local-rwx
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: 6735-strict-local-rwx-verify
  labels:
    app: 6735-strict-local-rwx-verify
spec:
  replicas: 1
  selector:
    matchLabels:
      app: 6735-strict-local-rwx-verify
  template:
    metadata:
      labels:
        app: 6735-strict-local-rwx-verify
    spec:
      containers:
        - name: 6735-strict-local-rwx-verify
          image: busybox
          command: ['sh', '-c', 'while true ; do now=$(date +%s) ; echo $now ; for vol in share1 share2 share3 ; do echo $now >>/mnt/data/${vol}/log || echo $now >/mnt/data/${vol}/log ; done ; ls -l $(find /mnt/data -name log) ; sleep 1 ; done']
          volumeMounts:
            - mountPath: /mnt/data/share1
              name: share1
      volumes:
        - name: share1
          persistentVolumeClaim:
            claimName: 6735-strict-local-rwx

@c3y1huang
Copy link
Contributor

c3y1huang commented Dec 10, 2024

Test Steps

  • Step 1. a Longhorn-ready cluster

  • Step 2. create a storage class (see attached example)

    • provisioner: driver.longhorn.io
    • parameters.dataLocality: strict-local
    • parameters.migratable: false
  • Step 3. create a workload, with a PVC that claim a ReadWriteMany volume of given above class

  • Step 4. verify the cluster

    • the pods of the workload must stick as pending because of PV
    • check the status of created PVC, it must be pending, and one of the event indicates that access mode rwx (migratable: false) is incompatible with data locality strict-local mode

The following example resources can be created/cleanup by kubectl apply -f example.yaml and kubectl delete -f example.yaml

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn-6735-strict-local-rwx
provisioner: driver.longhorn.io
parameters:
  dataEngine: v1
  numberOfReplicas: "1"
  dataLocality: strict-local
  migratable: "false"
  staleReplicaTimeout: "30"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: 6735-strict-local-rwx
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Mi
  storageClassName: longhorn-6735-strict-local-rwx
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: 6735-strict-local-rwx-verify
  labels:
    app: 6735-strict-local-rwx-verify
spec:
  replicas: 1
  selector:
    matchLabels:
      app: 6735-strict-local-rwx-verify
  template:
    metadata:
      labels:
        app: 6735-strict-local-rwx-verify
    spec:
      containers:
        - name: 6735-strict-local-rwx-verify
          image: busybox
          command: ['sh', '-c', 'while true ; do now=$(date +%s) ; echo $now ; for vol in share1 share2 share3 ; do echo $now >>/mnt/data/${vol}/log || echo $now >/mnt/data/${vol}/log ; done ; ls -l $(find /mnt/data -name log) ; sleep 1 ; done']
          volumeMounts:
            - mountPath: /mnt/data/share1
              name: share1
      volumes:
        - name: share1
          persistentVolumeClaim:
            claimName: 6735-strict-local-rwx

@COLDTURNIP This seems like a simple case to automate with a robot test. I encourage you to give implementing the test a try.

Ref https://github.com/longhorn/longhorn-tests/tree/master/e2e

cc @derekbit @longhorn/qa

@COLDTURNIP COLDTURNIP moved this from Review to Ready For Testing in Longhorn Sprint Dec 10, 2024
@chriscchien chriscchien self-assigned this Dec 10, 2024
@chriscchien
Copy link
Contributor

Verified pass on longhorn master(longhorn/longhorn-manager@6da14ea)

  • following test steps, can see creation of RWX + strict-local PVC jejected
  Warning  ProvisioningFailed    36s (x8 over 2m43s)   driver.longhorn.io_csi-provisioner-5847bdddf5-425wm_8a4fcc3e-7de1-4616-b9b4-9640c305a198  failed to provision volume with StorageClass "longhorn-6735-strict-local-rwx": rpc error: code = Internal desc = Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [code=Internal Server Error, detail=, message=failed to create volume: unable to create volume pvc-865e5e1f-4c01-461b-b87f-e26cfcef2ea8: admission webhook "validator.longhorn.io" denied the request: access mode rwx (migratable: false) is incompatible with data locality strict-local mode] from [http://longhorn-backend:9500/v1/volumes]
  • Create RWX volume with data locality=strict-local from UI rejected as well.
    image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/admission-webhook Kubernetes validation and mutating admission webhooks backport/1.6.4 backport/1.7.3 kind/improvement Request for improvement of existing function priority/0 Must be implement or fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/backport Require backport. Only used when the specific versions to backport have not been definied.
Projects
Status: Closed
Development

No branches or pull requests

8 participants