Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cvc-operator): add automatic scaling of volumereplicas for CSI volumes #1613

Merged
merged 11 commits into from
Feb 26, 2020

Conversation

mittachaitu
Copy link

@mittachaitu mittachaitu commented Feb 11, 2020

Signed-off-by: mittachaitu sai.chaithanya@mayadata.io

What this PR does / why we need it:
This PR adds a feature to automatically ScaleUp/ScaleDown the volume replicas for CSI volumes.

Current CVC:

apiVersion: openebs.io/v1alpha1
kind: CStorVolumeClaim
metadata:
  annotations:
    openebs.io/volume-policy: ""
    openebs.io/volumeID: pvc-bd58d8d6-4d97-11ea-80a1-42010a8000d3
  creationTimestamp: "2020-02-12T13:01:10Z"
  finalizers:
  - cvc.openebs.io/finalizer
  generation: 2
  labels:
    openebs.io/cstor-pool-cluster: cstor-sparse-cspc
    openebs.io/pod-disruption-budget: cstor-sparse-cspcmssgq
  name: pvc-bd58d8d6-4d97-11ea-80a1-42010a8000d3
  namespace: openebs
  resourceVersion: "23259649"
  selfLink: /apis/openebs.io/v1alpha1/namespaces/openebs/cstorvolumeclaims/pvc-bd58d8d6-4d97-11ea-80a1-42010a8000d3
  uid: bd611b24-4d97-11ea-80a1-42010a8000d3
publish:
  nodeId: gke-sai-test-cluster-pool-1-2ad4a0b7-f2sd
spec:
  capacity:
    storage: 5Gi
  cstorVolumeRef:
    apiVersion: openebs.io/v1alpha1
    kind: CStorVolume
    name: pvc-bd58d8d6-4d97-11ea-80a1-42010a8000d3
    namespace: openebs
    resourceVersion: "23259625"
    uid: bd74b9ef-4d97-11ea-80a1-42010a8000d3
  policy:
    provision:
      replicaAffinity: false
    replica:
      affinity: null
      zvolWorkers: ""
    replicaPoolInfo:
      - poolName: cstor-sparse-cspc-l2ht
      - poolName: cstor-sparse-cspc-wgwf
      - poolName: cstor-sparse-cspc-wzt4
    target: {}
  replicaCount: 3
status:
  capacity:
    storage: 5Gi
  phase: Bound
  poolInfo:
  - cstor-sparse-cspc-l2ht
  - cstor-sparse-cspc-wgwf
  - cstor-sparse-cspc-wzt4
versionDetails:
  autoUpgrade: false
  desired: 1.7.0
  status:
    current: 1.7.0
    dependentsUpgraded: true
    lastUpdateTime: null
    state: ""

After creating PVC above CVC will be created in openebs namespace.

How the user/admin can perform scaling-up volume replicas:
In above CVC under cvc.spec.policy.replicaPool.poolInfo will show the volumereplicas pool information. Now to perform volume replica ScaleUp just add the pool entry under spec of CVC i.e spec.policy.replicaPool.poolInfo.
i.e Just change from

spec:
    replicaPoolInfo:
      - poolName: cstor-sparse-cspc-l2ht
      - poolName: cstor-sparse-cspc-wgwf
      - poolName: cstor-sparse-cspc-wzt4

to

spec:
    replicaPoolInfo:
      - poolName: cstor-sparse-cspc-l2ht
      - poolName: cstor-sparse-cspc-wgwf
      - poolName: cstor-sparse-cspc-wzt4
      - poolName: cstor-sparse-cspc-g54z

In above changes added the pool entry cstor-sparse-cspc-g54z. Rest of the things will be taken care by CVC controller.

How the user/admin can perform scaling-down volume replicas:
Same as above, to perform volume replica ScaleDown just remove the pool entry under spec of CVC i.e spec.policy.replicaPool.poolInfo.

spec:
    replicaPoolInfo:
      - poolName: cstor-sparse-cspc-l2ht
      - poolName: cstor-sparse-cspc-wgwf
      - poolName: cstor-sparse-cspc-wzt4
      - poolName: cstor-sparse-cspc-g54z

to

      ReplicaPoolInfo:
      - poolName: cstor-sparse-cspc-l2ht
      - poolName: cstor-sparse-cspc-wzt4
      - poolName: cstor-sparse-cspc-g54z

In above changes removed the pool entry cstor-sparse-cspc-wgwf. Rest of the things will be taken care by CVC controller.

How User/Admin can know the completed process:
User/Admin can verify from the CVC events (or) Status of CVC will show the current volume replica pool names i.e cvc.Status.PoolInfo.
How events will look like

Events:
  Type     Reason                 Age                From                         Message
  ----     ------                 ----               ----                         -------
  Normal   Synced                 32m                cstorvolumeclaim-controller  cstorvolumeclaim created successfully
  Warning  ScalingVolumeReplicas  20m (x2 over 21m)  cstorvolumeclaim-controller  scaling replicas from 3 to 4 in progress
  Normal   ScalingVolumeReplicas  20m                cstorvolumeclaim-controller  successfully scaled volume replicas to 4
  Warning  ScalingVolumeReplicas  15m                cstorvolumeclaim-controller  Scaling down volume replicas from 4 to 3 is in progress
  Normal   ScalingVolumeReplicas  14m                cstorvolumeclaim-controller  successfully scaled volume replicas to 3

How the user/admin can perform the migration of volume replicas from one pool to another pool:

  • Perform the scaledown step using CVC(i.e remove the pool entry under spec of CVC).
  • Perform the scaleup step using CVC(i.e Add the required pool entry under spec of CVC).

How Non-CSI volumes will scaleup/scaledown volume replicas:
ScaleUp and ScaleDown links.

Note:

  • This PR handles the PDB modifications according to scale up and scale down of volume replicas.
  • After performing scaling up/down the replica count will not be updated on CVC(treating it as initial replica count).
  • Scaling Down more than one replica at a time is not supported.

Release Notes:
Add support for scaling volume replicas for CSI Volumes via CVC(Single point edit to perform scaling operation).

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

Checklist:

  • Fixes #
  • Labelled this PR & related issue with documentation tag
  • PR messages has document related information
  • Labelled this PR & related issue with breaking-changes tag
  • PR messages has breaking changes related information
  • Labelled this PR & related issue with requires-upgrade tag
  • PR messages has upgrade related information
  • Commit has unit tests
  • Commit has integration tests

Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
cmd/cvc-operator/controller/cstorvolumeclaim.go Outdated Show resolved Hide resolved
cmd/cvc-operator/controller/cstorvolumeclaim.go Outdated Show resolved Hide resolved
cmd/cvc-operator/controller/cvc_controller.go Show resolved Hide resolved
cmd/cvc-operator/controller/cstorvolumeclaim.go Outdated Show resolved Hide resolved
cmd/cvc-operator/controller/cvc_controller.go Outdated Show resolved Hide resolved
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
@mittachaitu mittachaitu changed the title [WIP] feat(cvc-operator): add automatic scaling of volumereplicas via CVC [WIP] feat(cvc-operator): add automatic scaling of volumereplicas for CSI volumes Feb 12, 2020
@mittachaitu mittachaitu changed the title [WIP] feat(cvc-operator): add automatic scaling of volumereplicas for CSI volumes feat(cvc-operator): add automatic scaling of volumereplicas for CSI volumes Feb 12, 2020
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
@mittachaitu mittachaitu requested review from prateekpandey14, payes, shubham14bajpai and sonasingh46 and removed request for payes February 12, 2020 14:24
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
@mittachaitu mittachaitu requested a review from payes February 12, 2020 14:34
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
@@ -375,7 +384,10 @@ func (c *CVCController) distributeCVRs(
for count, pool := range usablePoolList.Items {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poolList, err := listCStorPools(cspcName, claim.Spec.ReplicaCount)
Just list the cstor pools in listCStorPools, and have the check on count after getting usable pool list against pendingReplicaCount rather than spec.ReplicaCount

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing name of ReplicaCount?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'if else' code at if claim.Spec.CstorVolumeSource != "" { need to be rewritten

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the check other two comments will be fixed while moving the code to cstor-operators repo.

@@ -297,6 +313,8 @@ func (c *CVCController) createVolumeOperation(cvc *apis.CStorVolumeClaim) (*apis
cvc.Spec.Policy = volumePolicy.Spec
cvc.Status.Phase = apis.CStorVolumeClaimPhaseBound
cvc.Status.Capacity = cvc.Spec.Capacity
// update volume replica pool information on cvc spec and status
addReplicaPoolInfo(cvc, poolNames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about changing it as receiver function like cvc.addReplicaPoolInfo(poolNames)

// names and current replica pool names
// 1. Below function will check whether there is any change in desired replica
// pool names and current replica pool names.
func (c *CVCController) isCVCScalePending(cvc *apis.CStorVolumeClaim) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a check that cvc is in bound state.. if not, return false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also add comment that scale up/down of cvc will not work until cvc is in bound state

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required to add a check. Resize, scale Up/Down will execute only once CVC is bound.

} else if len(cvc.Spec.Policy.ReplicaPool.PoolInfo) < len(cvc.Status.PoolInfo) {
err = scaleDownVolumeReplicas(cvc)
} else {
c.recorder.Event(cvc, corev1.EventTypeWarning, "Migration",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this limitation get into webhooks?

Copy link
Author

@mittachaitu mittachaitu Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the webhook for CVC is ready we will have following checks

  • Repetition of pool names either under spec or status shouldn't be there.
  • InitialReplicaCount shouldn't be modified.
  • The existing pool name can't be updated with a new pool name(Which is migration case).
  • Not allow scale down/up of volume replicas when one other in progress.
  • Not allow more that one replica at a time for scale down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Failure of scale up/down if added pool is not healthy ? ( but admin/user shouldn't be providing the unhealthy pool name in first place)

Copy link
Contributor

@vishnuitta vishnuitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make replicaPool as replicaPoolInfo

Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
return errors.Wrapf(err, "failed to get cstorvolumes object %s", cvc.Name)
}
if cvObj.Spec.DesiredReplicationFactor < drCount {
cvObj.Spec.DesiredReplicationFactor = drCount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider picking replicationCount from CVC instead of CV for cstor-volume-mgmt?
This will avoid maintaining this info at multiple places. So, cstor-volume-mgmt will do like:

  • pick the cvc.spec.ReplicaCount if len of ReplicaPoolInfo is zero
  • else consider replicationFactor as len of ReplicaPoolInfo if it is present
    what steps does cstor-vol-mgmt performs on noticing change in DRF?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to find from code, but unable to find other than scale down

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this info on how cstor-vol-mgmt works helps in review

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update PR with relevant pr's

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@mittachaitu mittachaitu Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else if len(cvc.Spec.Policy.ReplicaPool.PoolInfo) < len(cvc.Status.PoolInfo) {
err = scaleDownVolumeReplicas(cvc)
} else {
c.recorder.Event(cvc, corev1.EventTypeWarning, "Migration",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Failure of scale up/down if added pool is not healthy ? ( but admin/user shouldn't be providing the unhealthy pool name in first place)

// If more than one replica was scale down at a time keep on return the error
if (cvObj.Spec.ReplicationFactor - drCount) > 1 {
return errors.Wrapf(err,
"cann't perform %d replicas scaledown at a time",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can`t

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

("replica scale-down can not be performed more then 1 at a time, requested for %d", (cvObj.Spec.DesiredReplicationFactor - drCount) )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't Wrapf the error here , it will nil , use errors.Errorf() instead.

}
cvrObj, err = getScaleDownCVR(cvc)
if err != nil && !k8serror.IsNotFound(err) {
return errors.Wrapf(err, "failed to get scale down CVR object")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have error like --> ("failed to get cvr %d , requested for scale down operation", cvc.Name)

// created PDB name.
// 2. Update CVC label to point it to newely PDB got from above step and also
// replicas pool information on status of CVC.
func handlePostScalingProcess(cvc *apis.CStorVolumeClaim,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: func handlePostScalingOperations() ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or use func updatePDBForScaledVolume () ?

}
for _, poolName := range newPoolNames {
if _, ok := replicaPoolMap[poolName]; ok {
scaledRPNames = append(scaledRPNames, poolName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just return error as 'in-progress' if !ok and update the status when all the required CVRs are required

// HAVolume) and update the replica pool info on CVC(API calls).
// 2.2: If scalingUp was going then return error saying scalingUp was in
// progress.
func verifyAndUpdateScaleUpInfo(cvc *apis.CStorVolumeClaim, cvObj *apis.CStorVolume) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the function name as updateCVCWithScaledUpInfo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modifyCVCWithScaledUpInfo

klog.Errorf("%s", errorMsg)
continue
}
hash, err := hash.Hash(pvName + "-" + poolName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense to follow same key for hash..

// created PDB name.
// 2. Update CVC label to point it to newely PDB got from above step and also
// replicas pool information on status of CVC.
func handlePostScalingProcess(cvc *apis.CStorVolumeClaim,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change name to updatePDBForVolume

c.recorder.Eventf(cvc,
corev1.EventTypeNormal,
"ScalingVolumeReplicas",
"successfully scaled volume replicas to %d", len(cvc.Status.PoolInfo))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to make cvc point to new object

return errors.Wrapf(err, "failed to get cstorvolumes object %s", cvc.Name)
}
// If more than one replica was scale down at a time keep on return the error
if (cvObj.Spec.ReplicationFactor - drCount) > 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can get into webhook

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need similar ones for scale up also

Copy link
Contributor

@vishnuitta vishnuitta Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't allow scale down if DRF != RF

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for spec's and status's ReplicaInfo in webhook

var err error
cvcCopy := cvc.DeepCopy()
cvc.Status.PoolInfo = []string{}
cvc.Status.PoolInfo = append(cvc.Status.PoolInfo, currentRPNames...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove it and let caller take care of this

mittachaitu added 2 commits February 25, 2020 21:54
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
pdbName, hasPDB := cvcObj.GetLabels()[string(apis.PodDisruptionBudgetKey)]
pdbLabels := cvclaim.GetPDBPoolLabels(cvcObj.Status.PoolInfo)
labelSelector := apispdb.GetPDBLabelSelector(pdbLabels)
func getUpdatePDBForVolume(cvcObj *apis.CStorVolumeClaim) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getModifiedPDBForScaledVolume

pdbName, hasPDB := cvcObj.GetLabels()[string(apis.PodDisruptionBudgetKey)]
pdbLabels := cvclaim.GetPDBPoolLabels(cvcObj.Status.PoolInfo)
labelSelector := apispdb.GetPDBLabelSelector(pdbLabels)
func getUpdatePDBForVolume(cvcObj *apis.CStorVolumeClaim) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getModifiedPDBForScaledVolume

Copy link
Contributor

@vishnuitta vishnuitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes are good

Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
Copy link
Contributor

@prateekpandey14 prateekpandey14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit pick.
/lgtm

Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/release-note PR should be included in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants