-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cvc-operator): add automatic scaling of volumereplicas for CSI volumes #1613
feat(cvc-operator): add automatic scaling of volumereplicas for CSI volumes #1613
Conversation
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
@@ -375,7 +384,10 @@ func (c *CVCController) distributeCVRs( | |||
for count, pool := range usablePoolList.Items { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
poolList, err := listCStorPools(cspcName, claim.Spec.ReplicaCount)
Just list the cstor pools in listCStorPools, and have the check on count after getting usable pool list against pendingReplicaCount rather than spec.ReplicaCount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing name of ReplicaCount?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'if else' code at if claim.Spec.CstorVolumeSource != "" {
need to be rewritten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the check other two comments will be fixed while moving the code to cstor-operators repo.
@@ -297,6 +313,8 @@ func (c *CVCController) createVolumeOperation(cvc *apis.CStorVolumeClaim) (*apis | |||
cvc.Spec.Policy = volumePolicy.Spec | |||
cvc.Status.Phase = apis.CStorVolumeClaimPhaseBound | |||
cvc.Status.Capacity = cvc.Spec.Capacity | |||
// update volume replica pool information on cvc spec and status | |||
addReplicaPoolInfo(cvc, poolNames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about changing it as receiver function like cvc.addReplicaPoolInfo(poolNames)
// names and current replica pool names | ||
// 1. Below function will check whether there is any change in desired replica | ||
// pool names and current replica pool names. | ||
func (c *CVCController) isCVCScalePending(cvc *apis.CStorVolumeClaim) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a check that cvc is in bound state.. if not, return false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and also add comment that scale up/down of cvc will not work until cvc is in bound state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required to add a check. Resize, scale Up/Down will execute only once CVC is bound.
} else if len(cvc.Spec.Policy.ReplicaPool.PoolInfo) < len(cvc.Status.PoolInfo) { | ||
err = scaleDownVolumeReplicas(cvc) | ||
} else { | ||
c.recorder.Event(cvc, corev1.EventTypeWarning, "Migration", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this limitation get into webhooks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the webhook for CVC is ready we will have following checks
- Repetition of pool names either under spec or status shouldn't be there.
- InitialReplicaCount shouldn't be modified.
- The existing pool name can't be updated with a new pool name(Which is migration case).
- Not allow scale down/up of volume replicas when one other in progress.
- Not allow more that one replica at a time for scale down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Failure of scale up/down if added pool is not healthy ? ( but admin/user shouldn't be providing the unhealthy pool name in first place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets make replicaPool as replicaPoolInfo
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
return errors.Wrapf(err, "failed to get cstorvolumes object %s", cvc.Name) | ||
} | ||
if cvObj.Spec.DesiredReplicationFactor < drCount { | ||
cvObj.Spec.DesiredReplicationFactor = drCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we consider picking replicationCount from CVC instead of CV for cstor-volume-mgmt?
This will avoid maintaining this info at multiple places. So, cstor-volume-mgmt will do like:
- pick the cvc.spec.ReplicaCount if len of ReplicaPoolInfo is zero
- else consider replicationFactor as len of ReplicaPoolInfo if it is present
what steps does cstor-vol-mgmt performs on noticing change in DRF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to find from code, but unable to find other than scale down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this info on how cstor-vol-mgmt works helps in review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update PR with relevant pr's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scaleup function in cstor-volume-mgmt and updated the PR description with relevant PR's.
} else if len(cvc.Spec.Policy.ReplicaPool.PoolInfo) < len(cvc.Status.PoolInfo) { | ||
err = scaleDownVolumeReplicas(cvc) | ||
} else { | ||
c.recorder.Event(cvc, corev1.EventTypeWarning, "Migration", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Failure of scale up/down if added pool is not healthy ? ( but admin/user shouldn't be providing the unhealthy pool name in first place)
// If more than one replica was scale down at a time keep on return the error | ||
if (cvObj.Spec.ReplicationFactor - drCount) > 1 { | ||
return errors.Wrapf(err, | ||
"cann't perform %d replicas scaledown at a time", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can`t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
("replica scale-down can not be performed more then 1 at a time, requested for %d", (cvObj.Spec.DesiredReplicationFactor - drCount) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't Wrapf
the error here , it will nil , use errors.Errorf()
instead.
} | ||
cvrObj, err = getScaleDownCVR(cvc) | ||
if err != nil && !k8serror.IsNotFound(err) { | ||
return errors.Wrapf(err, "failed to get scale down CVR object") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have error like --> ("failed to get cvr %d , requested for scale down operation", cvc.Name)
// created PDB name. | ||
// 2. Update CVC label to point it to newely PDB got from above step and also | ||
// replicas pool information on status of CVC. | ||
func handlePostScalingProcess(cvc *apis.CStorVolumeClaim, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: func handlePostScalingOperations()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or use func updatePDBForScaledVolume ()
?
} | ||
for _, poolName := range newPoolNames { | ||
if _, ok := replicaPoolMap[poolName]; ok { | ||
scaledRPNames = append(scaledRPNames, poolName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just return error as 'in-progress' if !ok and update the status when all the required CVRs are required
// HAVolume) and update the replica pool info on CVC(API calls). | ||
// 2.2: If scalingUp was going then return error saying scalingUp was in | ||
// progress. | ||
func verifyAndUpdateScaleUpInfo(cvc *apis.CStorVolumeClaim, cvObj *apis.CStorVolume) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the function name as updateCVCWithScaledUpInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modifyCVCWithScaledUpInfo
klog.Errorf("%s", errorMsg) | ||
continue | ||
} | ||
hash, err := hash.Hash(pvName + "-" + poolName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes sense to follow same key for hash..
// created PDB name. | ||
// 2. Update CVC label to point it to newely PDB got from above step and also | ||
// replicas pool information on status of CVC. | ||
func handlePostScalingProcess(cvc *apis.CStorVolumeClaim, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change name to updatePDBForVolume
c.recorder.Eventf(cvc, | ||
corev1.EventTypeNormal, | ||
"ScalingVolumeReplicas", | ||
"successfully scaled volume replicas to %d", len(cvc.Status.PoolInfo)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to make cvc point to new object
return errors.Wrapf(err, "failed to get cstorvolumes object %s", cvc.Name) | ||
} | ||
// If more than one replica was scale down at a time keep on return the error | ||
if (cvObj.Spec.ReplicationFactor - drCount) > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can get into webhook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need similar ones for scale up also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't allow scale down if DRF != RF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check for spec's and status's ReplicaInfo in webhook
var err error | ||
cvcCopy := cvc.DeepCopy() | ||
cvc.Status.PoolInfo = []string{} | ||
cvc.Status.PoolInfo = append(cvc.Status.PoolInfo, currentRPNames...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove it and let caller take care of this
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
pdbName, hasPDB := cvcObj.GetLabels()[string(apis.PodDisruptionBudgetKey)] | ||
pdbLabels := cvclaim.GetPDBPoolLabels(cvcObj.Status.PoolInfo) | ||
labelSelector := apispdb.GetPDBLabelSelector(pdbLabels) | ||
func getUpdatePDBForVolume(cvcObj *apis.CStorVolumeClaim) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getModifiedPDBForScaledVolume
pdbName, hasPDB := cvcObj.GetLabels()[string(apis.PodDisruptionBudgetKey)] | ||
pdbLabels := cvclaim.GetPDBPoolLabels(cvcObj.Status.PoolInfo) | ||
labelSelector := apispdb.GetPDBLabelSelector(pdbLabels) | ||
func getUpdatePDBForVolume(cvcObj *apis.CStorVolumeClaim) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getModifiedPDBForScaledVolume
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes are good
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit pick.
/lgtm
Signed-off-by: mittachaitu <sai.chaithanya@mayadata.io>
Signed-off-by: mittachaitu sai.chaithanya@mayadata.io
What this PR does / why we need it:
This PR adds a feature to automatically ScaleUp/ScaleDown the volume replicas for CSI volumes.
Current CVC:
After creating PVC above CVC will be created in openebs namespace.
How the user/admin can perform scaling-up volume replicas:
In above CVC under cvc.spec.policy.replicaPool.poolInfo will show the volumereplicas pool information. Now to perform volume replica ScaleUp just add the pool entry under spec of CVC i.e spec.policy.replicaPool.poolInfo.
i.e Just change from
to
In above changes added the pool entry cstor-sparse-cspc-g54z. Rest of the things will be taken care by CVC controller.
How the user/admin can perform scaling-down volume replicas:
Same as above, to perform volume replica ScaleDown just remove the pool entry under spec of CVC i.e spec.policy.replicaPool.poolInfo.
to
In above changes removed the pool entry cstor-sparse-cspc-wgwf. Rest of the things will be taken care by CVC controller.
How User/Admin can know the completed process:
User/Admin can verify from the CVC events (or) Status of CVC will show the current volume replica pool names i.e cvc.Status.PoolInfo.
How events will look like
How the user/admin can perform the migration of volume replicas from one pool to another pool:
How Non-CSI volumes will scaleup/scaledown volume replicas:
ScaleUp and ScaleDown links.
Note:
Release Notes:
Add support for scaling volume replicas for CSI Volumes via CVC(Single point edit to perform scaling operation).
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Special notes for your reviewer:
Checklist:
documentation
tagbreaking-changes
tagrequires-upgrade
tag