Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve messaging on volume expansion #58415

Merged

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented Jan 17, 2018

  • we now provide clear message to user what to do when cloudprovider resizing is finished
    and file system resizing is needed.
  • add a event when resizing is successful
  • Use PATCH both in controller-manager and kubelet for updating PVC status
  • Remove code duplication between controller-manager and kubelet for updating PVC status
  • Only remove conditions that are managed by resize controller
Improve messages user gets during and after volume resizing is done.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 17, 2018
@k8s-github-robot k8s-github-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jan 17, 2018
@gnufied
Copy link
Member Author

gnufied commented Jan 17, 2018

/sig storage

cc @jsafrane @saad-ali

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Jan 17, 2018
@gnufied
Copy link
Member Author

gnufied commented Jan 18, 2018

/test pull-kubernetes-unit

@jsafrane
Copy link
Member

/assign

  • What removes the condition? I see it's only set and never removed / set to False.
  • What is the UX? Will kubect describe pvc show the condition?
  • An e2e test update would help :-)

@gnufied
Copy link
Member Author

gnufied commented Jan 18, 2018

The condition is removed here - https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/operationexecutor/operation_generator.go#L1372

Currently we don't seem to be showing Conditions in pvc describe. Will fix this in a follow up PR.

e2e test.. okay, will add. :-)

@jsafrane
Copy link
Member

The condition is removed here

Ack

Currently we don't seem to be showing Conditions in pvc describe. Will fix this in a follow up PR.

Ack

Type: v1.PersistentVolumeClaimFileSystemResizePending,
Status: v1.ConditionTrue,
LastTransitionTime: metav1.Now(),
Message: "Restart pod that is using this claim to finish File system resize on node",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it should say why the condition is set. "Waiting for user to (re-)start a pod to finish resize of the volume" or something in this direction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@gnufied gnufied force-pushed the fix-volume-resize-messages branch from fa751f6 to 57efdce Compare January 18, 2018 17:21
@jsafrane
Copy link
Member

/lgtm
for the controller changes Need API review/approvals.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 18, 2018
@gnufied
Copy link
Member Author

gnufied commented Jan 18, 2018

/test pull-kubernetes-e2e-kops-aws

@gnufied
Copy link
Member Author

gnufied commented Jan 19, 2018

Flaked on #58387

/test pull-kubernetes-e2e-kops-aws

@gnufied
Copy link
Member Author

gnufied commented Jan 19, 2018

/test pull-kubernetes-e2e-kops-aws

@liggitt
Copy link
Member

liggitt commented Jan 19, 2018

The condition is removed here - https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/operationexecutor/operation_generator.go#L1372

We cannot blindly remove/reset all conditions. Anywhere this flow is doing that needs to be corrected. I saw at least these places:

pvcCopy.Status.Conditions = []v1.PersistentVolumeClaimCondition{}

emptyCondition := []v1.PersistentVolumeClaimCondition{}
err := resizeMap.updatePVCCapacityAndConditions(pvcr, newSize, emptyCondition)

updatePVCCapacityAndConditions should be modified so that it doesn't encourage wholesale replacement of status conditions

  1. Only add/remove/modify conditions you explicitly know about. That maintains interoperability with other components adding conditions for their own purposes.
  2. If you are using conditions to coordinate between multiple components, be aware of the maximum version skew between those components (a controller setting a new condition and expecting it to be updated by a kubelet has to consider that kubelets up to 2 releases old will be processing this object... in this case, it might be accurate for the condition to continue reflecting a pending resize if an old kubelet wasn't paying any attention to the resize). Skew is less of an issue for features when they're in alpha. The point at which a feature goes beta, skew has to be considered carefully.

The kubelet/node condition updating code is a good example to look at:

// Check if NodeMemoryPressure condition already exists and if it does, just pick it up for update.
for i := range node.Status.Conditions {
if node.Status.Conditions[i].Type == v1.NodeMemoryPressure {
condition = &node.Status.Conditions[i]
}
}
newCondition := false
// If the NodeMemoryPressure condition doesn't exist, create one
if condition == nil {
condition = &v1.NodeCondition{
Type: v1.NodeMemoryPressure,
Status: v1.ConditionUnknown,
}
// cannot be appended to node.Status.Conditions here because it gets
// copied to the slice. So if we append to the slice here none of the
// updates we make below are reflected in the slice.
newCondition = true
}

It updates a specific condition, leaving all others undisturbed.

@gnufied gnufied force-pushed the fix-volume-resize-messages branch from 57efdce to 168f39f Compare January 22, 2018 21:32
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 22, 2018
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 22, 2018
@gnufied gnufied force-pushed the fix-volume-resize-messages branch from 168f39f to 18c9a5b Compare January 22, 2018 21:45
@gnufied
Copy link
Member Author

gnufied commented Jan 22, 2018

@liggitt I have updated the code, so as it only considers Conditions that volume expand controller explicitly knows about. Also, I have updated PVC update to use PATCH in both controller-manager and kubelet while I was at it. PTAL

@gnufied gnufied force-pushed the fix-volume-resize-messages branch from 18c9a5b to 0a4a593 Compare January 22, 2018 22:43
@gnufied
Copy link
Member Author

gnufied commented Jan 22, 2018

/retest

@gnufied gnufied force-pushed the fix-volume-resize-messages branch from 0a4a593 to 4d8bf53 Compare January 23, 2018 01:46
@childsb childsb added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jan 25, 2018
@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request Needs Approval

@davidz627 @gnufied @jsafrane @kubernetes/sig-storage-misc

Action required: This pull request must have the status/approved-for-milestone label applied by a SIG maintainer.

Pull Request Labels
  • sig/storage: Pull Request will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Help

for _, condition := range oldConditions {
// If Condition is of not resize type, we keep it.
if _, ok := knownResizeConditions[condition.Type]; !ok {
newConditions = append(newConditions, condition)
Copy link
Member

@liggitt liggitt Jan 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reorders conditions, which we don't want. see https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_node_status.go#L770-L788 for an example of in-place updating of an existing condition, then appending to the end if no existing condition is found

  1. if the condition already exists and already has the same status, there should be no change
  2. if the condition already exists with a different status, the status and timestamp should update
  3. if the condition does not exist, it should get appended to the end

only in case 2 and 3 should we attempt an update of the object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this. I did not quite use same approach as code you linked but the end result should be same.

@liggitt
Copy link
Member

liggitt commented Jan 26, 2018

once the condition reordering/updating is worked out, I think this looks right, though I'm having trouble following where each of the two conditions are added/removed (controller-side or kubelet-side). Will defer to the storage reviewer for those details.

@gnufied gnufied force-pushed the fix-volume-resize-messages branch from b90cb09 to 061ed90 Compare January 26, 2018 22:22
{
Type: v1.PersistentVolumeClaimResizing,
Status: v1.ConditionTrue,
LastTransitionTime: metav1.Now(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add time to this so it is distinctly different than currentTime

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@liggitt
Copy link
Member

liggitt commented Jan 27, 2018

one nit on test, condition handling looks good to me otherwise. will defer to storage reviewer for lgtm on overall flow.
/approve

- we now provide clear message to user what to do when cloudprovider resizing is finished
  and file system resizing is needed.
- add a event when resizing is successful.
- Use Patch for updating PVCs in both kubelet and controller-manager
- Extract updating pvc util function in one place.
- Only update resize conditions on progress
@gnufied gnufied force-pushed the fix-volume-resize-messages branch from 061ed90 to 1fa8cbc Compare January 29, 2018 20:08
@gnufied
Copy link
Member Author

gnufied commented Jan 29, 2018

@jsafrane can you PTAL.

@gnufied
Copy link
Member Author

gnufied commented Jan 29, 2018

@liggitt fixed the minor nit, you pointed out in test case.

/assign @derekwaynecarr

@childsb
Copy link
Contributor

childsb commented Jan 29, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 29, 2018
@gnufied
Copy link
Member Author

gnufied commented Feb 2, 2018

ping @derekwaynecarr @smarterclayton , it would be really nice to have this approved. The pending approval in pkg/kubelet is really minor and just adds an event.

@gnufied
Copy link
Member Author

gnufied commented Feb 2, 2018

/assign @smarterclayton

@derekwaynecarr
Copy link
Member

kubelet changes lgtm.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: childsb, derekwaynecarr, gnufied, jsafrane, liggitt

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2018
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@gnufied
Copy link
Member Author

gnufied commented Feb 6, 2018

/test pull-kubernetes-e2e-kops-aws

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 52942, 58415). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 4bd22b5 into kubernetes:master Feb 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants