-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in setting node statusUpdateNeeded flag #32807
Fix race condition in setting node statusUpdateNeeded flag #32807
Conversation
Removing this from 1.4 milestone per offline discussion. We will get it merged, give it time to bake, and merge it into a 1.4.x release. |
dfeb259
to
25a0678
Compare
GCE e2e build/test passed for commit 25a0678. |
ResetNodeStatusUpdateNeeded(nodeName string) error | ||
// node to true indicating the AttachedVolume field of the Node's Status | ||
// object needs to be updated by the node updater again. | ||
ResetNodeStatusUpdateNeeded(nodeName string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert the If no node with the...
portion of the comment since it is still applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I remove the returned error because I could not see the use of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can result in a failed to setNodeStatusUpdateNeeded
error if nodeName does not exist. I like documentation comments to capture 1) what the input is, 2) what the normal output is, and 3) what results in an error. So I would leave the comment as is. That said, I'll leave it up to you to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the comments
ResetNodeStatusUpdateNeeded(nodeName string) error | ||
// node to true indicating the AttachedVolume field of the Node's Status | ||
// object needs to be updated by the node updater again. | ||
ResetNodeStatusUpdateNeeded(nodeName string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SInce this method now sets the value of statusUpdatedNeeded
to true
instead of false
, change the name to reflect the behavior: SetNodeStatusUpdateNeeded(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple minor comments, otherwise LGTM
@saad-ali PTAL |
1 similar comment
@saad-ali PTAL |
25a0678
to
d696d45
Compare
"failed to ResetNodeStatusUpdateNeeded(nodeName=%q) nodeName does not exist", | ||
// should not happen | ||
glog.Errorf( | ||
"failed to setNodeStatusUpdateNeeded(nodeName=%q) nodeName does not exist", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add needed bool
to error message for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ResetNodeStatusUpdateNeeded(nodeName string) error | ||
// node to true indicating the AttachedVolume field of the Node's Status | ||
// object needs to be updated by the node updater again. | ||
ResetNodeStatusUpdateNeeded(nodeName string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can result in a failed to setNodeStatusUpdateNeeded
error if nodeName does not exist. I like documentation comments to capture 1) what the input is, 2) what the normal output is, and 3) what results in an error. So I would leave the comment as is. That said, I'll leave it up to you to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments
// Update the flag statusUpdateNeeded to indicate whether node status is already updated or | ||
// needs to be updated again by the node status updater. | ||
// This is an internal function and caller should acquire and release the lock | ||
func (asw *actualStateOfWorld) setNodeStatusUpdateNeeded(nodeName string, needed bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: to avoid confusing this with the SetNodeStatusUpdateNeeded
which sets the value to true
, maybe rename it to modifyNodeStatusUpdateNeeded
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
This PR fixes the race condition in setting node statusUpdateNeeded flag in master's attachdetach controller. This flag is used to indicate whether a node status has been updated by the node_status_updater or not. When updater finishes update a node status, it is set to false. When the node status is changed such as volume is detached or new volume is attached to the node, the flag is set to true so that updater can update the status again. The previous workflow has a race condition as follows 1. updater gets the currently attached volume list from the node which needs to be updated. 2. A new volume A is attached to the same node right after 1 and set the flag to TRUE 3. updater updates the node attached volume list (which does not include volume A) and then set the flag to FALSE. The result is that volume A will be never added to the attached volume list so at node side, this volume is never attached. So in this PR, the flag is set to FALSE when updater tries to get the attached volume list (as in an atomic operation). So in the above example, after step 2, the flag will be TRUE again, in step 3, updater does not set the flag if updates is sucessful. So after that, flag is still TRUE and in next round of update, the node status will be updated. This PR also changes a unit test due to the workflow changes
d696d45
to
14cad20
Compare
@saad-ali PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR should be cherry-picked to v1.4.1. |
Jenkins GCE e2e failed for commit 14cad20. Full PR test history. The magic incantation to run this job again is |
@k8s-bot gce e2e test this |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
Adding |
Commit found in the "release-1.4" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
Now that this has been cherry-picked to the 1.4 branch (for 1.4.1), let's also cherry-pick it to the 1.3 branch (for 1.3.9). |
…ck-of-#32807-upstream-release-1.4 Automatic merge from submit-queue Automated cherry pick of kubernetes#32807 Cherry pick of kubernetes#32807 on release-1.4.
Commit found in the "release-1.3" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
This PR fixes the race condition in setting node statusUpdateNeeded flag
in master's attachdetach controller. This flag is used to indicate
whether a node status has been updated by the node_status_updater or
not. When updater finishes update a node status, it is set to false.
When the node status is changed such as volume is detached or new volume
is attached to the node, the flag is set to true so that updater can
update the status again. The previous workflow has a race condition as
follows
updated.
flag to TRUE
The result is that volume A will be never added to the attached volume
list so at node side, this volume is never attached.
So in this PR, the flag is set to FALSE when updater tries to get the
attached volume list (as in an atomic operation). So in the above
example, after step 2, the flag will be TRUE again, in step 3, updater
does not set the flag if updates is sucessful. So after that, flag is
still TRUE and in next round of update, the node status will be updated.
This change is