-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow nodeName updates when endPoint is updated. #68575
Conversation
testing? |
/ok-to-test |
this needs a release note |
859c0c6
to
f59607f
Compare
Verified via kubectl that nodeNames can now be changed for endpoints. Without my change, I see the error: |
/assign @freehan |
/assign @thockin for approval |
@@ -12968,8 +12968,8 @@ func TestEndpointAddressNodeNameUpdateRestrictions(t *testing.T) { | |||
// Check that NodeName cannot be changed during update (if already set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/cannot/can
f59607f
to
93fbc8e
Compare
@@ -12965,11 +12965,11 @@ func newNodeNameEndpoint(nodeName string) *core.Endpoints { | |||
func TestEndpointAddressNodeNameUpdateRestrictions(t *testing.T) { | |||
oldEndpoint := newNodeNameEndpoint("kubernetes-node-setup-by-backend") | |||
updatedEndpoint := newNodeNameEndpoint("kubernetes-changed-nodename") | |||
// Check that NodeName cannot be changed during update (if already set) | |||
// Check that NodeName can be changed during update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment:
This is to accommodate the case where node IP changed or pod CIDR changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:add the comment
@@ -5134,19 +5134,6 @@ func buildEndpointAddressNodeNameMap(subsets []core.EndpointSubset) map[string]s | |||
return ipToNodeName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can also get rid of the helper function that constructs ipToNode.
93fbc8e
to
f252b1d
Compare
One scenario where nodeName can change for the same ip address is if the endpoints are in hostNetwork mode and nodes are being added/deleted. With the current validation check, if endpoints controller misses a pod delete event, future endpoint updates will never succeed. removed unused helper functions
f252b1d
to
e588ae5
Compare
/retest |
/lgtm |
/assign @thockin for approval |
@freehan: GitHub didn't allow me to assign the following users: for, approval. Note that only kubernetes members and repo collaborators can be assigned. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
So, to trigger this bug you have to have an pod on node X with IP A, node X goes away, node Y comes up. Controller manager misses the old pod going away. Pod gets rescheduled onto Y and gets address A again? That seems pretty niche - am I getting it right? |
That is one case. You can also trigger it with service endpoints in hostNetwork mode and nodes going up and down - autoscaling or other reasons. If there are a lot of these changes, endpoints controller can miss the endpoint deletion events. |
This bug triggers more often with preempt-able VMs + pods with host network. |
Missing events is a MAJOR bug. If we can demonstrate that, we should really REALLY fix that. That said, I am going to OK this. I think it was pedantic, and should be OK. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: freehan, prameshj, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…75-upstream-release-1.12 Automated cherry pick of #68575: Allow nodeName updates when endPoint is updated.
Confirmed to be hitting this by having scale down events (removal of nodes) in v1.10.5. Thanks for the fix! |
Thanks for the fix ! I think this also solves the issue referenced here #69668 . I am not able to confirm as we use kubernetes through GCP and they have not yet released |
Any reason not to pick this to 1.10 and 1.11 as well? |
Not reacting to individual events is possible, and I don't think it's a bug. The validation prior to this PR depended on the endpoints controller first deleting the IP+old-node from the endpoints object, then separately adding the IP+new-node. That's an edge-triggered behavior, and we want to be level-triggered, so that the following works:
To accomodate that, loosening the validation to allow a change is correct. |
opened picks to 1.10 and 1.11 |
…5-upstream-release-1.11 Automated cherry pick of #68575: Allow nodeName updates when endPoint is updated.
…5-upstream-release-1.10 Automated cherry pick of #68575: Allow nodeName updates when endPoint is updated.
This change removes a validation check that disallows nodeName change in a service endpoint update.
One scenario where nodeName can change for the same ip address is if the endpoints are in hostNetwork mode and nodes are being added/deleted.
With the current validation check, if endpoints controller misses a pod delete event, future endpoint updates will never succeed. This will result in stale
endpoints for the service.
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #66720
Special notes for your reviewer:
Release note: