-
Notifications
You must be signed in to change notification settings - Fork 7.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve wle reg/unreg robustness #28354
Conversation
pilot/pkg/xds/workloadentry.go
Outdated
wle.Annotations[DisconnectedAtAnnotation] = strconv.FormatInt(time.Now().UnixNano(), 10) | ||
_, err := sg.Store.Update(wle) | ||
_, err := sg.Store.Patch(gvk.WorkloadEntry, entryName, proxy.Metadata.Namespace, func(cfg config.Config) config.Config { | ||
delete(cfg.Annotations, WorkloadControllerAnnotation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think patch here is intended so if another Istiod' has already taken ownership we do not overwrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right about the "so we don't overwrite", but we want to use "Update" since that fails on conflict.
lgtm except for using Patch in the unregister flow. That's probably what's failing the test. |
The ut failure is because the time format precision is not enough. Not related to patch |
/retest |
The race still stands.. when disconnecting and reconnecting to a different pilot, we may end up un-setting the label.. we need to use update on unregister |
I do not see any bad result, the "istio.io/disconnectedAt" is just a indication the latest disconnect timestamp. When it reconnects to a new pilot, this annotation is always kept, and it does not influence the cleanup either |
The delayed cleanup logic checks if we have the controller annotation set. That patch will unset that label. |
Hmm, you mean |
Yes. Eventually we won't need to worry about that.. we've talked about doing a heartbeat based implementation that won't require it. It handles an edge case where istiod crashes and the associated proxy never re-connects, we'll have a hanging WorkloadEntry. See a comment from Shriram in the design doc if you're curious. As far as this PR is concerned, we do use the workloadController annotation to decide whether or not to cleanup and use it to indicate that the WorkloadEntry has its associated proxy still connected to some istiod instance. |
Yeah, Let me fix it. Should be easy, we can check if this wle belongs to me, i will do cleanup. Otherwise, leaves it. |
/test integ-pilot-k8s-tests_istio |
/retest |
* [release-1.8] Improve wle reg/unreg robustness (#28354) * [release-1.8] vm auto-registration: handle changes to WorkloadGroup (#28804) * [release-1.8] merge labels and annotations from workloadgroup.spec.metadata (#29164) * merge labels and annotations from workloadgroup.sepc.metadata * fix lint * nil checks * imports * more nil checks (cherry picked from commit f5f5f78) * remove undefined var from cherry-pick
When
RegisterWorkload
fails(maybe workloadgroup not exist), close the stream and make retry originated from sidecar. This is to make istiod stateless in this case.use patch instead of update for unregister
make connect/disconnect time readable