Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve wle reg/unreg robustness #28354

Merged
merged 5 commits into from
Nov 4, 2020
Merged

Conversation

hzxuzhonghu
Copy link
Member

  • When RegisterWorkload fails(maybe workloadgroup not exist), close the stream and make retry originated from sidecar. This is to make istiod stateless in this case.

  • use patch instead of update for unregister

  • make connect/disconnect time readable

@hzxuzhonghu hzxuzhonghu requested a review from a team as a code owner October 28, 2020 09:53
@google-cla google-cla bot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Oct 28, 2020
@istio-testing istio-testing added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 28, 2020
wle.Annotations[DisconnectedAtAnnotation] = strconv.FormatInt(time.Now().UnixNano(), 10)
_, err := sg.Store.Update(wle)
_, err := sg.Store.Patch(gvk.WorkloadEntry, entryName, proxy.Metadata.Namespace, func(cfg config.Config) config.Config {
delete(cfg.Annotations, WorkloadControllerAnnotation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think patch here is intended so if another Istiod' has already taken ownership we do not overwrite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right about the "so we don't overwrite", but we want to use "Update" since that fails on conflict.

@stevenctl
Copy link
Contributor

lgtm except for using Patch in the unregister flow. That's probably what's failing the test.

@stevenctl stevenctl added the release-notes-none Indicates a PR that does not require release notes. label Oct 28, 2020
@hzxuzhonghu
Copy link
Member Author

The ut failure is because the time format precision is not enough. Not related to patch

@hzxuzhonghu
Copy link
Member Author

/retest

@stevenctl
Copy link
Contributor

The race still stands.. when disconnecting and reconnecting to a different pilot, we may end up un-setting the label.. we need to use update on unregister

@hzxuzhonghu
Copy link
Member Author

The race still stands.. when disconnecting and reconnecting to a different pilot, we may end up un-setting the label.

I do not see any bad result, the "istio.io/disconnectedAt" is just a indication the latest disconnect timestamp. When it reconnects to a new pilot, this annotation is always kept, and it does not influence the cleanup either

@stevenctl
Copy link
Contributor

The delayed cleanup logic checks if we have the controller annotation set. That patch will unset that label.

@hzxuzhonghu
Copy link
Member Author

The delayed cleanup logic checks if we have the controller annotation set. That patch will unset that label.

Hmm, you mean "istio.io/workloadController".

@stevenctl
Copy link
Contributor

Yes. Eventually we won't need to worry about that.. we've talked about doing a heartbeat based implementation that won't require it. It handles an edge case where istiod crashes and the associated proxy never re-connects, we'll have a hanging WorkloadEntry. See a comment from Shriram in the design doc if you're curious.

As far as this PR is concerned, we do use the workloadController annotation to decide whether or not to cleanup and use it to indicate that the WorkloadEntry has its associated proxy still connected to some istiod instance.

@hzxuzhonghu
Copy link
Member Author

Yeah, Let me fix it. Should be easy, we can check if this wle belongs to me, i will do cleanup. Otherwise, leaves it.

@hzxuzhonghu
Copy link
Member Author

/test integ-pilot-k8s-tests_istio

@stevenctl
Copy link
Contributor

/retest

@istio-testing istio-testing merged commit b647ea4 into istio:master Nov 4, 2020
@hzxuzhonghu hzxuzhonghu deleted the wle-reg branch November 5, 2020 02:22
stevenctl pushed a commit to stevenctl/istio that referenced this pull request Nov 24, 2020
istio-testing pushed a commit that referenced this pull request Nov 26, 2020
* [release-1.8] Improve wle reg/unreg robustness (#28354)

* [release-1.8] vm auto-registration: handle changes to WorkloadGroup (#28804)

* [release-1.8] merge labels and annotations from workloadgroup.spec.metadata (#29164)

* merge labels and annotations from workloadgroup.sepc.metadata

* fix lint

* nil checks

* imports

* more nil checks

(cherry picked from commit f5f5f78)

* remove undefined var from cherry-pick
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. release-notes-none Indicates a PR that does not require release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants