Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new DualReadMonitor for UriProperties #999

Merged
merged 17 commits into from
May 10, 2024
Merged

Implement new DualReadMonitor for UriProperties #999

merged 17 commits into from
May 10, 2024

Conversation

PapaCharlie
Copy link
Member

@PapaCharlie PapaCharlie commented Apr 29, 2024

Summary

Previous dual read matches URI properties only when the versions are the same (same ZK data served from ZK vs Observer), but with Kafka data served, the version will never be the same. And the OutOfSync metric was never useful since only one version of uri properties is cached for each cluster, when uri data changes frequently, the OutOfSync increases and never comes down.
This new URI matching now produces a similarity metric which represents the fraction of matching uris present in the ZK response and the observer response, just the same as how observer is monitoring the cluster uri similarity. The similarity metric may go up during downstream deployment/updates, but eventually it should be 1. The goal is to verify that client side response processing logic in INDIS is not causing uri data to drift (given that Kafka data matches ZK data on observer already).

The Cluster and Service monitors remain unchanged.
(This new monitor should be as expensive as the previous one since the previous one called the .equals method on UriProperties, which compares every single URI).

Test Done

Added unit tests. Will do manual qei testing with d2-proxy in the counterpart container PR for adding the config.

PapaCharlie and others added 3 commits May 8, 2024 16:38
This new monitor should be as expensive as the previous one since the previous
one called the `.equals` method on `UriProperties`, which compares every single
URI. It now produces a similarity metric which represents the fraction of
matching hosts present in the ZK response and the observer response. The Cluster
and Service monitors remain unchanged.
Copy link
Collaborator

@brycezhongqing brycezhongqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the improvement

Comment on lines 180 to 181
boolean completed = done.await(1000, TimeUnit.MILLISECONDS);
if (completed) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, if !completed the test should fail right? You're better off doing a bar done.await() and adding the timeout flag in the @Test annotation. That way you don't have to explicitly handle the failure case. Similarly, I would add throws Exception to the test signature and avoid the explicit catch (InterruptedException e). It should make the test easier to read

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it does time out rarely, so I added the “completed” check. I didn’t see anything wrong in the code though, do you?

Copy link
Contributor

@bohhyang bohhyang May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out it was because the test has multi-thread accessing some non-thread safe vars in the monitor. Made the getters for testing synchronized.

Copy link
Contributor

@bohhyang bohhyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bohhyang bohhyang merged commit 5128dd5 into master May 10, 2024
2 checks passed
@bohhyang bohhyang deleted the pc/compare branch May 10, 2024 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants