-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement new DualReadMonitor for UriProperties #999
Conversation
d2/src/main/java/com/linkedin/d2/balancer/dualread/DualReadLoadBalancerMonitor.java
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Outdated
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Outdated
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Outdated
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Outdated
Show resolved
Hide resolved
This new monitor should be as expensive as the previous one since the previous one called the `.equals` method on `UriProperties`, which compares every single URI. It now produces a similarity metric which represents the fraction of matching hosts present in the ZK response and the observer response. The Cluster and Service monitors remain unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the improvement
d2/src/main/java/com/linkedin/d2/balancer/dualread/DualReadLoadBalancerJmx.java
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/DualReadStateManager.java
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Outdated
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Outdated
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Outdated
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Outdated
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Outdated
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Outdated
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Outdated
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Outdated
Show resolved
Hide resolved
d2/src/test/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitorTest.java
Outdated
Show resolved
Hide resolved
d2/src/test/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitorTest.java
Outdated
Show resolved
Hide resolved
d2/src/test/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitorTest.java
Outdated
Show resolved
Hide resolved
d2/src/test/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitorTest.java
Outdated
Show resolved
Hide resolved
d2/src/test/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitorTest.java
Outdated
Show resolved
Hide resolved
boolean completed = done.await(1000, TimeUnit.MILLISECONDS); | ||
if (completed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, if !completed
the test should fail right? You're better off doing a bar done.await()
and adding the timeout flag in the @Test
annotation. That way you don't have to explicitly handle the failure case. Similarly, I would add throws Exception
to the test signature and avoid the explicit catch (InterruptedException e)
. It should make the test easier to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it does time out rarely, so I added the “completed” check. I didn’t see anything wrong in the code though, do you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out it was because the test has multi-thread accessing some non-thread safe vars in the monitor. Made the getters for testing synchronized.
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Outdated
Show resolved
Hide resolved
d2/src/main/java/com/linkedin/d2/balancer/dualread/UriPropertiesDualReadMonitor.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary
Previous dual read matches URI properties only when the versions are the same (same ZK data served from ZK vs Observer), but with Kafka data served, the version will never be the same. And the OutOfSync metric was never useful since only one version of uri properties is cached for each cluster, when uri data changes frequently, the OutOfSync increases and never comes down.
This new URI matching now produces a similarity metric which represents the fraction of matching uris present in the ZK response and the observer response, just the same as how observer is monitoring the cluster uri similarity. The similarity metric may go up during downstream deployment/updates, but eventually it should be 1. The goal is to verify that client side response processing logic in INDIS is not causing uri data to drift (given that Kafka data matches ZK data on observer already).
The Cluster and Service monitors remain unchanged.
(This new monitor should be as expensive as the previous one since the previous one called the
.equals
method onUriProperties
, which compares every single URI).Test Done
Added unit tests. Will do manual qei testing with d2-proxy in the counterpart container PR for adding the config.