-
Notifications
You must be signed in to change notification settings - Fork 23.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix timing of new replication test #8807
Conversation
In CI valgrind run, i saw that even the fast replica (one that wasn't paused), didn't get to complete the replication fast enough, and ended up getting disconnected by timeout. Additionally, due to a typo in uname, we didn't get to actually run the cpu efficiency part of the test.
|
p.s. i run the new code with valgrind in github actions twice, and it passed. |
In github actions CI with valgrind, i saw that even the fast replica (one that wasn't paused), didn't get to complete the replication fast enough, and ended up getting disconnected by timeout. Additionally, due to a typo in uname, we didn't get to actually run the CPU efficiency part of the test.
Seems to be the same problem? Both timed out together https://github.com/redis/redis/runs/4111888647?check_suite_focus=true#step:5:3559 |
and again here (both on MacOS):
and indeed i see that both replicas get disconnected by timeout, so i suppose we may wanna further increase the timeout (from 2 to 3 seconds)? |
yes also makes no sense to me. I did a another look at the failure logs, and didn't find anything. I noticed a strange phenomenon
somehow the log interval here is four seconds away
|
This is very odd. this explains why the other replica doesn't get it's data on time (we didn't re-enable the read event on the pipe in but i really don't understand how these two prints can be spaced so far apart, the next line after printing |
and again (MacOS daily), is the same as ther previous one: #8807 (comment) |
this time we didn't have a 4 seconds lag between
|
In github actions CI with valgrind, i saw that even the fast replica
(one that wasn't paused), didn't get to complete the replication fast
enough, and ended up getting disconnected by timeout.
Additionally, due to a typo in uname, we didn't get to actually run the
CPU efficiency part of the test.