-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix exponential backoff minimum duration #2578
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #2578 +/- ##
===========================================
- Coverage 65.96% 27.80% -38.17%
===========================================
Files 419 419
Lines 20545 20553 +8
===========================================
- Hits 13553 5715 -7838
- Misses 6039 14252 +8213
+ Partials 953 586 -367
... and 137 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. |
internal/util/ctxutil/ctxutil.go
Outdated
lowestValue := int64(math.Min(capMilliseconds, maxMilliseconds)) - 100 | ||
|
||
// Math/rand is good enough because we don't need the randomness to be cryptographically secure. | ||
return time.Duration(rand.Int63n(lowestValue)+100) * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would panic with cap = 50 * time.Millisecond
, for example. Please add a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a comment that provided cap cannot be smaller than 100ms
, or should we for example decrease min to 0ms if the cap is smaller than 100ms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we even need minimum duration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I've updated description of function to mention expected parameters, added custom panic message similarly to the one that we already have for negative retry
, and added test case that validates if it actually panics.
internal/util/ctxutil/ctxutil.go
Outdated
|
||
return time.Duration(rand.Int63n(lowestValue)) * time.Millisecond | ||
lowestValue := int64(math.Min(capMilliseconds, maxMilliseconds)) - 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the same 100
as base
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same 100
which is added to the rand.Int63n
later to set min duration as 100
.
The base
is constant that is multiplied each retry to set max duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tiny comment 🤗 nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we want the graph of exponential backoff to looks like an exponent. Like in the original article. I'm not sure how to read the current graph. |
Are you talking about dot graph or the linear graph from the article? The dot one shows call count over time and it seems to me that we don't produce enough data to tell how many "calls" we have over time as we don't collect data over time, but just collect all generated durations. I've made the graph that shows how often each duration was returned and it seems that the function return durations pretty equally, and repetitions are mostly within the first "chunk" of values which is also expected. |
The exponent graph on their article shows impact of amount of competing clients on the amount of calls, which requires implementation of clients that will reconnect and make their connection successful or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still curious to see the distribution, but the approach looks good to me.
In principle, we can call a few loops that call DurationWithJitter
in parallel (with retry being increased in each loop) and save results. That will give some distribution similar to the idea of parallel connections.
07bf915
@AlekSi New chart based on the results of new test. It shows total amount of calls for simulation with n clients. |
On a call with Elena, I just released that I misread graphs in the original article, and that was the source of confusion. I will send a PR explaining what I meant. |
Description
Closes #1720.
Readiness checklist
task all
, and it passed.@FerretDB/core
), Labels, Project and project's Sprint fields.